Table Joins

Understanding Table Joins in SQL

Working with data often involves the need to utilize multiple data sources, usually stored in different data tables (in case of database storage) or data frames (when it comes to programming languages or data visualization tools.)  In order to put power of this data to a good use we want to be able to join these tables based on a field or fields they have in common (foreign key[s]) or sometimes values in the field that are different. Not only basic principles of table joins – INNER, OUTER (FULL, LEFT, and RIGHT), CROSS (or Cartesian) or even UNION-ing tables are universal to most relational databases and flavors of SQL, they also apply to working with data frames. In this post we will explore examples of using these table joins in a PostgreSQL database, while adding SELF, and LEFT/RIGHT exclusive joins for a good measure.

Continue reading

 

A beginner’s Guide to BigQuery Sandbox and exploring public datasets.

A beginner's guide to BigQuery Sandbox

          As you might realize by now, writing SQL queries is one of the essential skills any inspiring data analyst needs to master. After all, larger datasets are typically stored in relational databases and Structured Query Language is the language that helps us communicate with such databases. Sure, NoSQL is gaining prominence amid the growing popularity of nontraditional databases, but we need to learn to crawl before we start walking. Merely 10 years ago, you would need to download and install a RDBMS software package (be it MySQL, PostgreSQL, or SQLite), load a sample database and do a hundred pushups before you could write your very first SQL query. Luckily technology sprung ahead and we now have a plethora of web-based SQL editor options from SQL Lite Online to SQL Fiddle that eliminate the software setup step, but might still require us to load sample data. What if you wanted to access real-world big data sets from the comfort of your browser without having to download any software, no hassle, no trial, no credit card required? Well, you’re in luck, what follows is the beginner’s guide to Google BigQuery’s Sandbox. An active Google account is your cost of admission. BONUS: Machine Learning models are powered by nothing else but SQL are also included.
Continue reading

Working with Google Sheets’ QUERY Function


Using QUERY Function in Google Sheets

          So far we’ve covered some ground when it comes to using spreadsheet functions within MS Excel ecosystem. Perhaps it’s time to turn to one of its leading competitors: Google Sheets ? When GoogleSheets was launched noone, but perhaps its creators would seriously consider it to be a real threat the mighty, ubiquitous Excel spreadsheet package. Over the years this product went through a plethora of incremental, as well as break-through enhancements. By now GoogleSheets achieved an admirable user base backing it’s strong functionality matching and in some instances surpassing the gold standard continuously maintained by the much beloved Excel program.

          One of most versatile Sheets’ functions is the QUERY function. Its SQL-like syntax allows us to retrieve specific information from our worksheet, enabling various filtering, sorting, lookup, and aggregation capabilities. While this function is more powerful than INDEX/MATCH, VLOOKUP, SUMPRODUCT, SUMIFS and others, its syntax is rather simple: QUERY(Data, Query, [Headers]) .

Data – range of cells we want to query from

Query – actual query code, to be enclosed in quotation marks or be a reference to a cell containing query text.

Headers – OPTIONAL, number of header rows at the top of data, when omitted value is set to -1.

          If you can comfortably compose basic SQL queries, you will find this function rather intuitive and easy to use. However, you might also find yourself wanting to throw your keyboard across the room when some of the SQL functionality we take for granted is either not found or not implemented properly in this function. Working with aliases has definitely topped my personal list of annoyances.
Continue reading

How to pass an analytics job test – Part I – SQL.


How to pass an analytics job test - Part I - SQL

          With the demand in data analytics professionals growing stronger than ever, recruiters find themselves in a peculiar position of having to screen hundreds of potential, seemingly qualified candidates. Some firms turned to the proven selection tool: pre-employment skills assessments; and analytics-related tests are on the rise, especially for the junior level industry positions. No tests are the same, but most are designed with the sole purpose of gauging candidate’s cognitive ability to understand the problem at hand and having the technical know-how to implement a working solution. Two types of analytics tests that my students shared involved using either: Microsoft Excel or SQL language. Most of relational databases can be queried using a dialect of SQL, and as such, knowledge of SQL is as essential for a data analytics professional, as their excellent communications skills. In this post we will go through an example of a SQL job test, while in the next article we would focus on an Excel problem.

Continue reading