Table Joins

Understanding Table Joins in SQL

Working with data often involves the need to utilize multiple data sources, usually stored in different data tables (in case of database storage) or data frames (when it comes to programming languages or data visualization tools.)  In order to put power of this data to a good use we want to be able to join these tables based on a field or fields they have in common (foreign key[s]) or sometimes values in the field that are different. Not only basic principles of table joins – INNER, OUTER (FULL, LEFT, and RIGHT), CROSS (or Cartesian) or even UNION-ing tables are universal to most relational databases and flavors of SQL, they also apply to working with data frames. In this post we will explore examples of using these table joins in a PostgreSQL database, while adding SELF, and LEFT/RIGHT exclusive joins for a good measure.

Continue reading

US COVID-19 Cases

During these uncertain times, how can you make sense of the data tsunami being presented on the state of pandemic in US? For the last couple of months, many Americans found themselves checking the spread of COVID-19 cases on a daily basis. As most of US states went into shelter-in-place mode, resources like Johns Hopkins and 91-DIVOC became a daily refuge for those seeking to stay informed. In today’s post, we will work on creating our own version of a web-based, interactive and visually appealing COVID-19 dashboard using Google DataStudio. Doing so we will gain a better understanding of the data used, decide on the type of data we deem most relevant, and maintain control over the best ways to visualize such data to help our audience make most sense of it. In the process of building this data viz, we will utilize various objects and features of the mighty GDS application: Google Sheets connector, Calculated fields, Scorecard, Table, Geo Map, Line and Combo charts, Date range, Filter controls and recently released optional metrics – are some but not all features we will cover.

Continue reading

Google Dataset Search

Google has been dominating web search for nearly two decades and it’s acquisition of YouTube resulted in the second most popular search engine in the world. Yet, it seemingly lost the product search niche to Amazon. It’s not surprising that amidst growing interest in all things data, including public and open data, this tech giant would be keen on developing a search product geared towards making dataset search easier. What is surprising, is how long it took them to develop and release this product, which was officially introduced to general public on January 23rd, 2020 after spending more than 16 months in beta testing. You can embark on your own dataset search journey here.

Continue reading

 

A beginner’s Guide to BigQuery Sandbox and exploring public datasets.

A beginner's guide to BigQuery Sandbox

          As you might realize by now, writing SQL queries is one of the essential skills any inspiring data analyst needs to master. After all, larger datasets are typically stored in relational databases and Structured Query Language is the language that helps us communicate with such databases. Sure, NoSQL is gaining prominence amid the growing popularity of nontraditional databases, but we need to learn to crawl before we start walking. Merely 10 years ago, you would need to download and install a RDBMS software package (be it MySQL, PostgreSQL, or SQLite), load a sample database and do a hundred pushups before you could write your very first SQL query. Luckily technology sprung ahead and we now have a plethora of web-based SQL editor options from SQL Lite Online to SQL Fiddle that eliminate the software setup step, but might still require us to load sample data. What if you wanted to access real-world big data sets from the comfort of your browser without having to download any software, no hassle, no trial, no credit card required? Well, you’re in luck, what follows is the beginner’s guide to Google BigQuery’s Sandbox. An active Google account is your cost of admission. BONUS: Machine Learning models are powered by nothing else but SQL are also included.
Continue reading