Working with sample datasets in BigQuery
In the previous post we added public tables to our BigQuery interface. However, Google already provides sample data on various topics by default. While most of these tables are not updated, they still present some interest in terms of learning trends or insights on a multitude of topics. We will focus on 3 of these tables:
Natality (daily US births from 1969 to 2008),
GSOD (daily weather information by a station number from 1929 to 2009),
and Shakespeare (word index of Shakespeare’s works.)
Let’s start our exploration with the Natality dataset. The graph above charts share of teenager births, comparing to grand total by year. Between 1969, nominal number of births by teenagers went up from 307,561 to 441,110. However, this is not necessarily a bad news, considering growing US population. While in 1973, almost every fifth birth (19.55%) was by a teenager mother, by 2005 this ratio dropped to every 10th birth (10.18%.) To pull relevant source data, we simply need to run the following query (which would incidentally retrieve preteen births as well [outliers representing fewer than 200 births a year.]):
Continue reading