Working with sample datasets in BigQuery

Working with sample datasets in BigQuery

          In the previous post we added public tables to our BigQuery interface. However, Google already provides sample data on various topics by default. While most of these tables are not updated, they still present some interest in terms of learning trends or insights on a multitude of topics. We will focus on 3 of these tables:
Natality (daily US births from 1969 to 2008),
GSOD (daily weather information by a station number from 1929 to 2009),
and Shakespeare (word index of Shakespeare’s works.)

          Let’s start our exploration with the Natality dataset. The graph above charts share of teenager births, comparing to grand total by year. Between 1969, nominal number of births by teenagers went up from 307,561 to 441,110. However, this is not necessarily a bad news, considering growing US population. While in 1973, almost every fifth birth (19.55%) was by a teenager mother, by 2005 this ratio dropped to every 10th birth (10.18%.) To pull relevant source data, we simply need to run the following query (which would incidentally retrieve preteen births as well [outliers representing fewer than 200 births a year.]):
Continue reading

Calculate your age, WITHOUT using Excel’s DateDif function.


Calculate Your Age in Excel, without DATEDIF function

          My previous post on using Excel’s DATEDIF function resulted in rather productive discussions on LinkedIn. First of all, some users are concerned that the DATEDIF function will no longer be supported in Excel and will cease to exist at some point in the future. Microsoft openly states that “this function is provided for compatibility with Lotus 1-2-3.” , and given the history of this support, I don’t see why it will be discontinued. Nonetheless, I will make an attempt to replace DATEDIF function with other functions readily available in the program. Secondly, as a reminder that English is not the only language in the world, Microsoft created “local” versions of this function. A user confirmed existence of the SIFECHA function, but the syntax to use it, involves using semicolons, instead of commas. If you are using Spanish version of Excel, you might want to try both versions:

SIFECHA(fecha_inicial,fecha_final,unidad)
SIFECHA(fecha_inicial;fecha_final;unidad)

          According to this site , remaining versions of this function include:
Continue reading

Calculate your age, using Excel’s DateDif function.


Excel's DateDif function - Calculate Your Age

          During this time of the year, most of us find ourselves performing some sorts of date manipulations in Excel: running year-end reports, creating a project plan for the next year, or simply setting up a new calendar. Let’s use one of Excel’s “hidden” functions to calculate one’s age in years, months, and days.

          We remember that Excel stores dates as serial numbers, in consecutive order. According to Microsoft, Day 1 fell on January 1st, 1900. Therefore, today’s date is equivalent to number 42,020 (meaning, 42,020 days passed since January 1st, 1900.) This makes it fairly easily to calculate difference between two dates [in days] by subtracting one date from another. While Excel does not support dates prior to January 1st, 1900 out of the box, if you need to work with such dates, John Walkenbach offers an Excel add in: XDate to help with this task.
Continue reading

Using Nested If Statements.


Nested IF Function

          A reader asked me to explain logic behind the formula used in my Holiday Email post. That solution involved using multiple IF functions as arguments within another IF function, technique also known as nested IF logic. I find that the best way to use nested if functionality is through using decision trees AND Excel’s VBA syntax. Charting a decision tree will help us better understand the task at hand, and putting various conditions and the ways to handle them in writing, will prepare us to enter necessary formula in Excel.

          Let’s suppose that we need to implement the following logic with 4 possible scenarios. If the value of cell 1 is string “Case 1” AND cell 2 is string “Case 2”, we need the output to be 4. If cell 1 equals “Case 1”, but cell 2 does NOT equal “Case 2”, we need the output to be 3. If cell 1 does NOT equal “Case 1” , but cell 2 equals “Case 2”, the output should be 2. Finally, if cell 1 does not equal “Case 1”, NOR cell 2 equals “Case 2”, the output is 1. Hopefully, using the decision tree below makes it easier to conceptualize these scenarios.
Continue reading