How to pass an analytics job test – Part II – MS Excel.


How to pass an analytics job test - Part II - Microsoft Excel

          Even with the rise of use of R, Python, SAS and other more scientific analytical tools, Microsoft Excel remains the most popular data analysis tool. While we have gone over a solution for an analytics job test in SQL last month , you are much more likely to encounter a job test in Microsoft Excel for your next analytics opportunity. While I would personally argue that this particular test is actually better solved with SQL, the employer believes that the applicants instead need to apply their Excel skills to demonstrate their proficiency and acumen. As before, we should start by asking questions about the problem at hand and trying to get as much clarification as needed or state our assumptions. However, since spreadsheets are less forgiving from the presentation point of view than the databases, I would strongly recommend that we would also take a few minutes to format any workbooks provided by the prospective employer. Chances are they would recognize your level of professionalism by looking at clean and presentable file. Your stylistic preferences might be different, but as a minimum I would remove gridlines, add filters/format as tables larger datasets, freeze panes, and add at least one to two colors to the otherwise monochrome layout.

Continue reading

How to pass an analytics job test – Part I – SQL.


How to pass an analytics job test - Part I - SQL

          With the demand in data analytics professionals growing stronger than ever, recruiters find themselves in a peculiar position of having to screen hundreds of potential, seemingly qualified candidates. Some firms turned to the proven selection tool: pre-employment skills assessments; and analytics-related tests are on the rise, especially for the junior level industry positions. No tests are the same, but most are designed with the sole purpose of gauging candidate’s cognitive ability to understand the problem at hand and having the technical know-how to implement a working solution. Two types of analytics tests that my students shared involved using either: Microsoft Excel or SQL language. Most of relational databases can be queried using a dialect of SQL, and as such, knowledge of SQL is as essential for a data analytics professional, as their excellent communications skills. In this post we will go through an example of a SQL job test, while in the next article we would focus on an Excel problem.

Continue reading

Working with sample datasets in BigQuery

Working with sample datasets in BigQuery

          In the previous post we added public tables to our BigQuery interface. However, Google already provides sample data on various topics by default. While most of these tables are not updated, they still present some interest in terms of learning trends or insights on a multitude of topics. We will focus on 3 of these tables:
Natality (daily US births from 1969 to 2008),
GSOD (daily weather information by a station number from 1929 to 2009),
and Shakespeare (word index of Shakespeare’s works.)

          Let’s start our exploration with the Natality dataset. The graph above charts share of teenager births, comparing to grand total by year. Between 1969, nominal number of births by teenagers went up from 307,561 to 441,110. However, this is not necessarily a bad news, considering growing US population. While in 1973, almost every fifth birth (19.55%) was by a teenager mother, by 2005 this ratio dropped to every 10th birth (10.18%.) To pull relevant source data, we simply need to run the following query (which would incidentally retrieve preteen births as well [outliers representing fewer than 200 births a year.]):
Continue reading

Getting started with Google BigQuery and GDELT Project

Getting started with Google BigQuery and GDELT

          Once upon the time, the new kid on the block left more established search engines in the dust, then, after reinventing web-based email service, Google introduced its Apps. Today, let’s talk about one of the myriad services Google offers to us: BigQuery. Basically, this cloud-based service allows us to utilize Google’s hardware to store our own datasets or access public data on the go. Google provides API for Java, PHP, and Python access. In addition, various third-party tools now connect directly to BigQuery: Tableau, R, JasperSoft, and Simba to name a few. We get a 1 TB monthly usage quota to query BigQuery’s data for free. Some of the downsides of this service include: premiums for storing our own data and querying in excess of the free quota. We are also limited with data manipulation tasks we can perform in BigQuery; in fact, we can only append records to our table, we cannot update or delete them. Finally, this service uses a SQL language dialect, which lacks some of the SQL commands we are accustomed to: DISTINCT comes to mind, or resort us to some convoluted workarounds (try using the TOP command.) Meet, the GDELT Project – “the largest, most comprehensive, and highest resolution open database of human society ever created.” In this tutorial, we will learn some interesting facts about different countries, using GDELT data in BigQuery.

Continue reading