How to pass an analytics job test – Part I – SQL.
With the demand in data analytics professionals growing stronger than ever, recruiters find themselves in a peculiar position of having to screen hundreds of potential, seemingly qualified candidates. Some firms turned to the proven selection tool: pre-employment skills assessments; and analytics-related tests are on the rise, especially for the junior level industry positions. No tests are the same, but most are designed with the sole purpose of gauging candidate’s cognitive ability to understand the problem at hand and having the technical know-how to implement a working solution. Two types of analytics tests that my students shared involved using either: Microsoft Excel or SQL language. Most of relational databases can be queried using a dialect of SQL, and as such, knowledge of SQL is as essential for a data analytics professional, as their excellent communications skills. In this post we will go through an example of a SQL job test, while in the next article we would focus on an Excel problem.
Let’s suppose we applied for a retail analytics job, and a potential employer offers the below data snapshot. Before proceeding any further, we need to understand relationship between 3 tables provided: Salespeople find Customers , who place Orders and help the company earn revenue. Salespeople table houses some pertinent salespeople information: their ID, Name, and the Region. Customers table contains similar data points for our customers: Customer ID, Name, City, and State. Finally, Orders table puts it all together and shows us order-related information: from the order number to order date (month and year), linking orders to both: customer ID (allowing us to get any relevant information about the customer placing this order) to salesperson ID, which would help us look up our salesperson information and reward salespeople and regions they represent.
Data tables for a SQL job test.
The above exercise is essentially the first step in applying the STAR (Situation, Task, Action, and Result) framework to our problem solving process. The situational context requires us to understand data. While this specific example is rather straight-forward, we might need to either get more clarification, or document our assumptions at this time. For example, should we assume that there are business rules in place that prevent salespeople from selling to existing customers, signed up by other salespeople? If there is such a rule, but we encounter a data point contradicting it, how should we resolve such data issue? The next step: task starts when our employer asks some questions that we need to answer. Indeed, we would have to compose some SQL queries as our next action , let’s not forget to ensure that our SQL logic is valid in both: using the SQL syntax valid for the type of database we are using, AND accurate in terms of the logic used. Once we retrieve the results of these queries and validate them further, perhaps utilizing a spot-checking technique, we complete our analytical process.
Without further adieu, let’s answer specific questions asked in this job exam:
1. List the names and total revenue earned for all salespeople who have orders with customers in IL.
2. List the names of all salespeople who sold more than two orders.
3. List the names of all salespeople who do not have any orders on record.
4. List each Order’s % of total revenue and % of regional revenue.
5. Create a table with Region, Order_Mth, and Total_Revenue fields. For each month, insert one row into this table containing the Region that had the highest revenue.
1. List the names and total revenue earned for all salespeople who have orders with customers in IL
To solve this problem, we would have to use the Orders table to look up records for all Customers residing in IL and retrieve Salespeople who originated orders for these customers. We already touched upon some practical applications of Microsoft Access SQL before , building upon this knowledge base, we can construct the query below. Let’s be cognizant of some Access SQL nuances when it comes to joining more than two tables in the same query:
SELECT Salespeople.Name, SUM(Orders.Revenue) AS Revenue FROM (Customers INNER JOIN Orders ON Customers.ID = Orders.Cust_ID) INNER JOIN Salespeople ON Orders.Salesperson_ID = Salespeople.ID WHERE Customers.State="IL" GROUP BY Salespeople.Name;
2. List the names of all salespeople who sold more than two orders .
First, we would query the Orders table for get the list of salesperson IDs for all salespeople ringing up more than 2 orders, then we would simply retrieve corresponding salespeople names from the Salespeople table. Since we are using a condition, based on aggregate function (SUM), we would have to use the HAVING clause, instead of the WHERE clause.
SELECT Name FROM Salespeople INNER JOIN Orders ON Salespeople.ID = Orders.Salesperson_ID GROUP BY Name HAVING COUNT(Order_Num) >= 2
3. List the names of all salespeople who do not have any orders on record.
To zoom in onto salespeople who have not been able to secure any orders, we simply need to retrieve the list of salespeople names, who do not have any matches in the Orders table:
SELECT Name FROM Salespeople LEFT JOIN Orders ON Salespeople.ID = Orders.Salesperson_ID WHERE Orders.Order_Num IS NULL;
4. List each Order’s % of total revenue and % of regional revenue.
This is an example of a somewhat more involved query. The first step involves figuring out the level of granularity required in our reporting, first we have to report at the Total level, then at the Region level. The easiest way to accomplish this task is by using performing these two separate calculations and joining the results based on the common denominator: Order ID . To add some finishing touches, let’s format Share of total as percent:
SELECT A.Order_Num, A.Revenue, FORMAT(A.Revenue/ (SELECT Sum(Revenue) FROM Orders), "Percent") AS ShareOfTotal, A.Region, FORMAT(A.Revenue/SUM(B.Revenue), "Percent") AS ShareOfRegion FROM (SELECT Order_Num, Sum(Orders.Revenue) AS Revenue, Region FROM Salespeople INNER JOIN Orders ON Salespeople.ID = Orders.Salesperson_ID GROUP BY Order_Num, Region) AS A, (SELECT SUM(Orders.Revenue) AS Revenue, Region FROM Salespeople INNER JOIN Orders ON Salespeople.ID = Orders.Salesperson_ID GROUP BY Region) AS B WHERE A.Region = B.Region GROUP BY A.Order_Num, A.Revenue, A.Region;
5. Create a table with Region, Order_Mth, and Total_Revenue fields. For each month, insert one row into this table containing the Region that had the highest revenue
Having just finished a relatively complex query, this exercise is not overly difficult after all. There is some simple syntax in Access SQL that allows us to save results of our query into a new table, let’s call it tblMonthlyTopRegion :
SELECT * INTO tblMonthlyTopRegion FROM (SELECT A.Order_Mth, B.Region, A.Revenue FROM (SELECT C.Order_Mth, MAX(C.Revenue) AS Revenue FROM (SELECT Order_Mth, Region, SUM(Orders.Revenue) AS Revenue FROM Orders INNER JOIN Salespeople ON Orders.Salesperson_ID = Salespeople.ID GROUP BY Order_Mth, Region) AS C GROUP BY C.Order_Mth ) AS A, (SELECT Order_Mth, Region, SUM(Orders.Revenue) AS Revenue FROM Orders INNER JOIN Salespeople ON Orders.Salesperson_ID = Salespeople.ID GROUP BY Order_Mth, Region) AS B WHERE A.Order_Mth = B.Order_Mth AND A.Revenue = B.Revenue) D;
We are always looking for more examples of analytics-related pre-employment tests, would you share any such tests that you happen to have encountered?