Solving ModelOff Data Analysis problem using Microsoft Access SQL.
Last week we solved ModelOff’s Data Analysis problem from their 2013 championship. Since the second round of 2014 Model Off competition takes place this Saturday, November 8th, let’s pay respect to the data superheroes making it thus far. Our previous ModelOff solution involved using PivotTable feature of Microsoft Excel. Would you believe that we can realistically conceive a solution to the Data analysis problem, using Microsoft Access, or even better, Microsoft’s flavor of the SQL language?
I am as big of an Excel fan as the next guy, much bigger, actually, on the second thought. However, I also believe that when possible, using the right tools for the job will yield better, faster results, than duct-taping your workarounds. So, why Access? Why on Earth, SQL? Well, let’s go to the source: ” If you often have to view your data in a variety of ways, depending on changing conditions or events, Access might be the better choice for storing and working with your data. Access lets you use Structured Query Language (SQL) queries to quickly retrieve just the rows and columns of data that you want, whether the data is contained in one table or many tables. You can also use expressions in queries to create calculated fields.”
Looking at the questions asked, it seems like this problem begs us to utilize Access queries, especially while using SQL language. Another case that i would make for the database approach is the fact that database is the place where data lives, and instead of relying on some grumpy IT folks to retrieve it for us and “dump” it in Excel, we would be better off handling this treasure of terabytes ourselves, with extra care that it needs. As you will see, we don’t even need any help from Excel with our axillary calculations. Finally, could all of the employers seeking Data Analyst s and requiring SQL skills be wrong? Of course they can, but their wrongness should not stand between you and the dream job that you deserve…
To get started, we need to move our data from the Excel worksheet into the Access database. This scenario is not as common as the other way around, but given the problem at hand, let’s proceed:
First things first, we need to open Access and create a new database file. I’ll name it ModelOff_2013
Then, we need to navigate to the External Data ribbon and select Excel function to import our Excel file.
Next step is to select our Excel file, and import its data into the database.
We can trust Access to properly rename our fields.
In this case we know that first row contains column headings
Let’s not index our Date field.
Our table is small enough to not require a primary key, but creating one wouldn’t hurt either.
Let’s call our table tblModelOff2013
If necessary, we can finish this export process by deleting any blank records that Access might have created. We are almost ready to solve this ModelOff challenge.
Going to the Create ribbon and selecting the Query Design option will get us started with a new query.
Let’s choose our data table.
Click on View menu, select the SQL View option
and by now we are are all set create a MoldelOff solution, using Access SQL.
Let’s do this!
This is a rather straight-forward question. All we need to do is to select total quantity (sum) of units sold, where our conditions are: Item Code is 10, and Manager On Duty is “John Jones.” Note that in Microsoft SQL (MS SQL), we need to use double quotes for text/string field values, while other databases typically use single quotes for the same reason. Remembering that SQL syntax calls for enclosure of multi-word field names in brackets, our SQL code is the following:
SELECT SUM (Quantity) AS Units
FROM tblModelOff2013
WHERE [Item Code]=10 AND [Manager On Duty]="John Jones";
We can run this code in Access by clicking on the Run button (exclamation point) and the output should be 554, making our answer B – 600 is the closest number to 554 out of all amounts listed in the answers provided for this problem.
This question is easily resolved using the TOP SQL command (very similar to Excel’s PivotTable feature of the same name.) In addition, we ought to remember that SQL dictates using its Group By clause whenever retrieving aggregated fields (any calculations, such as SUM or Average ) in conjunction with non-aggregated fields (fields from our table.) In our example, we will be grouping by Item Code , while aggregating Quantity field. Finally, we will have to sort in descending order, evoking SQL’s Order By clause:
SELECT TOP 3 [Item Code]
FROM tblModelOff2013
GROUP BY [Item Code]
ORDER BY SUM(Quantity) DESC;
After executing this query, we realize that our answer is A
Drawing on our new SQL knowledge from the last two questions, we can construct the statement below:
SELECT TOP 1 [Sales Person], [Item Code]
FROM tblModelOff2013
GROUP BY [Sales Person], [Item Code]
ORDER BY SUM(Quantity) DESC;
Correct answer is A
This simple question is solved using calculated Sales field: product of Sales Price Per Unit and Quantity
SELECT SUM ([Sales Price Per Unit]*[Quantity]) AS Sales
FROM tblModelOff2013
WHERE [Sales Person] Like "*Wendel*";
C is the correct answer.
Finally, this is a tricky question! Notice that our raw data presented at the product line level, where any invoice can have one or more lines, based on the number of items present on invoice. As a result, we cannot simply calculate number of records by a sales person, we need to calculate unique occurrences of invoice numbers. This is something that even Excel PivotTables can’t help us to solve. Yest, SQL has this nifty command called Distinct that comes to our rescue. While Access SQL, lets us to select distinct records, SQL snobs often look down on Access SQL due to its inability to perform a “given” command in other databases: COUNT(DISTINCT). We are not afraid to overcome any obstacles present in our way, and using a subquery method, we can in fact perform calculations needed. Finally, note using a wildcard symbol *, within the LIKE command to retrieve results for any salesperson, whose names starts with Sally
SELECT Count(*) AS InvoiceCount
FROM
(SELECT DISTINCT [Invoice Number]
FROM tblModelOff2013
WHERE [Sales Person] Like "Sally*") ;
D is our final answer.
This question tests our knowledge of SQL’s date functions. We need to format our Date field as a month format to answer the question asked. In fact, using Format(Date,”mmmm”) code should be of assistance here.
SELECT TOP 1 [Postal Code]
FROM tblModelOff2013
WHERE Format(Date,"mmmm") = "May" AND [Item Code] = 5
GROUP BY [Postal Code], Format(Date,"mmmm"), [Item Code]
ORDER BY SUM(Quantity) DESC;
Again, we find that C is the correct answer.
This is another involved question, where we can use another way to retrieve month from a date field. We will use a combination of Month and Monthname commands. You might be familiar with the Month through using Excel functions – it works exactly the same: retrieving number corresponding to the month of the year, January being 1. MonthName will retrieve exactly what this function name suggests. In addition, using the Having is essentially the Where equivalent for aggregated criteria.
SELECT COUNT(*) AS PostalCodes
FROM (SELECT [Postal Code]
FROM tblModelOff2013
GROUP BY MONTHNAME(MONTH([Date])), [Postal Code]
HAVING MONTHNAME(MONTH([Date]))="February" AND SUM(Quantity)>400
ORDER BY Sum(Quantity) DESC) ;
Consistently with the last couple of questions, C is the correct answer.
This question tests our knowledge of SQL’s date functions. We need to format our Date field as a month format to answer the question asked. In fact, using Format(Date,”mmmm”) code should be of assistance here.
SELECT TOP 3 [Item Code]
FROM tblModelOff2013
WHERE [Postal Code]=3020
GROUP BY [Postal Code], [Item Code]
ORDER BY SUM([Cost per Unit]*[Quantity]) DESC;
A is the answer that we are looking for.
No comments needed. By now we are ready for anything that ModelOff throws our way.
SELECT [Sales Person]
FROM
(SELECT DISTINCT [Invoice Number], [Sales Person]
FROM tblModelOff2013 WHERE Format(Date,"mmmm") = "May") AS A
GROUP BY [Sales Person]
ORDER BY Count(*) DESC;
We’ll choose C as our answer.
Building off our strong SQL knowledge, we have:
SELECT TOP 1 [Invoice Number]
FROM tblModelOff2013
WHERE [Sales Person] Like "Wendel*"
GROUP BY [Invoice Number]
ORDER BY Sum([Sales Price Per Unit]*[Quantity]) DESC;
Correct answer is: A.
After figuring our how to calculate Discount given, we have
SELECT [Sales Person],
FROM tblModelOff2013
GROUP BY [Sales Person]
ORDER BY SUM([Discount Given]*[Ticket Price Price Per Unit]*[Quantity]) DESC;
Correct answer is: C.
Using SQL magic,
SELECT TOP 1 MonthName(Month([Date])) AS [Month]
FROM tblModelOff2013
GROUP BY MonthName(Month([Date]))
ORDER BY SUM([Sales Price Per Unit]*[Quantity]) DESC;
Correct answer is: A.
Using SQL magic,
SELECT TOP 1 [Item Code]
FROM tblModelOff2013
WHERE [Postal Code] In (3013,3017,3031)
GROUP BY [Item Code], MonthName(Month([Date]))
HAVING MonthName(Month([Date]))="February"
ORDER BY Sum(([Sales Price Per Unit]-[Cost Per Unit])*[Quantity]) DESC;
Correct answer is: C.
Using SQL magic,
SELECT MonthName(Month([Date])) AS [Month]
FROM tblModelOff2013
GROUP BY MonthName(Month([Date]))
ORDER BY Sum(([Sales Price Per Unit]-[Cost Per Unit])*[Quantity]) DESC;
Correct answer is: A.
We can easily set up multiple criteria for this query
SELECT TOP 3 MonthName(Month([Date])) AS [Month]
FROM tblModelOff2013
WHERE [Postal Code] Not In (3019,3028)
AND [Item Code] Not In (4,5,6,17,18)
AND [Manager On Duty]="John Jones"
GROUP BY MonthName(Month([Date])), [Manager On Duty]
ORDER BY SUM(([Sales Price Per Unit]-[Cost Per Unit])*[Quantity]) DESC;
Correct answer is: B.
Using SQL magic,
SELECT SUM(Quantity) AS Units
FROM tblModelOff2013
WHERE [Sales Person] Like "Benny*" Or [Sales Person] Like "Kelly*"
GROUP BY [Item Code], MonthName(Month([Date]))
HAVING [Item Code]=3 AND MonthName(Month([Date]))="June";
Correct answer is: B.
Should you have any further questions, you can download my Access file for further review.
Slide share version of this post.