Understanding Table Joins in SQL
Working with data often involves the need to utilize multiple data sources, usually stored in different data tables (in case of database storage) or data frames (when it comes to programming languages or data visualization tools.) In order to put power of this data to a good use we want to be able to join these tables based on a field or fields they have in common (foreign key[s]) or sometimes values in the field that are different. Not only basic principles of table joins – INNER, OUTER (FULL, LEFT, and RIGHT), CROSS (or Cartesian) or even UNION-ing tables are universal to most relational databases and flavors of SQL, they also apply to working with data frames. In this post we will explore examples of using these table joins in a PostgreSQL database, while adding SELF, and LEFT/RIGHT exclusive joins for a good measure.
Setting up data tables
To illustrate how table joins work let’s start by setting up two tables for illustrative purposes. tbl_Students contains student names, student ids and their dates of registration. We will work on joining it with the tbl_Student_Registrations, which includes course ids, titles, semester, AND student id fields for students registered for respective classes:
For your convenience, I uploaded this data to SQLFiddle, which allows you to interactively test your SQL commands against our data.
Before proceeding any further, let’s observe that Ava Stuart has not registered for any course, while Ethical Implications of AI class has only one student registration, which does not appear to have a match in the Students table.
INNER JOIN
Any user of SQL is likely familiar with the most popular type of table joins – the INNER JOIN, sometimes referred to as an EQUI JOIN. Should we want to retrieve all records from both tables where there is a match, we can run the below query:
SELECT * FROM tbl_Students A INNER JOIN tbl_StudentRegistrations B ON A.Student_Id = B.Student_Id;
If we want to be more specific, we can list students taking any given course, let’s say with the term “Statistics” in its title, we can do so by utilizing wildcards and specifying a term of interest. Running the below query will retrieve any course with the term “Statistics”, also listing names of students registered for such a course.
SELECT A.First_Name || ' ' || A.Last_Name AS Student_Name, B.Title AS Course_Title FROM tbl_Students A INNER JOIN tbl_StudentRegistrations B ON A.Student_Id = B.Student_Id WHERE B.Title Like '%Statistics%';
FULL OUTER JOIN
This type of join will select all records from both tables even when there is no match available. Aforementioned Ava Stuart is listed in the results, even though she has not registered for any courses. Similarly, the Ethical Implications of AI class is also listed, albeit there appear to be no students registered for it:
SELECT * FROM tbl_Students A FULL OUTER JOIN tbl_StudentRegistrations B ON A.Student_Id = B.Student_Id;
Alternatively, should we want to ONLY show records from both tables when no match was found we could use the IS NULL clause. Now it is much easier to locate records that don’t have a match in one of the two tables referenced:
SELECT * FROM tbl_Students A FULL OUTER JOIN tbl_StudentRegistrations B ON A.Student_Id = B.Student_Id WHERE A.Student_Id IS NULL OR B.Student_ID IS NULL;
LEFT (Outer) JOIN – Inclusive
LEFT JOIN will help us see all records from the Students table, plus corresponding records from the Student Registrations table, even when no match was found:
SELECT * FROM tbl_Students A LEFT JOIN tbl_StudentRegistrations B ON A.Student_Id = B.Student_Id;
LEFT (Outer) JOIN – Exclusive
Should we want to look at the records from the Students table where there was no match in the Students Registrations table and quickly identify Ava, we would make the below revision to our code:
SELECT * FROM tbl_Students A LEFT JOIN tbl_StudentRegistrations B ON A.Student_Id = B.Student_Id WHERE B.Student_Id IS NULL;
NOTE that the RIGHT JOIN is the exact opposite of the LEFT JOIN in terms of the order of tables listed in the query, running the code below will yield the same results:
SELECT * FROM tbl_StudentRegistrations A RIGHT JOIN tbl_Students B ON A.Student_Id = B.Student_Id WHERE A.Student_Id IS NULL;
SELF JOIN
There are instances when we need to join a table to itself. For example, to identify students (Daniela and Ava) who were added to our system on the same date, we would run this query:
SELECT A.First_Name || ' ' || A.Last_Name AS Student_Name, B.Registration_Date FROM tbl_Students A, tbl_Students B WHERE A.Student_Id <> B.Student_Id AND A.Registration_Date = B.Registration_Date ORDER BY Student_Name;
UNION
UNION command combines records from both tables and we can utilized it to create a unique and ordered list of Students and their courses:
SELECT Student_ID, First_Name AS COURSE_NAME_OR_STUDENT_NAME FROM tbl_Students A UNION SELECT Student_Id, Title FROM tbl_StudentRegistrations B Order BY Student_ID;
CROSS JOIN (aka Cartisean)
Selecting all theoretically possible combinations of combined records from both tables can be done with the help of CROSS JOIN. This type of join will create such combinations even if they don’t exist in our records. As such, it can be extremely helpful in circumstances when we want to create an extensive deduped list of records that we would join to other tables or populate over time:
SELECT * FROM tbl_Students A, tbl_StudentRegistrations B;
Which types of table joins have you encountered out in the wild? Which ones are your favorite or go-to ones?