In practice, it is very rare to have an SQL query involving a single table. We may need to merge multiple tables by rows (records) or columns (fields) to get the desired result. In this article, we’ll discuss the operators/commands in SQL that enable use to merge tables by rows or columns. Multiple tables can be merged by columns in SQL using joins. Joins merge two tables based on the specified columns (generally, the primary key of one table and a foreign key of the other). Below is the generic syntax of SQL joins. In the above syntax, table_1 and table_2 are the two tables with the key column (matching column in both the tables), id. We use the keyword USING only when the key column has the same name in both the tables. Otherwise, we need to explicitly mention the key columns of both the tables as shown below. In the above syntax, t1 is an alias of table_1 and t2 is of table_2. When the names of the key columns are not same in both the tables, we need to match them using the ON keyword as shown above. We’ll now discuss a few important joins in SQL. Inner join merges two tables by columns and returns only the matching records (based on the specified columns) in both the tables. In the below query result, we can see that only the records with common id in both left_table and right_table are returned. Or Left join merges two tables by columns and returns all the records in the left table but only the matching records (based on the specified columns) from the right table. In the below query result, we can see the records with common id in both the tables along with all the records of the left_table. Records in the right_table with no matching id in the left_table have NULL. Or Right join merges two tables by columns and returns all the records in the right table but only the matching records (based on the specified columns) from the left table. In the below query result, we can see the records with common id in both the tables along with all the records of the right_table. Records in the left_table with no matching id in the right_table have NULL. Or Full join can be considered as a combination of left and right joins. Full join merges two tables by columns and returns all the records in both the left and right tables. In the below query result, we can see that all the records of both the tables are returned. Records with no matching id in the other table have NULL. Or Full Join (Image by author)Cross JoinCross join returns the cartesian product of two tables. Cartesian product of two sets A = {1, 2}, B = {3, 4} is A x B = {(1, 3), (1, 4), (2, 3), (2, 4)}. We need not specify a key column in cross joins. Cross Join (Image by author)Semi JoinSemi join is technically not an SQL join but works like a join. Semi join returns the matching records in the left table based on a key column in the right table. Semi join doesn’t include the columns of the right table in the query result. In the below example, we want to return the records from the left_table with matching id in the right_table. In other words, we want the records in the left_table whose id is present in the right_table. Semi Join (Image by author)Anti JoinAnti join is also technically not an SQL join but works like a join. Anti join returns the non-matching records in the left table based on a key column in the right table. Anti join also doesn’t include the columns of the right table in the query result. In the below example, we want to return the records from the left_table whose id doesn’t match with the id of the right_table. In other words, we want the records in the left_table whose id is not present in the right_table. Anti Join (Image by author)Self JoinSelf join enables us to join a table with itself. In the below query, we need to find the records with the same left value. For this, we have joined the table with itself and filtered the records with same left value but different id. Self join (Image by author)Union merges two tables by rows, provided the data types of the columns of one table matches with that of the other. We cannot merge a table having column data types as integer and text with a table having column data types as text and integer. However, we can merge two tables even if the column names of one table doesn’t match with that of the other. Union returns only the unique records of both the tables. Union (Image by author)Union AllSimilar to Union, Union All also merges tables by rows. Unlike Union, Union All retains the duplicate records of both the tables. In the below query result, we have merged the id of left_table and right_table. We can see a few duplicates in the result. Union All (Image by author)IntersectIntersect returns the common records of both the tables. In the below query result, we can see the common ids of left_table and right_table. Intersect (Image by author)EXCEPTExcept returns the records from the first table (left table) which are not present in the second table (right table). In the below query result, we can see the ids of left_table which aren’t present in the right_table. Except (Image by author)We’ll use the dvd_rental database downloaded from here and restore it. Below is the documentation to restore a database in PostgreSQL. 1. Top 5 frequent rentersIn this example, we need to find the top 5 customers who rented the most. For this, we’ll
In this example, we’ll use common table expressions (CTE). With CTEs, we can create temporary table that exist for a particular query. Below is the official Postgres documentation on CTEs. In this example, we need to find out top and bottom 5 customers who generated the most revenue. For this, we’ll 1. Create a CTE named revenue_per_customer by
2. Select top 5 customers by revenue from the above CTE by
3. Select bottom 5 customers by revenue from the above CTE by
4. Merging the above two results using UNION. Image by authorWe can also get the above query result using window functions. Below is the official Postgres documentation on window functions. To find out the top and bottom 5 customers who generated the most revenue using window functions, we’ll 1. Create a CTE named total_amt_rank by
2. Select top 5 customers by revenue by selecting the customers whose total_amount_rank is BETWEEN 1 and 5 from the above CTE. 3. Select bottom 5 customers by revenue from the above CTE by
4. Merging the above two results using UNION. Image by authorIn this example, we need to find the top 5 countries with the highest rentals. For this, we’ll
There are a few addresses and cities with no customers. Using inner join omits such records. In the below query, we’ll look at how the result will include addresses without customers on using left join. There are a few cities and addresses without any customers (these may be store addresses). Using inner joins would have omitted them from the results as there are no matching entries in the other table. For example, a city named London in Canada has no matching city_id in the address table. Using inner join would have omitted London in Canada from the result. Similarly, four addresses in Canada and Australia have no matching address_id in the customer table. Image by authorIn this example, we’ll find the countries with no customers by 1. Creating a subquery to find the countries with at least one customer by
2. Selecting country from country table where country_id is not present in the country_id of the above subquery. Image by authorIn the above example, we saw that Australia has no customers. In this example, we’ll see are there any stores in Australia by
Left join ensures that countries with no cities and cities with no stores are also included in the query result. Image by authorThere is one store in Australia. In fact, there are just two stores in the whole database. We’ll view them using the below query. Image by authorIn this example, we’ll see if there are any languages with no films by
We see a few languages with no films in the database. We’ll make sure that it’s not an error by selecting the films with language_id in (2,3,4,5,6) from the film table. The query result should return no records. In this example, we’ll find the number of rentals per film category in India by joining the required tables as discussed in the earlier examples and
In this example, we‘ll find the films with a single actor by
In this example, we’ll find the number of films of an actor by film category by
In the above example, we found the number of films of an actor by film category. In this example, we’ll find the popular categories of an actor (i.e. the categories in which an actor has the most films) by
This brings this article to an end. We’ve discussed ways of merging tables by rows or columns using SQL along with a few examples using the dvd_rental database. These are the fundamental concepts that are used in almost every query we write in SQL. We may not frequently use a few of them in practice, but knowing them is necessary. |