Joining CSV data by columns
Joining data in different tables with a common column is a frequent operation in SQL. This operation combines columns from several tables to create a single table. Depending on how you want the component tables combined, there are several types of commonly used joins: INNER, LEFT_OUTER, RIGHT_OUTER and FULL_OUTER.
An inner join uses a column (called the join column) to join two (or more) tables where the column has the same value in each row of each table.
The example below shows two tables employees and departments which can be joined with an inner join using the value of the DepartmentID column (the join column). The result is a table of employees and their respective departments.
The result is a table of employees and their respective departments as shown below. The columns LastName and DepartmentID are from the left table (employees), while the columns r_DepartmentID and r_DepartmentName are from the right table (which is departments).
- DepartmentID has the same value since it is the join column.
- Rows with no value (NULL) for DepartmentID are dropped in the resulting table.
Using ArgonStudio, you can do an inner join of two CSV tables right in your browser without having to import the data into an SQL database first. For the sake of this example, we will use the employees and the departments table shown above.
- Import the first CSV into the grid editor. We will call this the left table.
- Click the menu-bars icon and select Join CSV from the menu.
- In the Join CSV dialog, click the folder-open button (1) which brings up the file selection box. Navigate to the second CSV location and select it. We will call this the right table. The columns from this table will appear under the Target Column dropdown.
- Select the join column DepartmentID in the Target Column dropdown (2). Also ensure that Join Type is set to Inner Join (3).
- Click the dropdown under Source Column and choose DepartmentID as the join column (1). Finally, click Join Columns to perform the join (2).
- We have now joined two CSV files on a common column using an inner join.
Left outer join
A left outer join includes all rows from the left table and rows matching the join column from the right table.
In our example, it includes all rows from the employees table and nulls for right-table columns for non-matching rows.
Left outer join in ArgonStudio with CSV data
Left outer join in ArgonStudio is essentially similar to inner join with the difference that you choose Left Join under Join Type.
- Choose Left Join under Join Type:
Right outer join
A right outer join includes all rows from the right table and rows matching the join column from the left table.
As we can see above, all rows from the departments table are included with nulls for left-table columns for non-matching rows.
A full join is a combination which includes all rows from both the left and the right tables matched using the value of the join column, and also includes non-matching rows with nulls for columns from the other table
In the above picture, we can see that all rows from both the employees table as well as the departments table are included with nulls for missing columns.
Joining multiple tables of CSV data is a very useful operation to combine data from different sources. While you could always import the CSV into an SQL database to perform a join, the process is tedious. ArgonStudio instead offers you the ability to perform a join in your browser using just the CSV files. We covered the process of how to do that in this article. It basically involves the following:
- Import the first CSV data into the grid editor.
- Invoke the Join CSV dialog to perform a join.
- Import the second CSV data into the dialog and choose the join column in both the tables.
- Pick the type of join to perform: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL JOIN.
- Complete the join and you have easily joined CSV data from different sources.