Matching data from two different files with one or more common keys can be done without loading files into the database by
using the sort and join command from the Unix. Join is the command which joins data from different files but before you join the two files
it needs to be in sorted order, that’s where we going to use sort.
Here is the example with two files having data delimited by the pipe
Data from employee.dat with key employee_id, Name, and department_id delimited by the pipe
Employee.dat
———————
4|John | 10
1|Monica|4
12|Louis|5
20|Peter|2
21|David|3
13|Barbara|6
Department file with deparment_id and its name delimited by the pipe.
Dept.dat
————
1|HR
2| Manufacturing
3| Engineering
4 |Marketing
5| Sales
6| Information technology
7| Security
Our aim is to put the department name in front of each person. As deparment_id is common in both files, we going to join by department_id. As mentioned above, the field we going to use for joining should be in sorted order. Department file it is already in sorted but in Employee.dat is not, so we going to sort it first.
Here is command for sorting Employee.dat file
sort -t “|” -k3 Employee.dat > Employee_sort.dat
Here -t “|” indicates that the fields are delimited by the pipe
-k3 for sorting file by the third field which is deparment_id
> Employee_sort.dat for sending output to Employee_sort.dat
Content of Employee_sort.dat file
20|Peter|2
21|David|3
1|Monica|4
12|Louis|5
13|Barbara|6
4|John | 7
Now we have both files are in sorted order by deparment_id, we can use the join command to put employee and department name
join -t “|” -1 3 -2 1 Employee_sort.dat Dept.dat
-t “| ” indicated files are delimited by the pipe
-1 3 for the third column of file 1 i.e deparment_id from Employee_sort.dat file
-2 1 for the first column of file 2 i.e. deparment_id from Dept.dat file
By using the above command, we get the following output.
2|20|Peter| Manufacturing
3|21|David| Engineering
4|1|Monica| Marketing
5|12|Louis| Sales
6|13|Barbara| Information technology
If you want to get everything from file 2 and corresponding entries in file 1
You can also use -a and -v options
try following commands
join -t “|” -1 3 -2 1 -v2 Employee_sort.dat Dept.dat
join -t “|” -1 3 -2 1 -a2 Employee_sort.dat Dept.dat