Sunday, 20 October 2013

Combine two CSV files into one using the Command Line (batch file)

Introduction
Batch files are still useful! Often we need to perform some routine tasks over and over. There are macros, scripting languages (AutoIt) and other solutions - but what about using batch files? In this article I will explain how recently I needed to combine a couple of CSV files regulary.

IMPORTANT: This article is only a guide, the steps in it are specific to my requirements but I hope the principals maybe of use to you!


Scenario
The BI (Business Intelligence) software we use at work output files as CSV (Comma Separated Values). This is great for opening in Excel and using pivot tables with. However, the BI software outputs only one year of budget figures at once. If you want to compare figures from year to year you must run a report for each year you need. You can then add these files together to make one big Excel worksheet. This procedure is not difficult, you can copy/paste the data from one CSV file to the other. However, wouldn't it be great if you could just run a batch file for this to be done for you, automatically?


Think and Plan
Before I did anything I had a good think about what I wanted to do and how I was going to do it. You need to understand each step in the process and think of a way to automate that step in the batch file.

I opened the CSV file in Notepad and took a look at it. The first three lines were title text telling me what the report is - this can be removed as it's not needed. Line four is the header, the column titles. This is important so I must keep it. This means I will need to find a way to delete the first three lines of text from a text file.

Imagine the second CSV file, this also has the first three lines of text to be deleted. But also line four, the header is not needed in the second CSV file because it will be added to the first CSV. This means I also need to delete not three but four lines from the second CSV file.

The last thing to do will be to add the two files together and to check everything is fine.

I decided to rename the CSV files to a.csv and b.csv. That way the batch file will just refer to those file names and in the future I will not have to keep editing the batch file to change the names again.


Delete lines from a text file
The MORE command outputs the contents of a text file. It has a switch /E that means it outputs everything but the first x number of lines. Using this we can create a new CSV file without the first three lines.

MORE /E +3 a.csv > tempa.csv
MORE /E +4 b.csv > tempb.csv

The above code worked for me. The first line removes the first three lines (rows) from the a.csv file. It creates a new file called tempa.csv

The second MORE command removes the first four lines from the b.csv file. A file called tempb.csv is created.


Combine two files
COPY tempa.csv + tempb.csv tempCombined.csv

The above command will copy the two files tempa.csv and tempb.csv into one new file tempCombined.csv.


Delete a blank line
There is a little problem with the tempCombined.csv file - a blank line appears in the middle between the sets of data. If you used the tempCombined.csv file in Excel you'd have a blank line which would show up when using pivot tables. We need to delete the blank line...

TYPE tempCombined.csv | FINDSTR /v "^$" > Combined.csv

The TYPE command outputs the file and the "|" pipes the output into the FINDSTR command. This looks for the string "^$", it means blank line. Output the result to Combined.csv, done!


The Finished Batch File
@echo off
echo.
echo Combine two CSV report files into one
echo.

echo Remove titles rows...
MORE /E +3 a.csv > tempa.csv
MORE /E +4 b.csv > tempb.csv
 
echo.
echo Add the CSV files together
COPY tempa.csv + tempb.csv tempCombined.csv
 
echo.
echo Delete blank rows
TYPE tempCombined.csv | FINDSTR /v "^$" > Combined.csv
 
echo.
echo Delete any previous temporary working files
del temp*.csv
 
echo.
echo Finished!

Copy and paste the above into Notepad and save as Combine2CSV.cmd.


Conclusion
I save my reports as a.csv and b.csv, double click the Combine2CSV.cmd file and it makes Combined.csv! Open that in Excel and it's ready to insert pivot tables. This saves me some time and hassle. I hope it will help you too or at least parts of this article may be of help in what you want to do.
Post a Comment