Fun with Python: Reading CSV Files
CSV files are the most commonly used types of flat files in today’s data-intensive world. Whoever has worked with huge amounts of data knows that typical CSV files could very easily range from a few hundred MBs to a few GBs in size. For files in the latter range, standard text editors—notepad, wordpad, notepad++, etc.—aren’t even able to load these files. Therefore, we might have no choice but to process such files programmatically.
Because of the ubiquity of CSV files, standard Python installations come with a built-in module called, unsurprisingly, the csv module. From the API documentation of this module,
“The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.”
In this article, I’ll show you how to use this module to read a csv file line-by-line.
Let’s begin first with a csv file, say, test_file.csv as follows:
- We’ll start our Python code by importing the csv module
- Now, we will create a reader object of the reader class. This object returns each line from the file as a list of strings.
fp = open('test_file.csv', 'r') reader = csv.reader(fp) type(reader)
- To avoid loading the entire file into memory, we will iterate over this reader object using a for loop. (Since this only an example, I’ll loop over only the first 10 lines in the file.)
for i, row in enumerate(reader): print('Row {rownum} is {data}'.format(rownum=i+1, data=str(row))) # Break after printing the first 10 lines if i+1 > 9: break if not fp.closed: fp.close()
Using the with statement:
The above can be done with fewer lines of code using the with statement as follows:
with open('test_file.csv', 'r') as fp: reader = csv.reader(fp) for i, row in enumerate(reader): print('Row {rownum} is {data}'.format(rownum=1+1, data=str(row))) # Exit the loop after printing the first 10 lines if i+1 > 9: break
Files with different delimiters:
It’s very common to find flat files with delimiters other than a comma. The most widespread delimiters are pipes(|) and tabs(\t) and the file extensions used for such files are psv and tsv, respectively. Python’s csv module handles such files with the same ease.
If you have a pipe-separated psv file, then you could write your Python code as follows:
with open('test_file.psv', 'r') as fp: reader = csv.reader(fp, delimiter='|') for i, row in enumerate(reader): print('Row {rownum} is {data}'.format(rownum=1+1, data=str(row))) # Exit the loop after printing the first 10 lines if i+1 > 9: break
Note that the delimiter—pipe—is specified as an argument to the reader object.
Reference:
https://docs.python.org/2/library/csv.html
Yorumlar
Yorum Gönder