This article will show you how to read CSV (Comma Separated Values) files in the Python programming language, with examples.
What is CSV (Comma Separated Values)
CSV (Comma Separated Values) is a format for storing and transmitting data, usually in a text file, in which the individual values are separated by a comma (,). Each row in a CSV file represents an individual record containing multiple comma separated values.
A CSV file looks like this:
name,age,favourite colour Fred,54,green Mary,31,pink Steve,12,orange
Above, the CSV describes three people, with information about their name, age, and favourite colour. Each row contains a record for each person, with the values separated by commas. The first row in a CSV file usually contains the header, which describes the purpose of the values.
The order of the values in each record is important, as it’s the position of each value that denotes what it is — matching a header in the same position.
Basically, think of it like a spreadsheet, where the columns are separated by commas, and the rows are separated by newlines.
Parsing CSV Data Manually
As CSV data is just text, it is possible to parse it manually by splitting the string variable containing the CSV data into a list of records, and then splitting each record into a list of values:
myCSV = "Sandra,26,red\nTim,19,yellow" myRows = myCSV.split("\n") myData = [] for row in myRows: myData.append(row.split(","))
Above, some CSV data is defined using commas and newlines (\n) – it could also be read from a file. This CSV data is then split into rows and stored in the variable myRows. Each row in myRows is then iterated over using a for statement, in which each row is split at the comma, the resulting list added to the final variable myData, which contains the final, parsed CSV information as a multidimensional array.
Reading CSV Files Using the csv Python Library
The above is a bit tedious, and would quickly become hard to manage when dealing with lots of data from multiple files. Python includes the csv library specifically for generating, saving, and reading CSV data and files.
The below code examples will read from an example CSV file called people.csv:
import csv with open('people.csv', 'r') as file: reader = csv.reader(file) for row in reader: print(row)
The above code opens the people.csv file in read (‘r’) mode, and then initialises the csv reader(). The reader provides an iterator which is then used to print each row in the file, resulting in the following output:
['name', 'age', 'favourite colour'] ['Fred', '54', 'green'] ['Mary', '31', 'pink'] ['Steve', '12', 'orange']
Delimiters, Initial Spaces, and Quotes
Some CSV files will use a different delimiter, so rather than a comma, they may use a pipe (|), tab (\t), semicolon (;), or other character to separate values. If something other than a comma is used as a delimiter, the delimiter option needs to be provided so that the csv library knows where to split the values. Below, a semicolon is used as the delimiter:
import csv with open('people.csv', 'r') as file: reader = csv.reader(file, delimiter = ';') for row in reader: print(row)
Similarly, some CSV files place a space after the delimeter, and some don’t. If there are spaces after the delimiter that you don’t want included in the final data imported from CSV, supply the skipinitialspace option:
import csv with open('people.csv', 'r') as file: reader = csv.reader(file, skipinitialspace=True) for row in reader: print(row)
Some CSV files may include quotes on all contained data – these can be removed by supplying the quoting option:
import csv with open('people.csv', 'r') as file: reader = csv.reader(file, quoting=csv.QUOTE_ALL) for row in reader: print(row)