Python : CSV to Dictionary


This post is about writing a CSV reader which generates a dictionary from a csv file.The reader accepts path to csv file and python types corresponding to the CSV headers as input params. The output is a dictionary generated from the specified file. By the end of this post you will get a good understanding of lists, dictionary and zip in Python.


import csv
def read_csv(file, types):
    with open(file, 'r', encoding='utf-8-sig')
        rows = csv.reader(f)
        head = next(rows)
        records = []
        for row_num, row in enumerate(rows):
            try:
                record = dict(zip(head,
                             [func(value) for func, value in zip(types, row)]))
                records.append(record)
            except ValueError as ve:
                print('Ignored row {} - {} in {} due to {}'
                      .format(row_num, row, file, ve))

        return records

Calling above function..


import pprint

records = read_csv('stocks/stocks.csv',[str, str, int, float])
pprint.pprint(records)

Now let us dissect the above code line by line. The function first opens the file and pass the stream to csv.reader(). we are using encoding=’utf-8-sig’ to ignore BOM. The first line is header and its grabbed using next(). In order to grab the line number we are using enumerate(). We are using line numbers to show meaninful error when we fail to convert a type. Before disecting the next line let us understand how zip() works.


name = ['John', 'Bond', 'Gavin']
age = [33, 32, 23]
name_age = zip(name,age)
for name, age in name_age:
    print (name,age)

#output
John 33
Bond 32
Gavin 23


Zip combines two list into a single list, combining corresponding elements in lists as a tuple. So output is a list of tuples. So the output generated by zip looks similar to this.

[(John,33), (Bond,32), (Gavin,23)]

Now back to our code line:10 . Let us consider a snap shot of our input csv file as shown below.


Name,Date,Shares,Price
HPQ,7/11/2007,100,32.2
IBM,7/12/2007,50,91.9
GE,7/13/2007,150,83.44
CAT,7/14/2007,200,51.23
MSFT,7/15/2007,95,40.37
HPE,7/16/2007,50,65.1
AFL,7/17/2007,100,70.44

[func(value) for func, value in zip(types, row)] .

The zip(types, row) will generate following output for each row in rows.

[(str,'HPQ'), (str,'7/11/2007'), (int,100), (float,32.2)]

Its important to remember that str, int, float we are passing are actual types and not strings.

Now [func(value) for func, value in zip(types, row)]` will generate following output for each row.


['HPQ', '7/11/2007', 100, 32.2]   

So we generated a single list after proper type conversion. Its important that we handle exception here. Now we are applying zip on above list with headers we grabbed earlier as shown below.

record = dict(zip(head,[func(value) for func, value in zip(types, row)])). So this will generate output like this for each row.


[('Name','HPQ'), ('Date','7/11/2007'), ('Shares',100), ('Price',32.2)]   

Now we are converting this list to a dictionary, using dict() which will generate following output for a single row.

{‘Name’:’HPQ’, ‘Date’:’7/11/2007’, ‘Shares’:100), (‘Price’:32.2)}

In line 11 we are appending each dictionary corresponding to a row into a called records[].The final output for above csv file looks like this.


[{'Date': '7/11/2007', 'Name': 'HPQ', 'Price': 32.2, 'Shares': 100},
 {'Date': '7/12/2007', 'Name': 'IBM', 'Price': 91.9, 'Shares': 50},
 {'Date': '7/13/2007', 'Name': 'GE', 'Price': 83.44, 'Shares': 150},
 {'Date': '7/14/2007', 'Name': 'CAT', 'Price': 51.23, 'Shares': 200},
 {'Date': '7/15/2007', 'Name': 'MSFT', 'Price': 40.37, 'Shares': 95},
 {'Date': '7/16/2007', 'Name': 'HPE', 'Price': 65.1, 'Shares': 50},
 {'Date': '7/17/2007', 'Name': 'AFL', 'Price': 70.44, 'Shares': 100}]

So we succesfully convert a csv file to python dictionary.

Coding is fun enjoy…