Python Forum

Full Version: Read csv file, parse data, and store in a dictionary
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a file that contains songs recently played by a radio station, the artist, and time played in this format: "November 4, 2019 8:02 PM","Wagon Wheel","Darius Rucker". I am trying to store the content of this file in string variable playlist_csv, use splitlines() to store records in variable lines, and then iterate through the lines to store data in a dictionary. The key should be a datetime object of the timestamp, and the value should be a tuple of song and artist: {datetime_key: (song, artist)}

This is what I have for code so far:
# read the file and store content in string variable playlist_csv
with open('playlist.txt', 'r') as csv_file:
    playlist_csv = csv_file.read().replace('\n', '')
    # use splitlines() method to store records in variable lines (it is list)
    split_playlist = playlist_csv.splitlines()
    # iterate through lines to store data in playlist_dict dictionary
    playlist_dict = {}
    for l in csv.reader(split_playlist, quotechar='"', delimiter=',',
       quoting=csv.QUOTE_ALL, skipinitialspace=True):
       dt=datetime.strptime(l[0], '%B %d, %Y %I:%M %p')
       playlist_dict[l[dt]].append(dt)
print(playlist_dict)
However, I keep running into errors when trying to store this data in a dictionary (specifically "'datetime.datetime' object is not subscriptable" and "list indices must be integers or slices" when modifying the code). Desired output looks like: {datetime.datetime(2019, 11, 4, 20, 2): ('Wagon Wheel', 'Darius Rucker'),...}

I appreciate any help!
Try playlist_dict[dt] = l[1:] perhaps.
If you're sure, that you have only 3 columns everywhere, you can use item unpacking.

with open('playlist.txt', 'r') as csv_file:
    playlist_dict = {}
    reader = csv.reader(
        csv_file, quotechar='"', delimiter=',',
        quoting=csv.QUOTE_ALL, skipinitialspace=True
    )
    for timestamp, song, artist in reader:
       dt = datetime.strptime(timestamp, '%B %d, %Y %I:%M %p')
       playlist_dict[dt].append((song, artist))


print(playlist_dict)
You can make it shorter.
No use of splitlines, because the csv_reader does it indirect.

I corrected the assignment in #11 of your code.

If you want to assign a value to a key, it looks like this:
some_dict = {}
a_key = 'my_key'
some_value = 42
some_value = (1,2,3) # could be a tuple
some_value = [1,2,3] # could be a list
some_value = {'foo': 'bar'} # or  a dict
some_value = {1,2,3} # could be a set

# assignment 
some_dict[a_key] = some_value # in this case the name was last overwritten by a set
Keys could be only hashable objects. This means you could not use mutable mappings/sequences as key.
The datetime object is for example immutable. The values of a dict, don't need to be hashable.

And since Python 3.6 we've got the implementation detail, that dicts keeps the order.
Since Python 3.7 it's in the language specification and a guarantee.

If you test your code with older Python version, you'll get scrambled results.
Previously dicts didn't keep the order. In some versions they used an algorithm to scramble it.

If you want to write code for older Python versions, you have to know it.
In this case you can use collections.OrderedDict.
Thanks for helping here. I am sure I only have 3 columns everywhere, and I am also using python 3.6. However, when I run this code, I keep getting the error:
playlist_dict[dt].append((song, artist))
KeyError: datetime.datetime(2019, 11, 4, 20, 2)

Any idea what is causing this?

(Nov-26-2019, 07:44 AM)DeaD_EyE Wrote: [ -> ]If you're sure, that you have only 3 columns everywhere, you can use item unpacking.

with open('playlist.txt', 'r') as csv_file:
    playlist_dict = {}
    reader = csv.reader(
        csv_file, quotechar='"', delimiter=',',
        quoting=csv.QUOTE_ALL, skipinitialspace=True
    )
    for timestamp, song, artist in reader:
       dt = datetime.strptime(timestamp, '%B %d, %Y %I:%M %p')
       playlist_dict[dt].append((song, artist))


print(playlist_dict)
You can make it shorter.
No use of splitlines, because the csv_reader does it indirect.

I corrected the assignment in #11 of your code.

If you want to assign a value to a key, it looks like this:
some_dict = {}
a_key = 'my_key'
some_value = 42
some_value = (1,2,3) # could be a tuple
some_value = [1,2,3] # could be a list
some_value = {'foo': 'bar'} # or  a dict
some_value = {1,2,3} # could be a set

# assignment 
some_dict[a_key] = some_value # in this case the name was last overwritten by a set
Keys could be only hashable objects. This means you could not use mutable mappings/sequences as key.
The datetime object is for example immutable. The values of a dict, don't need to be hashable.

And since Python 3.6 we've got the implementation detail, that dicts keeps the order.
Since Python 3.7 it's in the language specification and a guarantee.

If you test your code with older Python version, you'll get scrambled results.
Previously dicts didn't keep the order. In some versions they used an algorithm to scramble it.

If you want to write code for older Python versions, you have to know it.
In this case you can use collections.OrderedDict.
My mistake.

In line number 9:
playlist_dict[dt].append((song, artist))
# the key dt does not exist
# no list behind
to...

playlist_dict[dt] = (song, artist)
If you expect songs/artist with the same date, then the value should be a list.
from collections import defaultdict

playlist_dict = defaultdict(list)
# not existing keys, return an empty list
# which could be modified

playlist_dict['this key does not exist'].append(42)  # <-- returns an empty list, which is already assigned to the key
# now the key 'this key does not exist' exists.
playlist_dict['this key does not exist'].append(43) # <-- adding next object to the existing list
So you can decide. Just assign a tuple with song/artist to the key.
If there is an song/artist with the same date, the old one is just overwritten.
If you expect this, the easiest way is to use a defaultdict.



import csv
from collections import defaultdict


with open('playlist.txt', 'r') as csv_file:
    playlist_dict = defaultdict(list)
    reader = csv.reader(
        csv_file, quotechar='"', delimiter=',',
        quoting=csv.QUOTE_ALL, skipinitialspace=True
    )
    for timestamp, song, artist in reader:
       dt = datetime.strptime(timestamp, '%B %d, %Y %I:%M %p')
       playlist_dict[dt].append((song, artist))
 
 
print(playlist_dict)