Python Forum

Full Version: what data structure to use?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I have a two-column, CSV file that lists amenities cities have:
ZIP;Amenity
51454;5
51454;6
53130;5
59437;6
63178;1
69029;6
69081;5
69290;5
71540;5
75101;6
75101;6
75101;5
75101;4
etc.


The first colum contains ZIP codes, and the second column contains an amenity for that town (eg. 1=church, 2=school, 3=restaurant, etc.)

[Image: image.png]

I need to write a loop that will fill an array:
1. If it doesn't yet exist, add the zip code as a key in that array
2. For that ZIP, if it doesn't exist, add the key for that kind of amenity (ie. 1, 2, 3, etc.), and increment the value (eg. 1=3 means that the town now has three churches).

What kind of data structure do you think is best for that task?

Thank you.
Not a CSV file. A delimited file, but not a COMMA separated values file.

What python data types are you familiar with, or maybe what python data types are you allowed to use? There are many ways to solve this problem.
If it were me I'd use a class.

class Zip_Code_Entry :
	def __init__ (self, zip_code) :
		self.zip_code = zip_code
		self.amenities = {}

	def add_amenity (self, amenity: str) :
		if amenity in self.amenities :
			self.amenities [amenity] += 1
		else :
			self.amenities [amenity] = 1

	def show_amenities (self) :
		for key, value in self.amenities.items () :
			print (f'There are {value} {key} in {self.zip_code}.')

first_one = Zip_Code_Entry ('12345')
first_one.add_amenity ('Churches')
first_one.show_amenities ()
Thank you.
You have number of options
one way is to have dict of dicts. Outer dict will have zip as keys and dicts as values. Eacj inner dict will have amenity code as key and number of said amenity as value. you will iterate over data and populate the data structure
You can do this in a number of different ways.

for you can use collections.defaultdict, twice.

from collections import defaultdict
import csv

mydata = defaultdict(lambda: defaultdict(int))
with open('sample.txt') as f:
    rdr = csv.reader(f, delimiter=';')
    next(rdr) # skip header row
    for zipcode, amenity in rdr:
        mydata[zipcode][amenity] += 1
print(mydata)
or just once
from collections import Counter
from collections import defaultdict
mydata = defaultdict(dict)
with open('sample.txt') as f:
    for key, value in Counter(f).items():
        zipcode, amenity = key.strip().split(';')
        mydata[zipcode][amenity] = value
print(mydata)
Another way is to use pandas

import pandas as pd 
df = pd.read_csv('sample.txt', sep=';')
df = df.groupby(['ZIP', 'Amenity'], as_index=False).size()
print(df)