Editing spreadsheet/csv - Printable Version

Editing spreadsheet/csv - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Editing spreadsheet/csv (/thread-40646.html)

Editing spreadsheet/csv - BSDevo - Aug-31-2023

Hi, sorry if something wrong im new its my first post.
Im learning python but english is not my native language and i dont know how to properly put sentence to search so i thought i might ask a question in this forum.

I have spreadhseet table with thousands of records and looking to edit it using python.
My column consist of " city, state,zip" i manually added column to add "city,state" and "zip" but im looking to make it easier. I would rather tell python to read specific column and separate zips and once zip separated - read the first 3 digits of it and if zip contains XXX number - should be writen "YYYY" (XXX and YYY is just generic for example purpose ) and this need to be done by creating another column after "zip" .

Is this possible ? If yes - i would be appreciated if pythonistas could guide me to the right direction to create this code or an example.
As i have strugle to undserstand - example and how to would be even more appretiated.

P.s. ill be using streamlit, pandas as well and i think few more extras so if things are combined together - im more than hhappy to use them.
Also im using Python version 3.11

Thank You and have a Great Day !

RE: Editing spreadsheet/csv - deanhystad - Aug-31-2023

You should post a sample spreadsheet table (19 rows or less) and post a mock-up of what you want to produce. What format is your "spreadsheet table". Is this an excel spreadsheet?

This sounds like a good fit for Pandas. You can load in your entire table into a pandas DataFrame, then you can extract rows from the dataframe based on your conditions.

RE: Editing spreadsheet/csv - BSDevo - Aug-31-2023

(Aug-31-2023, 05:13 PM)deanhystad Wrote: You should post a sample spreadsheet table (19 rows or less) and post a mock-up of what you want to produce. What format is your "spreadsheet table". Is this an excel spreadsheet?

This sounds like a good fit for Pandas. You can load in your entire table into a pandas DataFrame, then you can extract rows from the dataframe based on your conditions.

Yes, excel spreadsheet.
I dont know how to post/show sample of spreadsheet.
Left of the red color - BEFORE
RIght after black - AFTER
Example

I will create some sort of script where i will note if 780 - Laredo, if 333 - Atlanta , if 498 - Green Bay, etc. But most important is to divide column into few other and write to it.
I hope its a bit clearer.
Thank You.

RE: Editing spreadsheet/csv - deanhystad - Aug-31-2023

What do you see if you run this:

import pandas as pd

df = pd.read_excel("spreadshet_filename.xlsx").head(10)  # Change to name of your spreadsheet file.
print(df)

Post results if they don't contain anything sensitive.

RE: Editing spreadsheet/csv - BSDevo - Aug-31-2023

(Aug-31-2023, 06:48 PM)deanhystad Wrote: What do you see if you run this:
import pandas as pd

df = pd.read_excel("spreadshet_filename.xlsx").head(10)  # Change to name of your spreadsheet file.
print(df)
Post results if they don't contain anything sensitive.

I have not run anythiong yet , im trying to understand how to write this code and where can i find info about it as at the moment im maping zip codes inside another excel sheet so i can know what zip belongs to which Area.
But i will start writing the code tomorrow but before i start googling in the morning i woanted to get some links today to read and start experimenting tomorrow.
Multitasking Angel

RE: Editing spreadsheet/csv - deanhystad - Aug-31-2023

Starting with an excel spreadsheet that looks like this:

Output:Location
Laredo, TX 78045
Atlanta, GA 33333
Chicago, IL 60000
Escanaba, MI 49888
Pendergrass, GA 33345

I read the spreadsheet into a DataFrame and split the location column into City and State_ZIP columns using ", " as the delimiter. After that I split the State_ZIP column into state and zip columns using " " as the delimiter.

import pandas as pd

df = pd.read_excel("test.xlsx", header=0)
df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True)
df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True)
print(df)

Output:                Location         City State Zip State    Zip
0       Laredo, TX 78045       Laredo  TX 78045    TX  78045
1      Atlanta, GA 33333      Atlanta  GA 33333    GA  33333
2      Chicago, IL 60000      Chicago  IL 60000    IL  60000
3     Escanaba, MI 49888     Escanaba  MI 49888    MI  49888
4  Pendergrass, GA 33345  Pendergrass  GA 33345    GA  33345

You can generate the Area column values in python.

import pandas as pd

df = pd.read_excel("test.xlsx", header=0)
df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True)
df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True)

area_zips = {"333": "Atlanta"}
cities = df["City"].values
codes = df["Zip"].values
df["Area"] = [area_zips.get(code[:3], city) for code, city in zip(codes, cities)]
print(df)

Output:                Location         City State Zip State    Zip      Area
0       Laredo, TX 78045       Laredo  TX 78045    TX  78045    Laredo
1      Atlanta, GA 33333      Atlanta  GA 33333    GA  33333   Atlanta
2      Chicago, IL 60000      Chicago  IL 60000    IL  60000   Chicago
3     Escanaba, MI 49888     Escanaba  MI 49888    MI  49888  Escanaba
4  Pendergrass, GA 33345  Pendergrass  GA 33345    GA  33345   Atlanta

Another way is to use a map function. If you have a dictionary of Area zip codes with full coverage, you can use the dictionary.

import pandas as pd

df = pd.read_excel("test.xlsx", header=0)
df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True)
df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True)

area_zips = {"333": "Atlanta", "600": "Chicago", "498": "Marquette", "780": "Laredo"}
df["zip_prefix"] = df["Zip"].str[:3]
df["Area"] = df["zip_prefix"].map(area_zips)
print(df)

Output:                Location         City State Zip State    Zip zip_prefix       Area
0       Laredo, TX 78045       Laredo  TX 78045    TX  78045        780     Laredo
1      Atlanta, GA 33333      Atlanta  GA 33333    GA  33333        333    Atlanta
2      Chicago, IL 60000      Chicago  IL 60000    IL  60000        600    Chicago
3     Escanaba, MI 49888     Escanaba  MI 49888    MI  49888        498  Marquette
4  Pendergrass, GA 33345  Pendergrass  GA 33345    GA  33345        333    Atlanta

map can also call a function.

import pandas as pd


def area_zips(zipcode):
    zips = {"333": "Atlanta", "600": "Chicago", "498": "Marquette", "780": "Laredo"}
    return zips[zipcode[:3]]


df = pd.read_excel("test.xlsx", header=0)
df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True)
df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True)
df["Area"] = df["Zip"].map(area_zips)
df = df[["City", "State", "Zip", "Area"]]
print(df)

None of these methods (using python on values, map dictionary, map function) are particularly fast, but should be fine as long as you aren't processing hundreds of millions of locations.

When you are all done you either drop the unwanted rows or extract the rows you want to keep.

df = df[["City", "State", "Zip", "Area"]]

Output:          City State    Zip       Area
0       Laredo    TX  78045     Laredo
1      Atlanta    GA  33333    Atlanta
2      Chicago    IL  60000    Chicago
3     Escanaba    MI  49888  Marquette
4  Pendergrass    GA  33345    Atlanta

RE: Editing spreadsheet/csv - BSDevo - Sep-01-2023

This is awesome !
Thank You very much.
I dont care if its slow as long as it works for now and no, there wont be hundreds of millions of location but good to know that will be to slow for such a huge DB.