Editing spreadsheet/csv - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Editing spreadsheet/csv (/thread-40646.html) |
Editing spreadsheet/csv - BSDevo - Aug-31-2023 Hi, sorry if something wrong im new its my first post. Im learning python but english is not my native language and i dont know how to properly put sentence to search so i thought i might ask a question in this forum. I have spreadhseet table with thousands of records and looking to edit it using python. My column consist of " city, state,zip" i manually added column to add "city,state" and "zip" but im looking to make it easier. I would rather tell python to read specific column and separate zips and once zip separated - read the first 3 digits of it and if zip contains XXX number - should be writen "YYYY" (XXX and YYY is just generic for example purpose ) and this need to be done by creating another column after "zip" . Is this possible ? If yes - i would be appreciated if pythonistas could guide me to the right direction to create this code or an example. As i have strugle to undserstand - example and how to would be even more appretiated. P.s. ill be using streamlit, pandas as well and i think few more extras so if things are combined together - im more than hhappy to use them. Also im using Python version 3.11 Thank You and have a Great Day ! RE: Editing spreadsheet/csv - deanhystad - Aug-31-2023 You should post a sample spreadsheet table (19 rows or less) and post a mock-up of what you want to produce. What format is your "spreadsheet table". Is this an excel spreadsheet? This sounds like a good fit for Pandas. You can load in your entire table into a pandas DataFrame, then you can extract rows from the dataframe based on your conditions. RE: Editing spreadsheet/csv - BSDevo - Aug-31-2023 (Aug-31-2023, 05:13 PM)deanhystad Wrote: You should post a sample spreadsheet table (19 rows or less) and post a mock-up of what you want to produce. What format is your "spreadsheet table". Is this an excel spreadsheet? Yes, excel spreadsheet. I dont know how to post/show sample of spreadsheet. Left of the red color - BEFORE RIght after black - AFTER Example I will create some sort of script where i will note if 780 - Laredo, if 333 - Atlanta , if 498 - Green Bay, etc. But most important is to divide column into few other and write to it. I hope its a bit clearer. Thank You. RE: Editing spreadsheet/csv - deanhystad - Aug-31-2023 What do you see if you run this: import pandas as pd df = pd.read_excel("spreadshet_filename.xlsx").head(10) # Change to name of your spreadsheet file. print(df)Post results if they don't contain anything sensitive. RE: Editing spreadsheet/csv - BSDevo - Aug-31-2023 (Aug-31-2023, 06:48 PM)deanhystad Wrote: What do you see if you run this: I have not run anythiong yet , im trying to understand how to write this code and where can i find info about it as at the moment im maping zip codes inside another excel sheet so i can know what zip belongs to which Area. But i will start writing the code tomorrow but before i start googling in the morning i woanted to get some links today to read and start experimenting tomorrow. Multitasking RE: Editing spreadsheet/csv - deanhystad - Aug-31-2023 Starting with an excel spreadsheet that looks like this: I read the spreadsheet into a DataFrame and split the location column into City and State_ZIP columns using ", " as the delimiter. After that I split the State_ZIP column into state and zip columns using " " as the delimiter.import pandas as pd df = pd.read_excel("test.xlsx", header=0) df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True) df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True) print(df) You can generate the Area column values in python.import pandas as pd df = pd.read_excel("test.xlsx", header=0) df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True) df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True) area_zips = {"333": "Atlanta"} cities = df["City"].values codes = df["Zip"].values df["Area"] = [area_zips.get(code[:3], city) for code, city in zip(codes, cities)] print(df) Another way is to use a map function. If you have a dictionary of Area zip codes with full coverage, you can use the dictionary.import pandas as pd df = pd.read_excel("test.xlsx", header=0) df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True) df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True) area_zips = {"333": "Atlanta", "600": "Chicago", "498": "Marquette", "780": "Laredo"} df["zip_prefix"] = df["Zip"].str[:3] df["Area"] = df["zip_prefix"].map(area_zips) print(df) map can also call a function.import pandas as pd def area_zips(zipcode): zips = {"333": "Atlanta", "600": "Chicago", "498": "Marquette", "780": "Laredo"} return zips[zipcode[:3]] df = pd.read_excel("test.xlsx", header=0) df[["City", "State Zip"]] = df["Location"].str.split(", ", expand=True) df[["State", "Zip"]] = df["State Zip"].str.split(" ", expand=True) df["Area"] = df["Zip"].map(area_zips) df = df[["City", "State", "Zip", "Area"]] print(df)None of these methods (using python on values, map dictionary, map function) are particularly fast, but should be fine as long as you aren't processing hundreds of millions of locations. When you are all done you either drop the unwanted rows or extract the rows you want to keep. df = df[["City", "State", "Zip", "Area"]]
RE: Editing spreadsheet/csv - BSDevo - Sep-01-2023 This is awesome ! Thank You very much. I dont care if its slow as long as it works for now and no, there wont be hundreds of millions of location but good to know that will be to slow for such a huge DB. |