How to combine multiple column values into 1? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to combine multiple column values into 1? (/thread-37934.html) Pages:
1
2
|
How to combine multiple column values into 1? - cubangt - Aug-11-2022 So ive looked around and found some steps on how to accomplish this, but seem like alot of code for this, maybe im not searching for the right thing, maybe it goes by another name, but i am using pandas and working with CSV and excel files in other scripts.. So here is what i have, just looking for suggestions on the proper or right coding to use to accomplish. So i have a CSV file that has 4 columns(legit columns) Date/Time/User/Message BUT my problem is that if the "Message" string value has their own comma in the value, then the CSV ends up with additional columns.. which then prevents the data from being imported correctly, unless we consolidate those into the main "message" column.. So what im trying to do is save ourselves the manual step of copying over those rare instances of extra columns into the main message column. Say the CSV has 3000 rows.. and MAYBE about 20 rows have those extra columns.. sometimes the row may only have 1 extra column and other times there may be 5 extra columns for a row.. How can i run a script against this file to check for the extra columns, if found, then copy those column values into the main message column so that we have 1 message string value? RE: How to combine multiple column values into 1? - deanhystad - Aug-11-2022 Please provide samples of the data with and without the extra commas. RE: How to combine multiple column values into 1? - cubangt - Aug-11-2022 yea im working on finding a file that has not been cleaned up yet so i can upload. RE: How to combine multiple column values into 1? - cubangt - Aug-11-2022 ok here is a small sample This is literally a real example on how we get he file because there are extra commas in the message value which then throws everything off And there is one row you will see has the string value in the first column which is the date column and in those cases it seems to happen when the message column has a huge paragraph worth of data, it gets places into other columns instead of just new columns. RE: How to combine multiple column values into 1? - deanhystad - Aug-11-2022 That is not a csv file. Do you have any control over how the file is generated? RE: How to combine multiple column values into 1? - cubangt - Aug-11-2022 How do you mean its not a CSV file? When i open it in notepad this is what i get... Date,Time,User,Message,,,, 5/22/2022,8:44 AM,Don,Oh no,,,, 5/22/2022,8:43 AM,Jenn,Did i tell you my mom, dad, nephew, my sister's husband, and i think my brother have covid 5/22/2022,8:42 AM,Jenn,A little sore,,,, 5/21/2022,10:11 PM,Don,Ok,,,, 5/21/2022,10:11 PM,Jenn,I will talk to you tomorrow... Yes it's in Hulu.... With Jessica Beal,,,, 5/21/2022,10:10 PM,Don,Candy?,,,, 5/21/2022,10:10 PM,Jenn,That's good,,,, And I've also been told you are with everyone is praying her husband doesn't find out,,,,,,, 5/11/2022,7:29 PM,Jenn,Buttttttt,,,, And unfortunately i do not have access to the generation of this file, it comes from a 3rd party and I'm just trying to clean it up as best as possible before pulling the data into our side of things. RE: How to combine multiple column values into 1? - bowlofred - Aug-11-2022 If the extra commas can only be in the message column, then split on comma to get the first columns, then rsplit on comma to get the last columns. # name, age, notes, zip_code table = '''Susan,27,works with HR on Zoom calls,02134 Roger,41,Gets coffee, bagels, and sodas for all the meetings,90210 ''' for row in table.splitlines(): name, age, rest = row.split(",", maxsplit=2) notes, zip_code = rest.rsplit(",", maxsplit = 1) print(f"Name: {name}. Notes: {notes}")
RE: How to combine multiple column values into 1? - cubangt - Aug-11-2022 So i tried to follow your example with my file as the source and get an error: import csv with open("sample.csv", "r") as file_in: dataReader = csv.reader(file_in) for row in dataReader.splitlines(): date, time, user, rest = row.split(",", maxsplit=2) message = rest.rsplit(",", maxsplit = 1) print(f"Date: {date}. Message: {message}")
RE: How to combine multiple column values into 1? - deanhystad - Aug-11-2022 This is not a csv format file. Do not use csv reader. import re import pandas as pd date_pattern = re.compile("\d+/\d+/\d+") lines = [] with open("Sample.csv", "r") as f: # Get column headers columns = next(f).rstrip(",\n").split(",") for line in f: line = line.rstrip(",\n") # Check if line starts with date, time, if re.match(date_pattern, line) # This is a new row. Split into columns row = line.split(",", maxsplit=len(columns) - 1) lines.append(row) else: # This is a continuation of previous message. row = lines[-1] row[-1] = f"{row[-1]}\n{line}" df = pd.DataFrame(lines, columns=columns) print(df) RE: How to combine multiple column values into 1? - cubangt - Aug-11-2022 ok, so that i can understand, how are you identifying that is not a true csv format file? |