How to combine multiple column values into 1?

cubangt · Aug-11-2022, 02:00 PM

So ive looked around and found some steps on how to accomplish this, but seem like alot of code for this, maybe im not searching for the right thing, maybe it goes by another name, but i am using pandas and working with CSV and excel files in other scripts..

So here is what i have, just looking for suggestions on the proper or right coding to use to accomplish.

So i have a CSV file that has 4 columns(legit columns)
Date/Time/User/Message

BUT my problem is that if the "Message" string value has their own comma in the value, then the CSV ends up with additional columns.. which then prevents the data from being imported correctly, unless we consolidate those into the main "message" column..

So what im trying to do is save ourselves the manual step of copying over those rare instances of extra columns into the main message column.

Say the CSV has 3000 rows.. and MAYBE about 20 rows have those extra columns.. sometimes the row may only have 1 extra column and other times there may be 5 extra columns for a row..

How can i run a script against this file to check for the extra columns, if found, then copy those column values into the main message column so that we have 1 message string value?

**deanhystad** · Aug-11-2022, 02:08 PM

Please provide samples of the data with and without the extra commas.

cubangt · Aug-11-2022, 02:18 PM

yea im working on finding a file that has not been cleaned up yet so i can upload.

cubangt · Aug-11-2022, 02:34 PM

ok here is a small sample

This is literally a real example on how we get he file because there are extra commas in the message value which then throws everything off
And there is one row you will see has the string value in the first column which is the date column and in those cases it seems to happen when the message column has a huge paragraph worth of data, it gets places into other columns instead of just new columns.

**deanhystad** · Aug-11-2022, 03:19 PM

That is not a csv file. Do you have any control over how the file is generated?

cubangt · Aug-11-2022, 03:25 PM

How do you mean its not a CSV file?

When i open it in notepad this is what i get...

Date,Time,User,Message,,,,
5/22/2022,8:44 AM,Don,Oh no,,,,
5/22/2022,8:43 AM,Jenn,Did i tell you my mom, dad, nephew, my sister's husband, and i think my brother have covid
5/22/2022,8:42 AM,Jenn,A little sore,,,,
5/21/2022,10:11 PM,Don,Ok,,,,
5/21/2022,10:11 PM,Jenn,I will talk to you tomorrow... Yes it's in Hulu.... With Jessica Beal,,,,
5/21/2022,10:10 PM,Don,Candy?,,,,
5/21/2022,10:10 PM,Jenn,That's good,,,,
And I've also been told you are with everyone is praying her husband doesn't find out,,,,,,,
5/11/2022,7:29 PM,Jenn,Buttttttt,,,,

And unfortunately i do not have access to the generation of this file, it comes from a 3rd party and I'm just trying to clean it up as best as possible before pulling the data into our side of things.

bowlofred · Aug-11-2022, 03:45 PM

If the extra commas can only be in the message column, then split on comma to get the first columns, then rsplit on comma to get the last columns.

# name, age, notes, zip_code
table = '''Susan,27,works with HR on Zoom calls,02134
Roger,41,Gets coffee, bagels, and sodas for all the meetings,90210
'''

for row in table.splitlines():
    name, age, rest = row.split(",", maxsplit=2)
    notes, zip_code = rest.rsplit(",", maxsplit = 1)

    print(f"Name: {name}.  Notes: {notes}")

Output:Name: Susan.  Notes: works with HR on Zoom calls
Name: Roger.  Notes: Gets coffee, bagels, and sodas for all the meetings

cubangt · Aug-11-2022, 04:27 PM

So i tried to follow your example with my file as the source and get an error:

import csv

with open("sample.csv", "r") as file_in: 
    dataReader = csv.reader(file_in)
    
    for row in dataReader.splitlines():
        date, time, user, rest = row.split(",", maxsplit=2)
        message = rest.rsplit(",", maxsplit = 1)
        
        print(f"Date: {date}.  Message: {message}")

Error:
AttributeError: '_csv.reader' object has no attribute 'splitlines'

**deanhystad** · (This post was last modified: Aug-11-2022, 06:04 PM by deanhystad.)

This is not a csv format file. Do not use csv reader.

import re
import pandas as pd

date_pattern = re.compile("\d+/\d+/\d+")

lines = []
with open("Sample.csv", "r") as f:
    # Get column headers
    columns = next(f).rstrip(",\n").split(",")
    for line in f:
        line = line.rstrip(",\n")
        # Check if line starts with date, time,
        if re.match(date_pattern, line)
            # This is a new row.  Split into columns
            row = line.split(",", maxsplit=len(columns) - 1)
            lines.append(row)
        else:
            # This is a continuation of previous message.
            row = lines[-1]
            row[-1] = f"{row[-1]}\n{line}"

df = pd.DataFrame(lines, columns=columns)
print(df)

cubangt · Aug-11-2022, 06:02 PM

ok, so that i can understand, how are you identifying that is not a true csv format file?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Get an average of the unique values of a column with group by condition and assign it	klllmmm	0	2,155	Feb-17-2024, 05:53 PM Last Post: klllmmm
	Converting column of values into muliple columns of counts	highland44	0	955	Feb-01-2024, 12:48 AM Last Post: highland44
	__init__() got multiple values for argument 'schema'	dawid294	4	10,109	Jan-03-2024, 09:42 AM Last Post: buran
	PowerBI: Using Python Regex to look for values in column	MarcusR44	1	2,192	Oct-14-2022, 01:03 PM Last Post: ibreeden
	Reshaping a single column in to multiple column using Python	sahar	7	3,746	Jun-20-2022, 12:35 PM Last Post: deanhystad
	df column aggregate and group by multiple columns	SriRajesh	0	1,713	May-06-2022, 02:26 PM Last Post: SriRajesh
	Creating a numpy array from specific values of a spreadsheet column	JulianZ	0	1,785	Apr-19-2022, 07:36 AM Last Post: JulianZ
	How to split file by same values from column from imported CSV file?	Paqqno	5	6,008	Mar-24-2022, 05:25 PM Last Post: Paqqno
	Float Slider - Affecting Values in Column 'Pandas'	planckepoch86	0	2,043	Jan-22-2022, 02:18 PM Last Post: planckepoch86
	Split single column to multiple columns	SriRajesh	1	1,987	Jan-07-2022, 06:43 PM Last Post: jefsummers

How to combine multiple column values into 1?

User Panel Messages

Announcements