Python Forum
How to combine multiple column values into 1?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to combine multiple column values into 1?
So ive looked around and found some steps on how to accomplish this, but seem like alot of code for this, maybe im not searching for the right thing, maybe it goes by another name, but i am using pandas and working with CSV and excel files in other scripts..

So here is what i have, just looking for suggestions on the proper or right coding to use to accomplish.

So i have a CSV file that has 4 columns(legit columns)

BUT my problem is that if the "Message" string value has their own comma in the value, then the CSV ends up with additional columns.. which then prevents the data from being imported correctly, unless we consolidate those into the main "message" column..

So what im trying to do is save ourselves the manual step of copying over those rare instances of extra columns into the main message column.

Say the CSV has 3000 rows.. and MAYBE about 20 rows have those extra columns.. sometimes the row may only have 1 extra column and other times there may be 5 extra columns for a row..

How can i run a script against this file to check for the extra columns, if found, then copy those column values into the main message column so that we have 1 message string value?
Please provide samples of the data with and without the extra commas.
yea im working on finding a file that has not been cleaned up yet so i can upload.
ok here is a small sample

This is literally a real example on how we get he file because there are extra commas in the message value which then throws everything off
And there is one row you will see has the string value in the first column which is the date column and in those cases it seems to happen when the message column has a huge paragraph worth of data, it gets places into other columns instead of just new columns.

Attached Files

.csv   Sample.csv (Size: 556 bytes / Downloads: 18)
That is not a csv file. Do you have any control over how the file is generated?
How do you mean its not a CSV file?

When i open it in notepad this is what i get...

5/22/2022,8:44 AM,Don,Oh no,,,,
5/22/2022,8:43 AM,Jenn,Did i tell you my mom, dad, nephew, my sister's husband, and i think my brother have covid
5/22/2022,8:42 AM,Jenn,A little sore,,,,
5/21/2022,10:11 PM,Don,Ok,,,,
5/21/2022,10:11 PM,Jenn,I will talk to you tomorrow... Yes it's in Hulu.... With Jessica Beal,,,,
5/21/2022,10:10 PM,Don,Candy?,,,,
5/21/2022,10:10 PM,Jenn,That's good,,,,
And I've also been told you are with everyone is praying her husband doesn't find out,,,,,,,
5/11/2022,7:29 PM,Jenn,Buttttttt,,,,

And unfortunately i do not have access to the generation of this file, it comes from a 3rd party and I'm just trying to clean it up as best as possible before pulling the data into our side of things.
If the extra commas can only be in the message column, then split on comma to get the first columns, then rsplit on comma to get the last columns.

# name, age, notes, zip_code
table = '''Susan,27,works with HR on Zoom calls,02134
Roger,41,Gets coffee, bagels, and sodas for all the meetings,90210

for row in table.splitlines():
    name, age, rest = row.split(",", maxsplit=2)
    notes, zip_code = rest.rsplit(",", maxsplit = 1)

    print(f"Name: {name}.  Notes: {notes}")
Name: Susan. Notes: works with HR on Zoom calls Name: Roger. Notes: Gets coffee, bagels, and sodas for all the meetings
So i tried to follow your example with my file as the source and get an error:

import csv

with open("sample.csv", "r") as file_in: 
    dataReader = csv.reader(file_in)
    for row in dataReader.splitlines():
        date, time, user, rest = row.split(",", maxsplit=2)
        message = rest.rsplit(",", maxsplit = 1)
        print(f"Date: {date}.  Message: {message}")
AttributeError: '_csv.reader' object has no attribute 'splitlines'
This is not a csv format file. Do not use csv reader.

import re
import pandas as pd

date_pattern = re.compile("\d+/\d+/\d+")

lines = []
with open("Sample.csv", "r") as f:
    # Get column headers
    columns = next(f).rstrip(",\n").split(",")
    for line in f:
        line = line.rstrip(",\n")
        # Check if line starts with date, time,
        if re.match(date_pattern, line)
            # This is a new row.  Split into columns
            row = line.split(",", maxsplit=len(columns) - 1)
            # This is a continuation of previous message.
            row = lines[-1]
            row[-1] = f"{row[-1]}\n{line}"

df = pd.DataFrame(lines, columns=columns)
ok, so that i can understand, how are you identifying that is not a true csv format file?

Possibly Related Threads…
Thread Author Replies Views Last Post
  PowerBI: Using Python Regex to look for values in column MarcusR44 1 230 Oct-14-2022, 01:03 PM
Last Post: ibreeden
  Reshaping a single column in to multiple column using Python sahar 7 822 Jun-20-2022, 12:35 PM
Last Post: deanhystad
  df column aggregate and group by multiple columns SriRajesh 0 459 May-06-2022, 02:26 PM
Last Post: SriRajesh
  Creating a numpy array from specific values of a spreadsheet column JulianZ 0 429 Apr-19-2022, 07:36 AM
Last Post: JulianZ
  How to split file by same values from column from imported CSV file? Paqqno 5 1,020 Mar-24-2022, 05:25 PM
Last Post: Paqqno
  Float Slider - Affecting Values in Column 'Pandas' planckepoch86 0 698 Jan-22-2022, 02:18 PM
Last Post: planckepoch86
  Split single column to multiple columns SriRajesh 1 702 Jan-07-2022, 06:43 PM
Last Post: jefsummers
  How to combine multiple rows of strings into one using pandas? shantanu97 1 2,168 Aug-22-2021, 05:26 AM
Last Post: klllmmm
  Function - Return multiple values tester_V 10 2,917 Jun-02-2021, 05:34 AM
Last Post: tester_V
  Ploting single column with multiple category drunkenneo 1 1,502 May-26-2021, 04:51 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020