Python Forum
How to combine multiple column values into 1?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to combine multiple column values into 1?
#11
Your file is not CSV because it does not consistently use the same character as a delimiter. Your file sometimes uses commas as a delimiter, but other times commas are just part of a string. You can include the delimiter in a string, but the string has to be quoted.

Another problem with your file is is doesn't have a tabular format. Sometimes a line is a new row, other times it is a continuation of the previous row. I think this can also be fixed by quoting the string.

My guess is your file is a botched attempt at making a CSV format file. It is easy to botch writing a CSV file. Really easy.
Reply
#12
ok makes sense now, thanks..

AND after asking around it seems that even though we are getting the data from a 3rd party, another team is the one translating that data into the "csv" format.. lol

So i asked for the code and this is what they are doing, which would explain the instances you are describing.. apparently the data from 3rd party is a json file, which another team here is trying to convert/clean up into a csv format(apparently not correctly), but then that file is sent to my team and the above was trying to clean that, so maybe the first team is the one that needs to do better at parsing that json file..

import json
from datetime import datetime
import pandas as pd
import os
import csv

fileNumber = '2'

f = open('message_'+ fileNumber+'.json')

data = json.load(f)

lv = []

for message in data["messages"]:
    timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
    
    lv.append([
        timestamp.strftime("%m/%d/%Y"),
        timestamp.strftime("%I:%M:%S %p"),
        message["sender_name"],
        message["content"] if "content" in message else "Media Link"])  

df = pd.DataFrame(lv, columns=["Date", "Time", "Sender", "Content"])

df.to_csv('igMess_'+ fileNumber+'_.csv', header=None, index=None, quoting=csv.QUOTE_NONE,  escapechar=",", mode='a')

f.close()
Can the above example provided, be integrated into this "parsing" code to better clean the data and create a legit csv?
Reply
#13
That does not include the code that creates the CSV file.

DataFrame.to_csv should automatically quote strings that contain commas or newlines. Check df and see if it has the right number of columns. If df is messed up, check lv. If lv is messed up, take a look at message.
Reply
#14
I have updated the code to show the save portion.
Reply
#15
They should not be using "QUOTE_NONE". The default is "QUOTE_MINIMAL" which is what you want. This will not quote strings unless they contain a delimiter (comma is the default delimiter) or a newline. They should not be using escapechar either.

mode="a" is also suspect. This will append to an existing file.
header=None is probably wrong too.
index=None. Hey, they got that one right!
Reply
#16
OMG removing just this from the save "quoting=csv.QUOTE_NONE" fixed the output file.

so all this trouble caused for my team was a simple parameter that was added or whatever in the save settings.

We will see for sure next week when we get another json file and they run it without that parameter..

thank you for the explanations and suggestions.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Get an average of the unique values of a column with group by condition and assign it klllmmm 0 283 Feb-17-2024, 05:53 PM
Last Post: klllmmm
  Converting column of values into muliple columns of counts highland44 0 254 Feb-01-2024, 12:48 AM
Last Post: highland44
  __init__() got multiple values for argument 'schema' dawid294 4 2,342 Jan-03-2024, 09:42 AM
Last Post: buran
  PowerBI: Using Python Regex to look for values in column MarcusR44 1 975 Oct-14-2022, 01:03 PM
Last Post: ibreeden
  Reshaping a single column in to multiple column using Python sahar 7 2,052 Jun-20-2022, 12:35 PM
Last Post: deanhystad
  df column aggregate and group by multiple columns SriRajesh 0 1,043 May-06-2022, 02:26 PM
Last Post: SriRajesh
  Creating a numpy array from specific values of a spreadsheet column JulianZ 0 1,123 Apr-19-2022, 07:36 AM
Last Post: JulianZ
  How to split file by same values from column from imported CSV file? Paqqno 5 2,788 Mar-24-2022, 05:25 PM
Last Post: Paqqno
  Float Slider - Affecting Values in Column 'Pandas' planckepoch86 0 1,401 Jan-22-2022, 02:18 PM
Last Post: planckepoch86
  Split single column to multiple columns SriRajesh 1 1,330 Jan-07-2022, 06:43 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020