Python Forum
Need help formatting dataframe data before saving to CSV
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help formatting dataframe data before saving to CSV
#1
I am trying to parse and save my IG messages for stat purposes.

I have the following code that works and saves to a csv file, but when i open the file each line is wrapped in " " like so ("2022-06-29,08:50 AM,Joe,Liked a message"), so when i try to import the data into excel, its not split out correctly, thats issue #1


Issue #2 is how the date is formatted, either im not choosing the correct format below or just missing something, but the date that is placed in the file is formatter as the below code shows yyyy-mm-dd i need to have it formatted in mm/dd/yyyy

Issue #3 same thing with the time, in this case i need to include the seconds in the format, currently the below returns 07:05 PM, but need it to return 07:05:00 PM

Can someone suggest what im doing wrong or offer some samples that i can change or implement in my code to clean up the results some more.
import json
from datetime import datetime
import pandas as pd
import os

f = open('message_1.json')

data = json.load(f)

lv = []

for message in data["messages"]:
    timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
    
    date_val = timestamp.strftime('%Y-%m-%d')
    time_val = timestamp.strftime("%I:%M %p")
       
    if 'content' not in message:
        st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
        lv.append(st)

    else:
        st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
        lv.append(st)

df = pd.DataFrame(lv)
Reply
#2
Ok i figured out #2 and #3

    date_val = timestamp.strftime("%m/%d/%Y")
    time_val = timestamp.strftime("%I:%M:%S %p")
How can i prevent the " " wrapped around each line?
Reply
#3
Here is the latest updated code and getting an error:

import json
from datetime import datetime
import pandas as pd
import os
import csv

f = open('message_1.json')

data = json.load(f)

lv = []

for message in data["messages"]:
    timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
    
    date_val = timestamp.strftime("%m/%d/%Y")
    time_val = timestamp.strftime("%I:%M:%S %p")
       
    if 'content' not in message:
        st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
        lv.append(st)

    else:
        st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
        lv.append(st)

df = pd.DataFrame(lv)


df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE,  mode='a')

f.close()
Error:
Error: need to escape, but no escapechar set
The "quoting=csv.QUOTE_NONE" parameter got rid of the quotes on each line, but the above give that error now..
Reply
#4
Please show the complete error message. And which line in your program causes this error?
Reply
#5
Error:
File "pandas\_libs\writers.pyx", line 72, in pandas._libs.writers.write_csv_rows Error: need to escape, but no escapechar set
Reply
#6
set escapechar, replace your_escapechar with yours

df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE, escapechar="your_escapechar", mode='a')
Reply
#7
ok, so doing that adds the delimeter, but since im building the string already with the comma, i now have 2 commas between each column.. so how can i change this line:
        st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
        lv.append(st)
so that i get my 3 values comma delimited?

This is whats in my csv now with the addition of the escapecharacter param

06/29/2022,,08:50:16 AM,,Joe,,Liked a message
06/29/2022,,08:50:04 AM,,Ron,,Liked a message
06/29/2022,,08:49:58 AM,,Ron,,Liked a message
Reply
#8
How is message_1.json structured? Can you show some lines?
Reply
#9
You are making a 1 column dataframe of strings. If you made a dataframe that had columns for date, time, sender and content you would get a proper csv file from df.to_csv(). Here's a short example:
mport pandas as pd
from random import choice
import time, datetime

senders = ["Me", "You", "Dog named Boo"]
content = [
    "Just wanted to sa Hi!",
    "Coming home soon.",
    "Take me for a walk.",
    "We're out of dog food.",
    None,
]

messages = []
for _ in range(10):
    timestamp = datetime.datetime.now()
    messages.append(
        [
            timestamp.strftime("%m/%d/%Y"),
            timestamp.strftime("%I:%M:%S %p"),
            choice(senders),
            choice(content),
        ]
    )
    time.sleep(1)

df = pd.DataFrame(messages, columns=["Date", "Time", "Sender", "Content"])
print(df)
df.to_csv("test.csv")
The dataframe looks like this:
Output:
Date Time Sender Content 0 06/30/2022 01:57:50 PM Dog named Boo None 1 06/30/2022 01:57:51 PM You Just wanted to sa Hi! 2 06/30/2022 01:57:52 PM You Coming home soon. 3 06/30/2022 01:57:53 PM Dog named Boo Coming home soon. 4 06/30/2022 01:57:54 PM You None 5 06/30/2022 01:57:55 PM You Take me for a walk. 6 06/30/2022 01:57:56 PM Dog named Boo None 7 06/30/2022 01:57:57 PM You Take me for a walk. 8 06/30/2022 01:57:58 PM Me We're out of dog food. 9 06/30/2022 01:58:00 PM Me Just wanted to sa Hi!
And the generated CSV file like this.
Output:
,Date,Time,Sender,Content 0,06/30/2022,01:57:50 PM,Dog named Boo, 1,06/30/2022,01:57:51 PM,You,Just wanted to sa Hi! 2,06/30/2022,01:57:52 PM,You,Coming home soon. 3,06/30/2022,01:57:53 PM,Dog named Boo,Coming home soon. 4,06/30/2022,01:57:54 PM,You, 5,06/30/2022,01:57:55 PM,You,Take me for a walk. 6,06/30/2022,01:57:56 PM,Dog named Boo, 7,06/30/2022,01:57:57 PM,You,Take me for a walk. 8,06/30/2022,01:57:58 PM,Me,We're out of dog food. 9,06/30/2022,01:58:00 PM,Me,Just wanted to sa Hi!
If I want to remove the column or row headers this is easily accomplished using arguments in the to_csv() call.
Reply
#10
{
  "participants": [
    {
      "name": "Ron"
    },
    {
      "name": "Joe"
    }
  ],
  "messages": [
    {
      "sender_name": "Ron",
      "timestamp_ms": 1656510616932,
      "content": "Liked a message",
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "is_bumped": false
      }
    },
    {
      "sender_name": "Ron",
      "timestamp_ms": 1656510604636,
      "content": "Liked a message",
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "is_bumped": false
      }
    },
    {
      "sender_name": "Ron",
      "timestamp_ms": 1656510598826,
      "content": "Liked a message",
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "is_bumped": false
      }
    },
    {
      "sender_name": "Joe",
      "timestamp_ms": 1656510439241,
      "reactions": [
        {
          "reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f",
          "actor": "Ron"
        },
        {
          "reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f",
          "actor": "Ron"
        }
      ],
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "bumped_message": "",
        "is_bumped": false
      }
    },
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Filter data into new dataframe as main dataframe is being populated cubangt 8 1,007 Oct-23-2023, 12:43 AM
Last Post: cubangt
  Seeing al the data in a dataframe or numpy.array Led_Zeppelin 1 1,150 Jul-11-2022, 08:54 PM
Last Post: Larz60+
  Problem in saving .xlsm (excel) file using pandas dataframe in python shantanu97 2 4,318 Aug-29-2021, 12:39 PM
Last Post: snippsat
  Reading data to python: turn into list or dataframe hhchenfx 2 5,395 Jun-01-2021, 10:28 AM
Last Post: Larz60+
  How to save json data in a dataframe shantanu97 1 2,166 Apr-15-2021, 02:44 PM
Last Post: klllmmm
  Formatting date in a dataframe WiPi 1 1,744 Jan-06-2021, 11:26 AM
Last Post: WiPi
  Pandas Extract data from two dataframe nio74maz 1 2,188 Dec-26-2020, 09:52 PM
Last Post: nio74maz
  Error when Excelwriter saving a dataframe with datetime datatype with timezone klllmmm 3 13,438 Dec-08-2020, 11:37 AM
Last Post: Larz60+
  saving data from text file to CSV file in python having delimiter as space K11 1 2,411 Sep-11-2020, 06:28 AM
Last Post: bowlofred
  Formatting Data/Time with Pyodbc and openpyxl bearcats6001 0 2,294 Aug-17-2020, 03:44 PM
Last Post: bearcats6001

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020