Need help formatting dataframe data before saving to CSV

cubangt · (This post was last modified: Jun-29-2022, 08:32 PM by cubangt.)

I am trying to parse and save my IG messages for stat purposes.

I have the following code that works and saves to a csv file, but when i open the file each line is wrapped in " " like so ("2022-06-29,08:50 AM,Joe,Liked a message"), so when i try to import the data into excel, its not split out correctly, thats issue #1

Issue #2 is how the date is formatted, either im not choosing the correct format below or just missing something, but the date that is placed in the file is formatter as the below code shows yyyy-mm-dd i need to have it formatted in mm/dd/yyyy

Issue #3 same thing with the time, in this case i need to include the seconds in the format, currently the below returns 07:05 PM, but need it to return 07:05:00 PM

Can someone suggest what im doing wrong or offer some samples that i can change or implement in my code to clean up the results some more.

import json
from datetime import datetime
import pandas as pd
import os

f = open('message_1.json')

data = json.load(f)

lv = []

for message in data["messages"]:
    timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
    
    date_val = timestamp.strftime('%Y-%m-%d')
    time_val = timestamp.strftime("%I:%M %p")
       
    if 'content' not in message:
        st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
        lv.append(st)

    else:
        st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
        lv.append(st)

df = pd.DataFrame(lv)

cubangt · (This post was last modified: Jun-29-2022, 08:45 PM by cubangt.)

Ok i figured out #2 and #3

    date_val = timestamp.strftime("%m/%d/%Y")
    time_val = timestamp.strftime("%I:%M:%S %p")

How can i prevent the " " wrapped around each line?

cubangt · (This post was last modified: Jun-29-2022, 09:12 PM by cubangt.)

Here is the latest updated code and getting an error:

import json
from datetime import datetime
import pandas as pd
import os
import csv

f = open('message_1.json')

data = json.load(f)

lv = []

for message in data["messages"]:
    timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
    
    date_val = timestamp.strftime("%m/%d/%Y")
    time_val = timestamp.strftime("%I:%M:%S %p")
       
    if 'content' not in message:
        st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
        lv.append(st)

    else:
        st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
        lv.append(st)

df = pd.DataFrame(lv)


df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE,  mode='a')

f.close()

Error:
Error: need to escape, but no escapechar set

The "quoting=csv.QUOTE_NONE" parameter got rid of the quotes on each line, but the above give that error now..

ibreeden · Jun-30-2022, 07:10 AM

Please show the complete error message. And which line in your program causes this error?

cubangt · Jun-30-2022, 01:19 PM

Error:  File "pandas\_libs\writers.pyx", line 72, in pandas._libs.writers.write_csv_rows

Error: need to escape, but no escapechar set

Axel_Erfurt · (This post was last modified: Jun-30-2022, 04:55 PM by Axel_Erfurt.)

set escapechar, replace your_escapechar with yours

df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE, escapechar="your_escapechar", mode='a')

cubangt · (This post was last modified: Jun-30-2022, 06:29 PM by cubangt.)

ok, so doing that adds the delimeter, but since im building the string already with the comma, i now have 2 commas between each column.. so how can i change this line:

        st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
        lv.append(st)

so that i get my 3 values comma delimited?

This is whats in my csv now with the addition of the escapecharacter param

06/29/2022,,08:50:16 AM,,Joe,,Liked a message
06/29/2022,,08:50:04 AM,,Ron,,Liked a message
06/29/2022,,08:49:58 AM,,Ron,,Liked a message

Axel_Erfurt · Jun-30-2022, 06:50 PM

How is message_1.json structured? Can you show some lines?

**deanhystad** · Jun-30-2022, 06:58 PM

You are making a 1 column dataframe of strings. If you made a dataframe that had columns for date, time, sender and content you would get a proper csv file from df.to_csv(). Here's a short example:

mport pandas as pd
from random import choice
import time, datetime

senders = ["Me", "You", "Dog named Boo"]
content = [
    "Just wanted to sa Hi!",
    "Coming home soon.",
    "Take me for a walk.",
    "We're out of dog food.",
    None,
]

messages = []
for _ in range(10):
    timestamp = datetime.datetime.now()
    messages.append(
        [
            timestamp.strftime("%m/%d/%Y"),
            timestamp.strftime("%I:%M:%S %p"),
            choice(senders),
            choice(content),
        ]
    )
    time.sleep(1)

df = pd.DataFrame(messages, columns=["Date", "Time", "Sender", "Content"])
print(df)
df.to_csv("test.csv")

The dataframe looks like this:

Output:         Date         Time         Sender                 Content
0  06/30/2022  01:57:50 PM  Dog named Boo                    None
1  06/30/2022  01:57:51 PM            You   Just wanted to sa Hi!
2  06/30/2022  01:57:52 PM            You       Coming home soon.
3  06/30/2022  01:57:53 PM  Dog named Boo       Coming home soon.
4  06/30/2022  01:57:54 PM            You                    None
5  06/30/2022  01:57:55 PM            You     Take me for a walk.
6  06/30/2022  01:57:56 PM  Dog named Boo                    None
7  06/30/2022  01:57:57 PM            You     Take me for a walk.
8  06/30/2022  01:57:58 PM             Me  We're out of dog food.
9  06/30/2022  01:58:00 PM             Me   Just wanted to sa Hi!

And the generated CSV file like this.

Output:,Date,Time,Sender,Content
0,06/30/2022,01:57:50 PM,Dog named Boo,
1,06/30/2022,01:57:51 PM,You,Just wanted to sa Hi!
2,06/30/2022,01:57:52 PM,You,Coming home soon.
3,06/30/2022,01:57:53 PM,Dog named Boo,Coming home soon.
4,06/30/2022,01:57:54 PM,You,
5,06/30/2022,01:57:55 PM,You,Take me for a walk.
6,06/30/2022,01:57:56 PM,Dog named Boo,
7,06/30/2022,01:57:57 PM,You,Take me for a walk.
8,06/30/2022,01:57:58 PM,Me,We're out of dog food.
9,06/30/2022,01:58:00 PM,Me,Just wanted to sa Hi!

If I want to remove the column or row headers this is easily accomplished using arguments in the to_csv() call.

cubangt · Jun-30-2022, 06:58 PM

{
  "participants": [
    {
      "name": "Ron"
    },
    {
      "name": "Joe"
    }
  ],
  "messages": [
    {
      "sender_name": "Ron",
      "timestamp_ms": 1656510616932,
      "content": "Liked a message",
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "is_bumped": false
      }
    },
    {
      "sender_name": "Ron",
      "timestamp_ms": 1656510604636,
      "content": "Liked a message",
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "is_bumped": false
      }
    },
    {
      "sender_name": "Ron",
      "timestamp_ms": 1656510598826,
      "content": "Liked a message",
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "is_bumped": false
      }
    },
    {
      "sender_name": "Joe",
      "timestamp_ms": 1656510439241,
      "reactions": [
        {
          "reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f",
          "actor": "Ron"
        },
        {
          "reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f",
          "actor": "Ron"
        }
      ],
      "type": "Generic",
      "is_unsent": false,
      "is_taken_down": false,
      "bumped_message_metadata": {
        "bumped_message": "",
        "is_bumped": false
      }
    },

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Confused by the different ways of extracting data in DataFrame	leea2024	1	685	Aug-17-2024, 01:34 PM Last Post: deanhystad
	Filter data into new dataframe as main dataframe is being populated	cubangt	8	3,129	Oct-23-2023, 12:43 AM Last Post: cubangt
	Seeing al the data in a dataframe or numpy.array	Led_Zeppelin	1	1,725	Jul-11-2022, 08:54 PM Last Post: Larz60+
	Problem in saving .xlsm (excel) file using pandas dataframe in python	shantanu97	2	6,185	Aug-29-2021, 12:39 PM Last Post: snippsat
	Reading data to python: turn into list or dataframe	hhchenfx	2	6,799	Jun-01-2021, 10:28 AM Last Post: Larz60+
	How to save json data in a dataframe	shantanu97	1	2,851	Apr-15-2021, 02:44 PM Last Post: klllmmm
	Formatting date in a dataframe	WiPi	1	2,414	Jan-06-2021, 11:26 AM Last Post: WiPi
	Pandas Extract data from two dataframe	nio74maz	1	2,746	Dec-26-2020, 09:52 PM Last Post: nio74maz
	Error when Excelwriter saving a dataframe with datetime datatype with timezone	klllmmm	3	20,039	Dec-08-2020, 11:37 AM Last Post: Larz60+
	saving data from text file to CSV file in python having delimiter as space	K11	1	3,249	Sep-11-2020, 06:28 AM Last Post: bowlofred

Need help formatting dataframe data before saving to CSV

User Panel Messages

Announcements