Posts: 170
Threads: 43
Joined: May 2019
Jun-29-2022, 08:32 PM
(This post was last modified: Jun-29-2022, 08:32 PM by cubangt.)
I am trying to parse and save my IG messages for stat purposes.
I have the following code that works and saves to a csv file, but when i open the file each line is wrapped in " " like so ("2022-06-29,08:50 AM,Joe,Liked a message"), so when i try to import the data into excel, its not split out correctly, thats issue #1
Issue #2 is how the date is formatted, either im not choosing the correct format below or just missing something, but the date that is placed in the file is formatter as the below code shows yyyy-mm-dd i need to have it formatted in mm/dd/yyyy
Issue #3 same thing with the time, in this case i need to include the seconds in the format, currently the below returns 07:05 PM, but need it to return 07:05:00 PM
Can someone suggest what im doing wrong or offer some samples that i can change or implement in my code to clean up the results some more.
import json
from datetime import datetime
import pandas as pd
import os
f = open('message_1.json')
data = json.load(f)
lv = []
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
date_val = timestamp.strftime('%Y-%m-%d')
time_val = timestamp.strftime("%I:%M %p")
if 'content' not in message:
st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
lv.append(st)
else:
st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
lv.append(st)
df = pd.DataFrame(lv)
Posts: 170
Threads: 43
Joined: May 2019
Jun-29-2022, 08:45 PM
(This post was last modified: Jun-29-2022, 08:45 PM by cubangt.)
Ok i figured out #2 and #3
date_val = timestamp.strftime("%m/%d/%Y")
time_val = timestamp.strftime("%I:%M:%S %p") How can i prevent the " " wrapped around each line?
Posts: 170
Threads: 43
Joined: May 2019
Jun-29-2022, 09:12 PM
(This post was last modified: Jun-29-2022, 09:12 PM by cubangt.)
Here is the latest updated code and getting an error:
import json
from datetime import datetime
import pandas as pd
import os
import csv
f = open('message_1.json')
data = json.load(f)
lv = []
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
date_val = timestamp.strftime("%m/%d/%Y")
time_val = timestamp.strftime("%I:%M:%S %p")
if 'content' not in message:
st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
lv.append(st)
else:
st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
lv.append(st)
df = pd.DataFrame(lv)
df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE, mode='a')
f.close() Error: Error: need to escape, but no escapechar set
The "quoting=csv.QUOTE_NONE" parameter got rid of the quotes on each line, but the above give that error now..
Posts: 582
Threads: 1
Joined: Aug 2019
Please show the complete error message. And which line in your program causes this error?
Posts: 170
Threads: 43
Joined: May 2019
Error: File "pandas\_libs\writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
Error: need to escape, but no escapechar set
Posts: 1,023
Threads: 16
Joined: Dec 2016
Jun-30-2022, 04:54 PM
(This post was last modified: Jun-30-2022, 04:55 PM by Axel_Erfurt.)
set escapechar, replace your_escapechar with yours
df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE, escapechar="your_escapechar", mode='a')
Posts: 170
Threads: 43
Joined: May 2019
Jun-30-2022, 06:29 PM
(This post was last modified: Jun-30-2022, 06:29 PM by cubangt.)
ok, so doing that adds the delimeter, but since im building the string already with the comma, i now have 2 commas between each column.. so how can i change this line:
st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
lv.append(st) so that i get my 3 values comma delimited?
This is whats in my csv now with the addition of the escapecharacter param
06/29/2022,,08:50:16 AM,,Joe,,Liked a message
06/29/2022,,08:50:04 AM,,Ron,,Liked a message
06/29/2022,,08:49:58 AM,,Ron,,Liked a message
Posts: 1,023
Threads: 16
Joined: Dec 2016
How is message_1.json structured? Can you show some lines?
Posts: 6,775
Threads: 20
Joined: Feb 2020
You are making a 1 column dataframe of strings. If you made a dataframe that had columns for date, time, sender and content you would get a proper csv file from df.to_csv(). Here's a short example:
mport pandas as pd
from random import choice
import time, datetime
senders = ["Me", "You", "Dog named Boo"]
content = [
"Just wanted to sa Hi!",
"Coming home soon.",
"Take me for a walk.",
"We're out of dog food.",
None,
]
messages = []
for _ in range(10):
timestamp = datetime.datetime.now()
messages.append(
[
timestamp.strftime("%m/%d/%Y"),
timestamp.strftime("%I:%M:%S %p"),
choice(senders),
choice(content),
]
)
time.sleep(1)
df = pd.DataFrame(messages, columns=["Date", "Time", "Sender", "Content"])
print(df)
df.to_csv("test.csv") The dataframe looks like this:
Output: Date Time Sender Content
0 06/30/2022 01:57:50 PM Dog named Boo None
1 06/30/2022 01:57:51 PM You Just wanted to sa Hi!
2 06/30/2022 01:57:52 PM You Coming home soon.
3 06/30/2022 01:57:53 PM Dog named Boo Coming home soon.
4 06/30/2022 01:57:54 PM You None
5 06/30/2022 01:57:55 PM You Take me for a walk.
6 06/30/2022 01:57:56 PM Dog named Boo None
7 06/30/2022 01:57:57 PM You Take me for a walk.
8 06/30/2022 01:57:58 PM Me We're out of dog food.
9 06/30/2022 01:58:00 PM Me Just wanted to sa Hi!
And the generated CSV file like this.
Output: ,Date,Time,Sender,Content
0,06/30/2022,01:57:50 PM,Dog named Boo,
1,06/30/2022,01:57:51 PM,You,Just wanted to sa Hi!
2,06/30/2022,01:57:52 PM,You,Coming home soon.
3,06/30/2022,01:57:53 PM,Dog named Boo,Coming home soon.
4,06/30/2022,01:57:54 PM,You,
5,06/30/2022,01:57:55 PM,You,Take me for a walk.
6,06/30/2022,01:57:56 PM,Dog named Boo,
7,06/30/2022,01:57:57 PM,You,Take me for a walk.
8,06/30/2022,01:57:58 PM,Me,We're out of dog food.
9,06/30/2022,01:58:00 PM,Me,Just wanted to sa Hi!
If I want to remove the column or row headers this is easily accomplished using arguments in the to_csv() call.
Posts: 170
Threads: 43
Joined: May 2019
{
"participants": [
{
"name": "Ron"
},
{
"name": "Joe"
}
],
"messages": [
{
"sender_name": "Ron",
"timestamp_ms": 1656510616932,
"content": "Liked a message",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"is_bumped": false
}
},
{
"sender_name": "Ron",
"timestamp_ms": 1656510604636,
"content": "Liked a message",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"is_bumped": false
}
},
{
"sender_name": "Ron",
"timestamp_ms": 1656510598826,
"content": "Liked a message",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"is_bumped": false
}
},
{
"sender_name": "Joe",
"timestamp_ms": 1656510439241,
"reactions": [
{
"reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f",
"actor": "Ron"
},
{
"reaction": "\u00e2\u009d\u00a4\u00ef\u00b8\u008f",
"actor": "Ron"
}
],
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"bumped_message": "",
"is_bumped": false
}
},
|