Posts: 170
Threads: 43
Joined: May 2019
So im playing around with parsing a json file in python.
Im able to read in the file and print it to the console, but now i want to extract 3 values from each "section" not sure what the proper terminology is.
Here is a example of the data structure..:
"messages": [
{
"sender_name": "Me",
"timestamp_ms": 1653260883178,
"content": "There are plenty of leftovers",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"bumped_message": "There are plenty of leftovers",
"is_bumped": false
}
},
{
"sender_name": "Me",
"timestamp_ms": 1653260872966,
"content": "Watching the new scream movie",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"bumped_message": "Watching the new scream movie",
"is_bumped": false
}
}, I basically need to pull out only the first 3 sets of values and save it into a CSV file.
"sender_name": "Me",
"timestamp_ms": 1653260883178,
"content": "There are plenty of leftovers", Right now i have this basic simple code, but need to figure out how to get within the "message" section and pull out those 3 values per group
import json
f = open('message_1.json')
data = json.load(f)
for i in data['messages']:
print(i)
f.close()
Posts: 1,144
Threads: 114
Joined: Sep 2019
Posts: 6,779
Threads: 20
Joined: Feb 2020
import json
from datetime import datetime
json_str = """
{
"messages": [
{
"sender_name": "Me",
"timestamp_ms": 1653260883178,
"content": "There are plenty of leftovers",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"bumped_message": "There are plenty of leftovers",
"is_bumped": false
}
},
{
"sender_name": "Me",
"timestamp_ms": 1653260872966,
"content": "Watching the new scream movie",
"type": "Generic",
"is_unsent": false,
"is_taken_down": false,
"bumped_message_metadata": {
"bumped_message": "Watching the new scream movie",
"is_bumped": false
}
}
]
}
"""
data = json.loads(json_str)
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
print(
f"""{timestamp} from {message["sender_name"]}\n{message["content"]}\n"""
) Output: 2022-05-22 18:08:03.178000 from Me
There are plenty of leftovers
2022-05-22 18:07:52.966000 from Me
Watching the new scream movie
Posts: 170
Threads: 43
Joined: May 2019
So here is what i have and seems to work, now im trying to save this to a CSV so i can test importing it into my excel report
import json
from datetime import datetime
f = open('messages.json')
data = json.load(f)
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
if 'content' not in message:
print(
f"""{timestamp} from {message["sender_name"]}\n"""
)
else:
print(
f"""{timestamp} from {message["sender_name"]}\n{message["content"]}\n"""
)
f.close()
Posts: 170
Threads: 43
Joined: May 2019
May-25-2022, 07:38 PM
(This post was last modified: May-25-2022, 07:38 PM by cubangt.)
What am i doing wrong?
import json
from datetime import datetime
import pandas as pd
f = open('message_1.json')
data = json.load(f)
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
if 'content' not in message:
rw = pd.DataFrame([timestamp,message["sender_name"],pd.NA], columns=['Date', 'Name', 'Comment'])
else:
rw = pd.DataFrame([timestamp,message["sender_name"], message["content"]], columns=['Date', 'Name', 'Comment'])
rw.to_csv('igMess.csv',columns=["Date", "Name", "Comment"], header=None, index=None, mode='a')
f.close() I get this error:
Error: ValueError: Shape of passed values is (3, 1), indices imply (3, 3)
Posts: 170
Threads: 43
Joined: May 2019
Ok got past the error and a file generated, BUT not sure how to split out the timestamp so that i have a date and a time separated in the csv
import json
from datetime import datetime
import pandas as pd
f = open('message_1.json')
data = json.load(f)
lv = []
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
date_val = timestamp.strftime('%Y-%m-%d')
if 'content' not in message:
st = date_val +","+message["sender_name"]+","+""
lv.append(st)
else:
st = date_val +","+message["sender_name"]+","+message["content"]
lv.append(st)
df = pd.DataFrame(lv)
df.to_csv('igMess.csv', header=None, index=None, mode='a')
f.close() the file that was generated when the above was run produced this output:
"2022-05-22,Me,There are plenty of leftovers"
"2022-05-22,Me,Watching the new scream movie" expected results should be like so:
5/17/22, 5:28 PM,Me: There are plenty of leftovers
5/17/22, 5:28 PM,Me: Watching the new scream movie If you notice, the generated results have "" around each row and missing the 5:28 PM time..
Posts: 170
Threads: 43
Joined: May 2019
ok got the time added and working, so now the only question is how to remove the " " around each row in the file
here is the currently working code:
import json
from datetime import datetime
import pandas as pd
f = open('message_1.json')
data = json.load(f)
lv = []
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
date_val = timestamp.strftime('%Y-%m-%d')
time_val = timestamp.strftime("%I:%M %p")
if 'content' not in message:
st = date_val + "," + time_val + "," + message["sender_name"] + "," + ""
lv.append(st)
else:
st = date_val +"," + time_val + "," + message["sender_name"] + "," + message["content"]
lv.append(st)
df = pd.DataFrame(lv)
df.to_csv('igMess.csv', header=None, index=None, mode='a')
f.close()
Posts: 170
Threads: 43
Joined: May 2019
So i have been running this a few times since the above post and found a few things, im hoping i can fix in the above code. So i noticed that if a message is very large that it gets split up in my csv file., i only want my csv to have 4 columns
Here is the current code the does work, just needs some adjustments to make sure my "content" column is all inclusive and not split out. When i ran this code today against the newest json file, i found data in 4 or 6 other columns, basically had data for certain rows spread across columns A thru M
import json
from datetime import datetime
import pandas as pd
import os
import csv
f = open('message_1.json')
data = json.load(f)
lv = []
for message in data["messages"]:
timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
lv.append([
timestamp.strftime("%m/%d/%Y"),
timestamp.strftime("%I:%M:%S %p"),
message["sender_name"],
message["content"] if "content" in message else "Media Link"])
df = pd.DataFrame(lv, columns=["Date", "Time", "Sender", "Content"])
df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE, escapechar=",", mode='a')
f.close()
Posts: 6,779
Threads: 20
Joined: Feb 2020
What are all the possible keys that contain content values? How should the content values be combined?
|