Python Forum
Need help formatting dataframe data before saving to CSV
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help formatting dataframe data before saving to CSV
#11
(Jun-30-2022, 06:58 PM)deanhystad Wrote: You are making a 1 column dataframe of strings. If you made a dataframe that had columns for date, time, sender and content you would get a proper csv file from df.to_csv(). Here's a short example:
mport pandas as pd
from random import choice
import time, datetime

senders = ["Me", "You", "Dog named Boo"]
content = [
    "Just wanted to sa Hi!",
    "Coming home soon.",
    "Take me for a walk.",
    "We're out of dog food.",
    None,
]

messages = []
for _ in range(10):
    timestamp = datetime.datetime.now()
    messages.append(
        [
            timestamp.strftime("%m/%d/%Y"),
            timestamp.strftime("%I:%M:%S %p"),
            choice(senders),
            choice(content),
        ]
    )
    time.sleep(1)

df = pd.DataFrame(messages, columns=["Date", "Time", "Sender", "Content"])
print(df)
df.to_csv("test.csv")
The dataframe looks like this:
Output:
Date Time Sender Content 0 06/30/2022 01:57:50 PM Dog named Boo None 1 06/30/2022 01:57:51 PM You Just wanted to sa Hi! 2 06/30/2022 01:57:52 PM You Coming home soon. 3 06/30/2022 01:57:53 PM Dog named Boo Coming home soon. 4 06/30/2022 01:57:54 PM You None 5 06/30/2022 01:57:55 PM You Take me for a walk. 6 06/30/2022 01:57:56 PM Dog named Boo None 7 06/30/2022 01:57:57 PM You Take me for a walk. 8 06/30/2022 01:57:58 PM Me We're out of dog food. 9 06/30/2022 01:58:00 PM Me Just wanted to sa Hi!
And the generated CSV file like this.
Output:
,Date,Time,Sender,Content 0,06/30/2022,01:57:50 PM,Dog named Boo, 1,06/30/2022,01:57:51 PM,You,Just wanted to sa Hi! 2,06/30/2022,01:57:52 PM,You,Coming home soon. 3,06/30/2022,01:57:53 PM,Dog named Boo,Coming home soon. 4,06/30/2022,01:57:54 PM,You, 5,06/30/2022,01:57:55 PM,You,Take me for a walk. 6,06/30/2022,01:57:56 PM,Dog named Boo, 7,06/30/2022,01:57:57 PM,You,Take me for a walk. 8,06/30/2022,01:57:58 PM,Me,We're out of dog food. 9,06/30/2022,01:58:00 PM,Me,Just wanted to sa Hi!
If I want to remove the column or row headers this is easily accomplished using arguments in the to_csv() call.

that suggestion helped and corrected it, thank you

here is what i changed my code to:

import json
from datetime import datetime
import pandas as pd
import os
import csv

f = open('message_1.json')

data = json.load(f)

lv = []

for message in data["messages"]:
    timestamp = datetime.fromtimestamp(message["timestamp_ms"] / 1000)
    
    if 'content' not in message:
        lv.append(
            [
                timestamp.strftime("%m/%d/%Y"),
                timestamp.strftime("%I:%M:%S %p"),
                message["sender_name"],
                ""
                ])

    else:
        lv.append(
            [
                timestamp.strftime("%m/%d/%Y"),
                timestamp.strftime("%I:%M:%S %p"),
                message["sender_name"],
                message["content"]
                ])   

df = pd.DataFrame(lv, columns=["Date", "Time", "Sender", "Content"])

df.to_csv('igMess.csv', header=None, index=None, quoting=csv.QUOTE_NONE,  escapechar=",", mode='a')

f.close()
Axel_Erfurt likes this post
Reply
#12
This is sloppy:
    if 'content' not in message:
        lv.append(
            [
                timestamp.strftime("%m/%d/%Y"),
                timestamp.strftime("%I:%M:%S %p"),
                message["sender_name"],
                ""
                ])
 
    else:
        lv.append(
            [
                timestamp.strftime("%m/%d/%Y"),
                timestamp.strftime("%I:%M:%S %p"),
                message["sender_name"],
                message["content"]
                ]) 
I hate duplicate code, even as little as that. I would do this:
lv.append([
    timestamp.strftime("%m/%d/%Y"),
    timestamp.strftime("%I:%M:%S %p"),
    message["sender_name"],
    message["content"] if "content" in message else ""])
Reply
#13
Well thats def cleaner.. didnt know you can do the IF inline like that... thank you..
Reply
#14
Since message is a dictionary you could also do this:
lv.append([
    timestamp.strftime("%m/%d/%Y"),
    timestamp.strftime("%I:%M:%S %p"),
    message["sender_name"],
    message.get("content", "")])
Reply
#15
What can i look up to get better familiar with such things? Just look up python dictionary? or something else?
Reply
#16
Every time you use something read all the documentation. Do that until you know the thing upside down and inside out. A lot of the Python documentation will reference related things. When reading up on dictionaries:

https://docs.python.org/3/library/stdtyp...types-dict

If you look at the above link you'll see on the left of the page several related topics. I don't really know about Unions, so maybe I'll go take a look at that. Stay curious.
Reply
#17
Awesome i appreciate the help and suggestions
thank you
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Filter data into new dataframe as main dataframe is being populated cubangt 8 922 Oct-23-2023, 12:43 AM
Last Post: cubangt
  Seeing al the data in a dataframe or numpy.array Led_Zeppelin 1 1,111 Jul-11-2022, 08:54 PM
Last Post: Larz60+
  Problem in saving .xlsm (excel) file using pandas dataframe in python shantanu97 2 4,162 Aug-29-2021, 12:39 PM
Last Post: snippsat
  Reading data to python: turn into list or dataframe hhchenfx 2 5,279 Jun-01-2021, 10:28 AM
Last Post: Larz60+
  How to save json data in a dataframe shantanu97 1 2,121 Apr-15-2021, 02:44 PM
Last Post: klllmmm
  Formatting date in a dataframe WiPi 1 1,707 Jan-06-2021, 11:26 AM
Last Post: WiPi
  Pandas Extract data from two dataframe nio74maz 1 2,143 Dec-26-2020, 09:52 PM
Last Post: nio74maz
  Error when Excelwriter saving a dataframe with datetime datatype with timezone klllmmm 3 13,083 Dec-08-2020, 11:37 AM
Last Post: Larz60+
  saving data from text file to CSV file in python having delimiter as space K11 1 2,355 Sep-11-2020, 06:28 AM
Last Post: bowlofred
  Formatting Data/Time with Pyodbc and openpyxl bearcats6001 0 2,251 Aug-17-2020, 03:44 PM
Last Post: bearcats6001

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020