Python Forum
Trying to parse and work with the WhatsApp export of chats
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Trying to parse and work with the WhatsApp export of chats
#1
So i have been trying to parse or work with the export from WhatsApp chat history. I'm using that because i thought it would be a simple text format to get working, which it is to a point, but then i found some issues within the file when i open to view it in Notepad++

for the most part, majority of the rows in the file are formatted correctly and look like this:
9/5/21, 8:07 PM - Me: Lol romantic comedy
9/5/21, 8:38 PM - Friend: Yup

then there are rows in the file that are mixed in that throw off the formatting and rows, so my python code doesn't parse those lines well

9/5/21, 8:07 PM - Me: Lol romantic comedy
https://music.youtube.com/
2006 present
9/5/21, 8:38 PM - Friend: Yup

If i go back to the actual chats those lines, the chat was multi-line and maybe a issue during the export, but not looking to fix that exporting, i just really want to parse out ONLY rows/lines that have a date at the beginning.

Here is one set of logic i have tried and got close, but cant seem to exclude the lines that aren't complete. I think the end goal here is to be able to parse the lines out in a way to get total messages per day. But if there is a way to clear out the orphan lines that dont have a date or who sent it, that may work as well..
import pandas as pd
from datetime import datetime as DateT

file2 = open("Dates.txt", "w",encoding='utf-8')

with open("Chat.txt", "r",encoding='utf-8') as file_in:
    lines = []
    for line in file_in:
        dt = line.partition('-')
        
        #datetime_obj = DateT.strptime(dt[0],'%m/%d/%y').date()

        print(dt[0])
        # print(line)
        # print(dt[0])

        file2.write(dt[0] + '\n')

        
file2.close()
Im only using python to learn some more about what it can do, If i could use it to clean up the file that would work, because i can then import it into excel and create a pivot table from all clean rows and get my total counts per day and then if i want to see the messages, i can get to them from the pivot table.

My first attempt, i was trying to just grab the "dates" out of the file and doing a count in excel for that, but because some rows didnt have dates, they were still being pulled in with the dt[0] method i was using.

Can anyone help suggest what i can do or focus on to try and clean out the rows that are not 100% complete?
Do i have to read each line and if doesnt start with a date, then delete or exclude from writing it to the file? If this is, what functions or methods would i need to look at using?
Reply


Messages In This Thread
Trying to parse and work with the WhatsApp export of chats - by cubangt - Dec-15-2021, 11:00 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Eliminate entering QR - Whatsapp web automated by selenium akanowhere 1 3,126 Jan-21-2024, 01:12 PM
Last Post: owalahtole
  Sending Whatsapp Message in the text file ebincharles869 9 3,650 Jun-21-2022, 04:26 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020