Posts: 14
Threads: 5
Joined: Dec 2020
Dec-21-2020, 06:04 AM
(This post was last modified: Dec-21-2020, 06:04 AM by yk303.)
(Dec-21-2020, 04:14 AM)bowlofred Wrote: In this case you only have one line above it. Move that line below the function, and leave a couple of blank lines below the function to serve as a visual break.
Now you can start reading your code execution from that point (and mentally ignore the stuff inside the function).
You can't use a variable (like history ) unless you've assigned a value to it. The assignment inside read_history() doesn't count.
You call the function later in the for loop, and assign history at that time (line 59). You can refer to or print history anytime after that.
Thank you for input!! I have made the changes you recommended. I unindented lines 63+ and moved that one line above the function to south. I feel I have made progress because I am getting new errors. It now says it can't concante the one file that is saved in the data folder. I know there is only one file in the data folder and there is nothing to concate. But why is it giving me an error. Please have a look.
Do you have any thoughts about this new error.
thanks,
#Very good file. Third revision!
import os
import pandas as pd
pd.set_option('display.max_rows', 500)
import itertools
import datetime as dt
from matplotlib import pyplot as plt
import matplotlib as mpl
mpl.use('Agg')
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline
#from IPython import get_ipython
#get_ipython().run_line_magic('matplotlib', 'inline')
import seaborn as sns
import re
from collections import Counter
import string
import emoji
import pickle
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
import pandas as pd
import time
import sys
from wordcloud import WordCloud, STOPWORDS
from PIL import Image
def read_history(file,conv_type):
f = open('data'.format(conv_type,file), 'r',)
# Feed the file text into findall(); it returns a list of all the found strings
messages = re.findall('\[(\d+-\d+-\d+, \d+:\d+:\d+ [A-Z]*)\] (.*?): (.*)', f.read())
f.close()
#Convert list to a dataframe and name columns
history = pd.DataFrame(messages,columns =['date','name','msg'])
history['date'] = pd.to_datetime(history['date'],format="%Y-%m-%d, %I:%M:%S %p")
history['date1'] = history['date'].apply(lambda x: x.date())
history['msg_len'] = history['msg'].str.len()
history['conv_name'] = file[19:-4]
history['conv_name'] = file[19:-4]
# Get Media shared in the Message
history['Media']=history['msg'].str.contains('omitted')
return history
files_groups = os.listdir('data/')
all = []
for file in files_groups:
history = read_history(file,'')
history['tipo'] = 'g'
all.append(history)
history = pd.concat(all).reset_index()
history_clean = history[history['msg']!=' <Media omitted>'].sort_values(by=['conv_name','name','date1'])
history_clean.shape the error is this now
Error: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-5xolmerl because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ium3ovbd because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-2qnvo_ba because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File "main.py", line 63, in <module>
history = pd.concat(all).reset_index()
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 274, in concat
op = _Concatenator(
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
Posts: 1,583
Threads: 3
Joined: Mar 2020
all is normally a built-in function. By using that name as your variable, you can't use that function. I would recommend using another name.
Are you sure the file is being found? Perhaps right before line 63 you could add some diagnostic messaging.
...
print(f"all has {len(all)} elements")
history = pd.concat(all).reset_index()
... If it has zero elements, you'll have to diagnose why. Presumably the file isn't being found.
Posts: 14
Threads: 5
Joined: Dec 2020
(Dec-21-2020, 07:18 AM)bowlofred Wrote: all is normally a built-in function. By using that name as your variable, you can't use that function. I would recommend using another name.
Are you sure the file is being found? Perhaps right before line 63 you could add some diagnostic messaging.
...
print(f"all has {len(all)} elements")
history = pd.concat(all).reset_index()
... If it has zero elements, you'll have to diagnose why. Presumably the file isn't being found.
the file is not being found. There are zero elements. I will google on how to pull a file in Repl.it. Its very strange. I have pasted my new code and new error.
#Very good file. Third revision!
import os
import pandas as pd
pd.set_option('display.max_rows', 500)
import itertools
import datetime as dt
from matplotlib import pyplot as plt
import matplotlib as mpl
mpl.use('Agg')
import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline
#from IPython import get_ipython
#get_ipython().run_line_magic('matplotlib', 'inline')
import seaborn as sns
import re
from collections import Counter
import string
import emoji
import pickle
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
import pandas as pd
import time
import sys
from wordcloud import WordCloud, STOPWORDS
from PIL import Image
def read_history(file,conv_type):
f = open('data'.format(conv_type,file), 'r',)
# Feed the file text into findall(); it returns a list of all the found strings
messages = re.findall('\[(\d+-\d+-\d+, \d+:\d+:\d+ [A-Z]*)\] (.*?): (.*)', f.read())
f.close()
#Convert list to a dataframe and name columns
history = pd.DataFrame(messages,columns =['date','name','msg'])
history['date'] = pd.to_datetime(history['date'],format="%Y-%m-%d, %I:%M:%S %p")
history['date1'] = history['date'].apply(lambda x: x.date())
history['msg_len'] = history['msg'].str.len()
history['conv_name'] = file[19:-4]
history['conv_name'] = file[19:-4]
# Get Media shared in the Message
history['Media']=history['msg'].str.contains('omitted')
return history
files_groups = os.listdir('data/')
allnow = []
for file in files_groups:
history = read_history(file,'')
history['tipo'] = 'g'
allnow.append(history)
print(f"all has {len(allnow)} elements")
history = pd.concat(allnow).reset_index()
history_clean = history[history['msg']!=' <Media omitted>'].sort_values(by=['conv_name','name','date1'])
history_clean.shape error I am getting:
Error: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-f7k83q3t because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
all has 0 elements
Traceback (most recent call last):
File "main.py", line 63, in <module>
history = pd.concat(allnow).reset_index()
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 274, in concat
op = _Concatenator(
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
Posts: 14
Threads: 5
Joined: Dec 2020
(Dec-21-2020, 07:18 AM)bowlofred Wrote: all is normally a built-in function. By using that name as your variable, you can't use that function. I would recommend using another name.
Are you sure the file is being found? Perhaps right before line 63 you could add some diagnostic messaging.
...
print(f"all has {len(all)} elements")
history = pd.concat(all).reset_index()
... If it has zero elements, you'll have to diagnose why. Presumably the file isn't being found.
Hello,
The file is not being found. I googled and I can say that my code to access the text whatsapp chat is the right one for Repl. Is there something else I can try.
thanks,
YK
Posts: 1,583
Threads: 3
Joined: Mar 2020
Depends on the environment and how the data is saved. Your code seems to presume there is a data directory. That may or may not be true.
Perhaps just look at the output of os.listdir() . Do you see the files you expect? If that works, you can give it other directories until you find the file you're expecting.
|