(Dec-21-2020, 04:14 AM)bowlofred Wrote: In this case you only have one line above it. Move that line below the function, and leave a couple of blank lines below the function to serve as a visual break.
Now you can start reading your code execution from that point (and mentally ignore the stuff inside the function).
You can't use a variable (likehistory
) unless you've assigned a value to it. The assignment insideread_history()
doesn't count.
You call the function later in the for loop, and assignhistory
at that time (line 59). You can refer to or printhistory
anytime after that.
Thank you for input!! I have made the changes you recommended. I unindented lines 63+ and moved that one line above the function to south. I feel I have made progress because I am getting new errors. It now says it can't concante the one file that is saved in the data folder. I know there is only one file in the data folder and there is nothing to concate. But why is it giving me an error. Please have a look.
Do you have any thoughts about this new error.
thanks,
#Very good file. Third revision! import os import pandas as pd pd.set_option('display.max_rows', 500) import itertools import datetime as dt from matplotlib import pyplot as plt import matplotlib as mpl mpl.use('Agg') import numpy as np import matplotlib.pyplot as plt #%matplotlib inline #from IPython import get_ipython #get_ipython().run_line_magic('matplotlib', 'inline') import seaborn as sns import re from collections import Counter import string import emoji import pickle import numpy as np import matplotlib.pyplot as plt import matplotlib.dates as mdates import matplotlib.cbook as cbook import pandas as pd import time import sys from wordcloud import WordCloud, STOPWORDS from PIL import Image def read_history(file,conv_type): f = open('data'.format(conv_type,file), 'r',) # Feed the file text into findall(); it returns a list of all the found strings messages = re.findall('\[(\d+-\d+-\d+, \d+:\d+:\d+ [A-Z]*)\] (.*?): (.*)', f.read()) f.close() #Convert list to a dataframe and name columns history = pd.DataFrame(messages,columns =['date','name','msg']) history['date'] = pd.to_datetime(history['date'],format="%Y-%m-%d, %I:%M:%S %p") history['date1'] = history['date'].apply(lambda x: x.date()) history['msg_len'] = history['msg'].str.len() history['conv_name'] = file[19:-4] history['conv_name'] = file[19:-4] # Get Media shared in the Message history['Media']=history['msg'].str.contains('omitted') return history files_groups = os.listdir('data/') all = [] for file in files_groups: history = read_history(file,'') history['tipo'] = 'g' all.append(history) history = pd.concat(all).reset_index() history_clean = history[history['msg']!=' <Media omitted>'].sort_values(by=['conv_name','name','date1']) history_clean.shapethe error is this now
Error:Matplotlib created a temporary config/cache directory at /tmp/matplotlib-5xolmerl because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ium3ovbd because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-2qnvo_ba because the default path (/config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File "main.py", line 63, in <module>
history = pd.concat(all).reset_index()
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 274, in concat
op = _Concatenator(
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate