Apr-07-2020, 01:06 PM
Hi,
I am reading a csv and applying the def to remove unnecessary data.
If i apply for 174 rows,"dict = (dc_data['Description'].head(174).apply(process_text))" it gives below error.
If i specify 100 rows it works.
Requirements is to apply for all rows.
Any help is appreciated.
I am reading a csv and applying the def to remove unnecessary data.
If i apply for 174 rows,"dict = (dc_data['Description'].head(174).apply(process_text))" it gives below error.
If i specify 100 rows it works.
Requirements is to apply for all rows.
Any help is appreciated.
Error:Traceback (most recent call last):
File "C:\Python\test\DC\dc_mar2020.py", line 26, in <module>
dict = (ec_data['Description'].head(174).apply(process_text))
File "C:\Python\lib\site-packages\pandas\core\series.py", line 3848, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
File "C:\Python\test\DC\dc_mar2020.py", line 16, in process_text
nopunc = [char for char in text if char not in string.punctuation]
TypeError: 'float' object is not iterable
Code:-import pandas as pd from textblob import TextBlob import string import nltk from nltk.corpus import stopwords dc_data = pd.read_csv('dc.csv', encoding="ISO-8859-1", index_col=False) print(dc_data.head()) desc = dc_data['Description'] print(desc.shape) def process_text(text): #1 nopunc = [char for char in text if char not in string.punctuation] nopunc = ''.join(nopunc) #2 clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')] #3 return clean_words #Show the Tokenization (a list of tokens ) dict = (dc_data['Description'].head(174).apply(process_text)) print("Dict: ", dict)