Python Forum

Hi,

I am reading a csv and applying the def to remove unnecessary data.
If i apply for 174 rows,"dict = (dc_data['Description'].head(174).apply(process_text))" it gives below error.
If i specify 100 rows it works.
Requirements is to apply for all rows.
Any help is appreciated.

Error:Traceback (most recent call last):
  File "C:\Python\test\DC\dc_mar2020.py", line 26, in <module>
    dict = (ec_data['Description'].head(174).apply(process_text))
  File "C:\Python\lib\site-packages\pandas\core\series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
  File "C:\Python\test\DC\dc_mar2020.py", line 16, in process_text
    nopunc = [char for char in text if char not in string.punctuation]
TypeError: 'float' object is not iterable

Code:-

import pandas as pd
from textblob import TextBlob
import string
import nltk
from nltk.corpus import stopwords

dc_data = pd.read_csv('dc.csv', encoding="ISO-8859-1", index_col=False)
print(dc_data.head())

desc = dc_data['Description']
print(desc.shape)

def process_text(text):
    
    #1
    nopunc = [char for char in text if char not in string.punctuation]
    nopunc = ''.join(nopunc)
    
    #2
    clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]
    
    #3
    return clean_words

#Show the Tokenization (a list of tokens )
dict = (dc_data['Description'].head(174).apply(process_text))
print("Dict: ", dict)

naab