Mar-08-2018, 03:18 PM
#find sentences containing HTML tags i=0; for sent in final['Text'].values: if (len(re.findall('<.*?>', sent))): print(i) print(sent) break; i += 1;
Need Help with this peace of code.PLease Exlain Each line of the code.
|
Mar-08-2018, 03:18 PM
#find sentences containing HTML tags i=0; for sent in final['Text'].values: if (len(re.findall('<.*?>', sent))): print(i) print(sent) break; i += 1;
Mar-08-2018, 03:20 PM
import re import string from nltk.corpus import stopwords from nltk.stem import PorterStemmer from nltk.stem.wordnet import WordNetLemmatizer stop = set(stopwords.words('english')) #set of stopwords sno = nltk.stem.SnowballStemmer('english') #initialising the snowball stemmer def cleanhtml(sentence): #function to clean the word of any html-tags cleanr = re.compile('<.*?>') cleantext = re.sub(cleanr, ' ', sentence) return cleantext def cleanpunc(sentence): #function to clean the word of any punctuation or special characters cleaned = re.sub(r'[?|!|\'|"|#]',r'',sentence) cleaned = re.sub(r'[.|,|)|(|\|/]',r' ',cleaned) return cleaned print(stop) print('************************************') print(sno.stem('tasty'))
Mar-08-2018, 03:22 PM
#Code for implementing step-by-step the checks mentioned in the pre-processing phase # this code takes a while to run as it needs to run on 500k sentences. i=0 str1=' ' final_string=[] all_positive_words=[] # store words from +ve reviews here all_negative_words=[] # store words from -ve reviews here. s='' for sent in final['Text'].values: filtered_sentence=[] #print(sent); sent=cleanhtml(sent) # remove HTMl tags for w in sent.split(): for cleaned_words in cleanpunc(w).split(): if((cleaned_words.isalpha()) & (len(cleaned_words)>2)): if(cleaned_words.lower() not in stop): s=(sno.stem(cleaned_words.lower())).encode('utf8') filtered_sentence.append(s) if (final['Score'].values)[i] == 'positive': all_positive_words.append(s) #list of all words used to describe positive reviews if(final['Score'].values)[i] == 'negative': all_negative_words.append(s) #list of all words used to describe negative reviews reviews else: continue else: continue #print(filtered_sentence) str1 = b" ".join(filtered_sentence) #final string of cleaned words #print("***********************************************************************") final_string.append(str1) i+=1
I have merge your 3 threads,it was a close call with delete
![]() We are not gone explain what each line dos. You have to do the effort,it there are certain lines you wonder about ask about them.
Mar-08-2018, 04:28 PM
(This post was last modified: Mar-08-2018, 04:29 PM by AkashDubey.)
Then, Please give an idea of what all the above three code snippets do briefly explaining each code snippet in detail.
(Mar-08-2018, 03:49 PM)snippsat Wrote: I have merge your 3 threads,it was a close call with delete Then, Please give an idea of what all the above three code snippets do briefly explaining each code snippet in detail.
Mar-08-2018, 06:55 PM
The code is already very well documented as to what it does. As was pointed out, we are not going to explain the purpose of each line. You might want to start with finding a tutorial for Python beginners to learn the basics of the language.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch" Python 3.6.5, IDE: PyCharm 2018 Community Edition |
|