Aug-29-2020, 06:37 AM
I am a newbie to regular expressions. In google colaboratory, while doing an assignment on bag of words based on twitter dataset, I gave my code like this to modify the twitter tweets from the given dataset.
Please help me out.
#Replacing 2 or more repetitions of character with the character itself def replaceTwoOrMore(s): pattern = re.compile(r"(.)\1{1,}", re.DOTALL) return pattern.sub(r"\1\1", s)
def tweets_reconstruction(tweet): #Removing numbers tweet = re.sub('[0-9]', '', tweet) #Convert to lower case tweet = tweet.lower() tweet = tweet.translate(str.maketrans('', '', string.punctuation)) #Replacing short URLs with "" tweet = re.sub(r"[\b(http)]+", "", tweet) #Replacing "@username" with "AT_USER" tweet = re.sub(r"[\b(@)]+", "AT_USER", tweet) #Replacing "#word" with "word" tweet = re.sub(r'#([^\s]+)', r'\1', tweet) #Replacing multiple whitespaces with single whitespace tweet = re.sub(r"[\s]+", " ", tweet) tweet = replaceTwoOrMore(tweet) return tweet
processedTweets = [] for tweet in tweets: processedTweets.append(tweets_reconstruction(tweet))
vectorizer = CountVectorizer() featurevector = vectorizer.fit_transform(processedTweets) featurevector.todense()While running, it is showing "Session crashed after using all available RAM"
Please help me out.