Unable to understand a statement in an existing code - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Unable to understand a statement in an existing code (/thread-28745.html) |
Unable to understand a statement in an existing code - ateestructural - Aug-01-2020 I have the following code: import nltk nltk.download('stopwords') import nltk.corpus import re import string # turn a doc into clean tokens from load_file_with_function import load_doc def clean_doc(doc): # split the tokens by white space tokens = doc.split() # prepare regex for char filtering re_punc = re.compile('[%s]' % re.escape((string.punctuation))) # remove punctuation from each wor tokens = [re_punc.sub('', w) for w in tokens] # remove remaining tokens that are not alphabetic tokens = [word for word in tokens if word.isalpha()] # filter out stop-words stop_words = set(nltk.corpus.stopwords.words('english')) # filter out short tokens tokens = [word for word in tokens if len(word) > 1] print(tokens)It is working because it is someone else's code - I have to work further on it I'm unable to understand how this statement below is filtering out non alphabets from my set of words (tokens) tokens = [word for word in tokens if word.isalpha()]I know about the string function isalpha() but do not follow how the "new" tokens get rid of non alphabets in a single statement like this. Can anyone please explain? RE: Unable to understand a statement in an existing code - deanhystad - Aug-01-2020 This is a list comprehension. It is a compact way of writing this: temp = [] for word in tokens: if word.isalpha() temp.append(word) tokens = temp tokens = [] says the resulting list is assigned to "tokens".[word for word in tokens] says the list is going to be made up of words from the original "tokens".if isalpha(word) says only include words that are "isalpha".
|