Unable to understand a statement in an existing code

ateestructural · (This post was last modified: Aug-01-2020, 08:54 PM by ateestructural.)

I have the following code:

import nltk

nltk.download('stopwords')

import nltk.corpus
import re
import string

# turn a doc into clean tokens
from load_file_with_function import load_doc


def clean_doc(doc):
    # split the tokens by white space
    tokens = doc.split()
    # prepare regex for char filtering
    re_punc = re.compile('[%s]' % re.escape((string.punctuation)))
    # remove punctuation from each wor
    tokens = [re_punc.sub('', w) for w in tokens]
    # remove remaining tokens that are not alphabetic
    tokens = [word for word in tokens if word.isalpha()]
    # filter out stop-words
    stop_words = set(nltk.corpus.stopwords.words('english'))
    
    # filter out short tokens
    tokens = [word for word in tokens if len(word) > 1]
    print(tokens)

It is working because it is someone else's code - I have to work further on it

I'm unable to understand how this statement below is filtering out non alphabets from my set of words (tokens)

tokens = [word for word in tokens if word.isalpha()]

I know about the string function isalpha() but do not follow how the "new" tokens get rid of non alphabets in a single statement like this. Can anyone please explain?

**deanhystad** · (This post was last modified: Aug-01-2020, 09:44 PM by deanhystad.)

This is a list comprehension. It is a compact way of writing this:

temp = []
for word in tokens:
    if word.isalpha()
        temp.append(word)
tokens = temp

tokens = [] says the resulting list is assigned to "tokens".
[word for word in tokens] says the list is going to be made up of words from the original "tokens".
if isalpha(word) says only include words that are "isalpha".

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python best library for Excel reports & review of existing code	MasterOfDestr	4	683	Feb-14-2024, 03:39 PM Last Post: MasterOfDestr
	Unable to understand the meaning of the line of code.	jahuja73	0	314	Jan-23-2024, 05:09 AM Last Post: jahuja73
	Python code: While loop with if statement	HAMOUDA	1	585	Sep-18-2023, 11:18 AM Last Post: deanhystad
	An unexplainable error in .format statement - but only in a larger piece of code?	ToniE	4	728	Sep-05-2023, 12:50 PM Last Post: ToniE
	code won't advance to next statement	MCL169	2	766	Apr-11-2023, 09:44 PM Last Post: Larz60+
	add mqtt feature to existing code	Positron79	0	600	Jan-31-2023, 05:56 PM Last Post: Positron79
	List Creation and Position of Continue Statement In Regular Expression Code	new_coder_231013	3	1,685	Jun-15-2022, 12:00 PM Last Post: new_coder_231013
	Visual studio code unable to color syntax on python interpreter	tomtom	4	6,936	Mar-02-2022, 01:23 AM Last Post: tomtom
	Unable to understand how given code takes a fixed input value, without inputting.	jahuja73	4	2,715	Jan-28-2021, 05:22 PM Last Post: snippsat
	Don't understand example code from a textbook	lil_fentanyl	1	1,848	Jan-25-2021, 07:02 PM Last Post: nilamo

Unable to understand a statement in an existing code

User Panel Messages

Announcements