Python Forum
Unable to understand a statement in an existing code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unable to understand a statement in an existing code
#1
I have the following code:

import nltk

nltk.download('stopwords')

import nltk.corpus
import re
import string

# turn a doc into clean tokens
from load_file_with_function import load_doc


def clean_doc(doc):
    # split the tokens by white space
    tokens = doc.split()
    # prepare regex for char filtering
    re_punc = re.compile('[%s]' % re.escape((string.punctuation)))
    # remove punctuation from each wor
    tokens = [re_punc.sub('', w) for w in tokens]
    # remove remaining tokens that are not alphabetic
    tokens = [word for word in tokens if word.isalpha()]
    # filter out stop-words
    stop_words = set(nltk.corpus.stopwords.words('english'))
    
    # filter out short tokens
    tokens = [word for word in tokens if len(word) > 1]
    print(tokens)
It is working because it is someone else's code - I have to work further on it

I'm unable to understand how this statement below is filtering out non alphabets from my set of words (tokens)

tokens = [word for word in tokens if word.isalpha()]
I know about the string function isalpha() but do not follow how the "new" tokens get rid of non alphabets in a single statement like this. Can anyone please explain?
Reply
#2
This is a list comprehension. It is a compact way of writing this:
temp = []
for word in tokens:
    if word.isalpha()
        temp.append(word)
tokens = temp
tokens = [] says the resulting list is assigned to "tokens".
[word for word in tokens] says the list is going to be made up of words from the original "tokens".
if isalpha(word) says only include words that are "isalpha".
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python best library for Excel reports & review of existing code MasterOfDestr 4 493 Feb-14-2024, 03:39 PM
Last Post: MasterOfDestr
  Unable to understand the meaning of the line of code. jahuja73 0 268 Jan-23-2024, 05:09 AM
Last Post: jahuja73
Photo Python code: While loop with if statement HAMOUDA 1 535 Sep-18-2023, 11:18 AM
Last Post: deanhystad
  An unexplainable error in .format statement - but only in a larger piece of code? ToniE 4 656 Sep-05-2023, 12:50 PM
Last Post: ToniE
  code won't advance to next statement MCL169 2 722 Apr-11-2023, 09:44 PM
Last Post: Larz60+
  add mqtt feature to existing code Positron79 0 567 Jan-31-2023, 05:56 PM
Last Post: Positron79
  List Creation and Position of Continue Statement In Regular Expression Code new_coder_231013 3 1,601 Jun-15-2022, 12:00 PM
Last Post: new_coder_231013
Photo Visual studio code unable to color syntax on python interpreter tomtom 4 6,677 Mar-02-2022, 01:23 AM
Last Post: tomtom
  Unable to understand how given code takes a fixed input value, without inputting. jahuja73 4 2,646 Jan-28-2021, 05:22 PM
Last Post: snippsat
  Don't understand example code from a textbook lil_fentanyl 1 1,798 Jan-25-2021, 07:02 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020