Python Forum
How to store the resulting Doc objects into a list named A - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to store the resulting Doc objects into a list named A (/thread-36593.html)



How to store the resulting Doc objects into a list named A - xinyulon - Mar-08-2022

Hey folks. I am a real new beginner for python and NLP.
I am stuck with this question: Store the resulting Doc objects into a list name A"
Is it possible to store it by using "A=list(doc)". How could it still remain doc objects under list? Thank you so much!

This is my exercise:
1. Load text files from a directory and read their contentsΒΆ
The directory data contains a subdirectory named sotu with five State of the Union speeches from various presidents of the United States, which are stored as UTF-8 encoded plain text files.

Import the pathlib module and use the module to read the contents of each text file into string objects.

Then import the spacy library and load a small language model for English. Assign the model under the variable nlp.

Process the texts using the language model and store the resulting Doc objects into a list named speeches.

And my answer:
from pathlib import Path
corpus_dir = Path ("data/sotu")
files = list(corpus_dir.glob(pattern='*.txt'))
for file in files:
    text = file.read_text(encoding='utf-8')
    import spacy
    nlp=spacy.load('en_core_web_sm')
    doc=nlp(text)
    speeches=list(doc)



RE: How to store the resulting Doc objects into a list named A - bowlofred - Mar-08-2022

Generally you should put imports at the start of your program. There's no reason to put the import inside a loop where it is run multiple times.

To create a list of things, generally you can append() each of the things to your list. So something like.

speeches = [] # list is created outside the loop
for file in files:
    # create the doc
    speeches.append(doc) # items added to the list inside the loop.