Python Forum
Please help NLP: Stanza - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Please help NLP: Stanza (/thread-37176.html)



Please help NLP: Stanza - xinyulon - May-08-2022

Here is the question:

The directory data contains 10 articles from the Estonian Wikipedia, whose filenames follow the pattern et_wiki_X.txt, in which X stands for a number that identifies the article.

Open each file, read the contents and store the resulting string objects into a list named texts.

Prepare these texts for processing using Stanza by creating Document objects without annotations.

Store the resulting Document objects into a list named docs_in.

Here is my answer:
import stanza
from pathlib import Path

nlp_et = stanza.Pipeline(lang='et')
corpus_dir = Path('data')
files = list(corpus_dir.glob(pattern='*_*_*.txt'))
for file in files:
    texts = []
    text = file.read_text(encoding='utf-8')
    texts.append(text)
    docs_in = []
    processed = nlp_et(text)
    docs_in.append(processed)
I wonder how can stanza create an Document object without annotations? Aren't stanza bound to have annotations? I tried put below, however it doesn't seem right. Could any one please offer a hint?
nlp_et = stanza.Pipeline(lang='et', processors = ' ')
Thank you so much! Heart