May-08-2022, 09:53 PM
Here is the question:
The directory
Open each file, read the contents and store the resulting string objects into a list named texts.
Prepare these texts for processing using Stanza by creating Document objects without annotations.
Store the resulting Document objects into a list named
Here is my answer:

The directory
data
contains 10 articles from the Estonian Wikipedia, whose filenames follow the pattern et_wiki_X.txt, in which X stands for a number that identifies the article.Open each file, read the contents and store the resulting string objects into a list named texts.
Prepare these texts for processing using Stanza by creating Document objects without annotations.
Store the resulting Document objects into a list named
docs_in
.Here is my answer:
import stanza from pathlib import Path nlp_et = stanza.Pipeline(lang='et') corpus_dir = Path('data') files = list(corpus_dir.glob(pattern='*_*_*.txt')) for file in files: texts = [] text = file.read_text(encoding='utf-8') texts.append(text) docs_in = [] processed = nlp_et(text) docs_in.append(processed)I wonder how can stanza create an Document object without annotations? Aren't stanza bound to have annotations? I tried put below, however it doesn't seem right. Could any one please offer a hint?
nlp_et = stanza.Pipeline(lang='et', processors = ' ')Thank you so much!
