Please help NLP: Stanza - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Homework (https://python-forum.io/forum-9.html) +--- Thread: Please help NLP: Stanza (/thread-37176.html) |
Please help NLP: Stanza - xinyulon - May-08-2022 Here is the question: The directory data contains 10 articles from the Estonian Wikipedia, whose filenames follow the pattern et_wiki_X.txt, in which X stands for a number that identifies the article.Open each file, read the contents and store the resulting string objects into a list named texts. Prepare these texts for processing using Stanza by creating Document objects without annotations. Store the resulting Document objects into a list named docs_in .Here is my answer: import stanza from pathlib import Path nlp_et = stanza.Pipeline(lang='et') corpus_dir = Path('data') files = list(corpus_dir.glob(pattern='*_*_*.txt')) for file in files: texts = [] text = file.read_text(encoding='utf-8') texts.append(text) docs_in = [] processed = nlp_et(text) docs_in.append(processed)I wonder how can stanza create an Document object without annotations? Aren't stanza bound to have annotations? I tried put below, however it doesn't seem right. Could any one please offer a hint? nlp_et = stanza.Pipeline(lang='et', processors = ' ')Thank you so much! |