Python Forum
Please help NLP: Stanza
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Please help NLP: Stanza
#1
Here is the question:

The directory data contains 10 articles from the Estonian Wikipedia, whose filenames follow the pattern et_wiki_X.txt, in which X stands for a number that identifies the article.

Open each file, read the contents and store the resulting string objects into a list named texts.

Prepare these texts for processing using Stanza by creating Document objects without annotations.

Store the resulting Document objects into a list named docs_in.

Here is my answer:
import stanza
from pathlib import Path

nlp_et = stanza.Pipeline(lang='et')
corpus_dir = Path('data')
files = list(corpus_dir.glob(pattern='*_*_*.txt'))
for file in files:
    texts = []
    text = file.read_text(encoding='utf-8')
    texts.append(text)
    docs_in = []
    processed = nlp_et(text)
    docs_in.append(processed)
I wonder how can stanza create an Document object without annotations? Aren't stanza bound to have annotations? I tried put below, however it doesn't seem right. Could any one please offer a hint?
nlp_et = stanza.Pipeline(lang='et', processors = ' ')
Thank you so much! Heart
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020