Python Forum

Here is the question:

Get sentences from Stanza Document objects.

The directory data contains a file named docs.pkl, which contains 10 articles from the Estonian Wikipedia that have been processed using Stanza. The code provided in the cell below loads these Document objects and stores them into a list nameddocs.

Assume that we want to study the linguistic features of introductory sentences in Estonian Wikipedia.

The first sentence in each Document object in the list docs corresponds to the title of the article, which means we must retrieve the second sentence in the Document.

Collect the second sentence of each Document object into a list named intros.

When trying to run docs, it returns a dictionary-like list which is segmented by each token and its corresponding linguistic annotations. I wanted to make a text-like segmented doc object where I can apply

docs.sents/docs.sentence

attributes directly however I don't know how.
I am really clueless here and wrote some random steps. But I assume it shouldn't be too hard but somehow just can't get it right. Please offer any hint if you happen to know this! Thank you!

# Import the 'pickle' module from Python for serializing data
import pickle
import spacy
import spacy_stanza

# Open the file with pickled Stanza Document objects for reading
with open('data/docs.bin', mode='rb') as f:
    
    # Load the pickled Documents and assign under variable 'docs'
    docs = pickle.load(f)

# Write your answer below this line. Please enter your entire solution in this cell.
intros = []
#for doc in docs:
           
docs[1].text

I am not promising anything here, but wanted to mention that I have used NLTK quite a bit, but never played with stanza. After reading a bit, i decided that I'll learn more about stanza as it has stirred my curiosity.
I'll start by seeing if I can answer your question. Give me a few days as I have to fit this in with other projects, if you don't already have an answer by then, I'll make an attempt.

(May-09-2022, 11:27 AM)Larz60+ Wrote: [ -> ]I am not promising anything here, but wanted to mention that I have used NLTK quite a bit, but never played with stanza. After reading a bit, i decided that I'll learn more about stanza as it has stirred my curiosity.
I'll start by seeing if I can answer your question. Give me a few days as I have to fit this in with other projects, if you don't already have an answer by then, I'll make an attempt.

Hi thank you so much, I think I have figured it out, however, got stucked with another one!

xinyulon

Larz60+

xinyulon