lexical diversity calculation

AOCL1234 · Jun-25-2020, 08:52 PM

How does one limit text length when calculating lexical diversity? For instance, say I would like to calculate the lexical diversity of TextA. While its total text length is 10,000, I would like to consider only the first 1,000 tokens of Text A. Thank you.

**Larz60+** · (This post was last modified: Jun-26-2020, 03:36 AM by Larz60+.)

you can slice off what you want:

>>> zz = 'This is a rather long rambling, uninformative sentence.'
>>> zstr = zz[:20]
>>> len(zstr)
20
>>> zstr
'This is a rather lon'
>>>

But this is not exactly what you are asking.
so:

>>> tokens = zz.split()
>>> tokens
['This', 'is', 'a', 'rather', 'long', 'rambling,', 'uninformative', 'sentence.']
>>> yy = ' '.join(tokens[:5])
>>> yy
'This is a rather long'
>>>

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[NLTK] How to calculate lexical diversity scores in Jupyter?	vanicci	4	13,170	Sep-01-2018, 09:43 AM Last Post: vanicci

lexical diversity calculation

User Panel Messages

Announcements