Python Forum

How does one limit text length when calculating lexical diversity? For instance, say I would like to calculate the lexical diversity of TextA. While its total text length is 10,000, I would like to consider only the first 1,000 tokens of Text A. Thank you.

you can slice off what you want:

>>> zz = 'This is a rather long rambling, uninformative sentence.'
>>> zstr = zz[:20]
>>> len(zstr)
20
>>> zstr
'This is a rather lon'
>>>

But this is not exactly what you are asking.
so:

>>> tokens = zz.split()
>>> tokens
['This', 'is', 'a', 'rather', 'long', 'rambling,', 'uninformative', 'sentence.']
>>> yy = ' '.join(tokens[:5])
>>> yy
'This is a rather long'
>>>

AOCL1234

Larz60+