Python Forum
lexical diversity calculation - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: lexical diversity calculation (/thread-27886.html)



lexical diversity calculation - AOCL1234 - Jun-25-2020

How does one limit text length when calculating lexical diversity? For instance, say I would like to calculate the lexical diversity of TextA. While its total text length is 10,000, I would like to consider only the first 1,000 tokens of Text A. Thank you.


RE: lexical diversity calculation - Larz60+ - Jun-26-2020

you can slice off what you want:
>>> zz = 'This is a rather long rambling, uninformative sentence.'
>>> zstr = zz[:20]
>>> len(zstr)
20
>>> zstr
'This is a rather lon'
>>>
But this is not exactly what you are asking.
so:
>>> tokens = zz.split()
>>> tokens
['This', 'is', 'a', 'rather', 'long', 'rambling,', 'uninformative', 'sentence.']
>>> yy = ' '.join(tokens[:5])
>>> yy
'This is a rather long'
>>>