Python Forum
lexical diversity calculation
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
lexical diversity calculation
#1
How does one limit text length when calculating lexical diversity? For instance, say I would like to calculate the lexical diversity of TextA. While its total text length is 10,000, I would like to consider only the first 1,000 tokens of Text A. Thank you.
Reply
#2
you can slice off what you want:
>>> zz = 'This is a rather long rambling, uninformative sentence.'
>>> zstr = zz[:20]
>>> len(zstr)
20
>>> zstr
'This is a rather lon'
>>>
But this is not exactly what you are asking.
so:
>>> tokens = zz.split()
>>> tokens
['This', 'is', 'a', 'rather', 'long', 'rambling,', 'uninformative', 'sentence.']
>>> yy = ' '.join(tokens[:5])
>>> yy
'This is a rather long'
>>>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [NLTK] How to calculate lexical diversity scores in Jupyter? vanicci 4 11,423 Sep-01-2018, 09:43 AM
Last Post: vanicci

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020