But unfortunately, each token is getting prefixed with a letter 'u'That is basically representing that each token is a unicode string. Try this to get rid of it
Text = getText('Doc.docx') words = word_tokenize(Text) words = map(str, words) print(words)In Python3 every string is unicode and therefore you wont get this issue (in fact its not even an issue). Use python3 or the trick above if using python2.