Python Forum
Trouble importing text from a .docx file
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Trouble importing text from a .docx file
#2
 But unfortunately, each token is getting prefixed with a letter 'u'
That is basically representing that each token is a unicode string. Try this to get rid of it

Text = getText('Doc.docx')
words = word_tokenize(Text)
words = map(str, words)
print(words)
In Python3 every string is unicode and therefore you wont get this issue (in fact its not even an issue). Use python3 or the trick above if using python2.
Reply


Messages In This Thread
RE: Trouble importing text from a .docx file - by hshivaraj - Dec-24-2017, 11:53 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  no module named 'docx' when importing docx MaartenRo 1 1,094 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 3,927 Oct-17-2023, 06:03 PM
Last Post: Devan
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,190 Dec-15-2022, 04:32 PM
Last Post: Larz60+
  New2Python: Help with Importing/Mapping Image Src to Image Code in File CluelessITguy 0 766 Nov-17-2022, 04:46 PM
Last Post: CluelessITguy
  Use module docx to get text from a file with a table Pedroski55 8 6,513 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  python-docx regex: replace any word in docx text Tmagpy 4 2,349 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  Problem with importing Python file in Visual Studio Code DXav 7 5,383 Jun-15-2022, 12:54 PM
Last Post: snippsat
  importing functions from a separate python file in a separate directory Scordomaniac 3 1,438 May-17-2022, 07:49 AM
Last Post: Pedroski55
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,779 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Converted Pipe Delimited text file to CSV file atomxkai 4 7,171 Feb-11-2022, 12:38 AM
Last Post: atomxkai

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020