Python Forum
Trouble importing text from a .docx file
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Trouble importing text from a .docx file
#1
I am trying tokenize words of a Word Document Doc.docx having a sentence This is a doc file. But unfortunately, each token is getting prefixed with a letter 'u'

from nltk .tokenize import word_tokenize
import docx

def getText(filename):
    doc = docx.Document(filename)
    fullText =
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

Text = getText('Doc.docx')
words = word_tokenize(Text)
print(words)
Output:
Output : [u'This', u'is', u'a', u'doc', u'file']
Expected Output : ['This', 'is', 'a', 'doc', 'file']
Reply


Messages In This Thread
Trouble importing text from a .docx file - by atinesh922 - Dec-24-2017, 04:46 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  no module named 'docx' when importing docx MaartenRo 1 1,265 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 4,323 Oct-17-2023, 06:03 PM
Last Post: Devan
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,240 Dec-15-2022, 04:32 PM
Last Post: Larz60+
  New2Python: Help with Importing/Mapping Image Src to Image Code in File CluelessITguy 0 789 Nov-17-2022, 04:46 PM
Last Post: CluelessITguy
  Use module docx to get text from a file with a table Pedroski55 8 6,784 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  python-docx regex: replace any word in docx text Tmagpy 4 2,389 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  Problem with importing Python file in Visual Studio Code DXav 7 5,566 Jun-15-2022, 12:54 PM
Last Post: snippsat
  importing functions from a separate python file in a separate directory Scordomaniac 3 1,508 May-17-2022, 07:49 AM
Last Post: Pedroski55
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,858 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Converted Pipe Delimited text file to CSV file atomxkai 4 7,316 Feb-11-2022, 12:38 AM
Last Post: atomxkai

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020