Trouble importing text from a .docx file

Thread Rating:

1 Vote(s) - 4 Average
1
2
3
4
5

Thread Modes

Trouble importing text from a .docx file

atinesh922
Unladen Swallow

Posts: 3

Threads: 2

Joined: Feb 2017

Reputation: 0

Dec-24-2017, 04:46 PM

I am trying tokenize words of a Word Document Doc.docx having a sentence This is a doc file. But unfortunately, each token is getting prefixed with a letter 'u'

from nltk .tokenize import word_tokenize
import docx

def getText(filename):
    doc = docx.Document(filename)
    fullText =
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

Text = getText('Doc.docx')
words = word_tokenize(Text)
print(words)

Output:
Output : [u'This', u'is', u'a', u'doc', u'file']

Expected Output : ['This', 'is', 'a', 'doc', 'file']

Find

Messages In This Thread

Trouble importing text from a .docx file - by atinesh922 - Dec-24-2017, 04:46 PM

RE: Trouble importing text from a .docx file - by hshivaraj - Dec-24-2017, 11:53 PM

RE: Trouble importing text from a .docx file - by nilamo - Jan-03-2018, 05:29 PM

RE: Trouble importing text from a .docx file - by snippsat - Jan-03-2018, 06:35 PM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	no module named 'docx' when importing docx	MaartenRo	1	1,332	Dec-31-2023, 11:21 AM Last Post: deanhystad
	Replace a text/word in docx file using Python	Devan	4	4,504	Oct-17-2023, 06:03 PM Last Post: Devan
	Need to compare the Excel file name with a directory text file.	veeran1991	1	1,273	Dec-15-2022, 04:32 PM Last Post: Larz60+
	New2Python: Help with Importing/Mapping Image Src to Image Code in File	CluelessITguy	0	800	Nov-17-2022, 04:46 PM Last Post: CluelessITguy
	Use module docx to get text from a file with a table	Pedroski55	8	6,888	Aug-30-2022, 10:52 PM Last Post: Pedroski55
	python-docx regex: replace any word in docx text	Tmagpy	4	2,441	Jun-18-2022, 09:12 AM Last Post: Tmagpy
	Problem with importing Python file in Visual Studio Code	DXav	7	5,638	Jun-15-2022, 12:54 PM Last Post: snippsat
	importing functions from a separate python file in a separate directory	Scordomaniac	3	1,537	May-17-2022, 07:49 AM Last Post: Pedroski55
	Modify values in XML file by data from text file (without parsing)	Paqqno	2	1,877	Apr-13-2022, 06:02 AM Last Post: Paqqno
	Converted Pipe Delimited text file to CSV file	atomxkai	4	7,408	Feb-11-2022, 12:38 AM Last Post: atomxkai

Users browsing this thread: 1 Guest(s)

View a Printable Version

Trouble importing text from a .docx file

User Panel Messages

Announcements