Python Forum
Error while parsing tables from docx file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Error while parsing tables from docx file
#1
Hi all,
I am currently using Python3.6 and I am using the python-docx package to parse document files:
The code:

from docx import Document
my_doc = Document(doc)

def extract(my_doc,w1,w2):
tabdata = []
for table in my_doc.tables: #looping through all tables in the .docx file
if re.search("My String", table.cell(0,1).text, re.IGNORECASE): # table
for row in table.rows: #looping through all rows in the table under consideration
for cell in row.cells:
tabdata = cell.text

For multiple document files, I am facing different errors for the same code. This document contains a combination of text and tables, and I am trying to parse just the tables.
For certain document files, I am able to parse the file when it contains both text and tables.
But for certain other files this error shows up.
All the files are similar and contain the keywords and tables I am searching for using the re.search() function. All the tables in the different files have equal number of rows and columns.
The error doesn’t show up if the document file contains only the tables and no other text/paragraphs.
I am unsure if this issue lies with a corrupted docx file, the docx file contains characters not parsed by my script or if I am missing some part in the script.

The error I am facing:

Traceback (most recent call last):
File "my_python_script.py", line 579, in main_1
extract(mld,w1,w2)
File "my_python_script.py", line 128, in extract
if re.search("My String", table.cell(0,1).text, re.IGNORECASE):
File $PYTHONPATH/python3.6/site-packages/docx/table.py", line 81, in cell
return self._cells[cell_idx]
IndexError: list index out of range

Any help on this issue would be much appreciated!
Reply


Messages In This Thread
Error while parsing tables from docx file - by aditi - Jul-10-2020, 08:04 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  no module named 'docx' when importing docx MaartenRo 1 1,197 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 4,127 Oct-17-2023, 06:03 PM
Last Post: Devan
Video doing data treatment on a file import-parsing a variable EmBeck87 15 3,138 Apr-17-2023, 06:54 PM
Last Post: EmBeck87
  Use module docx to get text from a file with a table Pedroski55 8 6,681 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  python-docx regex: replace any word in docx text Tmagpy 4 2,366 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,827 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Parsing xml file deletes whitespaces. How to avoid it? Paqqno 0 1,095 Apr-01-2022, 10:20 PM
Last Post: Paqqno
  Parsing a syslog file ebolisa 11 4,394 Oct-10-2021, 05:15 PM
Last Post: snippsat
Thumbs Up Parsing a YAML file without changing the string content..?, Flask - solved. SpongeB0B 2 2,354 Aug-05-2021, 08:02 AM
Last Post: SpongeB0B
  Rename docx file from tuple gjack 2 2,293 Oct-20-2020, 05:33 PM
Last Post: gjack

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020