Python Forum
Use module docx to get text from a file with a table
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Use module docx to get text from a file with a table
#4
This finds tables in a document and converts them to dataframes.
from docx import Document
from docx.document import Document as _Document
from docx.oxml.table import CT_Tbl
from docx.table import _Cell, Table
import pandas as pd

def tables(parent):
    if isinstance(parent, _Document):
        element = parent.element.body
    elif isinstance(parent, _Cell):
        element = parent._tc

    for child in element.iterchildren():
        if isinstance(child, CT_Tbl):
            table = Table(child, parent)
            data = [[cell.text for cell in row.cells] for row in table.rows]
            yield pd.DataFrame(data[1:], columns=data[0])

for table in tables(Document('data.docx')):
    print(table, "\n")
I made a word document with multiple paragraphs and two tables. The output from running the program accurately shows the two tables.
Output:
A B C D 0 1 3 5 7 1 2 4 6 8 A B C 0 1 2 3 1 4 5 6
Pedroski55 likes this post
Reply


Messages In This Thread
RE: Use module docx to get text from a file with a table - by deanhystad - Aug-29-2022, 01:20 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  docx file to pandas dataframe/excel iitip92 1 3,061 Jun-27-2024, 05:28 AM
Last Post: Pedroski55
  no module named 'docx' when importing docx MaartenRo 1 6,179 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 25,054 Oct-17-2023, 06:03 PM
Last Post: Devan
  Color a table cell based on specific text Creepy 11 5,643 Jul-27-2023, 02:48 PM
Last Post: deanhystad
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 1,963 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 2,129 Dec-15-2022, 04:32 PM
Last Post: Larz60+
  python-docx regex: replace any word in docx text Tmagpy 4 3,946 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  Modify values in XML file by data from text file (without parsing) Paqqno 2 3,304 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  How to perform DESC table sort on dates stored as TEXT type. hammer 7 4,195 Mar-15-2022, 01:10 PM
Last Post: hammer
  Converted Pipe Delimited text file to CSV file atomxkai 4 11,420 Feb-11-2022, 12:38 AM
Last Post: atomxkai

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020