Aug-29-2022, 01:20 AM
(This post was last modified: Aug-29-2022, 01:20 AM by deanhystad.)
This finds tables in a document and converts them to dataframes.
from docx import Document from docx.document import Document as _Document from docx.oxml.table import CT_Tbl from docx.table import _Cell, Table import pandas as pd def tables(parent): if isinstance(parent, _Document): element = parent.element.body elif isinstance(parent, _Cell): element = parent._tc for child in element.iterchildren(): if isinstance(child, CT_Tbl): table = Table(child, parent) data = [[cell.text for cell in row.cells] for row in table.rows] yield pd.DataFrame(data[1:], columns=data[0]) for table in tables(Document('data.docx')): print(table, "\n")I made a word document with multiple paragraphs and two tables. The output from running the program accurately shows the two tables.
Output: A B C D
0 1 3 5 7
1 2 4 6 8
A B C
0 1 2 3
1 4 5 6