How to remove unwanted images and tables from a Word file using Python?

Pedroski55 · Feb-04-2025, 08:30 AM

I have never used Colab, so I have no idea what that does.

You can parse xml files using BeautifulSoup. Word documents are saved as compressed xml files.

Take your Word document and remove the last 2 letters from .docx so it looks like mydocument.do. Now your OS should recognise this as a zip file.

Unpack the zip file to a folder like temp. Look in temp. You will see a folder called mydocument. Look in mydocument and you will see a folder called word.

Look in word and you will find a file document.xml, which contains all the text, images, tables and nearly all settings for your file mydocument.docx. If you double click on document.xml, it should open in your browser. Have a look at it.

Images are stored in word/media/

It won't be simple at first, but you can learn to edit xml, find what you want and change or remove it.

There are other Python tools for editing xml.

from bs4 import BeautifulSoup

path2xml = 'docx/docxFiles/temp/testme/word/document.xml'

# read the xml
with open(path2xml, 'r') as f:
    data = f.read()

# read data with bs
bs_data = BeautifulSoup(data, "xml")


# find all images
images = bs_data.find_all('drawing')

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Looping through each images in a give folder Python	druva	1	1,013	Jan-01-2025, 08:46 AM Last Post: Pedroski55
	Better python library to create ER Diagram by using pandas data frames as tables	klllmmm	0	3,186	Oct-19-2023, 01:01 PM Last Post: klllmmm
	Replace a text/word in docx file using Python	Devan	4	22,935	Oct-17-2023, 06:03 PM Last Post: Devan
	Unwanted execution of unittest	ThomasFab	9	4,315	Nov-15-2022, 05:33 PM Last Post: snippsat
	find some word in text list file and a bit change to them	RolanRoll	3	2,424	Jun-27-2022, 01:36 AM Last Post: RolanRoll
	Removing the unwanted data from a file	jehoshua	14	7,019	Feb-01-2022, 09:56 PM Last Post: jehoshua
	Creating file with images	BobSmoss	1	2,052	Jan-08-2022, 08:46 PM Last Post: snippsat
	Problem: Check if a list contains a word and then continue with the next word	Mangono	2	3,679	Aug-12-2021, 04:25 PM Last Post: palladium
	HELP on Unwanted CSV Export Output \| Using Selenium to Scrape	soothsayerpg	0	1,812	Jun-13-2021, 12:23 PM Last Post: soothsayerpg
	Problems with inserting images into an Excel File	FightingFarmer	2	4,563	May-12-2021, 10:03 PM Last Post: FightingFarmer

How to remove unwanted images and tables from a Word file using Python?

User Panel Messages

Announcements