Python Forum
extract data inside a table from a .doc file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract data inside a table from a .doc file
#1
i have more then 4000 hatefull Microsoft Office Word .doc files from which i should extract some data (both numbers and words but really in most cases there are empty spaces) and later convert to a single .csv file where every row would be a single .doc file

here there is a screen of one of those files, underlined in blue there are some example of what i should extract:
[Image: Immagine.jpg]

here i uploaded the file if someone wants to test something
https://ufile.io/vt2zq

So since my experience with python is quite little i thought it would be useful to came here before starting to gather some idea and hints
from google i know that my possibility to work with this file format are not so much
1) textract
2) convert the .doc to .docx with antiword and then use docx2txt

my idea was to:
1) open the folder and read the first .doc file
1) extract the data and handle the many empty values with a try/except
2) go to the next file

right now i doesn't have any idea on how to get to any of those points. what would you do in my situation? how would you open the files? how would you procede?
Reply


Messages In This Thread
extract data inside a table from a .doc file - by aster - Feb-26-2018, 11:28 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 1,072 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  Navigating file directories and paths inside Jupyter Notebook Mark17 5 720 Oct-29-2023, 12:40 PM
Last Post: Mark17
  Using pyodbc&pandas to load a Table data to df tester_V 3 830 Sep-09-2023, 08:55 PM
Last Post: tester_V
  Why can't it extract the data from .txt well? Melcu54 3 684 Aug-20-2023, 10:07 PM
Last Post: deanhystad
  Extract file only (without a directory it is in) from ZIPIP tester_V 1 1,011 Jan-23-2023, 04:56 AM
Last Post: deanhystad
  extract table from multiple pages sshree43 8 5,327 Dec-12-2022, 10:34 AM
Last Post: arvin
  Reading All The RAW Data Inside a PDF NBAComputerMan 4 1,358 Nov-30-2022, 10:54 PM
Last Post: Larz60+
  python multiprocessing help -- to extract 10 sql table into csv mg24 3 1,406 Nov-20-2022, 11:50 PM
Last Post: mg24
  python Extract sql data by combining below code. mg24 1 971 Oct-03-2022, 10:25 AM
Last Post: mg24
  SQL Alchemy help to extract sql data into csv files mg24 1 1,794 Sep-30-2022, 04:43 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020