Python Forum
extract data inside a table from a .doc file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract data inside a table from a .doc file
#10
I am moving to the second part of my project: extracting the values I need from the file!

To do that i am checking inside the newly created string with str.find() then i try to understand where my data start and end
I made an example of what i am doing, but i am sure there is a better way to handle this and that my coding style is not very pythonic
meanwhile i am learning about the re library!
import re
#https://docs.python.org/3/library/re.html
#r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline
#check special characters and put a "\" before them!

text = "Lorem ipsum dolor sit amet, consectetur adipisci elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

#print (text)
debug = True
#re.sub(pattern_to_find, replace_with, text_input, count=0, flags=0)
text = re.sub('(\. |, )', '.\n', text)

print (text)

#data that i need to find
name = "incidunt"
surname = "consectetur"
birth = "veniam" 

#index of them
ix_name = text.find(name)+len(name)
ix_surname = text.find(surname)+len(surname)
ix_birth = text.find(birth)+len(birth)

if debug:
    print("start of data:")
    print("name position:", str(ix_name))
    
#my data
data_name = text[ix_name:ix_name+20]
data_surname = text[ix_surname:ix_surname+20]
if debug:
    print(data_name)
as always any suggestion to better approach, libraries, examples is very welcomed :D
Reply


Messages In This Thread
RE: extract data inside a table from a .doc file - by aster - Mar-04-2018, 05:46 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 1,125 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  Navigating file directories and paths inside Jupyter Notebook Mark17 5 788 Oct-29-2023, 12:40 PM
Last Post: Mark17
  Using pyodbc&pandas to load a Table data to df tester_V 3 864 Sep-09-2023, 08:55 PM
Last Post: tester_V
  Why can't it extract the data from .txt well? Melcu54 3 699 Aug-20-2023, 10:07 PM
Last Post: deanhystad
  Extract file only (without a directory it is in) from ZIPIP tester_V 1 1,047 Jan-23-2023, 04:56 AM
Last Post: deanhystad
  extract table from multiple pages sshree43 8 5,428 Dec-12-2022, 10:34 AM
Last Post: arvin
  Reading All The RAW Data Inside a PDF NBAComputerMan 4 1,415 Nov-30-2022, 10:54 PM
Last Post: Larz60+
  python multiprocessing help -- to extract 10 sql table into csv mg24 3 1,439 Nov-20-2022, 11:50 PM
Last Post: mg24
  python Extract sql data by combining below code. mg24 1 992 Oct-03-2022, 10:25 AM
Last Post: mg24
  SQL Alchemy help to extract sql data into csv files mg24 1 1,842 Sep-30-2022, 04:43 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020