Python Forum
How to Extract Specific Words from PDFs with Python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to Extract Specific Words from PDFs with Python
#1
Photo 
I have to copy specific strings from a pdf file and paste it into a specific tag in xml file. So, in the picture attached, number 3 from pdf goes to label tag 3 in xml, if it’s bold in pdf (first string — Melnik BC.), goes to the <collab> tag in xml, if it’s normal string (The Pathogenic Role…) goes into the <article-title> tag, italic string (Current Diabetes…) goes to <source> tag and then year 2015 goes in year tag, 11 goes in volume, 46 goes into fpage and 62 in lpage. Can anybody help me with an idea how can I solve this in python?

Thank you very much,
Dan

[Image: 1.jpg?dl=0]
Reply
#2
see: https://pypi.org/search/?q=pdf+image+extract

minecart: https://pypi.org/project/minecart/ looks promising, sample code:
>>> pdffile = open('example.pdf', 'rb')
>>> doc = minecart.Document(pdffile)
>>> page = doc.get_page(3)
>>> for shape in page.shapes.iter_in_bbox((0, 0, 100, 200)):
...     print shape.path, shape.fill.color.as_rgb()
>>> im = page.images[0].as_pil()  # requires pillow
>>> im.show()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Comparing PDFs CaseCRS 5 1,207 Apr-01-2023, 05:46 AM
Last Post: DPaul
  python extract mg24 1 955 Nov-02-2022, 06:30 PM
Last Post: Larz60+
  Using locationtagger to extract locations found in a specific country/region lord_of_cinder 1 1,283 Oct-04-2022, 12:46 AM
Last Post: Larz60+
  How to extract specific data from .SRC (note pad file) Shinny_Shin 2 1,282 Jul-27-2022, 12:31 PM
Last Post: Larz60+
  python-docx regex : Browse the found words in turn from top to bottom Tmagpy 0 1,537 Jun-27-2022, 08:45 AM
Last Post: Tmagpy
  Extract a string between 2 words from a text file OscarBoots 2 1,882 Nov-02-2021, 08:50 AM
Last Post: ibreeden
  Generate a string of words for multiple lists of words in txt files in order. AnicraftPlayz 2 2,817 Aug-11-2021, 03:45 PM
Last Post: jamesaarr
  Extract specific sentences from text file Bubly 3 3,421 May-31-2021, 06:55 PM
Last Post: Larz60+
  How to extract specific key value pair from string? aditi06 0 2,540 Apr-15-2021, 06:26 PM
Last Post: aditi06
  download pubmed PDFs using pubmed2pdf in python Wooki 8 5,518 Oct-19-2020, 03:06 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020