Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Working with Dict Object
#1
I am new to Python. I am trying to extract text from the bookmarks in a PDF file that would provide the data for a Word template merge. I have gotten down to a string of text pulled out of the list object that I got from using PyPDF2 module. I am stuck on now to get the data out of the string that I need. I am calling it a string, but Python is recognizing as a dictionary object.

Here is the string:

{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}

What a want is the following to end up as fields on my Word template merge:
MedSourceFirstName: "John"
MedSourceLastName: "Milani"
MedSourceLastTreatment: "05/28/2014"

If I use keys() on the dictionary I get this:
['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated as keys. Seems like the key/value pair of a dictionary would translate nicely to fieldname and fielddata for a Word document merge. Here is my code so far.

import PyPDF2
pdfFileObj=open('x.pdf','rb')
pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
MyList=pdfReader.getOutlines()
MyDict=(MyList[-1][0])
print(isinstance(MyDict,dict))
print(MyDict)
print(list(MyDict.keys()))


I get this output in Sublime Text:
True
{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
['/Title', '/Page', '/Type']
[Finished in 0.4s]

Thank you in advance for any suggestions.
Reply
#2
After playing a bit, it looks like you should split on colon:
from your example text:
>>> m = "{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}"
>>> n = m.split(':')
>>> for item in n:
...     print(item)
...
{'/Title'
 '1F
 Progress Notes Src.
 MILANI, JOHN C Tmt. Dt.
 05/12/2014 - 05/28/2014 (9 pages)', '/Page'
 IndirectObject(465, 0), '/Type'
 '/FitB'}
>>>
Reply
#3
Thanks for much for the response. I get the following error, however:

'Destination' object has no attribute 'split'
Seems like split is not an available function for a dictionary? Was your "n" a string?
Reply
#4
Note the receiving veriable is 'n' not 'm'
λ python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> m = "{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}"
>>> n = m.split()
>>> n
["{'/Title':", "'1F:", 'Progress', 'Notes', 'Src.:', 'MILANI,', 'JOHN', 'C', 'Tmt.', 'Dt.:', '05/12/2014', '-', '05/28/2014', '(9', "pages)',", "'/Page':", 'IndirectObject(465,', '0),', "'/Type':", "'/FitB'}"]
>>>
Reply
#5
Right. N is a dictionary per isinstance.
Reply
#6
Perhap your m is a string? My m is a dict created from the getOutlines() function. It seems that in trying to create n the slice function is not availablee because m is a dict?
Reply
#7
What I post is verbatim from python interpreter.
and is converted to string
I can do dictionary, give me a bit (need some sleep, back in about 3-4 hours)
It works ... Python 3.6.4
Reply
#8
I have some working example code.

from collections import namedtuple


def get_data(row):
    columns = row['/Title'].split(':')
    lastname, firstname = [c.strip() for c in columns[2].split(',')]
    last_treatment = columns[3].split('-')[-1].split()[0]
    # take the 4th col, split at -, take the last element,
    # split the last element and get the first which is
    # the second date
    return firstname, lastname, last_treatment

sample_row = {
    '/Page': lambda: None,
    '/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)',
    '/Type': '/FitB'
    }

Result = namedtuple('row', 'first_name last_name last_treatment')
# Result is callable
result = Result(*get_data(sample_row))
# for each column in the namedtuple
# an argument is required.
# the * in front of get_data does this
# it unpacks the result from get_data
# and put the elements as arguments into
# Result(arg1, arg2, ...)
print(result)
Output:
row(first_name='JOHN C Tmt. Dt.', last_name='MILANI', last_treatment='05/28/2014')
I guess a regex is better to get the data out of the '/Title'.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#9
Thank you so much DeadEye for your kind response.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  working with TLV module Object Jennifer_Jone 3 1,151 Mar-14-2023, 07:54 PM
Last Post: Jennifer_Jone
  Object reference in Dict - not resolved at runtime benthomson 2 1,847 Apr-02-2020, 08:50 AM
Last Post: benthomson
  call dict object result key error lateublegende 2 3,043 May-15-2019, 01:08 PM
Last Post: lateublegende
  General query regarding conversion of dict to class object saisankalpj 0 1,918 Jan-16-2019, 01:58 PM
Last Post: saisankalpj
  AttributeError: 'dict' object has no attribute 'fees' mattraffel 6 52,393 Apr-20-2018, 06:54 PM
Last Post: mattraffel

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020