Working with Dict Object

standenman · Feb-09-2018, 05:03 AM

I am new to Python. I am trying to extract text from the bookmarks in a PDF file that would provide the data for a Word template merge. I have gotten down to a string of text pulled out of the list object that I got from using PyPDF2 module. I am stuck on now to get the data out of the string that I need. I am calling it a string, but Python is recognizing as a dictionary object.

Here is the string:

{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}

What a want is the following to end up as fields on my Word template merge:
MedSourceFirstName: "John"
MedSourceLastName: "Milani"
MedSourceLastTreatment: "05/28/2014"

If I use keys() on the dictionary I get this:
['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated as keys. Seems like the key/value pair of a dictionary would translate nicely to fieldname and fielddata for a Word document merge. Here is my code so far.

import PyPDF2
pdfFileObj=open('x.pdf','rb')
pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
MyList=pdfReader.getOutlines()
MyDict=(MyList[-1][0])
print(isinstance(MyDict,dict))
print(MyDict)
print(list(MyDict.keys()))

I get this output in Sublime Text:
True
{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
['/Title', '/Page', '/Type']
[Finished in 0.4s]

Thank you in advance for any suggestions.

**Larz60+** · Feb-09-2018, 05:32 AM

After playing a bit, it looks like you should split on colon:
from your example text:

>>> m = "{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}"
>>> n = m.split(':')
>>> for item in n:
...     print(item)
...
{'/Title'
 '1F
 Progress Notes Src.
 MILANI, JOHN C Tmt. Dt.
 05/12/2014 - 05/28/2014 (9 pages)', '/Page'
 IndirectObject(465, 0), '/Type'
 '/FitB'}
>>>

standenman · Feb-13-2018, 12:21 AM

Thanks for much for the response. I get the following error, however:

'Destination' object has no attribute 'split'

Seems like split is not an available function for a dictionary? Was your "n" a string?

**Larz60+** · Feb-13-2018, 12:27 AM

Note the receiving veriable is 'n' not 'm'

λ python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> m = "{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}"
>>> n = m.split()
>>> n
["{'/Title':", "'1F:", 'Progress', 'Notes', 'Src.:', 'MILANI,', 'JOHN', 'C', 'Tmt.', 'Dt.:', '05/12/2014', '-', '05/28/2014', '(9', "pages)',", "'/Page':", 'IndirectObject(465,', '0),', "'/Type':", "'/FitB'}"]
>>>

standenman · Feb-13-2018, 04:41 AM

Right. N is a dictionary per isinstance.

standenman · Feb-13-2018, 12:59 PM

Perhap your m is a string? My m is a dict created from the getOutlines() function. It seems that in trying to create n the slice function is not availablee because m is a dict?

**Larz60+** · (This post was last modified: Feb-13-2018, 01:48 PM by Larz60+.)

What I post is verbatim from python interpreter.
and is converted to string
I can do dictionary, give me a bit (need some sleep, back in about 3-4 hours)
It works ... Python 3.6.4

DeaD_EyE · (This post was last modified: Feb-13-2018, 05:51 PM by DeaD_EyE.)

I have some working example code.

from collections import namedtuple


def get_data(row):
    columns = row['/Title'].split(':')
    lastname, firstname = [c.strip() for c in columns[2].split(',')]
    last_treatment = columns[3].split('-')[-1].split()[0]
    # take the 4th col, split at -, take the last element,
    # split the last element and get the first which is
    # the second date
    return firstname, lastname, last_treatment

sample_row = {
    '/Page': lambda: None,
    '/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)',
    '/Type': '/FitB'
    }

Result = namedtuple('row', 'first_name last_name last_treatment')
# Result is callable
result = Result(*get_data(sample_row))
# for each column in the namedtuple
# an argument is required.
# the * in front of get_data does this
# it unpacks the result from get_data
# and put the elements as arguments into
# Result(arg1, arg2, ...)
print(result)

Output:
row(first_name='JOHN C Tmt. Dt.', last_name='MILANI', last_treatment='05/28/2014')

I guess a regex is better to get the data out of the '/Title'.

standenman · Feb-18-2018, 04:16 PM

Thank you so much DeadEye for your kind response.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	working with TLV module Object	Jennifer_Jone	3	1,151	Mar-14-2023, 07:54 PM Last Post: Jennifer_Jone
	Object reference in Dict - not resolved at runtime	benthomson	2	1,847	Apr-02-2020, 08:50 AM Last Post: benthomson
	call dict object result key error	lateublegende	2	3,043	May-15-2019, 01:08 PM Last Post: lateublegende
	General query regarding conversion of dict to class object	saisankalpj	0	1,918	Jan-16-2019, 01:58 PM Last Post: saisankalpj
	AttributeError: 'dict' object has no attribute 'fees'	mattraffel	6	52,393	Apr-20-2018, 06:54 PM Last Post: mattraffel

Working with Dict Object

User Panel Messages

Announcements