Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Transform a list
#1
I have a list of the bookmarks in pdf that I wish to transform. The list prints out in the form:

[
[2, 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)', 88, {'kind': 4, 'xref': 7541, 'page': '88', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (7 pages)', 98, {'kind': 4, 'xref': 7552, 'page': '98', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 105, {'kind': 4, 'xref': 7560, 'page': '105', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 112, {'kind': 4, 'xref': 7568, 'page': '112', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (3 pages)', 119, {'kind': 4, 'xref': 7576, 'page': '119', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (119 pages)', 122, {'kind': 4, 'xref': 7580, 'page': '122', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (7 pages)', 241, {'kind': 4, 'xref': 7700, 'page': '241', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}]]
In the title of these items there is too much info crammed in the Title. I want to preserve that full title in my new dictionary so that I can refer to the bookmark, but I need to parse into separate fields the text in the title that appears before "Scr:"as the "Document Type" and the text between "Scr:" and Tmt. Dt." as "Source" So, for example I want output as follows for the first two items:

[{'Title': 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'HELEN HASKELL HOBBS'},{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}]
Reply
#2
This looks interesting. It extracts the bookmarks directly from the PDF file in the form of a dictionary.

https://stackoverflow.com/questions/5430...genumber-a

You could use regex to split your bookmark title into document type and source like this:
import re


bookmarks = [[2, 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)']]
for bookmark in bookmarks:
    title = bookmark[1]
    dt, source, *_ = re.split(r' \S+\.: ', title)
    print({"title": title, "document type": dt, "source": source})
standenman likes this post
Reply
#3
Yes thank you I have done something like that. I now have a python dictionary result:

{'Title': 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'HELEN HASKELL HOBBS'}
{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}
{'Title': 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': ''}
{'Title': 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': ''}
{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (3 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'BAYLOR & SCOTT MEDICAL CENTER'}
{'Title': 'Medical Evidence of Record (MER)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (119 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}
{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'BAYLOR & SCOTT MEDICAL CENTER'}
I want now to reorganize, or perhaps simple iterate over, this dictionary I want to find for each 'Source': value here that has a row with a 'Document Type' of ' 'Copy of Evidence Request (CPYEVREQ)' whether the dictionary includes - for that given Source - a row with 'Document Type' of ''Medical Evidence of Record (MER)'. The former Document Type represents a request for medical records, and a Document Type of the latter represents compliance with that request. I am trying to identify the records requests that have not been complied with.
Reply
#4
Thanks for help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to get array fit/transform to be correct? Oliver 0 1,569 Jul-01-2021, 04:22 PM
Last Post: Oliver
  transform result to DataFrame Irv1n 1 2,091 Jan-29-2021, 10:08 PM
Last Post: Irv1n
  How to transform array into dataframe or table? python_newbie09 2 14,786 Mar-29-2019, 07:48 PM
Last Post: python_newbie09
  Fast Fourier Transform muhsin 1 3,144 Oct-13-2017, 07:50 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020