Transform a list

standenman · (This post was last modified: Feb-19-2024, 07:53 PM by standenman.)

I have a list of the bookmarks in pdf that I wish to transform. The list prints out in the form:

[

[2, 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)', 88, {'kind': 4, 'xref': 7541, 'page': '88', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (7 pages)', 98, {'kind': 4, 'xref': 7552, 'page': '98', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 105, {'kind': 4, 'xref': 7560, 'page': '105', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 112, {'kind': 4, 'xref': 7568, 'page': '112', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (3 pages)', 119, {'kind': 4, 'xref': 7576, 'page': '119', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (119 pages)', 122, {'kind': 4, 'xref': 7580, 'page': '122', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (7 pages)', 241, {'kind': 4, 'xref': 7700, 'page': '241', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}]]

In the title of these items there is too much info crammed in the Title. I want to preserve that full title in my new dictionary so that I can refer to the bookmark, but I need to parse into separate fields the text in the title that appears before "Scr:"as the "Document Type" and the text between "Scr:" and Tmt. Dt." as "Source" So, for example I want output as follows for the first two items:

[{'Title': 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'HELEN HASKELL HOBBS'},{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}]

**deanhystad** · (This post was last modified: Feb-19-2024, 09:03 PM by deanhystad.)

This looks interesting. It extracts the bookmarks directly from the PDF file in the form of a dictionary.

https://stackoverflow.com/questions/5430...genumber-a

You could use regex to split your bookmark title into document type and source like this:

import re


bookmarks = [[2, 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)']]
for bookmark in bookmarks:
    title = bookmark[1]
    dt, source, *_ = re.split(r' \S+\.: ', title)
    print({"title": title, "document type": dt, "source": source})

standenman · (This post was last modified: Feb-19-2024, 11:23 PM by standenman.)

Yes thank you I have done something like that. I now have a python dictionary result:

{'Title': 'Medical Evidence of Record (MER)  Src.:  HELEN HASKELL HOBBS Tmt. Dt.:  Unknown - Unknown (10 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'HELEN HASKELL HOBBS'}
{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}
{'Title': 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': ''}
{'Title': 'Medical Evidence of Record (MER)  Src.:   Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': ''}
{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (3 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'BAYLOR & SCOTT MEDICAL CENTER'}
{'Title': 'Medical Evidence of Record (MER)  Src.:  DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.:  Unknown - Unknown (119 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}
{'Title': 'Copy of Evidence Request (CPYEVREQ)  Src.:  BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.:  Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'BAYLOR & SCOTT MEDICAL CENTER'}

I want now to reorganize, or perhaps simple iterate over, this dictionary I want to find for each 'Source': value here that has a row with a 'Document Type' of ' 'Copy of Evidence Request (CPYEVREQ)' whether the dictionary includes - for that given Source - a row with 'Document Type' of ''Medical Evidence of Record (MER)'. The former Document Type represents a request for medical records, and a Document Type of the latter represents compliance with that request. I am trying to identify the records requests that have not been complied with.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to get array fit/transform to be correct?	Oliver	0	2,094	Jul-01-2021, 04:22 PM Last Post: Oliver
	transform result to DataFrame	Irv1n	1	2,692	Jan-29-2021, 10:08 PM Last Post: Irv1n
	How to transform array into dataframe or table?	python_newbie09	2	17,511	Mar-29-2019, 07:48 PM Last Post: python_newbie09
	Fast Fourier Transform	muhsin	1	3,742	Oct-13-2017, 07:50 PM Last Post: nilamo

Transform a list

User Panel Messages

Announcements