Transform a list - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Transform a list (/thread-41628.html) |
Transform a list - standenman - Feb-19-2024 I have a list of the bookmarks in pdf that I wish to transform. The list prints out in the form: [ [2, 'Medical Evidence of Record (MER) Src.: HELEN HASKELL HOBBS Tmt. Dt.: Unknown - Unknown (10 pages)', 88, {'kind': 4, 'xref': 7541, 'page': '88', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ) Src.: DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.: Unknown - Unknown (7 pages)', 98, {'kind': 4, 'xref': 7552, 'page': '98', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER) Src.: Tmt. Dt.: Unknown - Unknown (7 pages)', 105, {'kind': 4, 'xref': 7560, 'page': '105', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER) Src.: Tmt. Dt.: Unknown - Unknown (7 pages)', 112, {'kind': 4, 'xref': 7568, 'page': '112', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ) Src.: BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.: Unknown - Unknown (3 pages)', 119, {'kind': 4, 'xref': 7576, 'page': '119', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Medical Evidence of Record (MER) Src.: DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.: Unknown - Unknown (119 pages)', 122, {'kind': 4, 'xref': 7580, 'page': '122', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}], [2, 'Copy of Evidence Request (CPYEVREQ) Src.: BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.: Unknown - Unknown (7 pages)', 241, {'kind': 4, 'xref': 7700, 'page': '241', 'view': 'FitB', 'collapse': False, 'zoom': 0.0}]]In the title of these items there is too much info crammed in the Title. I want to preserve that full title in my new dictionary so that I can refer to the bookmark, but I need to parse into separate fields the text in the title that appears before "Scr:"as the "Document Type" and the text between "Scr:" and Tmt. Dt." as "Source" So, for example I want output as follows for the first two items: [{'Title': 'Medical Evidence of Record (MER) Src.: HELEN HASKELL HOBBS Tmt. Dt.: Unknown - Unknown (10 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'HELEN HASKELL HOBBS'},{'Title': 'Copy of Evidence Request (CPYEVREQ) Src.: DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.: Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'}] RE: Transform a list - deanhystad - Feb-19-2024 This looks interesting. It extracts the bookmarks directly from the PDF file in the form of a dictionary. https://stackoverflow.com/questions/54303318/read-all-bookmarks-from-a-pdf-document-and-create-a-dictionary-with-pagenumber-a You could use regex to split your bookmark title into document type and source like this: import re bookmarks = [[2, 'Medical Evidence of Record (MER) Src.: HELEN HASKELL HOBBS Tmt. Dt.: Unknown - Unknown (10 pages)']] for bookmark in bookmarks: title = bookmark[1] dt, source, *_ = re.split(r' \S+\.: ', title) print({"title": title, "document type": dt, "source": source}) RE: Transform a list - standenman - Feb-19-2024 Yes thank you I have done something like that. I now have a python dictionary result: {'Title': 'Medical Evidence of Record (MER) Src.: HELEN HASKELL HOBBS Tmt. Dt.: Unknown - Unknown (10 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'HELEN HASKELL HOBBS'} {'Title': 'Copy of Evidence Request (CPYEVREQ) Src.: DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.: Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'} {'Title': 'Medical Evidence of Record (MER) Src.: Tmt. Dt.: Unknown - Unknown (7 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': ''} {'Title': 'Medical Evidence of Record (MER) Src.: Tmt. Dt.: Unknown - Unknown (7 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': ''} {'Title': 'Copy of Evidence Request (CPYEVREQ) Src.: BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.: Unknown - Unknown (3 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'BAYLOR & SCOTT MEDICAL CENTER'} {'Title': 'Medical Evidence of Record (MER) Src.: DALLAS DIAGNOSTIC ASSOCIATION Tmt. Dt.: Unknown - Unknown (119 pages)', 'Document Type': 'Medical Evidence of Record (MER)', 'Source': 'DALLAS DIAGNOSTIC ASSOCIATION'} {'Title': 'Copy of Evidence Request (CPYEVREQ) Src.: BAYLOR & SCOTT MEDICAL CENTER Tmt. Dt.: Unknown - Unknown (7 pages)', 'Document Type': 'Copy of Evidence Request (CPYEVREQ)', 'Source': 'BAYLOR & SCOTT MEDICAL CENTER'}I want now to reorganize, or perhaps simple iterate over, this dictionary I want to find for each 'Source': value here that has a row with a 'Document Type' of ' 'Copy of Evidence Request (CPYEVREQ)' whether the dictionary includes - for that given Source - a row with 'Document Type' of ''Medical Evidence of Record (MER)'. The former Document Type represents a request for medical records, and a Document Type of the latter represents compliance with that request. I am trying to identify the records requests that have not been complied with. RE: Transform a list - Thadectives - Feb-20-2024 Thanks for help! |