Python Forum
Correct data structure for this problem
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Correct data structure for this problem
#11
Hi,

I need to parse the lines in the source file, simply looping. Then, split each line on the asterisk.
The first part determines segments (there's an opening tag and a closing tag).
Within each segment, the first part of the line has an ID to denote a different block of lines that belong together (the same transaction).

For the prefix in the output file, it's simply some of the parts of the splitted line text. So asterisk splits up the line and we need to grab the 3rd or 4th item, something like this. That is not the difficult part.

The difficult part is identifying the different segments, then blocks within a segment. I have it all working in VBA, but now I am looking at performance and speed :-) that's why I'm here with Python. The only thing I was unsure of, is how to store the information on line numbers such that I can use it in a second pass to form the output text file.
Reply
#12
python offers different collection like list, dict, tuple, named tuple, etc.
The point is because we don't know the specifications of the file (and your very general explanation does not help much) we hardly can help with something more concrete.
at least can you tell us where the data comes from or point to some specifications?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#13
Understood. I will provide more details by showing an anomimized version of the source file.

- A segment is defined by: starting with ISA*, ending with IEA*. This means that in the excerpt below we have 2 segments. The real file can contain in between 1 and 100 segments.
- The CLP lines mark a payment, within segments. So segment 1 has 6 payments, segment 2 has 3 payments. The real file can contain in between 1 and 2,000 payments per section.
- the output text file is equal to the source file, but with a prefix within the segments, in front of the current lines. The prefix is determined by the contents of the segment, we need to for certain codes and grab some of the characters on that line, to be used as a prefix on other lines.

My approach in VBA was to have a 2D array. Columns (dynamic) for the segments, and within each segment, I have rows for the line numbers where certain ID codes are found. 10 fields are fixed (the info in yellow in the screenshot), some are variable based on how many CLP payments we encounter within the segment.

So in Python I need an object to hold a dynamic number of "columns" (segments), a dynamic number of "rows" (payments). Is this a list of lists maybe ? Do I need arrays ?



ISA*00* *00* *ZZ*000000006
GS*HP*00000000021*1982611190*20200908*0544*1
TS3*1982611190*11*20201231*6*262918~
TS2*12805.44*12805.44****512.17*****4*11*11*
CLP*2000000001I73748370*22*-102890*-14262.35
CAS*CO*45*-87219.65~
QTY*CA*-15~
CLP*2000000002I73908052*1*106041*14262.35*14
CAS*CO*45*90370.65~
CAS*PR*1*1408~
AMT*AU*106041~
QTY*CA*15~
CLP*2000000003I1798623*1*73390*11909.61*1408
CAS*CO*45*60072.39~
AMT*AU*73390~
QTY*CA*11~
CLP*2000000004I73906905*4*182611*0**MA*22004
CAS*CO*16*182611*45~
NM1*QC*1*MOUSE*MICKEY****MI*6H00A18PX83~
N1*PE*MY MEDICAL CENT*XX*1200641514~
N3*PO BOX 125000-7400~
N4*PHILADELPHIA*PA*191957400~
REF*PQ*1200641514~
REF*TJ*112241326~
LX*111712~
TS3*1285641514*11*20171231*1*63523.78~
TS2*22945.58*15114.47***1521.94*****5*1**5**
CLP*2000000009I73719738*4*63523.78*0*1316*MA
CAS*CO*29*62207.78~
CAS*PR*1*1316~
NM1*QC*1*MOUSE*MINNIE****MI*2001A83HA09~
LX*112012~
TS3*1285641514*11*20201231*160*10759657.82~
TS2*1992134.24*1361024.14**86068.29*139133.
CLP*2000000005I73812607*1*55402.84*3072.54*
CAS*CO*74*52330.3~
NM1*QC*1*DUCK*DONALD****MI*4W000N2EP80~
GE*1*13754195~
IEA*1*005440685~
ISA*00* *00* *ZZ*0000000
GS*HP*00000000021*1982611190*20200908*0544
TS3*1982611190*11*20201231*6*262918~
TS2*12805.44*12805.44****512.17*****4*11*1
CAS*CO*45*60072.39~
AMT*AU*73390~
QTY*CA*11~
CLP*2000000004I73906905*4*192611*0**MA*220
CAS*CO*16*182611*45~
NM1*QC*1*MOUSE*MICKEY****MI*6H00A18PX83~
N1*PE*MY MEDICAL CENT*XX*1200641514~
N3*PO BOX 125000-7400~
N4*PHILADELPHIA*PA*191957400~
REF*PQ*1200641514~
REF*TJ*112241326~
LX*111712~
TS3*1285641514*11*20171231*1*63523.78~
TS2*22945.58*15114.47***1521.94*****5*1**5***2.2027*1209.02**102.89~
CLP*2000000009I73719738*4*63523.78*0*1316*MA*22002700746204NYA*11*1**286*2.2027*1~
CAS*CO*29*62207.78~
CAS*PR*1*1316~
NM1*QC*1*MOUSE*MINNIE****MI*2001A83HA09~
LX*112012~
TS3*1285641514*11*20201231*160*10759657.82~
TS2*1992134.24*1361024.14**86068.29*139133.04*310147.09**2468.34*41514.58*3*134*647*647***1.2445*107586.27**9963.87~
CLP*2000000005I73812607*1*55402.84*3072.54**MA*22000000176404NYA*11*1**470*1.9684*.952~
CAS*CO*74*52330.3~
NM1*QC*1*DUCK*DONALD****MI*4W000N2EP80~
GE*1*13754195~
IEA*1*005440685~
Reply
#14
This is the third time I start to write this (due to our problems with the site) and lost 2 long drafts, so now I am pissed off and this time my post will be as short as possible.
This is X12 EDI format, HIPAA 835 file to be precise. Don't know why you were reluctant to say so from the start or when I asked.
I looked for specifications online, but it's hard to obtain one free. There are different companion guides available, but they are not exhaustive and at the same time - company specific. I found this one most useful: https://passporthealthplan.com/wp-conten...-guide.pdf
It's still outdated, e.g. CLP segment they show has only 6 elements, while you have more elements in CLP segment.
I am sure you know all this, but I say it for the benefit of the others.
I also found sample file here: https://www.emedny.org/HIPAA/5010/5010_s...index.aspx and downloaded 835 Sample (Professional Claims Only- With Payment) file and saved it as sample835.txt
Now I will work with it.
Output:
ISA*00* *00* *ZZ*EMEDNYBAT *ZZ*ETIN *100101*1000*^*00501*006000600*0*T*:~GS*HP*EMEDNYBAT*ETIN*20100101*1050*6000600*X*005010X221A1~ST*835*1740~BPR*I*45.75*C*ACH*CCP*01*111*DA*33*1234567890**01*111*DA*22*20100101~TRN*1*10100000000*1000000000~REF*EV*ETIN~DTM*405*20100101~N1*PR*NYSDOH~N3*OFFICE OF HEALTH INSURANCE PROGRAMS*CORNING TOWER, EMPIRE STATE PLAZA~N4*ALBANY*NY*122370080~PER*BL*PROVIDER SERVICES*TE*8003439000*UR*www.emedny.org~N1*PE*MAJOR MEDICAL PROVIDER*XX*9999999995~REF*TJ*000000000~LX*1~CLP*PATIENT ACCOUNT NUMBER*1*34.25*34.25**MC*1000210000000030*11~NM1*QC*1*SUBMITTED LAST*SUBMITTED FIRST****MI*LL99999L~NM1*74*1*CORRECTED LAST*CORRECTED FIRST~REF*EA*PATIENT ACCOUNT NUMBER~DTM*232*20100101~DTM*233*20100101~AMT*AU*34.25~SVC*HC:V2020:RB*6*6**1~DTM*472*20100101~AMT*B6*6~SVC*HC:V2700:RB*2.75*2.75**1~DTM*472*20100101~AMT*B6*2.75~SVC*HC:V2103:RB*5.5*5.5**1~DTM*472*20100101~AMT*B6*5.5~SVC*HC:S0580*20*20**2~DTM*472*20100101~AMT*B6*20~CLP*PATIENT ACCOUNT NUMBER*2*34*0**MC*1000220000000020*11~NM1*QC*1*SUBMITTED LAST*SUBMITTED FIRST****MI*LL88888L~NM1*74*1*CORRECTED LAST*CORRECTED FIRST~REF*EA*PATIENT ACCOUNT NUMBER~DTM*232*20100101~DTM*233*20100101~SVC*HC:V2020*12*0**0~DTM*472*20100101~CAS*CO*29*12~SVC*HC:V2103*22*0**0~DTM*472*20100101~CAS*CO*29*22~CLP*PATIENT ACCOUNT NUMBER*2*34.25*11.5**MC*1000230000000020*11~NM1*QC*1*SUBMITTED LAST*SUBMITTED FIRST****MI*LL77777L~NM1*74*1*CORRECTED LAST*CORRECTED FIRST~REF*EA*PATIENT ACCOUNT NUMBER~DTM*232*20100101~DTM*233*20100101~AMT*AU*11.5~SVC*HC:V2020:RB*6*6**1~DTM*472*20100101~AMT*B6*6~SVC*HC:V2103:RB*5.5*5.5**1~DTM*472*20130917~AMT*B6*5.5~SVC*HC:V2700:RB*2.75*0**0~DTM*472*20100101~CAS*CO*251*2.75~LQ*HE*N206~SVC*HC:S0580*20*0**0~DTM*472*20100101~CAS*CO*251*20~LQ*HE*N206~SE*65*1740~GE*1*6000600~IEA*1*006000600~
My point is you will have deeply nested structure File->Interchange(s) -> Functional group -> Transaction set(s) -> Loop(s) (I may be wrong for some of these, but anyway) and at each nested level you can have either some built-in container like list, dict, tuple, namedtuple etc. or write own class.
What will you choose depends on you - what you plan to do, do you want to validate data, do you plan to expand and so on.

For start very basic example

import pprint
line_sep = '~'
element_sep = '*'
with open(r'.\835\sample835.txt') as f:
    x12 = f.read()

x12 = x12.split(line_sep)
message = []
for segment in x12:
    if segment.startswith('ISA'):
        isa = {} # create empty dict
        isa['ISA'] = segment.split(element_sep)
        isa['payments'] = []
    elif segment.startswith('CLP'):
        payment = segment.split(element_sep)
        isa['payments'].append(payment)
    elif segment.startswith('IEA'):
        message.append(isa)
pprint.pprint(message)
Output:
[{'ISA': ['ISA', '00', ' ', '00', ' ', 'ZZ', 'EMEDNYBAT ', 'ZZ', 'ETIN ', '100101', '1000', '^', '00501', '006000600', '0', 'T', ':'], 'payments': [['CLP', 'PATIENT ACCOUNT NUMBER', '1', '34.25', '34.25', '', 'MC', '1000210000000030', '11'], ['CLP', 'PATIENT ACCOUNT NUMBER', '2', '34', '0', '', 'MC', '1000220000000020', '11'], ['CLP', 'PATIENT ACCOUNT NUMBER', '2', '34.25', '11.5', '', 'MC', '1000230000000020', '11']]}]
As you can see - list (to allow multiple interchange blocks), each interchange will be a dict, the value for key "payments" is again dict list, holding multiple lists, etc.
I work with just ISA and CLP segments, but I guess you will need to work on other segments/loops too.

From here you can expand, e.g. replace lists for each segment with namedtuple
from collections import namedtuple
import pprint

ISA = namedtuple('ISA', ['identifier', 'authorization_information_qualifier', 'authorization_information', 
                         'security_information_qualifier', 'security_information', 'interchange_id_qualifier_isa5',
                        'interchange_sender_id', 'interchange_id_qualifier_isa7', 'interchange_receiver_id',
                        'interchange_date', 'interchange_time', 'interchange_control_standards',
                        'interchange_control_version_number', 'interchange_control_number',
                        'acknowledgement_requested', 'usage_indicator', 'component_element_separator'],
                        defaults=(None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '>'))

CLP =namedtuple('CLP', ['identifier', 'patient_control_number', 'claim_status_code', 'total_claim_charge_amount',
                        'claim_payment_amount', 'claim_filing_indicator_code_', 'payer_claim_control_number', 'clp07', 'clp08'])


line_sep = '~'
element_sep = '*'
with open(r'.\835\sample835.txt') as f:
    x12 = f.read()

x12 = x12.split(line_sep)

message = []
for segment in x12:
    if segment.startswith('ISA'):
        isa = {} # create empty dict
        isa['ISA'] = ISA(*segment.split(element_sep))
        isa['payments'] = []
    elif segment.startswith('CLP'):
        payment = CLP(*segment.split(element_sep))
        isa['payments'].append(payment)
    elif segment.startswith('IEA'):
        message.append(isa)

pprint.pprint(message)
print('\n')
for isa in message:
    for payment in isa['payments']:
        print(f'Claim payment amount: {payment.claim_payment_amount}')
Output:
[{'ISA': ISA(identifier='ISA', authorization_information_qualifier='00', authorization_information=' ', security_information_qualifier='00', security_information=' ', interchange_id_qualifier_isa5='ZZ', interchange_sender_id='EMEDNYBAT ', interchange_id_qualifier_isa7='ZZ', interchange_receiver_id='ETIN ', interchange_date='100101', interchange_time='1000', interchange_control_standards='^', interchange_control_version_number='00501', interchange_control_number='006000600', acknowledgement_requested='0', usage_indicator='T', component_element_separator=':'), 'payments': [CLP(identifier='CLP', patient_control_number='PATIENT ACCOUNT NUMBER', claim_status_code='1', total_claim_charge_amount='34.25', claim_payment_amount='34.25', claim_filing_indicator_code_='', payer_claim_control_number='MC', clp07='1000210000000030', clp08='11'), CLP(identifier='CLP', patient_control_number='PATIENT ACCOUNT NUMBER', claim_status_code='2', total_claim_charge_amount='34', claim_payment_amount='0', claim_filing_indicator_code_='', payer_claim_control_number='MC', clp07='1000220000000020', clp08='11'), CLP(identifier='CLP', patient_control_number='PATIENT ACCOUNT NUMBER', claim_status_code='2', total_claim_charge_amount='34.25', claim_payment_amount='11.5', claim_filing_indicator_code_='', payer_claim_control_number='MC', clp07='1000230000000020', clp08='11')]}] Claim payment amount: 34.25 Claim payment amount: 0 Claim payment amount: 11.5
In addition to above, which is my code, I found this https://hiplab.mc.vanderbilt.edu/git/lab/parse-edi
It's not great in terms of quality of python code, easy of installation, etc. I tried to run it but was not very successful with the sample file. Anyway - it may be useful and give you some additional hints if you decide to look at it further.

That's it for now. I apologise if it happened to use incorrect terminology here and there.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I add certain elements in this 2d data structure and calculate a mean TheOddCircle 3 1,558 May-27-2022, 09:09 AM
Last Post: paul18fr
  Looking for data/info on a perticular data-proccesing problem. MvGulik 9 3,900 May-01-2021, 07:43 AM
Last Post: MvGulik
  Appropriate data-structure / design for business-day relations (week/month-wise) sx999 2 2,807 Apr-23-2021, 08:09 AM
Last Post: sx999
  what data structure to use? Winfried 4 2,827 Mar-16-2021, 12:11 PM
Last Post: buran
  Yahoo_fin, Pandas: how to convert data table structure in csv file detlefschmitt 14 7,788 Feb-15-2021, 12:58 PM
Last Post: detlefschmitt
  How to use Bunch data structure moish 2 2,915 Dec-24-2020, 06:25 PM
Last Post: deanhystad
  difficulties to chage json data structure using json module in python Sibdar 1 2,085 Apr-03-2020, 06:47 PM
Last Post: micseydel
  File system representation in a data structure Alfalfa 1 2,072 Dec-18-2019, 01:56 AM
Last Post: Alfalfa
  Custom data structure icm63 2 2,540 Mar-27-2019, 02:40 AM
Last Post: icm63
  Nested Data structure question arjunfen 7 4,273 Feb-22-2019, 02:18 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020