Python Forum
Looking for help in Parse multiple XMLs and update key node values and generate Out..
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looking for help in Parse multiple XMLs and update key node values and generate Out..
#1
Hello,

I got struck in Parsing multiple XML files and update key node values. Can you please help..

The Code below works as expected for single file (I limited the code to just 3 nodes for an example, In real i have around 20 nodes).

from lxml import etree
from datetime import datetime, timedelta
import random, string
import time


doc = etree.parse(r'C:\Users\relangovan\Desktop\Project_Document\XML\python.xml') #if parsing from a file

# <PolicyId> - Random generated policy number
Policy_Random_Choice = 'POL' + ''.join(random.choices(string.digits, k=6)) + 'NGJ'

# PolicyTermEffectiveDate
for ROW in doc.findall('ROW'):
    
    today = datetime.now()
    PolicyTermEffectiveDate = ROW.find('PolicyTermEffectiveDate').text
    transactiondate = ROW.find('TransactionDate').text
    
    trans = datetime.strptime(transactiondate, '%Y-%m-%dT%H:%M:%S.%f')
    eff = datetime.strptime(PolicyTermEffectiveDate, '%Y-%m-%dT%H:%M:%S.%f')
    
    delta = (trans - eff).days
    b = today - timedelta (days=delta)
    
    if trans > eff:
         PolicyTermEffectiveDate = b.strftime('%Y-%m-%dT%H:%M:%S.%f')
    else: 
         PolicyTermEffectiveDate = today.strftime('%Y-%m-%dT%H:%M:%S.%f')


# PolicyTermExpirationDate
for ROW in doc.findall('ROW'):
    
    today = datetime.now()
    PolicyTermEffective_Date = ROW.find('PolicyTermEffectiveDate').text
    PolicyTermExpirationDate = ROW.find('PolicyTermExpirationDate').text
    transactiondate = ROW.find('TransactionDate').text
    
    trans1 = datetime.strptime(transactiondate, '%Y-%m-%dT%H:%M:%S.%f')
    eff1 = datetime.strptime(PolicyTermEffective_Date, '%Y-%m-%dT%H:%M:%S.%f')
    exp1 = datetime.strptime(PolicyTermExpirationDate, '%Y-%m-%dT%H:%M:%S.%f')
    
    delta1 = (exp1 - eff1).days
    c = today + timedelta (days=delta1)
    
    if trans1 < exp1:
        PolicyTermExpirationDate = c.strftime('%Y-%m-%dT%H:%M:%S.%f')
    else:
        PolicyTermExpirationDate = today.strftime('%Y-%m-%dT%H:%M:%S.%f')
        
# <TransactionDate> 
for ROW in doc.findall('ROW'):
    today = datetime.now()
    transactiondate = ROW.find('TransactionDate').text
    previous_update = datetime.strptime(transactiondate, '%Y-%m-%dT%H:%M:%S.%f')
    if previous_update < today:
                    TransactionDate = today.strftime('%Y-%m-%dT%H:%M:%S.%f')

replacements = [Policy_Random_Choice , PolicyTermEffectiveDate , PolicyTermExpirationDate , TransactionDate]

targets = doc.xpath('//ROW[PolicyId="POL000002NGJ"]')
for target in targets:
    
    target.xpath('./PolicyId')[0].text = replacements[0]
    target.xpath('./PolicyTermEffectiveDate')[0].text = replacements[1]
    target.xpath('./PolicyTermExpirationDate')[0].text = replacements[2]
    target.xpath('./TransactionDate')[0].text = replacements[3]
        
print(etree.tostring(doc).decode())

moment=time.strftime("%Y-%m-%d--%H_%M_%S",time.localtime())

doc.write(r'C:\Users\relangovan\Desktop\Project_Document\XML\OutputFile\new_policy_'+moment+'.xml', pretty_print=True, xml_declaration=True,   encoding="utf-8")
Here i would like to apply the same node logic's and update values by parsing multiple XML files which has the same file structure in single python .Please note i want to generate the individual output XML's with only the Updated ROWS.

The reason why i am asking is for eg,
1. Today i create a policy, so after submission one XML will be generated for NEW POLICY.
2. Then next week i cancels the same policy, In this case 2nd XML will be generated for CANCEL based on cancellation date and status attributes.

But both point 1 and 2, POLICY NUMBER is same and only other relevant node values will be different.

In this case, when i want to regenerates both the XML's just updating the KEY Nodes like policy number and policy effective dates to send as a Runtime policy, expected to generate 2 output XML's with NEW randomly generated policy id's and other key values updated based on the logic in code in both the files.

I created two variables to parse the base file "etree.parse" and two targets to parse the second file in same script like below, While executing, non key node values will update which is good. But the key nodes are not updating because in For loop I defined only one variable "doc=etree.parse()". i struck here to parse multiple XML's to update the key values. Please find below what i did for multiple file parsing

from lxml import etree
from datetime import datetime, timedelta
import random, string
import time


doc = etree.parse(r'C:\Users\relangovan\Desktop\Project_Document\XML\python.xml') #if parsing from a file
doc1 = etree.parse(r'C:\Users\relangovan\Desktop\Project_Document\XML\python1.xml')

-
-



# <PolicyId> - Random generated policy number
Policy_Random_Choice = 'POL' + ''.join(random.choices(string.digits, k=6)) + 'NGJ'

# PolicyTermExpirationDate
for ROW in doc.findall('ROW'):
    
    today = datetime.now()
    PolicyTermEffective_Date = ROW.find('PolicyTermEffectiveDate').text
    PolicyTermExpirationDate = ROW.find('PolicyTermExpirationDate').text
    transactiondate = ROW.find('TransactionDate').text
    
    trans1 = datetime.strptime(transactiondate, '%Y-%m-%dT%H:%M:%S.%f')
    eff1 = datetime.strptime(PolicyTermEffective_Date, '%Y-%m-%dT%H:%M:%S.%f')
    exp1 = datetime.strptime(PolicyTermExpirationDate, '%Y-%m-%dT%H:%M:%S.%f')
    
    delta1 = (exp1 - eff1).days
    c = today + timedelta (days=delta1)
    
    if trans1 < exp1:
        PolicyTermExpirationDate = c.strftime('%Y-%m-%dT%H:%M:%S.%f')
    else:
        PolicyTermExpirationDate = today.strftime('%Y-%m-%dT%H:%M:%S.%f')
-
-
-

targets = doc.xpath('//ROW[PolicyId="POL000002NGJ"]')
for target in targets:
    
    target.xpath('./PolicyId')[0].text = replacements[0]
    target.xpath('./PolicyTermEffectiveDate')[0].text = replacements[1]
    target.xpath('./PolicyTermExpirationDate')[0].text = replacements[2]


targets = doc1.xpath('//ROW[PolicyId="POL000002NGJ"]')
 for target in targets:
    
     target.xpath('./PolicyId')[0].text = replacements[0]
     target.xpath('.//TransactionDate')[0].text = replacements[1]
     target.xpath('./PolicyTermEffectiveDate')[0].text = replacements[2]
-
-
-
doc.write('output.xml', pretty_print=True, xml_declaration=True,   encoding="utf-8")
doc1.write('output1.xml', pretty_print=True, xml_declaration=True,   encoding="utf-8")
Can you please help to parse multiple variables in one python or any best solution in my code?

Expected Results : If you see the below 2 XML's, Policy "POL000002NGJ" exists in both the files with different status, So, after executing the code i would like to generate 2 output file only for the updated rows and with new policy number(same in both the files) and update key node values based on the code added..

(In below XML's, policies POL111111NGJ and POL222222NGJ should not added to the output files. To filter these policies, I would like parse it as an object instead of harcoding the not needed policies. Because it varies all the times)

Sample 2 XML's i would like to parse

[inline]
-- New Policy

<TABLE>
<ROW>
<PolicyId>POL000002NGJ</PolicyId>
<BusinessCoverageCode>COV00002D3X1</BusinessCoverageCode>
<TransactionDate>2020-03-23T10:56:15.00</TransactionDate>
<PolicyTermEffectiveDate>2019-07-30T0:00:00.000</PolicyTermEffectiveDate>
<Status>New</Status>
</ROW>
<ROW>
<PolicyId>POL000002NGJ</PolicyId>
<BusinessCoverageCode>COV00004AZ2</BusinessCoverageCode>
<TransactionDate>2020-03-23T10:56:15.00</TransactionDate>
<PolicyTermEffectiveDate>2019-07-30T0:00:00.000</PolicyTermEffectiveDate>
<Status>New</Status>
</ROW>
<ROW>
<PolicyId>POL111111NGJ</PolicyId>
<BusinessCoverageCode>COV00002D3X4</BusinessCoverageCode>
<TransactionDate>2020-03-23T10:56:15.00</TransactionDate>
<PolicyTermEffectiveDate>2019-07-30T0:00:00.000</PolicyTermEffectiveDate>
<Status>New</Status>
</ROW>
</TABLE>


-- Cancels

<TABLE>
<ROW>
<PolicyId>POL000002NGJ</PolicyId>
<BusinessCoverageCode>COV00002D3X1</BusinessCoverageCode>
<TransactionDate>2020-03-23T10:56:15.00</TransactionDate>
<PolicyTermEffectiveDate>2019-08-10T0:00:00.000</PolicyTermEffectiveDate>
<Status>Cancels</Status>
</ROW>
<ROW>
<PolicyId>POL000002NGJ</PolicyId>
<BusinessCoverageCode>COV00004AZ2</BusinessCoverageCode>
<TransactionDate>2020-03-23T10:56:15.00</TransactionDate>
<PolicyTermEffectiveDate>2019-08-10T0:00:00.000</PolicyTermEffectiveDate>
<Status>Cancels</Status>
</ROW>
<ROW>
<PolicyId>POL222222NGJ</PolicyId>
<BusinessCoverageCode>COV00002D3X4</BusinessCoverageCode>
<TransactionDate>2020-03-23T10:56:15.00</TransactionDate>
<PolicyTermEffectiveDate>2019-07-30T0:00:00.000</PolicyTermEffectiveDate>
<Status>New</Status>
</ROW>
</TABLE>

[/inline]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  __init__() got multiple values for argument 'schema' dawid294 4 1,886 Jan-03-2024, 09:42 AM
Last Post: buran
Question Using SQLAlchemy, prevent SQLite3 table update by multiple program instances Calab 3 703 Aug-09-2023, 05:51 PM
Last Post: Calab
  How to combine multiple column values into 1? cubangt 15 2,630 Aug-11-2022, 08:25 PM
Last Post: cubangt
  Trying to parse only 3 key values from json file cubangt 8 3,338 Jul-16-2022, 02:05 PM
Last Post: deanhystad
  Need to parse a list of boolean columns inside a list and return true values Python84 4 2,036 Jan-09-2022, 02:39 AM
Last Post: Python84
  How do you format Update statement with multiple conditions hammer 4 2,038 Dec-16-2021, 10:49 PM
Last Post: hammer
  Generate Multiple sql Files With csv inputs vkomarag 13 4,109 Aug-20-2021, 07:03 PM
Last Post: vkomarag
  Generate a string of words for multiple lists of words in txt files in order. AnicraftPlayz 2 2,757 Aug-11-2021, 03:45 PM
Last Post: jamesaarr
  Function - Return multiple values tester_V 10 4,319 Jun-02-2021, 05:34 AM
Last Post: tester_V
  Xlsxwriter: Create Multiple Sheets Based on Dataframe's Sorted Values KMV 2 3,441 Mar-09-2021, 12:24 PM
Last Post: KMV

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020