Python Forum
parsing xml and outpur as csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
parsing xml and outpur as csv
#1
Hi Guys,

I have a huge size of xml file which contents millions of trees objects, i would like to parse only specific objects and output into .csv.

Example of part of the xml as follows:

I am expecting the output of the file tabulated in csv for all matched childs like below:
column:MRBTS;LNBTS;LNCEL;offsetFreqIntra;operationalState;p0NomPucch
data:111111;111111;1;15;1;-100
data:111112;111112;1;15;1;-100


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="actual">
<header>
<managedObject class="MODULE" version="HW1.0" distName="PLMN-PLMN/MRBTS-103009/HW-1/MODULE-102" id="14926899">
<p name="identificationCode">472703A.101</p>
<p name="state">working</p>
<p name="subrackSpecificType">472703A.101-102</p>
<p name="userLabel">FRPA</p>
<p name="vendorName">HW</p>
<p name="version">101</p>
</managedObject>
<managedObject class="LNCEX" version="JKLM" distName="ABCS-ABCS/MRBTS-111111/LNBTS-111111/LNCEL-1" id="301995">
<p name="offsetFreqIntra">15</p>
<p name="operationalState">1</p>
<p name="p0NomPucch">-100</p>
<p name="p0NomPusch">-100</p>
<p name="pMax">460</p>
<p name="ulsPhrQci1Low">0</p>
</managedObject>
<managedObject class="LNCEX" version="JKLM" distName="ABCS-ABCS/MRBTS-111112/LNBTS-111112/LNCEL-2" id="301996">
<p name="offsetFreqIntra">15</p>
<p name="operationalState">1</p>
<p name="p0NomPucch">-100</p>
<p name="p0NomPusch">-100</p>
<p name="pMax">460</p>
<p name="ulsPhrQci1Low">0</p>
</managedObject>
</cmData>
</raml>

thank you
Reply
#2
To give you and idea and tool to use,as you have done nothing Wink
So lxml and BS are good parser for Python,as it large file can use lxml(fast parser) alone or as parser as i do here in BS.
When using lxml and not xml encoding='utf-8'), 'xml',has to do search in in all lowercase.
So managedObject search is managedobject

Example:
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('raml.xml', encoding='utf-8'), 'lxml')
r = soup.find('managedobject')
Some usage:
>>> r
<managedobject class="MODULE" distname="PLMN-PLMN/MRBTS-103009/HW-1/MODULE-102" id="14926899" version="HW1.0">
<p name="identificationCode">472703A.101</p>
<p name="state">working</p>
<p name="subrackSpecificType">472703A.101-102</p>
<p name="userLabel">FRPA</p>
<p name="vendorName">HW</p>
<p name="version">101</p>
</managedobject>
>>>
>>> r.attrs
{'class': ['MODULE'],
 'distname': 'PLMN-PLMN/MRBTS-103009/HW-1/MODULE-102',
 'id': '14926899',
 'version': 'HW1.0'}
>>> r.attrs['id']
'14926899'
>>> 
>>> [i for i in r.find_all('p')]
[<p name="identificationCode">472703A.101</p>,
 <p name="state">working</p>,
 <p name="subrackSpecificType">472703A.101-102</p>,
 <p name="userLabel">FRPA</p>,
 <p name="vendorName">HW</p>,
 <p name="version">101</p>]
>>>
>>> [i.text for i in r.find_all('p')]
['472703A.101', 'working', '472703A.101-102', 'FRPA', 'HW', '101']
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020