Python Forum
extracting XML data into datframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extracting XML data into datframe
#1
I am querying an API that returns the following, I am trying to extract the name and flow rate data using the below code, but can't work out why its not working?

Quote:b'<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body><GetInstantaneousFlowDataResponse xmlns="http://www.NationalGrid.com/EDP/UI/"><GetInstantaneousFlowDataResult>
<ReportName xmlns="http://www.NationalGrid.com/EDP/BusinessEntities/Public">Instantaneous Flow Report</ReportName>
<PublishedTime xmlns="http://www.NationalGrid.com/EDP/BusinessEntities/Public">2018-07-29T10:47:01</PublishedTime>
<EDPReportPage xmlns="http://www.NationalGrid.com/EDP/BusinessEntities/Public">
<PageName>page1</PageName>
<CurrentGasDay>2018-07-29T00:00:00</CurrentGasDay>
<EDPEnergyGraphTableCollection>
<EDPEnergyGraphTableBE>
<EDPEnergyGraphTableName>table1</EDPEnergyGraphTableName>
<ItemPosition>1</ItemPosition>
<EDPObjectCollection><EDPObjectBE>
<EDPObjectName>ALDBROUGH</EDPObjectName>
<EnergyDataList><EDPEnergyDataBE>
<ApplicableAt>2018-07-29T10:36:00</ApplicableAt>
<FlowRate>0</FlowRate>
<QualityIndicator />
<ScheduleTime>2018-07-29T10:48:00</ScheduleTime>
</EDPEnergyDataBE><EDPEnergyDataBE>
<ApplicableAt>2018-07-29T10:38:00</ApplicableAt>
<FlowRate>0</FlowRate>
<QualityIndicator />
<ScheduleTime>2018-07-29T10:48:00</ScheduleTime>
</EDPEnergyDataBE><EDPEnergyDataBE>
<ApplicableAt>2018-07-29T10:40:00</ApplicableAt>
<FlowRate>0</FlowRate><QualityIndicator

import requests
from lxml import etree
    
fromDate = "2018-07-29"

def getXML():
    url="http://energywatch.natgrid.co.uk/EDP-PublicUI/PublicPI/InstantaneousFlowWebService.asmx"
    headers = {'content-type': 'application/soap+xml; charset=utf-8'}
    body ="""<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
  <soap12:Body>
    <GetInstantaneousFlowData xmlns="http://www.NationalGrid.com/EDP/UI/" />
  </soap12:Body>
</soap12:Envelope>"""

    response = requests.post(url,data=body,headers=headers)
    return response.content

import pandas as pd
df1 = pd.DataFrame(columns=("name","value"))   
for pd_date in pd.date_range(fromDate, periods=1):
    day = pd_date.strftime('%Y-%m-%d')
     
    root = etree.fromstring(getXML())
     
    #map prefix 'd' to the default namespace URI
    ns = {'d': 'http://www.NationalGrid.com/EDP/UI/'}
     
    publication_objects = root.xpath('//d:EDPObjectCollection', namespaces=ns)
     
    for obj in publication_objects:
        name = obj.find('d:EDPObjectName', ns).text
     
        for data in obj.findall('d:EnergyDataList/d:EDPEnergyDataBE', ns):
            value = float(data.find('d:FlowRate', ns).text)
        
            df1.loc[len(df1) +1] = [name, value]
            
Reply
#2
(Jul-29-2018, 02:54 PM)hey_arnold Wrote: can't work out why its not working
not working is not clear. not working how?
Do you get an exception? In this case post the full traceback in error tags.
The result is not what you expect? In this case show us what you expect it to be.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
This is the returned XML:
Output:
b' <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <GetInstantaneousFlowDataResponse xmlns="http://www.NationalGrid.com/EDP/UI/"> <GetInstantaneousFlowDataResult> <ReportName xmlns="http://www.NationalGrid.com/EDP/BusinessEntities/Public">Instantaneous Flow Report </ReportName> <PublishedTime xmlns="http://www.NationalGrid.com/EDP/BusinessEntities/Public">2018-07-29T17:11:00 </PublishedTime> <EDPReportPage xmlns="http://www.NationalGrid.com/EDP/BusinessEntities/Public"> <PageName>page1</PageName> <CurrentGasDay>2018-07-29T00:00:00</CurrentGasDay> <EDPEnergyGraphTableCollection> <EDPEnergyGraphTableBE> <EDPEnergyGraphTableName>table1</EDPEnergyGraphTableName> <ItemPosition>1</ItemPosition> <EDPObjectCollection> <EDPObjectBE> <EDPObjectName>ALDBROUGH</EDPObjectName> <EnergyDataList> <EDPEnergyDataBE> <ApplicableAt>2018-07-29T17:00:00</ApplicableAt> <FlowRate>0</FlowRate> <QualityIndicator /> <ScheduleTime>2018-07-29T17:12:00</ScheduleTime> </EDPEnergyDataBE> <EDPEnergyDataBE> <ApplicableAt>2018-07-29T17:02:00</ApplicableAt> <FlowRate>0</FlowRate> <QualityIndicator /> <ScheduleTime>2018-07-29T17:12:00</ScheduleTime> </EDPEnergyDataBE> <EDPEnergyDataBE> <ApplicableAt>2018-07-29T17:04:00</ApplicableAt> <FlowRate>0</FlowRate> <QualityIndicator /> <ScheduleTime>2018-07-29T17:12:00</ScheduleTime> </EDPEnergyDataBE> <EDPEnergyDataBE> <ApplicableAt>2018-07-29T17:06:00</ApplicableAt> <FlowRate>0</FlowRate> <QualityIndicator /> <ScheduleTime>2018-07-29T17:12:00</ScheduleTime> </EDPEnergyDataBE> <EDPEnergyDataBE> <ApplicableAt>2018-07-29T17:08:00</ApplicableAt> <FlowRate>0</FlowRate> <QualityIndicator /> <ScheduleTime>2018-07-29T17:12:00</ScheduleTime> </EDPEnergyDataBE> <EDPEnergyDataBE> <ApplicableAt>2018-07-29T17:10:00</ApplicableAt> <FlowRate>0</FlowRate> <QualityIndicator /> <ScheduleTime>2018-07-29T17:12:00</ScheduleTime> </EDPEnergyDataBE> </EnergyDataList> </EDPObjectBE> </EDPObjectCollection> <Description>Total Supply</Description> </EDPEnergyGraphTableBE> </EDPEnergyGraphTableCollection> </EDPReportPage> </GetInstantaneousFlowDataResult> </GetInstantaneousFlowDataResponse> </soap:Body> </soap:Envelope>'
I would like to extract the following information: EDPObjectName, ApplicableAt, Flow Rate & ScheduleTime. I have tried modifying some code I have found on this forum, but my dataframe does not populate with any data, it just has the column names.

No exceptions are produced when I run the code.

I was hoping that the data would be picked out of the XML and put into a dataframe.

import requests
from lxml import etree
    
fromDate = "2018-07-29"

def getXML():
    url="http://energywatch.natgrid.co.uk/EDP-PublicUI/PublicPI/InstantaneousFlowWebService.asmx"
    headers = {'content-type': 'application/soap+xml; charset=utf-8'}
    body ="""<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
  <soap12:Body>
    <GetInstantaneousFlowData xmlns="http://www.NationalGrid.com/EDP/UI/" />
  </soap12:Body>
</soap12:Envelope>"""

    response = requests.post(url,data=body,headers=headers)
    return response.content

import pandas as pd
df1 = pd.DataFrame(columns=("name","value"))   
for pd_date in pd.date_range(fromDate, periods=1):
    day = pd_date.strftime('%Y-%m-%d')
     
    root = etree.fromstring(getXML())
     
    #map prefix 'd' to the default namespace URI
    ns = {'d': 'http://www.NationalGrid.com/EDP/UI/'}
     
    publication_objects = root.xpath('//d:EDPReportBE', namespaces=ns)
     
    for obj in publication_objects:
        name = obj.find('d:EDPObjectName', ns).text

        for data in obj.findall('d:EDPObjectBE/d:EDPEnergyDataBE', ns):
            value = float(data.find('d:FlowRate', ns).text)
        
            df1.loc[len(df1) +1] = [name, value]
            
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Searching a .txt file for a specific number and extracting the corresponding data nrozman 3 3,220 Jul-27-2018, 02:07 PM
Last Post: nrozman

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020