extracting data from XML/SOAP

hey_arnold · May-06-2018, 08:30 AM

Hi All,

My code below is currently returning:

LNG Stock Level 2016-09-30T14:00:14Z 2016-03-14T00:00:00Z 6722.422335

Which is what I want it to do, however there should be a lot more data than there is. When I look at the XML data that the api returns there is a lot of data points however I cannot get my code to output it all.

It should be doing something like this, but its not:

LNG Stock Level 2016-09-30T14:00:14Z 2016-03-14T00:00:00Z 6722.422335
LNG Stock Level 2016-09-30T14:00:14Z 2016-03-14T00:00:00Z 3048.422335
LNG Stock Level 2016-09-30T14:00:14Z 2016-03-14T00:00:00Z 3430.422335

import requests
from lxml import etree

def getXML():

    toDate = "2016-03-16"
    fromDate = "2016-03-14"
    dateType = "gasday"

    url="http://marketinformation.natgrid.co.uk/MIPIws-public/public/publicwebservice.asmx"
    headers = {'content-type': 'application/soap+xml; charset=utf-8'}

    body ="""<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
        <soap12:Body>
            <GetPublicationDataWM xmlns="http://www.NationalGrid.com/MIPI/">
                <reqObject>
                    <LatestFlag>Y</LatestFlag>
                    <ApplicableForFlag>Y</ApplicableForFlag>
                    <ToDate>%s</ToDate>
                    <FromDate>%s</FromDate>
                    <DateType>%s</DateType>
                    <PublicationObjectNameList>
                        <string>LNG Stock Level</string>
                    </PublicationObjectNameList>
                </reqObject>
            </GetPublicationDataWM>
        </soap12:Body>
    </soap12:Envelope>""" % (toDate, fromDate,dateType)


    response = requests.post(url,data=body,headers=headers)

    return response.content

root = etree.fromstring(getXML())

# map prefix 'd' to the default namespace URI
ns = { 'd': 'http://www.NationalGrid.com/MIPI/'}

publication_objects = root.xpath('//d:CLSMIPIPublicationObjectBE', namespaces=ns)
for obj in publication_objects:
    name = obj.find('d:PublicationObjectName', ns).text
    data = obj.find('d:PublicationObjectData/d:CLSPublicationObjectDataBE', ns)  
    applicable_at = data.find('d:ApplicableAt', ns).text    
    applicable_for = data.find('d:ApplicableFor', ns).text
    value = float(data.find('d:Value', ns).text)
    
    
print(name,applicable_at,applicable_for,value)

killerrex · May-06-2018, 09:52 AM

You are printing just the last entry in the loop because the print in line 49 is outside the loop in line 41.

for obj in publication_objects:
    name = obj.find('d:PublicationObjectName', ns).text
    data = obj.find('d:PublicationObjectData/d:CLSPublicationObjectDataBE', ns)  
    applicable_at = data.find('d:ApplicableAt', ns).text    
    applicable_for = data.find('d:ApplicableFor', ns).text
    value = float(data.find('d:Value', ns).text)
    # Report each entry
    print(name,applicable_at,applicable_for,value)

Either move it inside or collect the parts in lists (or your favourite data structure) and write them later.

hey_arnold · May-06-2018, 07:44 PM

If I do what you are suggesting it doesn't change how many values it returns, still only one row of data.

If you could provide me a link about how to collect the parts that would be great. Thanks for the reply.

killerrex · May-06-2018, 10:11 PM

When I try to run your code this is the result I receive:

Output:<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Body>
    <GetPublicationDataWMResponse xmlns="http://www.NationalGrid.com/MIPI/">
      <GetPublicationDataWMResult>
        <CLSMIPIPublicationObjectBE>
          <PublicationObjectName>LNG Stock Level</PublicationObjectName>
          <PublicationObjectData>
            <CLSPublicationObjectDataBE>
              <ApplicableAt>2016-09-30T14:00:14Z</ApplicableAt>
              <ApplicableFor>2016-03-14T00:00:00Z</ApplicableFor>
              <Value>6722.422335</Value>
              <GeneratedTimeStamp>2016-09-30T14:56:00Z</GeneratedTimeStamp>
              <QualityIndicator> </QualityIndicator>
              <Substituted>N</Substituted>
              <CreatedDate>2016-09-30T14:56:39Z</CreatedDate>
            </CLSPublicationObjectDataBE>
            <CLSPublicationObjectDataBE>
              <ApplicableAt>2016-09-30T14:00:14Z</ApplicableAt>
              <ApplicableFor>2016-03-15T00:00:00Z</ApplicableFor>
              <Value>6406.486959</Value>
              <GeneratedTimeStamp>2016-09-30T14:56:00Z</GeneratedTimeStamp>
              <QualityIndicator> </QualityIndicator>
              <Substituted>N</Substituted>
              <CreatedDate>2016-09-30T14:56:39Z</CreatedDate>
            </CLSPublicationObjectDataBE>
            <CLSPublicationObjectDataBE>
              <ApplicableAt>2016-09-30T14:00:14Z</ApplicableAt>
              <ApplicableFor>2016-03-16T00:00:00Z</ApplicableFor>
              <Value>6064.522162</Value>
              <GeneratedTimeStamp>2016-09-30T14:56:00Z</GeneratedTimeStamp>
              <QualityIndicator> </QualityIndicator>
              <Substituted>N</Substituted>
              <CreatedDate>2016-09-30T14:56:39Z</CreatedDate>
            </CLSPublicationObjectDataBE>
          </PublicationObjectData>
        </CLSMIPIPublicationObjectBE>
      </GetPublicationDataWMResult>
    </GetPublicationDataWMResponse>
  </soap:Body>
</soap:Envelope>

As you can see there is just 1 element CLSMIPIPublicationObjectBE. If what you want is to iterate in all the CLSPublicationObjectDataBE (and not get just the first one) you can do something like:

publication_objects = root.xpath('//d:CLSMIPIPublicationObjectBE', namespaces=ns)
for obj in publication_objects:
    name = obj.find('d:PublicationObjectName', ns).text
    
    for data in obj.findall('d:PublicationObjectData/d:CLSPublicationObjectDataBE', ns):
        applicable_at = data.find('d:ApplicableAt', ns).text    
        applicable_for = data.find('d:ApplicableFor', ns).text
        value = float(data.find('d:Value', ns).text)
    
        print(name,applicable_at,applicable_for,value)

With that the output I get is:

Output:LNG Stock Level 2016-09-30T14:00:14Z 2016-03-14T00:00:00Z 6722.422335
LNG Stock Level 2016-09-30T14:00:14Z 2016-03-15T00:00:00Z 6406.486959
LNG Stock Level 2016-09-30T14:00:14Z 2016-03-16T00:00:00Z 6064.522162

hey_arnold · May-07-2018, 09:23 AM

Thanks very helpful

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Extracting Data into Columns using pdfplumber	arvin	17	5,475	Dec-17-2022, 11:59 AM Last Post: arvin
	Extracting Data from tables	DataExtrator	0	1,134	Nov-02-2021, 12:24 PM Last Post: DataExtrator
	extracting data	ajitnayak1987	1	1,529	Jul-29-2021, 06:13 AM Last Post: bowlofred
	Extracting and printing data	ajitnayak1987	0	1,406	Jul-28-2021, 09:30 AM Last Post: ajitnayak1987
	Extracting unique pairs from a data set based on another value	rybina	2	2,293	Feb-12-2021, 08:36 AM Last Post: rybina
	extracting data/strings from Word doc	mikkelibsen	1	1,908	Feb-10-2021, 11:06 AM Last Post: Larz60+
	Extracting data without showing dtype, name etc.	tgottsc1	3	4,349	Jan-10-2021, 02:15 PM Last Post: buran
	Extracting data from a website	tgottsc1	2	2,252	Jan-09-2021, 08:14 PM Last Post: tgottsc1
	Extracting data based on specific patterns in a text file	K11	1	2,199	Aug-28-2020, 09:00 AM Last Post: Gribouillis
	Extracting Rows From Data Frame and Understanding The Code	JoeDainton123	0	1,429	Aug-03-2020, 04:08 PM Last Post: JoeDainton123

extracting data from XML/SOAP

User Panel Messages

Announcements