Posts: 19
Threads: 7
Joined: Apr 2018
Aug-05-2018, 09:40 AM
(This post was last modified: Aug-05-2018, 09:41 AM by hey_arnold.)
I am trying to extract certain data from the XML using Element tree, but I am unsure why my code doesn't work, any guidance would be helpful.
It appears that the paths I have used aren't correct as no values are returned. I think I am fairly close..... hopefully?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
import requests
from lxml import etree
fromDate = "2018-07-29"
def getXML():
headers = { 'content-type' : 'application/soap+xml; charset=utf-8' }
body =
response = requests.post(url,data = body,headers = headers)
return response.content
import pandas as pd
df1 = pd.DataFrame(columns = ( "applicable_at" , "name" , "value" , "created_date" ))
for pd_date in pd.date_range(fromDate, periods = 1 ):
day = pd_date.strftime( '%Y-%m-%d' )
root = etree.fromstring(getXML())
publication_objects = root.xpath( '//d:EDPObjectCollection' , namespaces = ns)
for obj in publication_objects:
name = obj.find( 'd:EDPObjectName' , ns).text
for data in obj.findall( 'd:EnergyDataList/d:EDPEnergyDataBE' , ns):
applicable_at = pd.to_datetime(data.find( 'd:ApplicableAt' , ns).text)
value = float (data.find( 'd:FlowRate' , ns).text)
created_date = pd.to_datetime(data.find( 'd:ScheduleTime' , ns).text)
df1.loc[ len (df1) + 1 ] = [applicable_at,name, value,created_date]
|
Posts: 2,953
Threads: 48
Joined: Sep 2016
Did you check if response.content contains the expected data?
Posts: 19
Threads: 7
Joined: Apr 2018
Aug-05-2018, 11:33 AM
(This post was last modified: Aug-05-2018, 11:33 AM by hey_arnold.)
Yep. I printed response.content and it contains the XML that I need.
The problem is to do with the paths I have set element tree to look at. I think they might not Ben right.
Posts: 2,953
Threads: 48
Joined: Sep 2016
The response is returned as bytes. What happens if you convert it to string?
Posts: 19
Threads: 7
Joined: Apr 2018
It should work as bytes. It’s something to do with the xml structure and how I’m trying to extract it
Posts: 2,953
Threads: 48
Joined: Sep 2016
The code "doesn't work" how? Getting error, do not get a result?
Posts: 19
Threads: 7
Joined: Apr 2018
Aug-05-2018, 07:48 PM
(This post was last modified: Aug-05-2018, 07:48 PM by hey_arnold.)
(Aug-05-2018, 12:25 PM)wavic Wrote: The code "doesn't work" how? Getting error, do not get a result?
Here is a working example of what I want to do. However with the example I am stuck on, the XML returned from the API has a more complicated structure and when I try and get Element tree to pickup certain elements of the tree, it won't do it.
Working Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
import requests
from lxml import etree
toDate = "2018-04-25"
fromDate = "2018-04-25"
dateType = "gasday"
def getXML():
headers = { 'content-type' : 'application/soap+xml; charset=utf-8' }
body =
% (toDate, fromDate,dateType)
response = requests.post(url,data = body,headers = headers)
return response.content
root = etree.fromstring(getXML())
import pandas as pd
df1 = pd.DataFrame(columns = ( "applicable_at" , "applicable_for" , "name" , "value" , "quality_indicator" , "substituted" , "created_date" ))
for pd_date in pd.date_range(fromDate, periods = 1 ):
day = pd_date.strftime( '%Y-%m-%d' )
root = etree.fromstring(getXML())
publication_objects = root.xpath( '//d:CLSMIPIPublicationObjectBE' , namespaces = ns)
for obj in publication_objects:
name = obj.find( 'd:PublicationObjectName' , ns).text
for data in obj.findall( 'd:PublicationObjectData/d:CLSPublicationObjectDataBE' , ns):
applicable_at = pd.to_datetime(data.find( 'd:ApplicableAt' , ns).text)
applicable_for = pd.to_datetime(data.find( 'd:ApplicableFor' , ns).text)
value = float (data.find( 'd:Value' , ns).text)
quality_indicator = data.find( 'd:Value' , ns).text
substituted = data.find( 'd:Substituted' , ns).text
created_date = pd.to_datetime(data.find( 'd:CreatedDate' , ns).text)
df1.loc[ len (df1) + 1 ] = [applicable_at, applicable_for,name, value, quality_indicator, substituted, created_date]
|
My none working example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
import requests
from lxml import etree
fromDate = "2018-07-29"
def getXML():
headers = { 'content-type' : 'application/soap+xml; charset=utf-8' }
body =
response = requests.post(url,data = body,headers = headers)
return response.content
import pandas as pd
df1 = pd.DataFrame(columns = ( "applicable_at" , "name" , "value" , "created_date" ))
for pd_date in pd.date_range(fromDate, periods = 1 ):
day = pd_date.strftime( '%Y-%m-%d' )
root = etree.fromstring(getXML())
publication_objects = root.xpath( '//d:EDPObjectCollection' , namespaces = ns)
for obj in publication_objects:
name = obj.find( 'd:EDPObjectName' , ns).text
for data in obj.findall( 'd:EnergyDataList/d:EDPEnergyDataBE' , ns):
applicable_at = pd.to_datetime(data.find( 'd:ApplicableAt' , ns).text)
value = float (data.find( 'd:FlowRate' , ns).text)
created_date = pd.to_datetime(data.find( 'd:ScheduleTime' , ns).text)
df1.loc[ len (df1) + 1 ] = [applicable_at,name, value,created_date]
|
Have I set the correct namespace and root path in order to extract data from EDPObjectName?
Posts: 2,953
Threads: 48
Joined: Sep 2016
Aug-05-2018, 09:57 PM
(This post was last modified: Aug-05-2018, 09:58 PM by wavic.)
Well, I managed to get a non-empty publication_objects list.
I don't know how I did it. I am not using lxml and xpath is almost magic for me.
However here it is ( some print functions are added for my convenience ):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
import requests
from lxml import etree
fromDate = "2018-07-29"
def getXML():
headers = { 'content-type' : 'application/soap+xml; charset=utf-8' }
body =
response = requests.post(url,data = body,headers = headers)
print (response.content)
return response.content
import pandas as pd
df1 = pd.DataFrame(columns = ( "applicable_at" , "name" , "value" , "created_date" ))
for pd_date in pd.date_range(fromDate, periods = 1 ):
day = pd_date.strftime( '%Y-%m-%d' )
root = etree.fromstring(getXML())
print (root)
publication_objects = root.xpath( '//soap:EDPObjectCollection' , namespaces = ns)
print ( '\n\n' )
print (publication_objects)
for obj in publication_objects:
name = obj.find( 'soap:EDPObjectName' , ns).text()
for data in obj.findall( 'soap:EnergyDataList/d:EDPEnergyDataBE' , ns):
applicable_at = pd.to_datetime(data.find( 'd:ApplicableAt' , ns).text)
value = float (data.find( 'soap:FlowRate' , ns).text)
created_date = pd.to_datetime(data.find( 'soap:ScheduleTime' , ns).text)
df1.loc[ len (df1) + 1 ] = [applicable_at,name, value,created_date]
|
It produces another error but I am too tired. It's an hour after midnight here.
Posts: 19
Threads: 7
Joined: Apr 2018
Thanks for your help so far, I haven't had a chance to take a look at the what you posted up as I have been super busy. I will try and take a look this evening.
Posts: 19
Threads: 7
Joined: Apr 2018
Aug-09-2018, 02:53 PM
(This post was last modified: Aug-09-2018, 02:53 PM by hey_arnold.)
Does that mean something further up in my code didnt work correctly?
Here is my error:
Error: Traceback (most recent call last):
File "python", line 34, in <module>
AttributeError: 'NoneType' object has no attribute 'text'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
import requests
from lxml import etree
fromDate = "2018-07-29"
def getXML():
headers = { 'content-type' : 'application/soap+xml; charset=utf-8' }
body =
response = requests.post(url,data = body,headers = headers)
print (response.content)
return response.content
import pandas as pd
df1 = pd.DataFrame(columns = ( "applicable_at" , "name" , "value" , "created_date" ))
for pd_date in pd.date_range(fromDate, periods = 1 ):
day = pd_date.strftime( '%Y-%m-%d' )
root = etree.fromstring(getXML())
print (root)
publication_objects = root.xpath( '//soap:EDPObjectCollection' , namespaces = ns)
print ( '\n\n' )
print (publication_objects)
for obj in publication_objects:
name = obj.find( 'soap:EDPObjectName' , ns).text()
for data in obj.findall( 'soap:EnergyDataList/d:EDPEnergyDataBE' , ns):
applicable_at = pd.to_datetime(data.find( 'd:ApplicableAt' , ns).text)
value = float (data.find( 'soap:FlowRate' , ns).text)
created_date = pd.to_datetime(data.find( 'soap:ScheduleTime' , ns).text)
df1.loc[ len (df1) + 1 ] = [applicable_at,name, value,created_date]
|
Could it be something to do with when the api response is parsed with etree?
|