Python Forum

Full Version: Reading and Extracting XML
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Can someone please tell me why this code isn't printing integers for the count items list? The output is encoded as utf-8, but I thought the fromstring() parses xml directly to the element lst, from which can pull out elements. If I need to decode lst, how would I do that?

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET

xml = 'http://py4e-data.dr-chuck.net/comments_42.xml'

url = urllib.request.urlopen(xml)
data = url.read()
print(data.decode())
tree = ET.fromstring(data)
lst = tree.findall('.//count')
print(lst)


Here is the output of print(lst):
[<Element 'count' at 0x000000000BD1EBD8>, <Element 'count' at 0x000000000BD1EEA8>, <Element 'count' at 0x000000000BD1EF98>, <Element 'count' at 0x000000000BD28228>, <Element 'count' at 0x000000000BD28318>, <Element 'count' at 0x000000000BD28408>, <Element 'count' at 0x000000000BD284F8>, <Element 'count' at 0x000000000BD285E8>, <Element 'count' at 0x000000000BD286D8>, <Element 'count' at 0x000000000BD287C8>, <Element 'count' at 0x000000000BD288B8>, <Element 'count' at 0x000000000BD289A8>, <Element 'count' at 0x000000000BD28A98>, <Element 'count' at 0x000000000BD28B88>, <Element 'count' at 0x000000000BD28C78>, <Element 'count' at 0x000000000BD28D68>, <Element 'count' at 0x000000000BD28E58>, <Element 'count' at 0x000000000BD28F48>, <Element 'count' at 0x000000000BD2B098>, <Element 'count' at 0x000000000BD2B188>, <Element 'count' at 0x000000000BD2B278>, <Element 'count' at 0x000000000BD2B368>, <Element 'count' at 0x000000000BD2B458>, <Element 'count' at 0x000000000BD2B548>, <Element 'count' at 0x000000000BD2B638>, <Element 'count' at 0x000000000BD2B728>, <Element 'count' at 0x000000000BD2B818>, <Element 'count' at 0x000000000BD2B908>, <Element 'count' at 0x000000000BD2B9F8>, <Element 'count' at 0x000000000BD2BAE8>, <Element 'count' at 0x000000000BD2BBD8>, <Element 'count' at 0x000000000BD2BCC8>, <Element 'count' at 0x000000000BD2BDB8>, <Element 'count' at 0x000000000BD2BEA8>, <Element 'count' at 0x000000000BD2BF98>, <Element 'count' at 0x000000000BD2E0E8>, <Element 'count' at 0x000000000BD2E1D8>, <Element 'count' at 0x000000000BD2E2C8>, <Element 'count' at 0x000000000BD2E3B8>, <Element 'count' at 0x000000000BD2E4A8>, <Element 'count' at 0x000000000BD2E598>, <Element 'count' at 0x000000000BD2E688>, <Element 'count' at 0x000000000BD2E778>, <Element 'count' at 0x000000000BD2E868>, <Element 'count' at 0x000000000BD2E958>, <Element 'count' at 0x000000000BD2EA48>, <Element 'count' at 0x000000000BD2EB38>, <Element 'count' at 0x000000000BD2EC28>, <Element 'count' at 0x000000000BD2ED18>, <Element 'count' at 0x000000000BD2EE08>]
Use code tag BBcode help.
You have to extract text for object in list.
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET

xml = 'http://py4e-data.dr-chuck.net/comments_42.xml'
url = urllib.request.urlopen(xml)
data = url.read()
#print(data.decode())
tree = ET.fromstring(data)
lst = tree.findall('.//count')
for item in lst:
    print(item.text)
Of course. Thank you!!