Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse XML line by line
#1
Hi, I want to retrieve the information from XML line that looks like this:

<pdce:ExploratoryDrilling contextRef="FD2016Q4YTD" decimals="-3" id="Fact-FA88F003169A4B0FBD05B0E8D5017E3E" unitRef="usd">180000</pdce:ExploratoryDrilling>

The value I need t is 180000.

This line has its unique identifier, which is: "pdce:ExploratoryDrilling contextRef="FD2013Q4YTD" (the use of only "pdce:ExploratoryDrilling " will now work since there are other lines with this text). At the same time I cannot use "id="Fact-0AD7AA10634C504BB2614E0B821523C8" as identifier because later I want to iterate through XLM files and this parameters changes from file to file. So the only identifier that remains constant is "pdce:ExploratoryDrilling contextRef="FD2013Q4YTD"

I used to copy the XML file to .txt file and parse line by line until python encounterd the identifier and then apply regex to retrieve the content between > < symbols.

But when I apply the same concept to XML it does not work, since urllib.request returns a byte-like format which I cannot use for this purpose.

Can you please advise what can be a walkaround  for this task? I guess lxml.etree can be an alternative but I can not figure out how to find the required line.

So far this is what I have

import urllib.request

url = 'https://www.sec.gov/Archives/edgar/data/77877/000007787714000013/pdce-20131231.xml'
tag = '<pdce:ExploratoryDrilling contextRef="FD2013Q4YTD'

source = urllib.request.urlopen(url).readlines()

for line in source:
   if tag in line:
       print(re.findall(r'>(.*?)<',line))

UPDATE:
I actually managed to parse line by line with str(line,'utf-8'). This converts byte-like string to a str format. But still interested in other more pythonic solutions.
Reply


Messages In This Thread
Parse XML line by line - by rakhmadiev - Jun-06-2017, 09:20 PM
RE: Parse XML line by line - by nilamo - Jun-07-2017, 03:53 AM
RE: Parse XML line by line - by snippsat - Jun-07-2017, 05:46 AM
RE: Parse XML line by line - by AussieSusan - Jun-07-2017, 05:52 AM
RE: Parse XML line by line - by Larz60+ - Jun-07-2017, 11:16 AM
RE: Parse XML line by line - by rakhmadiev - Jun-07-2017, 01:44 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Monitor specific line of code and get alert Olimpiarob 0 1,578 Jul-08-2020, 10:06 AM
Last Post: Olimpiarob
  expecting value: line 1 column 1 (char 0) in print (r.json)) loutsi 3 7,778 Jun-05-2020, 08:38 PM
Last Post: nuffink
  Striping the empty line Calli 8 3,342 May-24-2020, 02:47 PM
Last Post: Calli
  Python to interact with the Linux Command Line - Centos/RHEL redhat_boy 2 2,268 May-10-2020, 08:33 AM
Last Post: redhat_boy
  How to get a new line Calli 2 2,043 Apr-19-2020, 12:17 PM
Last Post: Calli
  Scraping from multiple URLS to print in a single line. jb89 4 3,453 Jan-29-2020, 06:12 AM
Last Post: perfringo
  how calculate length of detected line in image openCV Numpy taomihiranga 0 4,427 Jun-11-2019, 04:01 PM
Last Post: taomihiranga
  [Flask] Uploading CSV file to flask, only first line being uploaded. Help ! KirkmanJ 2 6,886 Jun-25-2018, 02:24 PM
Last Post: KirkmanJ
  command line: python -c 'code here' Skaperen 7 7,927 Mar-24-2018, 08:31 AM
Last Post: Gribouillis
  Beautiful soup opens python command line and nothing happens Prince_Bhatia 4 4,586 Aug-01-2017, 11:50 AM
Last Post: Prince_Bhatia

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020