GPX and XML parsing

Adriaan · (This post was last modified: Feb-27-2023, 01:37 PM by Adriaan.)

Gents I am new here and hobby python developer. Now I am trying to read a gpx file for mij OpenCPN.
I cannot get to the data in de nested namespace.

This is what i do, first the input GPX - XML:

<?xml version="1.0" encoding="UTF-8"?>
<gpx OriginalSource="RWS dataservices" TimeStamp="06-02-2023 01:06">
    <wpt lat="51.231260864601" lon="4.4071404648151">
        <extensions>
            <opencpn:scale_min_max UseScale="true" ScaleMin="199999" />
        </extensions>
        <name></name>
        <sym>Brug_dicht</sym>
        <desc>Antwerpen: Haven Antwerpen&#x0A;Londenbrug&#x0A;Beweegbaar&#x0A;</desc>
    </wpt>
    <wpt lat="51.241605678043" lon="4.4062392425861">
        <extensions>
            <opencpn:scale_min_max UseScale="true" ScaleMin="199999" />
        </extensions>
        <name></name>
        <sym>Brug_dicht</sym>
        <desc>Antwerpen: Haven Antwerpen&#x0A;Siberiabrug&#x0A;Vast  &#x0A;</desc>
    </wpt>
    <wpt lat="51.236823170286" lon="4.4090716553058">
        <extensions>
            <opencpn:scale_min_max UseScale="true" ScaleMin="199999" />
        </extensions>
        <name></name>
        <sym>Brug_open</sym>
        <desc>Antwerpen: Haven Antwerpen&#x0A;Mexicobrug&#x0A;Vast  &#x0A;ABC</desc>
    </wpt>
</gpx>

And this is my python3

#!/usr/bin/env python
from xml.etree import ElementTree as ET #import ElementTree module as an alias ET
from lxml import objectify, etree
parser = etree.XMLParser(encoding="UTF-8", resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True)
ns = {'gpx': 'http://www.topografix.com/GPX/1/1',
        'opencpn': 'http://www.opencpn.org'}
with open('test.xml') as fobj:
    xml = fobj.read()
    root = etree.fromstring(xml.encode(),parser=parser)
    print('Root: ',root)
    for wpt in root.findall('wpt', ns):
        print(wpt.attrib['lon'],  wpt.attrib['lat'])
        sym = wpt.find('sym')
        print(sym.text)
        for ext in wpt.findall('extensions', ns):
           print('ext in wpt')
           et = ext.findall('opencpn:scale_min_max', ns)
           print(ext)

I am not that good in understanding namespaces and stuff.
Please help me with some clues
Greetings from the Netherlands
Adriaan

buran write Feb-27-2023, 01:21 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

***snippsat*** · (This post was last modified: Feb-27-2023, 04:26 PM by snippsat.)

I think is much eaiser if you a parser like Beautiful Soup.
There are also own parsers for GPX like eg gpxpy
Example.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('ggx.xml'), 'html.parser')

Example find all opencpn.

>>> open_scale = soup.find_all('opencpn:scale_min_max')
>>> open_scale
[<opencpn:scale_min_max scalemin="199999" usescale="true"></opencpn:scale_min_max>,
 <opencpn:scale_min_max scalemin="199999" usescale="true"></opencpn:scale_min_max>,
 <opencpn:scale_min_max scalemin="199999" usescale="true"></opencpn:scale_min_max>]

>>> open_scale[0]
<opencpn:scale_min_max scalemin="199999" usescale="true"></opencpn:scale_min_max>

# The attributes as dictionary
>>> open_scale[0].attrs
{'scalemin': '199999', 'usescale': 'true'}
>>> open_scale[0].attrs['scalemin']
'199999'

Example find desc

>>> desc = soup.find_all('desc')
>>> desc
[<desc>Antwerpen: Haven Antwerpen
Londenbrug
Beweegbaar
</desc>,
 <desc>Antwerpen: Haven Antwerpen
Siberiabrug
Vast  
</desc>,
 <desc>Antwerpen: Haven Antwerpen
Mexicobrug
Vast  
ABC</desc>]

# Get the text of the last one
>>> desc[2]
<desc>Antwerpen: Haven Antwerpen
Mexicobrug
Vast  
ABC</desc>

>>> desc[2].text
'Antwerpen: Haven Antwerpen\nMexicobrug\nVast  \nABC'

>>> print(desc[2].text.strip())
Antwerpen: Haven Antwerpen
Mexicobrug
Vast  
ABC

This is a little more advance,so CSS Selector works for XML in BS4.
So can go to a vaule directly.

>>> op = soup.select_one('wpt:nth-child(3) > desc')
>>> op
<desc>Antwerpen: Haven Antwerpen
Mexicobrug
Vast  
ABC</desc>

Adriaan · (This post was last modified: Feb-28-2023, 08:48 AM by Adriaan.)

@snippsat Thanks!
I will work this out today
About own parsers for GPX like eg gpxpy: I have found on Stackoverflow that this gpxpy is not able to process nested namespaces like <opencpn> here. These gpx-xml are from OpenCPN, my ship navigation tooling. I am tryijng to re-create (edit) some gpx files.
Regards,
Adriaan

**Larz60+** · Feb-28-2023, 11:21 AM

Adriaan Wrote:I have found on Stackoverflow that this gpxpy is not able to process nested namespaces like <opencpn> here. These gpx-xml are from OpenCPN, my ship navigation tooling. I am tryijng to re-create (edit) some gpx files.

Please look closely at Snippsat's code. He shows how to isolate all <opencpn> tags.

Adriaan · (This post was last modified: Mar-03-2023, 01:44 PM by Adriaan.)

(Feb-28-2023, 11:21 AM)Larz60+ Wrote:
Adriaan Wrote:I have found on Stackoverflow that this gpxpy is not able to process nested namespaces like <opencpn> here. These gpx-xml are from OpenCPN, my ship navigation tooling. I am tryijng to re-create (edit) some gpx files.
Please look closely at Snippsat's code. He shows how to isolate all <opencpn> tags.

Thanks a lot, the problem is solved. I am able now te read and write these OpenCPN gpx files.
Smile

jnunez · Apr-10-2023, 08:18 AM

(Mar-03-2023, 01:44 PM)Adriaan Wrote:
(Feb-28-2023, 11:21 AM)Larz60+ Wrote: Please look closely at Snippsat's code. He shows how to isolate all <opencpn> tags.

Thanks a lot, the problem is solved. I am able now te read and write these OpenCPN gpx files.

Can you explain how you solved it?

***snippsat*** · Apr-10-2023, 11:37 AM

(Apr-10-2023, 08:18 AM)jnunez Wrote: Can you explain how you solved it?

He used my code in #post-2.
So the problem was finding namespaces like <opencpn>.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('ggx.xml'), 'html.parser')
open_scale = soup.find_all('opencpn:scale_min_max')
# First one
print(open_scale[0])
# The attributes as dictionary
print(open_scale[0].attrs)
print(open_scale[0].get('scalemin'))

Output:<opencpn:scale_min_max scalemin="199999" usescale="true"></opencpn:scale_min_max>
{'usescale': 'true', 'scalemin': '199999'}
199999

As you see i use his GPX-XML in post-1 and copy content to ggx.xml,then show how to parse <opencpn> values.

jnunez · (This post was last modified: Apr-10-2023, 01:34 PM by jnunez.)

(Apr-10-2023, 11:37 AM)snippsat Wrote:
(Apr-10-2023, 08:18 AM)jnunez Wrote: Can you explain how you solved it?
He used my code in #post-2.
So the problem was finding namespaces like <opencpn>.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('ggx.xml'), 'html.parser')
open_scale = soup.find_all('opencpn:scale_min_max')
# First one
print(open_scale[0])
# The attributes as dictionary
print(open_scale[0].attrs)
print(open_scale[0].get('scalemin'))
Output:<opencpn:scale_min_max scalemin="199999" usescale="true"></opencpn:scale_min_max>
{'usescale': 'true', 'scalemin': '199999'}
199999
As you see i use his GPX-XML in post-1 and copy content to ggx.xml,then show how to parse <opencpn> values.

Thanks for the fast reply. Your parser (BeautifulSoup) is more powerful than those I have used: gpxpy 1.5.0 and gpx 0.2.1

GPX and XML parsing

User Panel Messages

Announcements