Python Forum

Full Version: [gpxpy] "Error parsing XML: not well-formed (invalid token): line 1, column 1"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

For some reason,the gpxpy library isn't happy parsing the following very basic GPX (XML) file:

<?xml version="1.0" encoding="UTF-8"?>
<gpx 
 xmlns="http://www.topografix.com/GPX/1/1" 
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
 xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" 
 creator="BRouter-1.5.5" version="1.1">
  <trk>
    <trkseg>
      <trkpt lat="46.718843188136816" lon="-2.339932657778263">
        <ele>0</ele>
      </trkpt>
    </trkseg>
  </trk>
</gpx>


import gpxpy 
import gpxpy.gpx 

f = open('input.gpx', 'r')
"""
Traceback (most recent call last):
gpxpy.gpx.GPXXMLSyntaxException: Error parsing XML: not well-formed (invalid token): line 1, column 1
"""
gpx = gpxpy.parse(f)
Googles shows that some users had the same issue, but offered no solution. Any idea what it could be?

Thank you.
Hi :)
Unfortunately I could not reproduce the error. I copied your code and created a gpx file like you did and on my system everything worked perfectly. I used Python3.8 and gpxpy v.1.4.0.
What Python and gpxpy version are you using?
this package was just released 9 days ago.
If you find issues, you should probably contact the author and let him know
[email protected]
Python 3.7.0 and gpxpy 0.9.8.

Turns out it's an old version, so I ran "pip install gpxpy --upgrade"… but still no go: Other GPX files in the series show the same problem. I wonder if it could be some unprintable character that's keeping gpxpy from parsing them correcly :-/

BTW, Google didn't return any example of how to read and handle <trk> items from a GPX file. Do you know how it's done ?

f = open('input.gpx', 'r')
gpx = gpxpy.parse(f)

#File could have more than one <trk>
for track in gpx.tracks:
	#How to merge multiple trk's into a single GPX file?

	print(track)
	with open("merged.gpx", 'a') as tempf:
		#TypeError: write() argument must be str, not GPXTrack
		#tempf.write(track)
		tempf.close
exit()
Found it: They all contain "EF BB BF" at the top, like all UTF-8 files are supposed to be.

Removing those bytes solved the issue.

HTH,
For those needing to remove the BOM from UTF8 files:

import os
import sys
from glob import glob

for filename in glob("*.GPX"):
	basicname, file_extension = os.path.splitext(filename)
	#print(basicname,file_extension)
	s = open(filename, mode='r', encoding='utf-8-sig').read()
	open(f"{basicname}_NOBOM{file_extension}", mode='w', encoding='utf-8').write(s)