Python Forum

Full Version: Using xml.parsers.expat I can't get parser.ParseFile(f) to work
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This code comes from Dave Beazly and it is a bit old, Python 2.something. I am trying to get it to work in Python 3.

This is from Part 3 : Coroutines and Event Dispatching, coexpat.py

Up to now, everything worked.

I have this function and various others, all available on the link above.

def expat_parse(f,target):
    parser = xml.parsers.expat.ParserCreate()
    parser.buffer_size = 65536
    parser.buffer_text = True
    #parser.returns_unicode = False
    parser.StartElementHandler = \
       lambda name,attrs: target.send(('start',(name,attrs)))
    parser.EndElementHandler = \
       lambda name: target.send(('end',name))
    parser.CharacterDataHandler = \
       lambda data: target.send(('text',data))
    parser.ParseFile(f)
When I enter as per the given code:

expat_parse(open(xmlfile), buses_to_dicts(filter_on_field("route","22", filter_on_field("direction","North Bound", bus_locations()))))
I get this error:

Quote:Traceback (most recent call last):
File "/usr/lib/python3.10/idlelib/run.py", line 578, in runcode
exec(code, self.locals)
File "<pyshell#14>", line 1, in <module>
File "<pyshell#13>", line 12, in expat_parse
TypeError: read() did not return a bytes object (type=str)

But read() should return a string, not bytes, I believe.

The docs say:

Quote:xmlparser.ParseFile(file)
Parse XML data reading from the object file. file only needs to provide the read(nbytes) method, returning the empty string when there’s no more data.

If I do this there is no problem:

Quote:data = open(xmlfile)
type(data)
<class '_io.TextIOWrapper'>
mystring = data.read()
type(mystring)
<class 'str'>

So now I don't know what the problem is, or what file in xmlparser.ParseFile(file) should be!

I thought maybe the xml file is corrupt, but I can open it and .read() it to a string (13817 lines) and it displays in my browser ok.

What does xmlparser.ParseFile(f) want for f?

Is the module too old?
Have you tried opening in binary mode?
open(xmlfile, 'rb')
(Apr-24-2024, 05:33 AM)Pedroski55 Wrote: [ -> ]What does xmlparser.ParseFile(f) want for f?

Is the module too old?
Still work with some changes over to Python 3,as mention bye Gribouillis need 'rb'
# coexpat.py
#
# An example of pushing XML events generated by the low-level expat
# XML library into coroutines.

import xml.parsers.expat

def expat_parse(f,target):
    parser = xml.parsers.expat.ParserCreate()
    parser.buffer_size = 65536
    parser.buffer_text = True
    # parser.returns_unicode = False
    parser.StartElementHandler = \
       lambda name,attrs: target.send(('start',(name,attrs)))
    parser.EndElementHandler = \
       lambda name: target.send(('end',name))
    parser.CharacterDataHandler = \
       lambda data: target.send(('text',data))
    parser.ParseFile(f)

# Example.  This uses the bus processing code from earlier with no changes.

if __name__ == '__main__':
    from buses import *

    with open("allroutes.xml", 'rb') as file:
        expat_parse(file,
            buses_to_dicts(
            filter_on_field("route", "22",
            filter_on_field("direction", "North Bound",
            bus_locations()))))
Output:
22,1485,"North Bound",41.880481123924255,-87.62948191165924 22,1629,"North Bound",42.01851969751819,-87.6730209876751 22,1489,"North Bound",41.962393500588156,-87.66610128229314 .....
Other changes in the two files this code calles,is just over to print() function.
Eg in coroutine.py line 21 print line, is in Python 3 print(line, end='').
Thanks!

Worked first time with 'rb'!