Python Forum

Full Version: Invalid syntax on input string
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

Not an avid Python user so probably a simple solution to this but I've yet to find it. I have a string containing a mix of spaces, alphanumerics and special chars. I want to extract text between 2 key tags and assign it to a variable. I'm using re.search to accomplish this as shown below.

_This works_
import re

text = '<dc:title>Flames</dc:title>'

m = re.search('<dc:title>(.*)</dc:title>', text)
if m:
    title = m.group(1)
    print(title)
However, trying to search on a much larger, more complex string causes a syntax error as shown below.

import re

text = '<dc:title>{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'5', u'TrackDuration': u'0:03:15', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&flags=0&sn=1', u'RelTime': u'0:00:02', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:15">x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&amp;flags=0&amp;sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&amp;u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07BGCPYBM%252f71090898-695d-4d13-b8ff-4028fccf0a05%252fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%252fA3F2HSL9IRWOWF%252fn%252fPRIME%252f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%252fPRIME_STATION%252f57637924-c963-4136-8d51-e0c887aab1f5%252f%3fsid%3d201%26flags%3d0%26sn%3d1</upnp:albumArtURI><dc:title>Flames</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>David Guetta &amp; Sia</dc:creator><upnp:album>Flames</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}</dc:title>'

m = re.search('<dc:title>(.*)</dc:title>', text)
if m:
    title = m.group(1)
    print(title)
The above will return a:
SyntaxError: invalid syntax
on the input string. I'm assuming it's because the input string contains such a mix of characters, ' being a problem I imagine.

So I've tried to format the string on input to remove all the chars I think Python doesn't like as shown below:

import re

text = '{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'1', u'TrackDuration': u'0:03:34', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07DLVN8GS%2f%3fplaylistAsin%3dB07JB7Z9YC%26playlistType%3dprimePlaylist?sid=201&flags=65536&sn=1', u'RelTime': u'0:00:18', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:34">x-sonosapi-hls-static:catalog%2ftracks%2fB07DLVN8GS%2f%3fplaylistAsin%3dB07JB7Z9YC%26playlistType%3dprimePlaylist?sid=201&amp;flags=65536&amp;sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&amp;u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07DLVN8GS%252f%253fplaylistAsin%253dB07JB7Z9YC%2526playlistType%253dprimePlaylist%3fsid%3d201%26flags%3d65536%26sn%3d1</upnp:albumArtURI><dc:title>Girls Like You [Explicit]</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>Maroon 5</dc:creator><upnp:album>Best of Prime Music</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}'

# Delete Python-style comments
new_text = re.sub('[!@#:$\']', '', text)

print(new_text)
However, I'm still not able to get past the error on the input string.

If anyone can point me in the right direction, I'd appreciate it.

Thanks.
(Nov-01-2018, 04:18 PM)Callahan Wrote: [ -> ]
text = '<dc:title>{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'5', u'TrackDuration': u'0:03:15', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&flags=0&sn=1', u'RelTime': u'0:00:02', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:15">x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&amp;flags=0&amp;sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&amp;u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07BGCPYBM%252f71090898-695d-4d13-b8ff-4028fccf0a05%252fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%252fA3F2HSL9IRWOWF%252fn%252fPRIME%252f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%252fPRIME_STATION%252f57637924-c963-4136-8d51-e0c887aab1f5%252f%3fsid%3d201%26flags%3d0%26sn%3d1</upnp:albumArtURI><dc:title>Flames</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>David Guetta &amp; Sia</dc:creator><upnp:album>Flames</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}</dc:title>'

Allow me to rewrite your string, as the interpreter sees it:
text = '<dc:title>{u'nonsense
Our syntax highlighter makes that clear, and the syntax error you got probably had more details that said something similar. If you want to include quotes in your string, you could triple quote it, so any inclusive quotes don't end the string. Check this out:
text = '''<dc:title>{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'5', u'TrackDuration': u'0:03:15', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&flags=0&sn=1', u'RelTime': u'0:00:02', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:15">x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&amp;flags=0&amp;sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&amp;u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07BGCPYBM%252f71090898-695d-4d13-b8ff-4028fccf0a05%252fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%252fA3F2HSL9IRWOWF%252fn%252fPRIME%252f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%252fPRIME_STATION%252f57637924-c963-4136-8d51-e0c887aab1f5%252f%3fsid%3d201%26flags%3d0%26sn%3d1</upnp:albumArtURI><dc:title>Flames</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>David Guetta &amp; Sia</dc:creator><upnp:album>Flames</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}</dc:title>'''
Use triple quotes in order to fix the Invalid Syntax error.
import re
 
text = """<dc:title>{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'5', u'TrackDuration': u'0:03:15', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&flags=0&sn=1', u'RelTime': u'0:00:02', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:15">x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&amp;flags=0&amp;sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&amp;u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07BGCPYBM%252f71090898-695d-4d13-b8ff-4028fccf0a05%252fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%252fA3F2HSL9IRWOWF%252fn%252fPRIME%252f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%252fPRIME_STATION%252f57637924-c963-4136-8d51-e0c887aab1f5%252f%3fsid%3d201%26flags%3d0%26sn%3d1</upnp:albumArtURI><dc:title>Flames</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>David Guetta &amp; Sia</dc:creator><upnp:album>Flames</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}</dc:title>"""
 
m = re.search('<dc:title>(.*)</dc:title>', text)
if m:
    title = m.group(1)
    print(title)
also, when post traceback, always post the full traceback, not just the last line. In this case it was easy to spot the problem, but in more complex code the traceback has valuable information
Ouch, simple solution. Rolleyes

However, despite fixing the invalid syntax issue (thanks for that), it doesn't work the way the more simple

import re
 
text = """<dc:title>Flames</dc:title>"""
 
m = re.search('<dc:title>(.*)</dc:title>', text)
if m:
    title = m.group(1)
    print(title)
does. The script just returns the complete input string rather than the string between the tags.
https://www.regexpal.com/ says your regex is fine (though I would have escaped the angle brackets, since that's regex syntax), so you should be getting results. What is the result of print(m.groups())?
I can also confirm you regex works the same in both cases