Nov-01-2018, 04:18 PM
Hi,
Not an avid Python user so probably a simple solution to this but I've yet to find it. I have a string containing a mix of spaces, alphanumerics and special chars. I want to extract text between 2 key tags and assign it to a variable. I'm using re.search to accomplish this as shown below.
_This works_
So I've tried to format the string on input to remove all the chars I think Python doesn't like as shown below:
If anyone can point me in the right direction, I'd appreciate it.
Thanks.
Not an avid Python user so probably a simple solution to this but I've yet to find it. I have a string containing a mix of spaces, alphanumerics and special chars. I want to extract text between 2 key tags and assign it to a variable. I'm using re.search to accomplish this as shown below.
_This works_
import re text = '<dc:title>Flames</dc:title>' m = re.search('<dc:title>(.*)</dc:title>', text) if m: title = m.group(1) print(title)However, trying to search on a much larger, more complex string causes a syntax error as shown below.
import re text = '<dc:title>{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'5', u'TrackDuration': u'0:03:15', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&flags=0&sn=1', u'RelTime': u'0:00:02', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:15">x-sonosapi-hls-static:catalog%2ftracks%2fB07BGCPYBM%2f71090898-695d-4d13-b8ff-4028fccf0a05%2fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%2fA3F2HSL9IRWOWF%2fn%2fPRIME%2f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%2fPRIME_STATION%2f57637924-c963-4136-8d51-e0c887aab1f5%2f?sid=201&flags=0&sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07BGCPYBM%252f71090898-695d-4d13-b8ff-4028fccf0a05%252fbaf8f5ea-7f11-4cfe-b0cd-ce07704b31df%252fA3F2HSL9IRWOWF%252fn%252fPRIME%252f26ac853f-8dad-4dbe-b5aa-2bb2e22df554%252fPRIME_STATION%252f57637924-c963-4136-8d51-e0c887aab1f5%252f%3fsid%3d201%26flags%3d0%26sn%3d1</upnp:albumArtURI><dc:title>Flames</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>David Guetta & Sia</dc:creator><upnp:album>Flames</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}</dc:title>' m = re.search('<dc:title>(.*)</dc:title>', text) if m: title = m.group(1) print(title)The above will return a:
SyntaxError: invalid syntaxon the input string. I'm assuming it's because the input string contains such a mix of characters, ' being a problem I imagine.
So I've tried to format the string on input to remove all the chars I think Python doesn't like as shown below:
import re text = '[36m{u'AbsTime': u'NOT_IMPLEMENTED', u'@xmlns:u': u'urn:schemas-upnp-org:service:AVTransport:1', u'Track': u'1', u'TrackDuration': u'0:03:34', u'TrackURI': u'x-sonosapi-hls-static:catalog%2ftracks%2fB07DLVN8GS%2f%3fplaylistAsin%3dB07JB7Z9YC%26playlistType%3dprimePlaylist?sid=201&flags=65536&sn=1', u'RelTime': u'0:00:18', u'TrackMetaData': u'<DIDL-Lite xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:r="urn:schemas-rinconnetworks-com:metadata-1-0/" xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/"><item id="-1" parentID="-1" restricted="true"><res protocolInfo="sonos.com-http:*:application/x-mpegURL:*" duration="0:03:34">x-sonosapi-hls-static:catalog%2ftracks%2fB07DLVN8GS%2f%3fplaylistAsin%3dB07JB7Z9YC%26playlistType%3dprimePlaylist?sid=201&flags=65536&sn=1</res><r:streamContent></r:streamContent><upnp:albumArtURI>/getaa?s=1&u=x-sonosapi-hls-static%3acatalog%252ftracks%252fB07DLVN8GS%252f%253fplaylistAsin%253dB07JB7Z9YC%2526playlistType%253dprimePlaylist%3fsid%3d201%26flags%3d65536%26sn%3d1</upnp:albumArtURI><dc:title>Girls Like You [Explicit]</dc:title><upnp:class>object.item.audioItem.musicTrack</upnp:class><dc:creator>Maroon 5</dc:creator><upnp:album>Best of Prime Music</upnp:album></item></DIDL-Lite>', u'RelCount': u'2147483647', u'AbsCount': u'2147483647'}' # Delete Python-style comments new_text = re.sub('[!@#:$\']', '', text) print(new_text)However, I'm still not able to get past the error on the input string.
If anyone can point me in the right direction, I'd appreciate it.
Thanks.