string parsing with re.search()

delahug · Jun-03-2020, 09:23 PM

(Jun-03-2020, 01:35 PM)snippsat Wrote: There is no u'' in Python 3,so follow advice over.

# Python 3.8
>>> s = u'\xbd' 
>>> s
'½'

# Can remove <u> make no difference
>>> s = '\xbd' 
>>> s
'½'

# Python 2.7
>>> s = u'\xbd' 
>>> s
u'\xbd'
>>> s.encode()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in range(128)

# Try the obvious one  first  
>>> s.encode('utf-8')
'\xc2\xbd'
>>> print(s.encode('utf-8'))
Â½

# Make a guess
>>> print(s.encode('latin-1'))
½

On of the biggest changes moving to Python 3 was to make Unicode better Wink

Thanks for your help.

But I don't get where this will fit in my code?

Specifically what I am looking at is this:


2½
[5½]


I want what's in the square brackets, within the second nested span.
If I grab the whole lot by referencing the span class, I then run into the problem above when using re.search() on the square bracket. It's caused (apparently) by the fraction in the first span.

Can I get at the second span directly?

thanks

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[Learning:bs4, re.search] - RegEx string cutoff	jarmerfohn	5	3,761	Nov-23-2019, 09:32 AM Last Post: buran
	Regex search for string	DBS	3	4,624	Feb-06-2017, 11:39 PM Last Post: Ofnuts

string parsing with re.search()

User Panel Messages

Announcements