Python Forum
unicode within a RE grouping
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
unicode within a RE grouping
#1
Hi

I'm getting an error for the following:
1
2
3
4
import re
pattern = re.compile(r"(?u)\w+")
list = pattern.findall(ur"ñ")
print(list)
Error:
list = pattern.findall(ur"ñ") ^ SyntaxError: invalid syntax
Can anybody suggest what the problem might be ?
Reply
#2
If you look at Python lexical analysis rules, you can see that stringprefix doesn't contain ur as a prefix for strings. So, ur"some_string" is illegal construction in Python.
Reply
#3
It's illegal in python 3 as pointed out bye @scidam.
In python 2 ur prefix was used when needed to combine raw string and Unicode in a regex pattern.
1
2
3
4
5
6
7
8
9
# Python 2.7
>>> import re
>>>
>>> pattern = re.compile(ur"(ñ)")
>>> uni_char = pattern.search(u'helloñ world')
>>> uni_char.group(1)
u'\xf1'
>>> print(uni_char.group(1))
ñ
One of the biggest changes moving to Python 3 was Unicode.
In Python 3 are all strings sequences of Unicode character.
So there is no longer need for u prefix,we will not see u'\xf1'.
r raw string should still always be used in regex patten,because of escape character.
1
2
3
4
# Python 3.6
>>> s = 'ñ'
>>> s
'ñ'
1
2
3
4
5
6
7
# Python 3.6
>>> import re
>>>
>>> pattern = re.compile(r"(ñ)")
>>> uni_char = pattern.search('helloñ world')
>>> uni_char.group(1)
'ñ'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping Candidates with same name coolperson 4 4,135 Jul-12-2019, 07:38 PM
Last Post: coolperson
  column grouping (sum) metalray 2 5,295 Mar-07-2017, 07:15 PM
Last Post: zivoni

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020