Python Forum

Full Version: unable to replcae a pattern using method re.sub()
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
my question here
I am unable to replace a pattern using the method re.sub()
import re

#initialize string

x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'

#extracting a part of string

b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']

#now I want to remove the extract part so I am using

print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help
text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
regex = r'(<.+\s.+\?>)'
re.sub(regex, 'Replacement for regex', text)
Output:
'Replacement for regex<rule stage="Source">'
re.sub replaces the pattern, which is your regex, with the replacement (2nd argument) of the string (3rd argument).

You can address a group match inside the replacement with \1 - \n

re.sub(regex, r'\1Foo', text)
Output:
'<?xml version="1.0" encoding="UTF-8"?>Foo<rule stage="Source">'
Are you sure, that you want to use regex for this task?
Regex it's not a good choice for parsing html/xml.

You can use instead xml from the stdlib,
which is for my opinion not easier, but it's safer.

Quote:Programmer 1: We have a problem.
Programmer 2: Let’s use RegEx!
Programmer 1: Now we have two problems.
re.sub() takes a regex as it's first parameter.  You're passing it a random string that you got from a different regex.  If you skip the findall() step, you'll have what you're looking for.
>>> import re
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> re.findall('(<.+\s.+\?>)', text)
['<?xml version="1.0" encoding="UTF-8"?>']
>>> re.sub('(<.+\s.+\?>)', '', text)
'<rule stage="Source">'
edit: I was ninja'd :(
(Aug-15-2017, 03:37 PM)DeaD_EyE Wrote: [ -> ]You can use instead xml from the stdlib, which is for my opinion not easier, but it's safer.
That's an option,but i never use parser from stdlib when BeautifulSoup and lxml are better and easier to use.
>>> from bs4 import BeautifulSoup
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> soup = BeautifulSoup(text, 'lxml')
>>> s = soup.find('rule')
>>> s
<rule stage="Source"></rule>
>>> s.attrs
{'stage': 'Source'}
As mention regex is wrong tool for this this,
sure for 1 line it's not problem to use regex.
But HTML/XML are usually a lot lines and many way a regex can break.

This funny answer Shocked  which have been posted many times is worth reading.
I tried to use same method but I'm getting different error.
import re

desc = "fsafdsdf [23]"
reg = r'\[[0-9]?[0-9]\]'
re.sub(re, "hi", desc)
Error:
Traceback (most recent call last): File "E:\PYTHON\123.py", line 6, in <module> re.sub(re, "hi", desc) File "C:\Program Files (x86)\Python37\lib\re.py", line 192, in sub return _compile(pattern, flags).sub(repl, string, count) File "C:\Program Files (x86)\Python37\lib\re.py", line 285, in _compile raise TypeError("first argument must be string or compiled pattern") TypeError: first argument must be string or compiled pattern
could you tell me what is going wrong

(Aug-15-2017, 03:06 PM)Jeevananda Wrote: [ -> ]my question here
I am unable to replace a pattern using the method re.sub()
import re

#initialize string

x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'

#extracting a part of string

b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']

#now I want to remove the extract part so I am using

print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help