Python Forum

my question here
I am unable to replace a pattern using the method re.sub()

import re

#initialize string

x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'

#extracting a part of string

b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']

#now I want to remove the extract part so I am using

print(re.sub(b[0],'',x))

# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help

text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
regex = r'(<.+\s.+\?>)'
re.sub(regex, 'Replacement for regex', text)

Output:
'Replacement for regex<rule stage="Source">'

re.sub replaces the pattern, which is your regex, with the replacement (2nd argument) of the string (3rd argument).

You can address a group match inside the replacement with \1 - \n

re.sub(regex, r'\1Foo', text)

Output:
'<?xml version="1.0" encoding="UTF-8"?>Foo<rule stage="Source">'

Are you sure, that you want to use regex for this task?
Regex it's not a good choice for parsing html/xml.

You can use instead xml from the stdlib,
which is for my opinion not easier, but it's safer.

Quote:Programmer 1: We have a problem.
Programmer 2: Let’s use RegEx!
Programmer 1: Now we have two problems.

re.sub() takes a regex as it's first parameter. You're passing it a random string that you got from a different regex. If you skip the findall() step, you'll have what you're looking for.

>>> import re
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> re.findall('(<.+\s.+\?>)', text)
['<?xml version="1.0" encoding="UTF-8"?>']
>>> re.sub('(<.+\s.+\?>)', '', text)
'<rule stage="Source">'

edit: I was ninja'd :(

(Aug-15-2017, 03:37 PM)DeaD_EyE Wrote: [ -> ]You can use instead xml from the stdlib, which is for my opinion not easier, but it's safer.

That's an option,but i never use parser from stdlib when BeautifulSoup and lxml are better and easier to use.

>>> from bs4 import BeautifulSoup
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> soup = BeautifulSoup(text, 'lxml')
>>> s = soup.find('rule')
>>> s
<rule stage="Source"></rule>
>>> s.attrs
{'stage': 'Source'}

As mention regex is wrong tool for this this,
sure for 1 line it's not problem to use regex.
But HTML/XML are usually a lot lines and many way a regex can break.

This funny answer Shocked

which have been posted many times is worth reading.

I tried to use same method but I'm getting different error.

import re

desc = "fsafdsdf [23]"
reg = r'\[[0-9]?[0-9]\]'
re.sub(re, "hi", desc)

Error:Traceback (most recent call last):
  File "E:\PYTHON\123.py", line 6, in <module>
    re.sub(re, "hi", desc)
  File "C:\Program Files (x86)\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Program Files (x86)\Python37\lib\re.py", line 285, in _compile
    raise TypeError("first argument must be string or compiled pattern")
TypeError: first argument must be string or compiled pattern

could you tell me what is going wrong

(Aug-15-2017, 03:06 PM)Jeevananda Wrote: [ -> ]my question here
I am unable to replace a pattern using the method re.sub()
import re

#initialize string

x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'

#extracting a part of string

b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']

#now I want to remove the extract part so I am using

print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help

Jeevananda

DeaD_EyE

nilamo

snippsat

oneclick