my question here
I am unable to replace a pattern using the method re.sub()
import re
#initialize string
x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
#extracting a part of string
b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']
#now I want to remove the extract part so I am using
print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help
text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
regex = r'(<.+\s.+\?>)'
re.sub(regex, 'Replacement for regex', text)
Output:
'Replacement for regex<rule stage="Source">'
re.sub replaces the pattern, which is your regex, with the replacement (2nd argument) of the string (3rd argument).
You can address a group match inside the replacement with
\1 - \n
re.sub(regex, r'\1Foo', text)
Output:
'<?xml version="1.0" encoding="UTF-8"?>Foo<rule stage="Source">'
Are you sure, that you want to use regex for this task?
Regex it's not a good choice for parsing html/xml.
You can use instead xml from the stdlib,
which is for my opinion not easier, but it's safer.
Quote:Programmer 1: We have a problem.
Programmer 2: Let’s use RegEx!
Programmer 1: Now we have two problems.
re.sub() takes a regex as it's first parameter. You're passing it a random string that you got from a different regex. If you skip the findall() step, you'll have what you're looking for.
>>> import re
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> re.findall('(<.+\s.+\?>)', text)
['<?xml version="1.0" encoding="UTF-8"?>']
>>> re.sub('(<.+\s.+\?>)', '', text)
'<rule stage="Source">'
edit: I was ninja'd :(
(Aug-15-2017, 03:37 PM)DeaD_EyE Wrote: [ -> ]You can use instead xml from the stdlib, which is for my opinion not easier, but it's safer.
That's an option,but i never use parser from stdlib when
BeautifulSoup and lxml are better and easier to use.
>>> from bs4 import BeautifulSoup
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> soup = BeautifulSoup(text, 'lxml')
>>> s = soup.find('rule')
>>> s
<rule stage="Source"></rule>
>>> s.attrs
{'stage': 'Source'}
As mention regex is wrong tool for this this,
sure for 1 line it's not problem to use regex.
But HTML/XML are usually a lot lines and many way a regex can break.
This funny
answer 
which have been posted many times is worth reading.
I tried to use same method but I'm getting different error.
import re
desc = "fsafdsdf [23]"
reg = r'\[[0-9]?[0-9]\]'
re.sub(re, "hi", desc)
Error:
Traceback (most recent call last):
File "E:\PYTHON\123.py", line 6, in <module>
re.sub(re, "hi", desc)
File "C:\Program Files (x86)\Python37\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Program Files (x86)\Python37\lib\re.py", line 285, in _compile
raise TypeError("first argument must be string or compiled pattern")
TypeError: first argument must be string or compiled pattern
could you tell me what is going wrong
(Aug-15-2017, 03:06 PM)Jeevananda Wrote: [ -> ]my question here
I am unable to replace a pattern using the method re.sub()
import re
#initialize string
x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
#extracting a part of string
b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']
#now I want to remove the extract part so I am using
print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help