Python Forum
unable to replcae a pattern using method re.sub()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
unable to replcae a pattern using method re.sub()
#1
my question here
I am unable to replace a pattern using the method re.sub()
import re

#initialize string

x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'

#extracting a part of string

b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']

#now I want to remove the extract part so I am using

print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help
Reply
#2
text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
regex = r'(<.+\s.+\?>)'
re.sub(regex, 'Replacement for regex', text)
Output:
'Replacement for regex<rule stage="Source">'
re.sub replaces the pattern, which is your regex, with the replacement (2nd argument) of the string (3rd argument).

You can address a group match inside the replacement with \1 - \n

re.sub(regex, r'\1Foo', text)
Output:
'<?xml version="1.0" encoding="UTF-8"?>Foo<rule stage="Source">'
Are you sure, that you want to use regex for this task?
Regex it's not a good choice for parsing html/xml.

You can use instead xml from the stdlib,
which is for my opinion not easier, but it's safer.

Quote:Programmer 1: We have a problem.
Programmer 2: Let’s use RegEx!
Programmer 1: Now we have two problems.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
re.sub() takes a regex as it's first parameter.  You're passing it a random string that you got from a different regex.  If you skip the findall() step, you'll have what you're looking for.
>>> import re
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> re.findall('(<.+\s.+\?>)', text)
['<?xml version="1.0" encoding="UTF-8"?>']
>>> re.sub('(<.+\s.+\?>)', '', text)
'<rule stage="Source">'
edit: I was ninja'd :(
Reply
#4
(Aug-15-2017, 03:37 PM)DeaD_EyE Wrote: You can use instead xml from the stdlib, which is for my opinion not easier, but it's safer.
That's an option,but i never use parser from stdlib when BeautifulSoup and lxml are better and easier to use.
>>> from bs4 import BeautifulSoup
>>> text = '<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'
>>> soup = BeautifulSoup(text, 'lxml')
>>> s = soup.find('rule')
>>> s
<rule stage="Source"></rule>
>>> s.attrs
{'stage': 'Source'}
As mention regex is wrong tool for this this,
sure for 1 line it's not problem to use regex.
But HTML/XML are usually a lot lines and many way a regex can break.

This funny answer Shocked  which have been posted many times is worth reading.
Reply
#5
I tried to use same method but I'm getting different error.
import re

desc = "fsafdsdf [23]"
reg = r'\[[0-9]?[0-9]\]'
re.sub(re, "hi", desc)
Error:
Traceback (most recent call last): File "E:\PYTHON\123.py", line 6, in <module> re.sub(re, "hi", desc) File "C:\Program Files (x86)\Python37\lib\re.py", line 192, in sub return _compile(pattern, flags).sub(repl, string, count) File "C:\Program Files (x86)\Python37\lib\re.py", line 285, in _compile raise TypeError("first argument must be string or compiled pattern") TypeError: first argument must be string or compiled pattern
could you tell me what is going wrong

(Aug-15-2017, 03:06 PM)Jeevananda Wrote: my question here
I am unable to replace a pattern using the method re.sub()
import re

#initialize string

x='<?xml version="1.0" encoding="UTF-8"?><rule stage="Source">'

#extracting a part of string

b=re.findall('(<.+\s.+\?>)',x) ##result will be b=['<?xml version="1.0" encoding="UTF-8"?>']

#now I want to remove the extract part so I am using

print(re.sub(b[0],'',x))
# expected result is <rule stage="Source"> how ever I am getting <?xml version="1.0" encoding="UTF-8"?#><rule stage="Source"> please help
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020