Hi,
how can I read the inputfile with Regex and store it into outputfile with python?
## inputfile ##
<Color>blue</Color>
<Code>1</Code>
<Color>red</Color>
<Code>2</Code>
## inputfile ##
## outputfile ##
Color:blue
Code:1
Color:red
Code:2
## outputfile ##
Regular expressions are a bad idea for markup languages. Python has
html and xml support built in, and there are good third party modules like Beautiful Soup and lxml.
I think maybe that's overstating it a little bit.
Just to show one way,it's not normal to parse all input when have markup input.
Think of a web-site of XML doc that can have 100/1000 of tags,then want to narrow it down to find parse info needed.
There had been no problem to solve this specific task with with regex,
it just that when see markup there is a better way which is using a parser.
Also as info Regex is supported for use inside both
BeautifulSoup and
lxml.
Eg:
soup.find_all(re.compile("^b"))
.
//a[re:test(@id, "^hypProduct_[0-9]+$")]
from bs4 import BeautifulSoup
data = '''\
<Color>blue</Color>
<Code>1</Code>
<Color>red</Color >
<Code>2</Code>'''
soup = BeautifulSoup(data, 'lxml')
for item in soup.find_all(['color', 'code']):
print(f'{item.name.capitalize()}:{item.text}')
Output:
Color:blue
Code:1
Color:red
Code:2
ichabod801 Wrote:I think maybe that's overstating it a little bit.
Sure it is,but it's one the best and funny post about this topic.
I have like to that post bye @bobince many times over the years.
Is this possible with python:
1. Search for static Text
2. If 1. found, search from this position for Text between XML-Tags with Regex and write it to file
3. See 1. from current position in file
No. Step 2 is too complicated for a regex to solve.