Python Forum

Full Version: regX find XYZ when it occurs after ABC with stuff inbetween?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
given a string with characters, numbers, spaces, symbols... like a web page source

strStuff = "lmnoqABCrstuvwXYZdefghijk)
and ignoring case

how would you create a Regular Expression to return XYZ when it occurs after ABC with stuff in between?

thanks for any help.
I really like using regex pal to test these things out, while you play around with them to find something that works:

Starting with ABC, any number of things that aren't XYZ, and ending with XYZ (there's probably a way to do it with forward-lookahead match groups, but my regex fu isn't that powerful).

Testing it out:
>>> import re
>>> tests = [
... 'lmnoqABCrstuvwXYZdefghijk)',
... 'missing_start_groupXYZ',
... 'missing_end_ABC_group',
... 'missing_both'
... ]
>>> for test in tests:
...   match ="ABC[^XYZ]*(XYZ)", test, re.IGNORECASE)
...   print(match)
...   if match:
...     print(match.groups())
<_sre.SRE_Match object; span=(5, 17), match='ABCrstuvwXYZ'>
(Aug-13-2017, 11:41 PM)Fran_3 Wrote: [ -> ]like a web page source
Web page source is a bad example for regex,
because for HTML/XML should use a parser eg like BeautifulSoup, lxml.
As i have tutorial about here Web-Scraping part-1.
/ABC[^XYZ]*(XYZ)/ fails on 'ABCwZYXwXYZ'. What about /ABC.*?(XYZ)/?
/ABC[^\1]*(XYZ)/ ?

I try to avoid .* if I can, as it doesn't really make it clear right away what you're expecting to happen.
Thanks, guys. I'll play with ichabod801's sample later today.

Meanwhile I came up with...
- using the regx search method and () to end up with groups
- the regx findall method to return tuples

This way I can get to the data I want and ignore the other parts.