regX find XYZ when it occurs after ABC with stuff inbetween? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: regX find XYZ when it occurs after ABC with stuff inbetween? (/thread-4398.html) |
regX find XYZ when it occurs after ABC with stuff inbetween? - Fran_3 - Aug-13-2017 given a string with characters, numbers, spaces, symbols... like a web page source strStuff = "lmnoqABCrstuvwXYZdefghijk) and ignoring case how would you create a Regular Expression to return XYZ when it occurs after ABC with stuff in between? thanks for any help. RE: regX find XYZ when it occurs after ABC with stuff inbetween? - nilamo - Aug-14-2017 /ABC[^XYZ]*(XYZ)/ I really like using regex pal to test these things out, while you play around with them to find something that works: http://www.regexpal.com/ Starting with ABC, any number of things that aren't XYZ, and ending with XYZ (there's probably a way to do it with forward-lookahead match groups, but my regex fu isn't that powerful). Testing it out: >>> import re >>> tests = [ ... 'lmnoqABCrstuvwXYZdefghijk)', ... 'missing_start_groupXYZ', ... 'missing_end_ABC_group', ... 'missing_both' ... ] >>> for test in tests: ... match = re.search("ABC[^XYZ]*(XYZ)", test, re.IGNORECASE) ... print(match) ... if match: ... print(match.groups()) ... <_sre.SRE_Match object; span=(5, 17), match='ABCrstuvwXYZ'> ('XYZ',) None None None >>> RE: regX find XYZ when it occurs after ABC with stuff inbetween? - snippsat - Aug-14-2017 (Aug-13-2017, 11:41 PM)Fran_3 Wrote: like a web page sourceWeb page source is a bad example for regex, because for HTML/XML should use a parser eg like BeautifulSoup, lxml. As i have tutorial about here Web-Scraping part-1. RE: regX find XYZ when it occurs after ABC with stuff inbetween? - ichabod801 - Aug-14-2017 /ABC[^XYZ]*(XYZ)/ fails on 'ABCwZYXwXYZ'. What about /ABC.*?(XYZ)/? RE: regX find XYZ when it occurs after ABC with stuff inbetween? - nilamo - Aug-14-2017 /ABC[^\1]*(XYZ)/ ? I try to avoid .* if I can, as it doesn't really make it clear right away what you're expecting to happen. RE: regX find XYZ when it occurs after ABC with stuff inbetween? - Fran_3 - Aug-14-2017 Thanks, guys. I'll play with ichabod801's sample later today. Meanwhile I came up with... - using the regx search method and () to end up with groups or - the regx findall method to return tuples This way I can get to the data I want and ignore the other parts. |