Creating new list based on exact regex match in original list - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Creating new list based on exact regex match in original list (/thread-24879.html) |
Creating new list based on exact regex match in original list - interjectdirector - Mar-08-2020 I've searched pretty well for a preexisting topic but I can't find anything. I know the answer must be out there, I just don't think I perfectly understand what it is I'm trying to achieve. Learning regex has been a rocky road for me, and although I have the basics mostly down, I can't figure out how to achieve this particular goal. I am trying to use list comprehension to find a regex match for each index in list firstList . For each index, the exact matching regex should be written to list secondList . If there is no matching regex, the index from firstList will not be written to secondList . However, I also want this list comprehension to strip the path following the domain name and write it to secondList (e.g. "https://gmail.com/test123" at firstList[1] should be written to secondList as "https://gmail.com/")import re regex = re.compile(r'^http[s]?:\/?\/?([^:\/\s]+)/') firstList = ['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images'] secondList = [i for i in firstList if regex.match(i)] print(firstList) print(secondList)Output: As desired, my list comprehension is eliminating list index values that do not have URL components, but it is still including the path following the domain. Why is this? If I use print(re.match(regex, firstList[1]))My output shows the match is only https://gmail.com/ through output I understand that my list comprehension method is adding to secondList if there is any regex match at all, but how do I get it to write the match output as seen in re.match to secondList instead of the entirety of the index that has a match?I don't need anyone to post the solution code, I just need help wrapping my head around this concept of regex I'm clearly misunderstanding. Thanks for any help. EDIT: I should add that the regex I have in my code above matches components of a URL up to the / that defines the path following a domain. RE: Creating new list based on exact regex match in original list - deanhystad - Mar-08-2020 re.match returns a MatchObject or None. Your code is taking the "string" attribute of the MatchObject and adding that to the list. The string attribute is the string passed to match. Take a look at MatchObject and see how it can provide the string you really want. This may do what you want: secondList = [i.group(0) for i in firstList if regex.match(i)] |