Mar-08-2020, 07:48 PM
(This post was last modified: Mar-08-2020, 07:48 PM by interjectdirector.)
I've searched pretty well for a preexisting topic but I can't find anything. I know the answer must be out there, I just don't think I perfectly understand what it is I'm trying to achieve. Learning regex has been a rocky road for me, and although I have the basics mostly down, I can't figure out how to achieve this particular goal.
I am trying to use list comprehension to find a regex match for each index in list
I don't need anyone to post the solution code, I just need help wrapping my head around this concept of regex I'm clearly misunderstanding. Thanks for any help.
EDIT: I should add that the regex I have in my code above matches components of a URL up to the / that defines the path following a domain.
I am trying to use list comprehension to find a regex match for each index in list
firstList
. For each index, the exact matching regex should be written to list secondList
. If there is no matching regex, the index from firstList
will not be written to secondList
. However, I also want this list comprehension to strip the path following the domain name and write it to secondList
(e.g. "https://gmail.com/test123" at firstList[1]
should be written to secondList
as "https://gmail.com/")import re regex = re.compile(r'^http[s]?:\/?\/?([^:\/\s]+)/') firstList = ['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images'] secondList = [i for i in firstList if regex.match(i)] print(firstList) print(secondList)Output:
Output:['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images']
['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch']
As desired, my list comprehension is eliminating list index values that do not have URL components, but it is still including the path following the domain. Why is this? If I use print(re.match(regex, firstList[1]))My output shows the match is only https://gmail.com/ through output
Output:<re.Match object; span=(0, 18), match='https://gmail.com/'>
I understand that my list comprehension method is adding to secondList if there is any regex match at all, but how do I get it to write the match output as seen in re.match
to secondList instead of the entirety of the index that has a match?I don't need anyone to post the solution code, I just need help wrapping my head around this concept of regex I'm clearly misunderstanding. Thanks for any help.
EDIT: I should add that the regex I have in my code above matches components of a URL up to the / that defines the path following a domain.