Creating new list based on exact regex match in original list

interjectdirector · (This post was last modified: Mar-08-2020, 07:48 PM by interjectdirector.)

I've searched pretty well for a preexisting topic but I can't find anything. I know the answer must be out there, I just don't think I perfectly understand what it is I'm trying to achieve. Learning regex has been a rocky road for me, and although I have the basics mostly down, I can't figure out how to achieve this particular goal.

I am trying to use list comprehension to find a regex match for each index in list firstList. For each index, the exact matching regex should be written to list secondList. If there is no matching regex, the index from firstList will not be written to secondList. However, I also want this list comprehension to strip the path following the domain name and write it to secondList (e.g. "https://gmail.com/test123" at firstList[1] should be written to secondList as "https://gmail.com/")

import re

regex = re.compile(r'^http[s]?:\/?\/?([^:\/\s]+)/')
firstList = ['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images']
secondList = [i for i in firstList if regex.match(i)]
print(firstList)
print(secondList)

Output:

Output:['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch', 'notaurl', '/home/images']
['http:google.com/test', 'https://gmail.com/test123', 'http://youtube.com/watch']

As desired, my list comprehension is eliminating list index values that do not have URL components, but it is still including the path following the domain. Why is this? If I use

print(re.match(regex, firstList[1]))

My output shows the match is only https://gmail.com/ through output

Output:
<re.Match object; span=(0, 18), match='https://gmail.com/'>

I understand that my list comprehension method is adding to secondList if there is any regex match at all, but how do I get it to write the match output as seen in re.match to secondList instead of the entirety of the index that has a match?

I don't need anyone to post the solution code, I just need help wrapping my head around this concept of regex I'm clearly misunderstanding. Thanks for any help.

EDIT: I should add that the regex I have in my code above matches components of a URL up to the / that defines the path following a domain.

**deanhystad** · (This post was last modified: Mar-08-2020, 09:30 PM by deanhystad.)

re.match returns a MatchObject or None. Your code is taking the "string" attribute of the MatchObject and adding that to the list. The string attribute is the string passed to match. Take a look at MatchObject and see how it can provide the string you really want.

This may do what you want:

secondList = [i.group(0) for i in firstList if regex.match(i)]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	unable to remove all elements from list based on a condition	sg_python	3	451	Jan-27-2024, 04:03 PM Last Post: deanhystad
	Facing issue in python regex newline match	Shr	6	1,319	Oct-25-2023, 09:42 AM Last Post: Shr
	No matter what I do I get back "List indices must be integers or slices, not list"	Radical	4	1,175	Sep-24-2023, 05:03 AM Last Post: deanhystad
	Move Files based on partial Match	mohamedsalih12	2	832	Sep-20-2023, 07:38 PM Last Post: snippsat
	in this code, I input Key_word, it can not find although all data was exact Help me!	duchien04x4	3	1,054	Aug-31-2023, 05:36 PM Last Post: deanhystad
	Delete strings from a list to create a new only number list	Dvdscot	8	1,547	May-01-2023, 09:06 PM Last Post: deanhystad
	Failing regex, space before and after the "match"	tester_V	6	1,199	Mar-06-2023, 03:03 PM Last Post: deanhystad
	List all possibilities of a nested-list by flattened lists	sparkt	1	922	Feb-23-2023, 02:21 PM Last Post: sparkt
	Regex pattern match	WJSwan	2	1,276	Feb-07-2023, 04:52 AM Last Post: WJSwan
	Split pdf in pypdf based upon file regex	standenman	1	2,100	Feb-03-2023, 12:01 PM Last Post: SpongeB0B

Creating new list based on exact regex match in original list

User Panel Messages

Announcements