Python Forum
Regex not specific enough - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Regex not specific enough (/thread-16977.html)



Regex not specific enough - Clunk_Head - Mar-23-2019

I'm having trouble with regex.
I'm looking for the double quoted string starting with GET

>>> import re
>>> entry = '77.247.22.51 - - [08/Mar/2019:18:29:01 -0700] "GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"'
>>> re.findall('\"GET .+\"', entry)
['"GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"']
I've got it part of the there, but it cannot discern the string that has the word GET in it.

The result that I'm expecting is:
['"GET /access_130930.log HTTP/1.1"']



RE: Regex not specific enough - woooee - Mar-23-2019

Regex is the difficult way to do it

entry = '77.247.22.51 - - [08/Mar/2019:18:29:01 -0700] "GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"'
if '"GET' in entry:
    location=entry.index('"GET ')
    location_2=entry.index('"', location+1)
    print(location, entry[location:location_2+1]) 



RE: Regex not specific enough - DeaD_EyE - Mar-23-2019

Try this regex:
reg = r'"GET \S+'



RE: Regex not specific enough - Clunk_Head - Mar-23-2019

(Mar-23-2019, 05:26 AM)woooee Wrote: Regex is the difficult way to do it

entry = '77.247.22.51 - - [08/Mar/2019:18:29:01 -0700] "GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"'
if '"GET' in entry:
    location=entry.index('"GET ')
    location_2=entry.index('"', location+1)
    print(location, entry[location:location_2+1]) 

I appreciate it but the point is that I'm teaching myself about regex.


RE: Regex not specific enough - ichabod801 - Mar-23-2019

I would use a non-greedy regex:

re.findall('\"GET .+?\"', entry)
Regexes by default will get as much as possible, returning the largest possible match (they're greedy). The ? makes things like + and * stop at the first possible match.


RE: Regex not specific enough - Clunk_Head - Mar-23-2019

(Mar-23-2019, 05:27 AM)DeaD_EyE Wrote: Try this regex:
reg = r'"GET \S+'
This makes a lot of sense to me. I read this as after the string '"GET ', return the first string of characters of size one or more until a space is hit. Is it possible to have this method return the second quotation mark?

(Mar-23-2019, 04:31 PM)ichabod801 Wrote: I would use a non-greedy regex:

re.findall('\"GET .+?\"', entry)
Regexes by default will get as much as possible, returning the largest possible match (they're greedy). The ? makes things like + and * stop at the first possible match.

As usual, ichabod, you have great advise. Unlike DeaD_EyE's solution yours returns the second quotation mark. My follow up question. Is there a situation where your solution would return a different solution than DeaD_EyE's? Other than the second quotation mark, that is.

Thanks to you both.


RE: Regex not specific enough - woooee - Mar-23-2019

Quote: appreciate it but the point is that I'm teaching myself about regex
In the future, put things like this in your post so we don't waste our time with answers you can not use.


RE: Regex not specific enough - Clunk_Head - Mar-23-2019

(Mar-23-2019, 05:42 PM)woooee Wrote:
Quote: appreciate it but the point is that I'm teaching myself about regex
In the future, put things like this in your post so we don't waste our time with answers you can not use.

I specifically asked about regex. There was no ambiguity and no need for chiding.


RE: Regex not specific enough - ichabod801 - Mar-23-2019

First, everyone chill. There was ambiguity. You said you had trouble with a regex. You said nothing about a solution needing to be a regex. Frequently we have to tell people posting here that they are using the wrong solution, such as people trying to use regexes to process HTML. But we have clarity now, let's just move forward.

Second, Dead_Eye's solution can catch the second quote if you put it in there (after the +).

Third, my solution is based on any character, his is based on non-whitespace characters. If there is whitespace before the quote his will not match. If there is a non-whitespace character directly after the quote, his will need a non-greedy operator (?) just like mine did.


RE: Regex not specific enough - DeaD_EyE - Mar-24-2019

I try out my regex here: regex101.com
You can select Python and the syntax is highlighted.