Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex not specific enough
#1
I'm having trouble with regex.
I'm looking for the double quoted string starting with GET

>>> import re
>>> entry = '77.247.22.51 - - [08/Mar/2019:18:29:01 -0700] "GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"'
>>> re.findall('\"GET .+\"', entry)
['"GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"']
I've got it part of the there, but it cannot discern the string that has the word GET in it.

The result that I'm expecting is:
['"GET /access_130930.log HTTP/1.1"']
Reply
#2
Regex is the difficult way to do it

entry = '77.247.22.51 - - [08/Mar/2019:18:29:01 -0700] "GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"'
if '"GET' in entry:
    location=entry.index('"GET ')
    location_2=entry.index('"', location+1)
    print(location, entry[location:location_2+1]) 
Reply
#3
Try this regex:
reg = r'"GET \S+'
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
(Mar-23-2019, 05:26 AM)woooee Wrote: Regex is the difficult way to do it

entry = '77.247.22.51 - - [08/Mar/2019:18:29:01 -0700] "GET /access_130930.log HTTP/1.1" 404 73 "http://finasteridrabatt.npage.de/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0" "redlug.com"'
if '"GET' in entry:
    location=entry.index('"GET ')
    location_2=entry.index('"', location+1)
    print(location, entry[location:location_2+1]) 

I appreciate it but the point is that I'm teaching myself about regex.
Reply
#5
I would use a non-greedy regex:

re.findall('\"GET .+?\"', entry)
Regexes by default will get as much as possible, returning the largest possible match (they're greedy). The ? makes things like + and * stop at the first possible match.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#6
(Mar-23-2019, 05:27 AM)DeaD_EyE Wrote: Try this regex:
reg = r'"GET \S+'
This makes a lot of sense to me. I read this as after the string '"GET ', return the first string of characters of size one or more until a space is hit. Is it possible to have this method return the second quotation mark?

(Mar-23-2019, 04:31 PM)ichabod801 Wrote: I would use a non-greedy regex:

re.findall('\"GET .+?\"', entry)
Regexes by default will get as much as possible, returning the largest possible match (they're greedy). The ? makes things like + and * stop at the first possible match.

As usual, ichabod, you have great advise. Unlike DeaD_EyE's solution yours returns the second quotation mark. My follow up question. Is there a situation where your solution would return a different solution than DeaD_EyE's? Other than the second quotation mark, that is.

Thanks to you both.
Reply
#7
Quote: appreciate it but the point is that I'm teaching myself about regex
In the future, put things like this in your post so we don't waste our time with answers you can not use.
Reply
#8
(Mar-23-2019, 05:42 PM)woooee Wrote:
Quote: appreciate it but the point is that I'm teaching myself about regex
In the future, put things like this in your post so we don't waste our time with answers you can not use.

I specifically asked about regex. There was no ambiguity and no need for chiding.
Reply
#9
First, everyone chill. There was ambiguity. You said you had trouble with a regex. You said nothing about a solution needing to be a regex. Frequently we have to tell people posting here that they are using the wrong solution, such as people trying to use regexes to process HTML. But we have clarity now, let's just move forward.

Second, Dead_Eye's solution can catch the second quote if you put it in there (after the +).

Third, my solution is based on any character, his is based on non-whitespace characters. If there is whitespace before the quote his will not match. If there is a non-whitespace character directly after the quote, his will need a non-greedy operator (?) just like mine did.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#10
I try out my regex here: regex101.com
You can select Python and the syntax is highlighted.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Delete specific lines contain specific words mannyi 2 4,064 Nov-04-2019, 04:50 PM
Last Post: mannyi

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020