The regex
It will match also
You've to escape the dot with backslash:
You should put your code in a function, then you use yield instead of return and then you have a generator.
If you don't want a generator, you need two lines more:
To add an element to a set, you have to use the
A list has
r"(.*).js"
has a mistake.It will match also
foojs
, because the dot represent all chars.You've to escape the dot with backslash:
r"(.*)\n.js"
You should put your code in a function, then you use yield instead of return and then you have a generator.
import re def log_reader(file): with open(file) as fd: for line in fd: if re.match("(.*)\.js", line): yield line.split()[6].split('/')[2] my_reader = log_reader('/home/kali/Desktop/access_log.txt') # nothing happens # generator evaluates lazy # consume the generator paths = set(my_reader) # unique elements # paths has now elements and my_reader is exhausted / empty print(paths) # sort unique paths print(sorted(paths))You can solve it also without regex:
def read_log(file, allowed_method=None): # use a contextmanger with open(file) as fd: # fd is a iterator and it iterates lines # line end is not stripped for line in fd: # splitting the log line by " brings a good result _, request, *_ = line.split('"') # the request is in the second field # _ are placeholder for throw away object # *_ consumes the rest of the elements # request is what you need meth, path, proto = request.split() # A request consists of: Method, Path, Protocol-Version # # Evaluate allowed_method first # if it's None, the second part after the end is not evaluated # this allows to set allowed_method to None to # skip this check if allowed_method and meth.upper() != allowed_method: continue # otherwise continue, if the method is a different if path.endswith(".js"): yield path.rsplit("/", 1)[-1]Accessing the generator:
log_file = "access.log" js_files = sorted(set(read_log(log_file))) # first set consumes the generator read_log # then sorted consumes set # sorted returns a sorted listAnd if you need to do something with your data for each file:
for js_file in js_files: print(js_file) # code ...
If you don't want a generator, you need two lines more:
def log_reader(file): results = set() with open(file) as fd: for line in fd: if re.match("(.*)\.js", line): results.add( line.split()[6].split('/')[2] ) return sorted(results)In this case I return a unique sorted list instead of a generator.
To add an element to a set, you have to use the
add
method.A list has
append
to add en element.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!