help with project of reading and searching big log file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: help with project of reading and searching big log file (/thread-34079.html) |
help with project of reading and searching big log file - korenron - Jun-24-2021 Hello , I have a log file that in the end of the day get to ~ 6GB of text now I want to be able to cut from it a certion windows of time for example from 08:00:00 -- until 08:15:00 I have checked and in 15 min I have a around 1.5 milion lines (1,500,000) when I run the code in the morning , when the log file is less then 1GB - everything is working . when I run the code in the end of the day (when the log is more then 5GB) It get stuck , sometime I get on my computer Memory error and when I try to search another later window (7:00pm-7:20pm ) it can take more then 3 min before it get stuck my question is what can I do to make this run better ? faster ? can pythion handale this amount of data? this is the function def FilterLogFile(StartDate, EndDate): StartDate = datetime.datetime.strptime(StartDate, '%d/%m/%Y-%H:%M:%S') EndDate = datetime.datetime.strptime(EndDate, '%d/%m/%Y-%H:%M:%S') EndDate = EndDate.strftime('%d/%m/%Y-%H:%M:%S') StartDate = StartDate.strftime('%d/%m/%Y-%H:%M:%S') StartDate = str(StartDate) EndDate = str(EndDate) print(StartDate) print(EndDate) count = 0 StartLine = 0 EndLine = 0 FullLogFile = open('/home/pi/logs/java.txt', 'r') Lines = FullLogFile.readlines() ###------->>>> this part take to much time when it doens't stuck "Memory Error" FullLogFile.close() for line in Lines: count += 1 if StartDate in line and StartLine == 0: print("Start Line {}: {}".format(count, line.strip())) StartLine = count if EndDate in line and EndLine == 0: print("End Line {}: {}".format(count, line.strip())) EndLine = count if StartLine != 0 and EndLine != 0: break ## to stop the scan when he get to the wanted end time , no need to scan after the wanted time count = 0 print('start line is %d , end line is %d' % (StartLine, EndLine)) print('total number of line is %d' % (EndLine - StartLine)) with open(OutputFile, 'w') as f: for line in Lines: count += 1 if StartLine <= count <= EndLine: f.write(line.strip() + "\r\n") return OutputFileThanks, maybe to read RE: help with project of reading and searching big log file - Gribouillis - Jun-24-2021 There are many things to do to improve the code. First steps:
RE: help with project of reading and searching big log file - korenron - Jun-24-2021 up until here I understand FullLogFile = open('C:\\Users\\David\\Desktop\\java.txt', 'r') # Lines = FullLogFile.readlines() # FullLogFile.close() for line, count in enumerate(FullLogFile, 1): #count += 1 if StartDate in line and StartLine == 0: print("Start Line {}: {}".format(count, line.strip())) StartLine = count if EndDate in line and EndLine == 0: print("End Line {}: {}".format(count, line.strip())) EndLine = count count = 0 print('start line is %d , end line is %d' % (StartLine, EndLine)) print('total number of line is %d' % (EndLine - StartLine))can you explain this line? before the second pass, use FullLogFile.seek(0) to go back to the beginning of the file, then again use for line in FullLogFile:coudn't understand what you meant THank you , RE: help with project of reading and searching big log file - Gribouillis - Jun-24-2021 NB: the indentation is wrong in the new code that you showed above. When an open file is read, there is an internal cursor in the file object pointing to the 'current position' in the file, exactly like there is a current page when you are reading a book. When Python reads the next line, it does it from this current position. If you call FullLogFile.seek(0)the current position goes back at the beginning of the file and you can start again reading lines from the beginning of the file. This allows you to run the second for loop over the same file. RE: help with project of reading and searching big log file - korenron - Jun-24-2021 OK now I (think) understand the use of seek but why is indentation the wrong? I running the for loop until he find the end time , then he write the cerrnet text into my Output file,no? this is what I have now (if I understand you correct) def FilterLogFile(StartDate, EndDate): StartDate = datetime.datetime.strptime(StartDate, '%d/%m/%Y-%H:%M:%S') EndDate = datetime.datetime.strptime(EndDate, '%d/%m/%Y-%H:%M:%S') EndDate = EndDate.strftime('%d/%m/%Y-%H:%M:%S') StartDate = StartDate.strftime('%d/%m/%Y-%H:%M:%S') StartDate = str(StartDate) EndDate = str(EndDate) print(StartDate) print(EndDate) count = 0 StartLine = 0 EndLine = 0 FullLogFile = open('C:\\Users\\David\\Desktop\\java.txt', 'r') for line, count in enumerate(FullLogFile, 1): if StartDate in line and StartLine == 0: print("Start Line {}: {}".format(count, line.strip())) StartLine = count if EndDate in line and EndLine == 0: print("End Line {}: {}".format(count, line.strip())) EndLine = count if StartLine != 0 and EndLine != 0: break ## to stop the scan when he get to the wanted end time count = 0 print('start line is %d , end line is %d' % (StartLine, EndLine)) print('total number of line is %d' % (EndLine - StartLine)) FullLogFile.seek(0) ## return to first line in the text file with open(OutputFile, 'w') as f: for line in FullLogFile: count += 1 if StartLine <= count <= EndLine: f.write(line.strip() + "\r\n") return OutputFileI get error argument of type 'int' is not iterablewhen he enter the first loop why line is "int" and not "string"? RE: help with project of reading and searching big log file - Gribouillis - Jun-24-2021 Sorry, it should be for count, line instead of for line, count
RE: help with project of reading and searching big log file - korenron - Jun-24-2021 OK - great! now it's seem to be working faser , I don't get any memory error , the log file is ~ 2.1GB I will wait until he will be around ~ 5GB and see the result thank you so much for the help until now |