Python Forum
python one line file processing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python one line file processing
#1
Hi coders,

I have a file stores alerts, and it only stores alerts generated today, alerts before today have been archived to another files with datastamp.

In this file, one line has one alert. First I need to find alert type A, commands like grep will give me a lot of rows which belongs to type A.
Then I need to find if it has a string named "srcip", if not, I just move on to look a new row, if this row has a string named "srcip", then I need to search string "srcport" and "dstip", and store these three variables.
Now I need to search alerts type B, type B also have a lot of rows, but there is a field called "timestamp", type A's "timestamp" should be a few seconds apart type B's, and if the time apart too much, it's not the same, which shouldn't be correlated.
If A's srcip and srcport and dstip is same with type B's, then it's a bingo, and I need to extract "dstport" from type B alert.

The main question I don't know is how to know which rows have already been processed, and only search for the new rows?
Reply
#2
Your question isn't very clear. Say you find A(n), a type A error. You want to find B(n), the matching type B error. Is the file such that B(n) is going to appear in the file after A(n) but before A(n+1), the next type A error? If so, this is easy: Keep track of the last type A you found, and check it against any type B's you find.

If that is not the case, you need to keep track of all the type A's you find (that match your other criteria, of course). I would put them in a list, probably ordered by timestamp. If there's a match, put the match in the output, and remove the matching type A.

Depending on the data, I might use a dictionary. The key would a tuple of (scrip, srcport, dstip), the list would be a list of matching type A errors. Then for a given type B, you could find all of the potentially matching type A's, and check their time stamps.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Aha, my main question I don't know is how to know which rows have already been processed, and only search for the new rows? Does python have some library about this?
Reply
#4
You can process the first time the whole file and use after the iteration of lines the method tell of of the file object, which tells you where you are (at which byte). You can convert the integer to a str and write it to a file. Next time the script looks for this file and if the file is present, it should load the content of the file, convert it back to an int and you use before you start iterating over the lines, you use seek(position) on the file object. Then you have the position, where your script finished last time.

In [20]: with open('birds.txt') as fd: 
    ...:     for line in fd: 
    ...:         print(line.strip()) 
    ...:     print(fd.tell()) 
    ...: #fd.tell() <- file is already closed 
    ...:                                                                                                                                                                                    
2010-01-01 01:01:00.0000 left
2010-11-01 01:01:00.0000 right
2010-10-01 01:01:00.0000 right
91
So, if a program writes now to birds.txt, it starts as byte position 91.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
Amazing! Thank you DeaD_EyE, this is what I need.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  File "<string>", line 19, in <module> error is related to what? Frankduc 9 12,392 Mar-09-2023, 07:22 AM
Last Post: LocklearSusan
  Getting last line of each line occurrence in a file tester_V 1 812 Jan-31-2023, 09:29 PM
Last Post: deanhystad
  Writing string to file results in one character per line RB76SFJPsJJDu3bMnwYM 4 1,306 Sep-27-2022, 01:38 PM
Last Post: buran
  Print to a New Line when Appending File DaveG 0 1,189 Mar-30-2022, 04:14 AM
Last Post: DaveG
  Find and delete above a certain line in text file cubangt 12 3,353 Mar-18-2022, 07:49 PM
Last Post: snippsat
  CSV to Text File and write a line in newline atomxkai 4 2,612 Feb-15-2022, 08:06 PM
Last Post: atomxkai
  Python code to read second line from CSV files and create a master CSV file sh1704 1 2,353 Feb-13-2022, 07:13 PM
Last Post: menator01
  mysql.connector.errors.ProgrammingError: Failed processing format-parameters; Python ilknurg 3 5,465 Jan-18-2022, 06:25 PM
Last Post: ilknurg
  multi-line CMD in one-line python kucingkembar 5 3,861 Jan-01-2022, 12:45 PM
Last Post: kucingkembar
  writelines only writes one line to file gr3yali3n 2 2,295 Dec-05-2021, 10:02 PM
Last Post: gr3yali3n

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020