Python Forum

Full Version: Extract parts of a log-file and put it in a dataframe
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi

I'm a python beginner and would like to extract parts of a log-file and put it in a dataframe.
I tryed something, but it is not what I want.

Log-File content looks like this:

Output:
---------------------------------------------------------- Model: Hamilton-C1 S/N: 25576 Export timestamp: 2020-09-17_11-03-40 SW-Version: 2.2.9
I want extract only Hamilton-C1, 25576, 2020-09-17_11-03-40, 2.2.9

#list
result = []

#function
def appendlines(line, result, word):
  if line.startswith(word):
    del result[:]
  result.append(line)
  return line, result

with open(file, "r") as lines: 
  for line in lines:              
    appendlines(line, result, ":")
new_result = [line.split() for line in result[1:5]]

print(new_result)
Output:
[['Model:', 'Hamilton-C1'], ['S/N:', '25455'], ['Export', 'timestamp:', '2020-09-16_21-12-40'], ['SW-Version:', '2.2.9']]
I want only this output:
Output:
[['Hamilton-C1'], ['25455'], ['2020-09-16_21-12-40'], ['2.2.9']]
How I have to change my code?

Thanks for help!
This makes a dictionary where "Model" is a key and "Hamilton-C1" the value. You can use dictionary operations to get the keys or the values, or get the value associated with a key.
with open("data.txt", "r") as file:
    items = {}
    for line in file:
        if ":" in line:
            a, b = map(str.strip, line.split(":"))
            items[a] = b

print(items)
print(*items.keys())
print(*items.values())
Output:
{'Model': 'Hamilton-C1', 'S/N': '25576', 'Export timestamp': '2020-09-17_11-03-40', 'SW-Version': '2.2.9'} Model S/N Export timestamp SW-Version Hamilton-C1 25576 2020-09-17_11-03-40 2.2.9
If you have no interest in a dictionary make items a list and append items.
with open("data.txt", "r") as file:
    items = []
    for line in file:
        if ":" in line:
            a, b = map(str.strip, line.split(":"))
            items.append(a)

print(items)
\
Output:
['Model', 'S/N', 'Export timestamp', 'SW-Version']
If you want each item in the list to be a list do this.
with open("data.txt", "r") as file:
    items = []
    for line in file:
        if ":" in line:
            a, b = map(str.strip, line.split(":"))
            items.append([b])

print(items)
Output:
[['Hamilton-C1'], ['25576'], ['2020-09-17_11-03-40'], ['2.2.9']]
If every line in you log file is in the form "name: value" you could open this as a CSV file using ":" as the delimiter, or you could read it using pandas.
(Apr-05-2022, 08:32 PM)deanhystad Wrote: [ -> ]This makes a dictionary where "Model" is a key and "Hamilton-C1" the value. You can use dictionary operations to get the keys or the values, or get the value associated with a key.
with open("data.txt", "r") as file:
    items = {}
    for line in file:
        if ":" in line:
            a, b = map(str.strip, line.split(":"))
            items[a] = b

print(items)
print(*items.keys())
print(*items.values())
Output:
{'Model': 'Hamilton-C1', 'S/N': '25576', 'Export timestamp': '2020-09-17_11-03-40', 'SW-Version': '2.2.9'} Model S/N Export timestamp SW-Version Hamilton-C1 25576 2020-09-17_11-03-40 2.2.9
If you have no interest in a dictionary make items a list and append items.
with open("data.txt", "r") as file:
    items = []
    for line in file:
        if ":" in line:
            a, b = map(str.strip, line.split(":"))
            items.append(a)

print(items)
\
Output:
['Model', 'S/N', 'Export timestamp', 'SW-Version']
If you want each item in the list to be a list do this.
with open("data.txt", "r") as file:
    items = []
    for line in file:
        if ":" in line:
            a, b = map(str.strip, line.split(":"))
            items.append([b])

print(items)
Output:
[['Hamilton-C1'], ['25576'], ['2020-09-17_11-03-40'], ['2.2.9']]
If every line in you log file is in the form "name: value" you could open this as a CSV file using ":" as the delimiter, or you could read it using pandas.

Thank you for helping. For me is the Output of only the values (last solution) the best.
But If I execute the code, I got the following error:

Output:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [167], in <module> 5 for line in file: 6 if ":" in line: ----> 7 a,b = map(str.strip, line.split(":")) 8 items.append([b]) 10 print(items) ValueError: too many values to unpack (expected 2)
How I can deal with this?
There must be lines with more than one ":" in it. Maybe a time?

You can tell split when to stop splitting. This tells split to stop after the first split.
a, b = map(str.strip, line.split(":", maxsplit=1))
items.append([b])
(Apr-06-2022, 09:14 PM)deanhystad Wrote: [ -> ]There must be lines with more than one ":" in it. Maybe a time?

You can tell split when to stop splitting. This tells split to stop after the first split.
a, b = map(str.strip, line.split(":", maxsplit=1))
items.append([b])

Thank you, it is working now with the following code:

with open(data, "r") as file:
    items = []
    for line in file:
        if ":" in line:
            a,b = map(str.strip, line.split(":", maxsplit=1))
            items.append(b)

new_result = items[0:4]
            
print(new_result)
Output:
['Hamilton-C1', '25455', '2020-09-16_21-12-40', '2.2.9']
After the list, I put it in a dataframe:
import pandas as pd

df = pd.DataFrame([new_result], columns=['Model', 'S/N', 'timestamp', 'SW'])

df
Output:
Model S/N timestamp SW 0 Hamilton-C1 25455 2020-09-16_21-12-40 2.2.9
Now, I have a folder with thousends of log files and will get this information and put it also in a dataframe.
How I can do this, as simple as possible?

Thanks for helping me again.