Python Forum
Extract parts of multiple log-files and put it in a dataframe - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Extract parts of multiple log-files and put it in a dataframe (/thread-37036.html)



Extract parts of multiple log-files and put it in a dataframe - hasiro - Apr-25-2022

Hi everyone

As I posted a few weeks ago (https://python-forum.io/thread-36845-post-155721.html#pid155721), I want extract parts of multiple log-files and put it in a dataframe.

For a single log-file the code looks like this:
import pandas as pd

data = '/Users/roger/Data/Logs/train/2020-09-16_21-12-40_eventLog_enGB.txt'

with open(data, "r") as file:
    items = []
    for line in file:
        if ":" in line:
            a,b = map(str.strip, line.split(":", maxsplit=1))
            items.append(b)

new_result = items[0:4]
            
df = pd.DataFrame([new_result], columns=['Model', 'S/N', 'timestamp', 'SW'])

print(df)
Output:
Model S/N timestamp SW 0 Hamilton-C1 25455 2020-09-16_21-12-40 2.2.9
How can I do this for multiple files in a folder? Each file should write in a additional line, like this:

Output:
Model S/N timestamp SW 0 Hamilton-C1 25455 2020-09-16_21-12-40 2.2.9 1 Hamilton-C1 25456 2020-09-17_21-12-42 2.2.9
Thanks for helping me!


RE: Extract parts of multiple log-files and put it in a dataframe - menator01 - Apr-25-2022

Look into os.listdir and os.walk


RE: Extract parts of multiple log-files and put it in a dataframe - hasiro - Apr-26-2022

(Apr-25-2022, 07:19 PM)menator01 Wrote: Look into os.listdir and os.walk

Hi

Many thanks for help. I used glob for handle multiple files with .txt ending.
import glob
import pandas as pd

path = '/Users/roger/Data/Logs/train/'
all_files = glob.glob(path + "/*.txt")

for filename in all_files:
    if filename.endswith(".txt"):
        with open(filename, "r") as input_file:
            items = []
            for line in input_file:
                if ":" in line:
                    a,b = map(str.strip, line.split(":", maxsplit=1))
                    items.append(b)

        new_result = items[0:4]

df = pd.concat([new_result], axis=0, ignore_index=True)

print(df)
With that code I got following error:
Output:
TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid
How I can solve this problem?


RE: Extract parts of multiple log-files and put it in a dataframe - menator01 - Apr-27-2022

I have not used glob so, don't really know about it.

Using os.listdir (Code not tested)

from os import listdir
import pandas as pd

all_files = listdir('my_dir')

items = []

for filename in all_files:
    if filename.endswith('.txt'):
        with open(filename, 'r') as input_file:
            for line in input_file:
                if ':' in line:
                    a, b = map(str.strip, line.split(':', maxsplit=1))
                    items.append(b)
df = pd.DataFrame(items, columns=['Model', 'S/N', 'Timestamp', 'SW-Version'])

print(df)



RE: Extract parts of multiple log-files and put it in a dataframe - hasiro - Apr-27-2022

(Apr-27-2022, 08:39 AM)menator01 Wrote: I have not used glob so, don't really know about it.

Using os.listdir (Code not tested)

from os import listdir
import pandas as pd

all_files = listdir('my_dir')

items = []

for filename in all_files:
    if filename.endswith('.txt'):
        with open(filename, 'r') as input_file:
            for line in input_file:
                if ':' in line:
                    a, b = map(str.strip, line.split(':', maxsplit=1))
                    items.append(b)
df = pd.DataFrame(items, columns=['Model', 'S/N', 'Timestamp', 'SW-Version'])

print(df)

Thanks for help. When I execute your code I got this error:
Output:
Shape of passed values is (3991, 1), indices imply (3991, 4)
To fix this, I made this little extension in your code:
from os import listdir
import pandas as pd
 
all_files = listdir('D:/Data/Deep Learning/LogFiles/C1/')
 
items = []
 
for filename in all_files:
    if filename.endswith('.txt'):
        with open(filename, 'r') as input_file:
            for line in input_file:
                if ':' in line:
                    a, b = map(str.strip, line.split(':', maxsplit=1))
                    items.append(b)
        new_result = items[0:4]
df = pd.DataFrame([new_result], columns=['Model', 'S/N', 'Timestamp', 'SW-Version'])
 
print(df)
After the extension with "new_result = items[0:4] I got this output:
Output:
Model S/N Timestamp SW-Version 0 Hamilton-C1 25455 2020-09-16_21-12-40 2.2.9
But this ist the same output like for a single file. I want have a DataFrame with specific content of all files inside the folder.
How I have to change my code for this. I tried a lot, with no luck.