Create csv file with 4 columns for process mining

thomaskissas33 · Nov-05-2023, 02:54 PM

Hello
I try little to work with python and process mining. so i try to create a file from a text with 4 columns case id , name, process and time but my problem is that my code put it on same column on csv - excel file wich i dont want it. I want to put them on 4 different columns and same the titles.

import re
import pandas as pd

# Sample text paragraph (replace with your actual text)
text_paragraph = """
Character: Maria
Case1 - 2023-11-01 09:00 AM: Started the process
Character: George
Case2 - 2023-11-01 10:30 AM: Joined the project
Character: Maria
Case1 - 2023-11-01 11:45 AM: Continued working
Character: George
Case2 - 2023-11-01 12:15 PM: Left for a meeting
"""

# Initialize variables to store event data
event_data = {
    'Case ID': [],
    'Character': [],
    'Process': [],
    'Time': []
}

# Use regular expressions to extract character, case ID, process, and time information
event_pattern = r"(Character: (.+)|Case(\d+) - (\d{4}-\d{2}-\d{2} \d{2}:\d{2} [APM]{2}): (.+))"
matches = re.findall(event_pattern, text_paragraph)

current_character = None

for match in matches:
    character, case_id, timestamp, process = match[1], match[2], match[3], match[4]

    if character:
        current_character = character
    else:
        event_data['Character'].append(current_character)
        event_data['Case ID'].append(case_id)
        event_data['Time'].append(timestamp)
        event_data['Process'].append(process)

# Create a DataFrame from the event data
df = pd.DataFrame(event_data)

# Save the DataFrame as a CSV file
df.to_csv('process_mining_data_4_columns.csv', index=False)

**deanhystad** · (This post was last modified: Nov-06-2023, 09:32 PM by deanhystad.)

Unless told otherwise, re patterns only match a single line. Your pattern has two lines, so you should use a MULTILINE pattern.

import re
import pandas as pd


text_paragraph = """
Character: Maria
Case1 - 2023-11-01 09:00 AM: Started the process
Character: George
Case2 - 2023-11-01 10:30 AM: Joined the project
Character: Maria
Case1 - 2023-11-01 11:45 AM: Continued working
Character: George
Case2 - 2023-11-01 12:15 PM: Left for a meeting
"""

event_pattern = re.compile(
    r"^Character: (.+)\nCase(\d+) - (\d{4}-\d{2}-\d{2} \d{2}:\d{2} [APM]{2}): (.+)",
    re.MULTILINE
)
df = pd.DataFrame(
    re.findall(event_pattern, text_paragraph), 
    columns=["Character", "Case Num", "Time", "Process"]
)
print(df)

Output:  Character Case Num                 Time              Process
0     Maria        1  2023-11-01 09:00 AM  Started the process
1    George        2  2023-11-01 10:30 AM   Joined the project
2     Maria        1  2023-11-01 11:45 AM    Continued working
3    George        2  2023-11-01 12:15 PM   Left for a meeting

thomaskissas33 · Nov-06-2023, 07:23 PM

ok thank you. but can i save it as csv file with the above data and 4 columns with titles and data?

**deanhystad** · (This post was last modified: Nov-06-2023, 09:36 PM by deanhystad.)

I don't understand the question. The code from your first post wrote a CSV file. I just ran your code and the CSV file looks like this:

Output:Case ID,Character,Process,Time
1,Maria,Started the process,2023-11-01 09:00 AM
2,George,Joined the project,2023-11-01 10:30 AM
1,Maria,Continued working,2023-11-01 11:45 AM
2,George,Left for a meeting,2023-11-01 12:15 PM

4 columns with titles. Please describe how this is not what you want.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Create Choices from .ods file columns	cspower	3	1,736	Dec-28-2023, 09:59 PM Last Post: deanhystad
	How to create a table with different sizes of columns in MS word	pepe	8	7,482	Dec-08-2023, 07:31 PM Last Post: Pedroski55
	Recommended way to read/create PDF file?	Winfried	3	4,682	Nov-26-2023, 07:51 AM Last Post: Pedroski55
	Use PM4PY and create working file	thomaskissas33	0	1,899	Nov-14-2023, 06:53 AM Last Post: thomaskissas33
	create exe file for linux?	korenron	2	1,743	Mar-22-2023, 01:42 PM Last Post: korenron
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	2,082	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Converting a json file to a dataframe with rows and columns	eyavuz21	13	13,689	Jan-29-2023, 03:59 PM Last Post: eyavuz21
	my first file won't create itself	MehHz2526	2	1,646	Nov-27-2022, 12:58 AM Last Post: MehHz2526
	deleting columns in CSV file	astral_travel	8	6,485	Nov-26-2022, 09:36 PM Last Post: astral_travel
	Replace columns indexes reading a XSLX file	Larry1888	2	1,700	Nov-18-2022, 10:16 PM Last Post: Pedroski55

Create csv file with 4 columns for process mining

User Panel Messages

Announcements