Python Forum
Create csv file with 4 columns for process mining
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Create csv file with 4 columns for process mining
#1
Hello
I try little to work with python and process mining. so i try to create a file from a text with 4 columns case id , name, process and time but my problem is that my code put it on same column on csv - excel file wich i dont want it. I want to put them on 4 different columns and same the titles.

import re
import pandas as pd

# Sample text paragraph (replace with your actual text)
text_paragraph = """
Character: Maria
Case1 - 2023-11-01 09:00 AM: Started the process
Character: George
Case2 - 2023-11-01 10:30 AM: Joined the project
Character: Maria
Case1 - 2023-11-01 11:45 AM: Continued working
Character: George
Case2 - 2023-11-01 12:15 PM: Left for a meeting
"""

# Initialize variables to store event data
event_data = {
    'Case ID': [],
    'Character': [],
    'Process': [],
    'Time': []
}

# Use regular expressions to extract character, case ID, process, and time information
event_pattern = r"(Character: (.+)|Case(\d+) - (\d{4}-\d{2}-\d{2} \d{2}:\d{2} [APM]{2}): (.+))"
matches = re.findall(event_pattern, text_paragraph)

current_character = None

for match in matches:
    character, case_id, timestamp, process = match[1], match[2], match[3], match[4]

    if character:
        current_character = character
    else:
        event_data['Character'].append(current_character)
        event_data['Case ID'].append(case_id)
        event_data['Time'].append(timestamp)
        event_data['Process'].append(process)

# Create a DataFrame from the event data
df = pd.DataFrame(event_data)

# Save the DataFrame as a CSV file
df.to_csv('process_mining_data_4_columns.csv', index=False)
Reply
#2
Unless told otherwise, re patterns only match a single line. Your pattern has two lines, so you should use a MULTILINE pattern.
import re
import pandas as pd


text_paragraph = """
Character: Maria
Case1 - 2023-11-01 09:00 AM: Started the process
Character: George
Case2 - 2023-11-01 10:30 AM: Joined the project
Character: Maria
Case1 - 2023-11-01 11:45 AM: Continued working
Character: George
Case2 - 2023-11-01 12:15 PM: Left for a meeting
"""

event_pattern = re.compile(
    r"^Character: (.+)\nCase(\d+) - (\d{4}-\d{2}-\d{2} \d{2}:\d{2} [APM]{2}): (.+)",
    re.MULTILINE
)
df = pd.DataFrame(
    re.findall(event_pattern, text_paragraph), 
    columns=["Character", "Case Num", "Time", "Process"]
)
print(df)
Output:
Character Case Num Time Process 0 Maria 1 2023-11-01 09:00 AM Started the process 1 George 2 2023-11-01 10:30 AM Joined the project 2 Maria 1 2023-11-01 11:45 AM Continued working 3 George 2 2023-11-01 12:15 PM Left for a meeting
Reply
#3
ok thank you. but can i save it as csv file with the above data and 4 columns with titles and data?
Reply
#4
I don't understand the question. The code from your first post wrote a CSV file. I just ran your code and the CSV file looks like this:
Output:
Case ID,Character,Process,Time 1,Maria,Started the process,2023-11-01 09:00 AM 2,George,Joined the project,2023-11-01 10:30 AM 1,Maria,Continued working,2023-11-01 11:45 AM 2,George,Left for a meeting,2023-11-01 12:15 PM
4 columns with titles. Please describe how this is not what you want.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Create Choices from .ods file columns cspower 3 616 Dec-28-2023, 09:59 PM
Last Post: deanhystad
  How to create a table with different sizes of columns in MS word pepe 8 1,585 Dec-08-2023, 07:31 PM
Last Post: Pedroski55
  Recommended way to read/create PDF file? Winfried 3 2,902 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  Use PM4PY and create working file thomaskissas33 0 678 Nov-14-2023, 06:53 AM
Last Post: thomaskissas33
  create exe file for linux? korenron 2 985 Mar-22-2023, 01:42 PM
Last Post: korenron
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,116 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Converting a json file to a dataframe with rows and columns eyavuz21 13 4,522 Jan-29-2023, 03:59 PM
Last Post: eyavuz21
  my first file won't create itself MehHz2526 2 906 Nov-27-2022, 12:58 AM
Last Post: MehHz2526
  deleting columns in CSV file astral_travel 8 2,381 Nov-26-2022, 09:36 PM
Last Post: astral_travel
  Replace columns indexes reading a XSLX file Larry1888 2 996 Nov-18-2022, 10:16 PM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020