Jun-12-2024, 06:56 PM
I have returned to this same project and wish to extend my data analysis. My two latest code snippets and graphs can be found below.
Here are my questions for each pair:
Second pair:
Here are my questions for each pair:
- In the first snippet and graph, in my Jupyter Notebook pandas and matplotlib show two categories successfully thanks to the helpful feedback from other forum members. So thank you to those who have contributed to the discussion so far. But I noticed that when I change the alpha (translucency) variable, the time spent on the different categories overlap each other. How do I stack the data instead? That’s my first question.
- In the second snippet and graph, only one category shows up (”Magick”). How do I get the other “Research” category to show? As far as I can tell, the way I parse, modify, and cast function calls and methods against the two dataframes should work. I’ve been swapping out variable names, tried refactoring, as well as making large and small other changes without success. Who here can identify what I might be missing to get both categories to show (instead of one)? (My additional intent here is to ensure they also stack (rather than overlapping) like I have set out to do with the first graph).
import pandas as pd pd.set_option('display.expand_frame_repr', False) import matplotlib.pyplot as plt bulk_df = pd.read_csv('data/all-comments-removed.csv', parse_dates=["From", "To"]) bulk_df['Duration'] = pd.to_timedelta(bulk_df['Duration']) bulk_df['Duration_hours'] = bulk_df['Duration'].dt.total_seconds() / 3600 # Copy so changes made to python_df dos not affect bulk_df and vice versa python_df = bulk_df[bulk_df["Activity"] == "Python"].copy() python_df.set_index('From', inplace=True) # Calculate rolling means using the index now python_df['Rolling_Mean_90'] = python_df['Duration_hours'].rolling('90D').mean() python_df['Rolling_Mean_182'] = python_df['Duration_hours'].rolling('182D').mean() # Copy so changes made to django_df dos not affect bulk_df and vice versa django_df = bulk_df[bulk_df["Activity"] == "Django"].copy() django_df.set_index('From', inplace=True) # Calculate rolling means using the index now django_df['Rolling_Mean_90'] = django_df['Duration_hours'].rolling('90D').mean() django_df['Rolling_Mean_182'] = django_df['Duration_hours'].rolling('182D').mean() python_df_Month = python_df['Rolling_Mean_90'].resample('MS').sum() django_df_Month = django_df['Rolling_Mean_90'].resample('MS').sum() # py_dj_Month_combined = python_df_Month.add(django_df_Month, fill_value=0) plt.figure(figsize=(14, 8)) plt.bar(python_df_Month.index, python_df_Month, label='Python 90-Day Rolling Mean',width=20, alpha=0.5) # color='red') plt.bar(django_df_Month.index, django_df_Month, label='Django 90-Day Rolling Mean', width=20, alpha=0.5) #, color='blue') plt.legend() plt.title('Stacked Bar Chart for Python and Django Activities') plt.xlabel('Date') plt.ylabel('Hours Spent') plt.show()That renders as:
Second pair:
import pandas as pd pd.set_option('display.expand_frame_repr', False) import matplotlib.pyplot as plt # Load the data bulk_df = pd.read_csv('data/all-comments-removed.csv', parse_dates=["From", "To"]) bulk_df['Duration'] = pd.to_timedelta(bulk_df['Duration']) bulk_df['Duration_hours'] = bulk_df['Duration'].dt.total_seconds() / 3600 # Copy and filter data for "Magick" activity and calculate rolling means magick_df = bulk_df[bulk_df["Activity"] == "Magick"].copy() magick_df.set_index('From', inplace=True) magick_df['Rolling_Mean_90'] = magick_df['Duration_hours'].rolling('90D').mean() magick_df['Rolling_Mean_182'] = magick_df['Duration_hours'].rolling('182D').mean() # Copy and filter data for "Research (general)" activity and calculate rolling means research_df = bulk_df[bulk_df["Activity"] == "Research (general)"].copy() research_df.set_index('From', inplace=True) research_df['Rolling_Mean_90'] = research_df['Duration_hours'].rolling('90D').mean() research_df['Rolling_Mean_182'] = research_df['Duration_hours'].rolling('182D').mean() # Resample data magick_df_Month = magick_df['Rolling_Mean_90'].resample('MS').sum() research_df_Month = research_df['Rolling_Mean_90'].resample('MS').sum() # Plot the combined data with wider bars plt.figure(figsize=(12, 6)) plt.bar(research_df_Month.index, research_df_Month, label='"Research" 90-Day Rolling Mean', width=20, alpha=0.5, color='blue') plt.bar(magick_df_Month.index, magick_df_Month, label='"Magick" ("Philosophy") 90-Day Rolling Mean',width=20, alpha=0.5, color='red') plt.legend() plt.title('Stacked Bar Chart for Magick and Research Activities') plt.xlabel('Date') plt.ylabel('Hours Spent') plt.show()That shows as: