Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Market Basket Analysis: Finding Association Rules
#1
Hello. I am trying to do a market basket analysis of transaction data. So far I have been able to organize the transaction data into a Pandas dataframe:

#Import Libraries 
import pandas as pd

#Load the transaction data (.csv) into a Pandas dataframe
df_transactions_2018 = pd.read_csv(r'C:\Users\kroch13\Desktop\MBA_710_2018.csv')

#Manipulate the transaction dataframe into a basket dataframe, reset index
df_basket_2018 = df_transactions_2018.groupby('TXN_ID').PROD_ID.apply(list).reset_index()

#Print head of dataframe to confirm proper manipulation 
print(df_basket_2018.head(20)) 
The result of manipulating my dataframe with pandas groups the product_id into a list by their corresponding transaction_id. Essentially organizing the data into baskets.

runfile('C:/Users/kroch13/.spyder-py3/MBA_2018_710.py', wdir='C:/Users/kroch13/.spyder-py3')


TXN_ID PROD_ID
0 5328071000000 [1492909, 1829122, 732017]
1 5328071000002 [527887, 1903575]
2 5328071000004 [165031]
3 5328071000017 [732017, 173497, 1730121]
4 5328071000018 [1906819, 159076, 1972349]
5 5328071000019 [1456052, 2032012, 1941105, 19081, 459749, 299...
6 5328071000020 [732017, 54041, 1079145]
7 5328071000021 [868316, 1151328, 934249, 1083290, 950649]
8 5328071000022 [1972349]
9 5328071000024 [289050, 1972349, 924523, 912965, 1575825, 149...
10 5328071000026 [1112955, 971470, 1254388, 632059, 1695567, 37...
11 5328071000027 [170011, 1823366, 268097]
12 5328071000028 [213480, 348515, 1705337, 969985]
13 5328071000030 [1163293]
14 5328071000033 [64879, 1997702]
15 5328071000034 [197918, 219682]
16 5328071000035 [365919, 473903, 1912359, 828673, 521257, 9130...
17 5328071000036 [907493, 935161, 886485, 1988773, 1672096, 185...
18 5328071000037 [614694]
19 5328071000038 [1066552, 1645842]

Now that I have loaded my data into Python and have it organized by basket, how do find frequent item sets and generate association rules? I would like to have my association rules visualized so that I can see the associations with the highest the support, confidence, and lift of each rule.

Example:

Rules Support: Confidence: Lift:
{A} --> {B}: .07 1.00 5.00
{C} --> {D}: .07 1.00 5.00

*I do not need rules with item sets > 1. Ex. {A,B} --> {E}

Please let know what is best way to implement market basket analysis upon a dataframe organized in this way. I am lost as to where to continue from here.

I appreciate any help you can provide!

Thanks,
Kyle
Quote
#2
Take a look at Apriori algorithm and corresponding Python package efficient-apriori.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Statistical analysis of two dataframes zhl 1 349 Jun-11-2019, 07:26 PM
Last Post: Ecniv
  finding index to given value from dataset gamma 3 130 May-26-2019, 06:26 PM
Last Post: heiner55
  Fiber Photometry analysis Gius_ 0 123 May-12-2019, 09:14 PM
Last Post: Gius_
  Finding Coordinates Sikum7 0 136 Mar-27-2019, 11:25 AM
Last Post: Sikum7
  Sentiment Analysis Classifier lode 0 320 Feb-04-2019, 05:00 AM
Last Post: lode
  PyCM 1.8 released: Machine learning library for confusion matrix statistical analysis sepandhaghighi 0 261 Jan-05-2019, 12:36 PM
Last Post: sepandhaghighi
  Load .abf file and for analysis with Pandas finalcode 0 363 Nov-10-2018, 09:51 AM
Last Post: finalcode
  finding exact and similar matches from pandas dataframe? PrateekG 0 677 Apr-22-2018, 01:22 PM
Last Post: PrateekG
  Image Analysis for Measurement of Total leaf Area teamaqua 1 1,009 Aug-27-2017, 12:05 PM
Last Post: Larz60+
  Excel analysis SamB 3 1,707 Jul-04-2017, 09:52 PM
Last Post: zivoni

Forum Jump:


Users browsing this thread: 1 Guest(s)