Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Market Basket Analysis: Finding Association Rules
#1
Hello. I am trying to do a market basket analysis of transaction data. So far I have been able to organize the transaction data into a Pandas dataframe:

#Import Libraries 
import pandas as pd

#Load the transaction data (.csv) into a Pandas dataframe
df_transactions_2018 = pd.read_csv(r'C:\Users\kroch13\Desktop\MBA_710_2018.csv')

#Manipulate the transaction dataframe into a basket dataframe, reset index
df_basket_2018 = df_transactions_2018.groupby('TXN_ID').PROD_ID.apply(list).reset_index()

#Print head of dataframe to confirm proper manipulation 
print(df_basket_2018.head(20)) 
The result of manipulating my dataframe with pandas groups the product_id into a list by their corresponding transaction_id. Essentially organizing the data into baskets.

runfile('C:/Users/kroch13/.spyder-py3/MBA_2018_710.py', wdir='C:/Users/kroch13/.spyder-py3')


TXN_ID PROD_ID
0 5328071000000 [1492909, 1829122, 732017]
1 5328071000002 [527887, 1903575]
2 5328071000004 [165031]
3 5328071000017 [732017, 173497, 1730121]
4 5328071000018 [1906819, 159076, 1972349]
5 5328071000019 [1456052, 2032012, 1941105, 19081, 459749, 299...
6 5328071000020 [732017, 54041, 1079145]
7 5328071000021 [868316, 1151328, 934249, 1083290, 950649]
8 5328071000022 [1972349]
9 5328071000024 [289050, 1972349, 924523, 912965, 1575825, 149...
10 5328071000026 [1112955, 971470, 1254388, 632059, 1695567, 37...
11 5328071000027 [170011, 1823366, 268097]
12 5328071000028 [213480, 348515, 1705337, 969985]
13 5328071000030 [1163293]
14 5328071000033 [64879, 1997702]
15 5328071000034 [197918, 219682]
16 5328071000035 [365919, 473903, 1912359, 828673, 521257, 9130...
17 5328071000036 [907493, 935161, 886485, 1988773, 1672096, 185...
18 5328071000037 [614694]
19 5328071000038 [1066552, 1645842]

Now that I have loaded my data into Python and have it organized by basket, how do find frequent item sets and generate association rules? I would like to have my association rules visualized so that I can see the associations with the highest the support, confidence, and lift of each rule.

Example:

Rules Support: Confidence: Lift:
{A} --> {B}: .07 1.00 5.00
{C} --> {D}: .07 1.00 5.00

*I do not need rules with item sets > 1. Ex. {A,B} --> {E}

Please let know what is best way to implement market basket analysis upon a dataframe organized in this way. I am lost as to where to continue from here.

I appreciate any help you can provide!

Thanks,
Kyle
Quote
#2
Take a look at Apriori algorithm and corresponding Python package efficient-apriori.
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)