Python Forum
How to check for nested dataframe density? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/Forum-Python-Coding)
+--- Forum: Data Science (https://python-forum.io/Forum-Data-Science)
+--- Thread: How to check for nested dataframe density? (/Thread-How-to-check-for-nested-dataframe-density)



How to check for nested dataframe density? - python_newbie09 - Aug-27-2018

I have attached a csv file where this data is being stored as a nested dataframe in a main dataframe which i cannot include in here. main_col is the column from the main dataframe that has the data in this csv file stored in it as a nested df. what I want to achieve is to measure the data density but i am getting an index positional error. The code I am currently using looks like below and I am not sure what is causing the problem.

[attachment=469]

import pandas as pd

df = pd.read_csv('test_data.csv')

def data_density(thresh=None):
    counter = 0
    counter_1 = 0
    ix = []
    for ixn, data in df.iterrows():
        counter = counter + 1
        total_matrix = data['main_col'].loc[:, 'A1']['Game1'].shape[0] * \
                       data['main_col'].loc[:, 'A1']['Game2'].shape[1] + \
                       data['main_col'].loc[:, 'A2']['Game1'].shape[0] * \
                       data['main_col'].loc[:, 'A2']['Game2'].shape[1]
        total_values = data['main_col'].loc[:, 'A1']['Game1'].count().sum() + \
                       data['main_col'].loc[:, 'A2']['Game1'].count().sum()


        if total_values != 0:
            data_density = float(total_values) / float(total_matrix)

            if data_density > threshold:
                counter_1 = counter_1 + 1
                ix.append(ixn)
    ratio = float(counter_1) / counter
    return ix, ratio


df3 = pd.DataFrame()
for i in range(80, 100, 5):
    i = float(i) / 100
    ix, ratio = data_density(thresh=i)
    print('data density for', ratio, 'when threshold is:', i)
    print(len(ix))
    df = pd.DataFrame()
    for j in range(0, len(ix)):
        df2 = df[(df.index == ix[j])]
        df = df.append(df2)
    print(df)
    df3 = df3.append(df)