Python Forum

Full Version: Formatting data based on DataFrames values
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey everyone,

I'm trying to organise and filter data in order to plot variables against each other.
I have a DataFrame with 2 columns, x and y. I'm trying to form box plots with the data however I'm trying to separate the x values in sub groups and then assign the y value to that sub group. For example, the subgroups of x are 0-10, 10,20, 20-30 etc, and if the original DataFrame has a value of 15 in the y, then form a new DataFrame which has the y values and the subgroup which it belongs too.

I've attached an image to hopefully explain the problem better.[Image: jX1C4HL]

So far I've imported the csv file and assigned a new DataFrame to the columns I need. I'm not so sure how to check if the value of x is part of the subset; specifically in regards to the correct process.

I was thinking of making a new list with the subgroups 0-10 increasing by 10 to 90-100 and then attempt to create an if statement.

import pandas as pd
import csv

# importing the main dataset
dataset1 = pd.read_csv('dataset1.csv', index_col=0)

# creating a DataFrame with columns x and y
xydata = dataset1.loc[:, [x, y]]

# creating subset of x values
wd_bins = ["0-10", "10-20", "20-30","30-40","40-50","50,60","60-70","70-80","80-90","90-100"]
I am relatively new to Python, any help or advice would be greatly appreciated.

Thanks!