Python Forum
loop function that parses arrays with condition: no redundant data - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: loop function that parses arrays with condition: no redundant data (/thread-30720.html)



loop function that parses arrays with condition: no redundant data - amela - Nov-02-2020

for some classification, I want to split my data (60%,20%,20%) in condition that no redundant data in each class: I want that the train set have unique data, test set have another set of data and the validation set contains new set of data

take the content of the matrix cell where the cell number i= number from the list region
my data table is

Quote: 5 5 5 6 6 6 6
124 254 558 541 57 120 212

as illustated in the example we have "5" included in the region list so we take the unique content of the 5 , same for 6,7 untill geting 10 ( as 10 is the final number

my data are numpy arrays. My code imagination are

Quote:def sets_split (id, name, region:
1- extract the unique liste of regions
2 - loop the regions, and extract unique ID uniques for each region
3- random shuffle the ID
4- split id list into train valid test
5- in the end we take idx_train, idx_valid et idx_test maybe with np.where (np.isin( id,Train/Valid/Test ))
return train_bands, valid_bands, test_bands, train_label, valid_label, test_label
my data are:

region: array of 2145 region -->
Quote:with list(set(region.flat))
gives a list of 10 [1,2,3,4,5,6,7,8,9,10]
ID: 2d array of 14587 feature
Note:my algorithm could be false, please feel free to give me hints


RE: loop function that parses arrays with condition: no redundant data - jefsummers - Nov-03-2020

Use scikitlearn's sklearn.model_selection.train_test_split function twice. Once to split off your validation set, then once to get your train and test functions.

In other words, the wheel has been invented already. Don't try to write the routine that does this rather than use a fast, debugged routine that already exists.


RE: loop function that parses arrays with condition: no redundant data - amela - Nov-03-2020

yes I found thatthe split can be likethis

Quote:X_train, X_test, y_train, y_test
= train_test_split(X, y, test_size=0.2, random_state=1)

X_train, X_val, y_train, y_val
= train_test_split(X_train, y_train, test_size=0.25, random_state=1) # 0.25 x 0.8 = 0.2

my problem is how to enssure that the data in the train are different of those in test and in split


RE: loop function that parses arrays with condition: no redundant data - jefsummers - Nov-04-2020

Pretty sure it does that.
Review Documentation


RE: loop function that parses arrays with condition: no redundant data - amela - Nov-05-2020

Is it the The cross_validate function??