Python Forum

Full Version: How to increase the data size
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have data size around 100. Out of 100, 75 belongs to success category, 25 data belongs to failure category. Now, I want to increase this data size to 500 by keeping success to failure data ratio is 60%:40%. how to increase this data size.
You're going to be more specific. How is your data stored? And what do you mean by increasing your data size? When I want to increase my data, I go out and collect more data. But it sounds like you want to generate fake/imputed data.
I have only few data, and even I can not collect more of it. Hence I want to generate more data based on the available data trend (fake data) such as to increase more data count, then build model and predict. Can not be able to build model using fewer data. My data is stored in DataFrame.
There's tons of ways to impute data. Median/mean values, resampling, conditional resampling, regression, and so on. But these are for handling missing data points within a data set. They are not meant for creating more data beyond what is in the data set. If you don't have enough data for your model, then you don't have enough data for your model. Making up fake data is going to totally bias your model based on the assumptions you made when making your fake data.