Python Forum
Binning data to files - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Binning data to files (/thread-27780.html)



Binning data to files - Kappel - Jun-21-2020

Hi,
I am trying to develop a Power Curve / Algorithm that can give me a "Possible Max Power" signal when the Solar Panels on my roof are scaled-down in production.

To make this power curve I fetch data every second where I get a dataset similar to the below (Made with random.randint).
What would be a way I could separate this data into bins (Based on Solar_Radiation) so I can calculate a correlation between production and PV_Cell_Temp in each bin?
I have been looking around on the internet, but there doesn't seem to be anything in Pandas I can use to do this..

Timestamp,PV_Production,Solar_Radiation,PV_Cell_Temp,Ambient_Temp
2020-06-21 13:37:02.934901,0,206,164.8,0
2020-06-21 13:37:02.935898,0,312,124.8,0
2020-06-21 13:37:02.942879,0,234,23.4,0
2020-06-21 13:37:02.943877,0,230,230.0,0
2020-06-21 13:37:02.944874,0,273,218.4,0
2020-06-21 13:37:02.948862,0,317,95.1,0
2020-06-21 13:37:02.951855,0,328,328.0,0
2020-06-21 13:37:02.954847,0,311,0.0,0


RE: Binning data to files - Larz60+ - Jun-21-2020

Need further specification:
  • What comprises a bin
  • What are the keys
  • what are the data field names
  • description of each field



RE: Binning data to files - Kappel - Jun-22-2020

Sorry about the missing info. What I want my end result to be is a power curve where the production is determined based on the variables:
PV_Production: Total solar panel production [w]
Solar_Radiation: Solar radiation [w/m2]
PV_Cell_Temp: Temperature of the solar panel cells [c]
Ambient_Temp: Ambient temperature [c] (Not used, PV_Cell_Temp should suffice.

For the bins I imagined that:
- A bin consists of all the data points with 'Solar_Radiation' as the binning factor in an interval of 100
Eg.
Bin 1: if 0 < Solar_Radiation <= 100:
(All data from each string where solar radiation is 0..100)
Bin 2: 100 < Solar_Radiation <= 200
...
Bin 3...

What are the keys: What do you mean by key?

I hope the above made sense? I apologies for being very new to data analytics

Maybe to be even more exact. What i am looking for is some way to take a dataframe;
Example:
String_2 = ([100,24,59,19,588,209,345,288,193,294,298])

And then have some sort of function that could bin that string into a defined interval, lets say each 100;
df_1 = [24,59,19] #[0..100[
df_2 = [100, 193] #[100..200[
df_3 = [209, 288, 294, 298] #[200..300[
df_3 = [345] #[300..400[
df_4 = [] #[400..500[
df_5 = [588] #[500..600[


RE: Binning data to files - Larz60+ - Jun-22-2020

by key I mean the field that you organize all data sets by (date for example)
Also, how are you obtaining the data. Can you interface directly to the panels from python?


RE: Binning data to files - Kappel - Jun-22-2020

Oh, so the key I use is probably going to be a "Count" variable that I will add as the first column of the strings, this will just be incrementing += 1, I also have the Timestamp but I reckon a count integer would be more ideal as a key.

I am obtaining data using Pyads (From a Beckhoff PC that gets the data from my PLC that is connected to the panels), I have set the variables to update every 1sec, however, I think I will change it to 1min avg. instead to keep the dataflow down a bit.
Thank you so much for your time!