Python Forum

Full Version: Binning data to files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I am trying to develop a Power Curve / Algorithm that can give me a "Possible Max Power" signal when the Solar Panels on my roof are scaled-down in production.

To make this power curve I fetch data every second where I get a dataset similar to the below (Made with random.randint).
What would be a way I could separate this data into bins (Based on Solar_Radiation) so I can calculate a correlation between production and PV_Cell_Temp in each bin?
I have been looking around on the internet, but there doesn't seem to be anything in Pandas I can use to do this..

Timestamp,PV_Production,Solar_Radiation,PV_Cell_Temp,Ambient_Temp
2020-06-21 13:37:02.934901,0,206,164.8,0
2020-06-21 13:37:02.935898,0,312,124.8,0
2020-06-21 13:37:02.942879,0,234,23.4,0
2020-06-21 13:37:02.943877,0,230,230.0,0
2020-06-21 13:37:02.944874,0,273,218.4,0
2020-06-21 13:37:02.948862,0,317,95.1,0
2020-06-21 13:37:02.951855,0,328,328.0,0
2020-06-21 13:37:02.954847,0,311,0.0,0
Need further specification:
  • What comprises a bin
  • What are the keys
  • what are the data field names
  • description of each field
Sorry about the missing info. What I want my end result to be is a power curve where the production is determined based on the variables:
PV_Production: Total solar panel production [w]
Solar_Radiation: Solar radiation [w/m2]
PV_Cell_Temp: Temperature of the solar panel cells [c]
Ambient_Temp: Ambient temperature [c] (Not used, PV_Cell_Temp should suffice.

For the bins I imagined that:
- A bin consists of all the data points with 'Solar_Radiation' as the binning factor in an interval of 100
Eg.
Bin 1: if 0 < Solar_Radiation <= 100:
(All data from each string where solar radiation is 0..100)
Bin 2: 100 < Solar_Radiation <= 200
...
Bin 3...

What are the keys: What do you mean by key?

I hope the above made sense? I apologies for being very new to data analytics

Maybe to be even more exact. What i am looking for is some way to take a dataframe;
Example:
String_2 = ([100,24,59,19,588,209,345,288,193,294,298])

And then have some sort of function that could bin that string into a defined interval, lets say each 100;
df_1 = [24,59,19] #[0..100[
df_2 = [100, 193] #[100..200[
df_3 = [209, 288, 294, 298] #[200..300[
df_3 = [345] #[300..400[
df_4 = [] #[400..500[
df_5 = [588] #[500..600[
by key I mean the field that you organize all data sets by (date for example)
Also, how are you obtaining the data. Can you interface directly to the panels from python?
Oh, so the key I use is probably going to be a "Count" variable that I will add as the first column of the strings, this will just be incrementing += 1, I also have the Timestamp but I reckon a count integer would be more ideal as a key.

I am obtaining data using Pyads (From a Beckhoff PC that gets the data from my PLC that is connected to the panels), I have set the variables to update every 1sec, however, I think I will change it to 1min avg. instead to keep the dataflow down a bit.
Thank you so much for your time!