Sep-28-2022, 02:03 PM
I built a Time-Series that displays the price of the Electricty Price in South Italy and two of their most important commodities (commodities, gas) used to produce the eletrical energy. So I ordered all these data into DataFrame where there are the following data in details:
PETROIL GAS ELECTRICITY
0 64.138395 2.496172 68.608696
1 65.196161 2.482612 113.739130
2 64.982403 2.505938 112.086957
3 64.272606 2.500000 110.043478
4 65.993436 2.521739 95.260870
So on this DataFrame I tried to build the Correlation Matrix throught the Pandas metod .corr() (using the Pearson method) and faced one big issue:
If I take all 12 years as data I get:
So I am here asking these two questions:
- First Column - Daily Price of Petroil Future during N Day;
- Second Column - Daily Price of Gas Future during N Day;
- Third Column - Daily Price of Dau-Ahead Eletricity Market in Italy;
PETROIL GAS ELECTRICITY
0 64.138395 2.496172 68.608696
1 65.196161 2.482612 113.739130
2 64.982403 2.505938 112.086957
3 64.272606 2.500000 110.043478
4 65.993436 2.521739 95.260870
So on this DataFrame I tried to build the Correlation Matrix throught the Pandas metod .corr() (using the Pearson method) and faced one big issue:
If I take all 12 years as data I get:
- almost Zero as correlation between Electricity and Petroil price;
- low correlation (0.12) between Electricity and Gas price;
So I am here asking these two questions:
- Why I get this so high difference when I split the time ranges?
- Considering I am doing this kind of analysis to use Petroil and Gas prices to predict the electricity price, which of these two analysis should I consider? The first one (with low correlation) that considers the entire time range or the second one (with higher correlation) that is split into different time ranges?
Thank you for your answers.