Mar-09-2019, 02:19 PM
(This post was last modified: Mar-09-2019, 02:19 PM by Jackhammer.)
Hello folks!
I'm totally new using Python (coming from Java/Scala world) and I'm refactoring a project that does a lot ETL things using pandas. This project has some classes to extract data from sources like Google Adwords, Analytics and etc.
Right now, I'm starting to refactoring extraction data from Google Adword. The intern who wrote this part, wrote as a class and a lot of functions that are not being used by class, they are used on other parts of modules.
For a Java developer it's common to write classes everywhere and use fancy names as namespaces but I didn't figure out why should I use a class on python. I can write small and pure functions instead of fancy classes that I'll need to instantiate to use functions like "format my timestamp as string".
Going back to my problem, I need to write some code that will do the following thing:
Based on that, I was studying about the correct approach for that. Should I write pure functions or keep using a class?
I'm trying to avoid high coupling what could be ordinary in Java world.
Any advice, please?
I'm totally new using Python (coming from Java/Scala world) and I'm refactoring a project that does a lot ETL things using pandas. This project has some classes to extract data from sources like Google Adwords, Analytics and etc.
Right now, I'm starting to refactoring extraction data from Google Adword. The intern who wrote this part, wrote as a class and a lot of functions that are not being used by class, they are used on other parts of modules.
For a Java developer it's common to write classes everywhere and use fancy names as namespaces but I didn't figure out why should I use a class on python. I can write small and pure functions instead of fancy classes that I'll need to instantiate to use functions like "format my timestamp as string".
Going back to my problem, I need to write some code that will do the following thing:
- Get a CSV report from Google Analytics;
- Return a dataframe with new columns;
- Write data into redshift (I have a class that encapsulates a lot of things from redshift).
Based on that, I was studying about the correct approach for that. Should I write pure functions or keep using a class?
I'm trying to avoid high coupling what could be ordinary in Java world.
Any advice, please?