Jul-31-2023, 11:22 AM
I'm writing code to do statistical analysis of ecological data as part of a voluntary research project. Where the independent variables are continuous I'm planning to do multivariate analysis (LDA or PCA analysis) on it using Python libraries - where the independent variables are categorical I'm not sure of the best techniques to use - I was wondering if anyone else has experience of this or could provide any advice?
The biologists and zoologists I'm working with are experts in their field but don't have any background in statistics or computing - the other ecologists and zoologists I know tend to use R but my background is more in .Net and Python.
Given the power of the scientific data analysis libraries in Python - particularly pandas, numpy, scipy and scikit-learn and the graphical display libraries - particularly matplotlib, seaborn and plotnine - and of supporting technologies like pandas, Jupyter Notebook and JupyterLab - I was planning to do the analysis in Python.
In the area we're looking at there aren't many a priori theories which can be expressed as equations so I think the analysis will be primarily exploratory in nature with a lot of the results displayed graphically.
For multivariate analysis in Python I found this:
https://github.com/gatsoulis/a_little_bo...ysis.ipynb
it was written using older versions of the libraries and didn't work using the current versions but I amended it to work with current versions. An expert in Python in molecular biology and genomics however pointed out that this tutorial manually codes calculations which more modern versions of pandas and scikit-learn can do out-of the box.
I haven't been able to find many other resources on multivariate analysis in Python unfortunately.
It would be great to get input and some pointers from other people using Python in ecology and zoology!
The biologists and zoologists I'm working with are experts in their field but don't have any background in statistics or computing - the other ecologists and zoologists I know tend to use R but my background is more in .Net and Python.
Given the power of the scientific data analysis libraries in Python - particularly pandas, numpy, scipy and scikit-learn and the graphical display libraries - particularly matplotlib, seaborn and plotnine - and of supporting technologies like pandas, Jupyter Notebook and JupyterLab - I was planning to do the analysis in Python.
In the area we're looking at there aren't many a priori theories which can be expressed as equations so I think the analysis will be primarily exploratory in nature with a lot of the results displayed graphically.
For multivariate analysis in Python I found this:
https://github.com/gatsoulis/a_little_bo...ysis.ipynb
it was written using older versions of the libraries and didn't work using the current versions but I amended it to work with current versions. An expert in Python in molecular biology and genomics however pointed out that this tutorial manually codes calculations which more modern versions of pandas and scikit-learn can do out-of the box.
I haven't been able to find many other resources on multivariate analysis in Python unfortunately.
It would be great to get input and some pointers from other people using Python in ecology and zoology!

