Python Forum

Full Version: Customizing an sklearn submodule with cython
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'd like to create a custom DecisionTreeRegressor to be used with sklearn's RandomForestRegressor, however, to get the desired effect, I also need to create a custom Splitter, which determines how the training data is divided into leaf nodes and is written in Cython for sklearn. Here are the relevant sklearn files:

RandomForestRegressor (python): https://github.com/scikit-learn/scikit-l..._forest.py
DecisionTreeRegressor (python): https://github.com/scikit-learn/scikit-l...classes.py
Splitter (cython): https://github.com/scikit-learn/scikit-l...litter.pyx and https://github.com/scikit-learn/scikit-l...litter.pxd

The way to do this that intuitively makes sense to me is to create copies of the _forest.py file and the entire tree submodule, edit the files as needed to customize the relevant classes, and perform any recompilation steps, however, I want to make sure that I'm compiling in a manner that is consistent with the rest of sklearn. The problem is that I'm not sure what exactly sklearn is doing to compile its cython files and I can't replicate compilation using standard methods (https://cython.readthedocs.io/en/latest/...orial.html) without getting errors. Upon inspecting the local sklearn module folder, I see that sklearn generates a number of .so files that are not present in the GitHub repo. These appear to be generated by the setup.py file within the tree submodule (https://github.com/scikit-learn/scikit-l...e/setup.py).

One thing worth mentioning is the fact that someone using sklearn doesn't have to go through a manual compilation step. With that said, is anyone aware of a way to compile customized cython code from within a python file (i.e., without additional command line operations - similar to how sklearn apparently does it) that allows for easy recompilation in the event that a cython file is edited?