Python Forum

Hello,
I'm on a project where I want to code a random forest from scratch (to learn both how to code and the details and nuances of this type of machine learning). I know there are libraries to make stuff easily, but I want to make with my hands and learn.

WM[d] contains a list of lists and is global. d stands for the depth, r stands for the row, and each (depth, row) couple identifies a node, represented as a list containing the parameters of that node.
I am calling a node this way: WM[d][r]; and calling an element of a node via WM[d][r][elementID]

Some part of my code looks something like that:

for r in range(len(WM[d])):
    train_node(d, r)

In the first part, I have to loop over WM to create all the nodes. train_node() contains the code to create the next nodes if needs be, in the form of

def train_node(f,t,d,r):
    global WM
    if stuff:
        WM[d][r].append([]) #to create a new row element at that depth
        WM[d][r][-1] = [param1, param2...]
    return 'I love Yaks'

However it seems that my logic is flawed, since a first node is created, but the function never loops into them, meaning the nodes it created are not getting their parameters.
As I understand it, in the for loop WM is evaluated to give the range for r, and if during the loop we extend the range of r it doesn't reevaluate its range, meaning i won't train nodes that weren't there at the first call.

Am I right?
If there any way to do what I try to do cleanly? (I'm far from an experienced programmer, i'm basically selftaught, and this project has the objective of keeping selfteaching :) )
I suppose I can switch to while loops to do what I want to do here (is that even true?), but I was really interested in understanding what is going on exactly.

Quote:For looping over a list, editing the list from inside the loop?

The general rule of thumb is to not edit a sequence as your iterating over it. An option is to create a shallow copy via [:] syntax or a deep copy

I understand and recall that from courses I had long ago. However in that case I can't know in advance how much nodes there will be, nor how they will be connected...etc... More than this, I don't need to know, I'm supposed to let the program make it himself if I were to follow the concept of random forests.
It feels extremely inefficient to remake a loop over each forest then each tree then each depth then each row everytime I create a new node.
Also, the WM object, containing all the information of the nodes, will be a very big object, and will be the one which will limit my ram usage (in a future inplementation, after making a single forest to work, I'm planning to have n+6 forests, n being dependent on the size of the data to analyze, each forest containing tress of 2^depth nodes).

For all these reasons, I felt compelled to store the data in an object and never make duplicates of it, and working directly in that object to create all the parameters, and using the same object to make predictions afterwards (I know I'm not forced to do that now, but plan to later expand on the code I'm making now)
I am aware there is likely another way to do what I'm trying to do, and that's the reason I'm asking here: to get some pointers so that I can find these alternatives :) Also, I'm not explaining all this to say that I'm right, it's just to explain more of the context and why I'm doing it in this weird way.

Anyways, thanks for having taken the time to answer.

Krookroo

I ended up just rewriting everything with while loops instead. Works like a charm.

Krookroo

metulburr

Krookroo

Krookroo