Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DecisionTree, DecisionTreeModel, and RandomForest should be in three (or two) different repositories #2

Open
olekscode opened this issue Jan 22, 2020 · 2 comments

Comments

@olekscode
Copy link
Member

olekscode commented Jan 22, 2020

I believe that DecisionTree can be used not only for machine learning, but also for many other applications (for example, to contain expert knowledge). So it would be nice to have it as a standalone project that lives in a separate repository.

Then there is DecisionTreeModel - a machine learning model that is used for building decision trees. This should be a separate repository can contains an abstract class DecisionTreeModel and several algorithms implemented as subclasses:

  • C4.5
  • ID3
  • etc.

DecisionTreeModel repository should depend on DecisionTree repository.

Finally, I don't remember well how RandomForest works, but if it is the ensembling algorithm that averages the output of several DecisionTreeModels, then you can put it into a separate repository and add a dependency on DecisionTreeModel.

However, if it is the kind of DecisionTreeModel that generates random decision trees and averages their outputs, then I would make it a subclass of the abstract DecisionTreeModel and keep it in the same repository.

This way we can have a clean separation and each module can be used independently.

@olekscode olekscode changed the title DecisionTree, DecisionTreeModel, and RandomForest should be in thre different repositories DecisionTree, DecisionTreeModel, and RandomForest should be in three (or two) different repositories Jan 22, 2020
@Ducasse
Copy link
Contributor

Ducasse commented Jan 22, 2020

I would keep the organisation simple. Else each time you will add something you will have to do three commits.
Keep things simple and when they grow too large take actions. Be agile and fast at the beginning. There is no problem to have a repo with 5 packages.

@jordanmontt
Copy link
Member

jordanmontt commented Feb 23, 2022

I also think that at least Random Forest and Decision Tree should be in separate repoitories. As Oleks said, they are different algorithm that can be used in different ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants