Machine Learning Algorithms in C++

Overview

This repository implements some commonly used machine learning algorithms in C++. Currently, decision trees and random forests have been implemented.

A neural network module is currently being developed.

The input parameters should be specified in a json file. A representative example is given below:

{
	"general": {
		"logfile": "logs.txt"
	},
	"models": [
		{
			"model": "decision_tree_classifier",
			"data": "/mnt/c/Users/65915/example_datasets/iris_processed.csv",
			"search_algorithm": "breadth",
			"impurity_method": "gini",
			"random_seed": 120
		},
		{
			"model": "random_forest_classifier",
			"data": "/mnt/c/Users/65915/example_datasets/iris_processed.csv",
			"search_algorithm": "breadth",
			"impurity_method": "gini",
			"number_trees": 5,
			"random_seed": 120,
			"max_feature_fraction": 1.0
		}
	]
}

The parameters that can be specified for each type of model are explained in the online documentation.

The next few sections explain the algorithms used to implement each type of machine learning model.

Decision Trees

The current implementation of decision trees is for classification problems only.

During the training process, the decision tree is grown using either depth-first or breadth-first search. When determining the optimal split for each node, either the gini or entropy methods can be used to calculate the node impurity. More details about these impurity calculation methods can be found here. The algorithm always selects the split that produces the maximum reduction in impurity.

When performing inference, the tree grown during the training phase is traversed until a leaf node is reached for each instance in the test dataset. The predicted class is the one that occurs most frequently in the leaf node that is reached.

Random Forests

The current implementation of random forests is for classification problems only.

During the training process, the random forest grows a number of decision trees using the procedure described above. The data used to grow each tree is obtained from the training data supplied to this module using bootstrap sampling.

During inference, the prediction of each tree is obtained for each test instance. The class that is predicted by the most number of trees is returned as the predicted class for that test instance.

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
.github/workflows		.github/workflows
cmake		cmake
docs		docs
logs		logs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Algorithms in C++

Overview

Decision Trees

Random Forests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

nagarajankarthik/machine-learning

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Algorithms in C++

Overview

Decision Trees

Random Forests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages