SoleModels.jl – Symbolic Learning Models

Stable

In a nutshell

SoleModels.jl defines the building blocks of symbolic modeling and learning. It features:

  • Definitions for symbolic models (decision trees/forests, rules, branches, etc.);
  • Tools for evaluate them, and extracting rules from them;
  • Support for mixed, neuro-symbolic computation.

These definitions provide a unified base for implementing symbolic algorithms, such as:

  • Decision tree/random forest learning;
  • Classification/regression rule extraction;
  • Association rule mining.


Basic models are:

  • Leaf models: wrapping native Julia computation (e.g., constants, functions);
  • Rules: structures with IF antecedent THEN consequent END semantics;
  • Branches: structures with IF antecedent THEN pos_consequent ELSE neg_consequent END semantics.

Remember that:

  • An antecedent is a logical formula that can be checked on a logical interpretation (that is, an instance of a symbolic learning dataset), yielding a truth value (e.g., true/false);
  • A consequent is another model, for example, a (final) constant model or branch to be applied.

Within this framework, a decision tree is no other than a branch with branch and final consequents. Note that antecedents can consist of logical formulas and, in such case, the symbolic models are can be applied to logical interpretations. For more information, refer to SoleLogics.jl, the underlying logical layer.

Other noteworthy models include:

  • Decision List (or decision table): see Wikipedia;
  • Decision Tree: see Wikipedia;
  • Decision Forest (or tree ensamble): see Wikipedia;
  • Mixed Symbolic Model: a nested structure, mixture of many symbolic models.

Usage: rule extraction from a decision tree

First, train a decision tree:

# Load packages
    Pkg.add("MLJ"); using MLJ
    Pkg.add("MLJDecisionTreeInterface"); using MLJDecisionTreeInterface
    Pkg.add("DataFrames"); using DataFrames
    Pkg.add("Random"); using Random

# Load dataset
X, y = begin
    X, y = @load_iris;
    X = DataFrame(X)
    X, y

# Split dataset
X_train, y_train, X_test, y_test = begin
    train, test = partition(eachindex(y), 0.8, shuffle=true, rng = Random.MersenneTwister(42));
    X_train, y_train = X[train, :], y[train];
    X_test, y_test = X[test, :], y[test];
    X_train, y_train, X_test, y_test

# Train tree
mach = begin
    Tree = MLJ.@load DecisionTreeClassifier pkg=DecisionTree
    model = Tree(max_depth=-1, rng = Random.MersenneTwister(42))
    machine(model, X_train, y_train) |> fit!

# Inspect the tree
🌱 = fitted_params(mach).tree

Then, port it to Sole and play with it:

Pkg.add("DecisionTree"); import DecisionTree as DT

# Convert to 🌞-compliant model
🌲 = solemodel(🌱);

# Print model

# Inspect the rules

# Inspect rule metrics

# Inspect normalized rule metrics
metricstable(🌲, normalize = true)

# Make test instances flow into the model, so that test metrics can, then, be computed.
apply!(🌲, X_test, y_test)

# Pretty table of rules and their metrics
metricstable(🌲; normalize = true, metrics_kwargs = (; additional_metrics = (; height = r->SoleLogics.height(antecedent(r)))))

# Join some rules for the same class into a single, sufficient and necessary condition for that class
metricstable(joinrules(🌲; min_ncovered = 1, normalize = true))

Want to know more?

The formal foundations of the Sole framework are given in giopaglia's PhD thesis: Modal Symbolic Learning: from theory to practice, G. Pagliarini (2024)


The package is developed by the ACLAI Lab @ University of Ferrara.

SoleModels.jl mainly builds upon SoleLogics.jl and SoleData.jl, and it is the core module of Sole.jl, an open-source framework for symbolic machine learning.