-
Notifications
You must be signed in to change notification settings - Fork 0
Online Prediction Framework
Online Prediction Framework (OPF) is a framework for working with and deriving predictions from online learning algorithms, including Numenta’s Cortical Learning Algorithm (CLA). OPF is designed to work in conjunction with a larger architecture, as well as in a standalone mode (i.e. directly from the command line). It is also designed such that new model algorithms and functionalities can be added with minimal code changes.
The OPF has three main responsibilities:
- Provide an interface/implementations for models
- Compute metrics on the output of models
- Provide an interface to write model output to a permanent store (csv file or some form of database)
Each of these 3 components is in a separate set of modules. Metrics and writing output are option al when running models.
Figure 1: Data flow in the OPF
- The OPF does not create models. It is up to the client code to figure out how many models to run, and to instantiate the correct types of models
- The OPF does not run models automatically. All the models in the OPF operate under a “push” model. The client is responsible for getting records from some data source, feeding records into the model, and handling the output of models.
The OPF defines the abstract "Model" interface for the implementation of any online learning model. All models must implement the following methods:
-
__init__(modelDescription, inferenceType)
Constructor for the model. Must take a modelDescription dictionary, which contains all the parameters necessary to instantiate the model, and an InferenceType value (see below). A model’s __init__() method should always call the __init__() method of the superclass.
-
run(inputRecord)
The main function for the model that does all the computation required for a new input record. Because the OPF only deals with online streaming models, each record is fed to the model one at a time Returns: A populated ModelResult object (see below)
-
getFieldInfo()
Returns a list of metadata about each of the translated fields (see below about translation). Each entry in the list is a FieldMetaInfo object, which contains information about the field, such as name and data type Returns: A list of FieldMetaInfo objects
-
finishLearning()
This is a signal from the client code that the model may be placed in a permanent "finished learning" mode where it will not be able to learn from subsequent input records. This allows the model to perform optimizations and clean up any learning-related state Returns: Nothing
-
resetSequenceStates()
Signals the model that a logical sequence has finished. The model should not treat the subsequent input record as subsequent to the previous record. Returns: Nothing
-
mapInputRecord() - not used
-
getRuntimeStats() – [can be a no-op]
Get runtime statistics specific to this model. Examples include “number of records seen” or “average cell overlap”
Returns: A dictionary where the keys are the statistic names, and the values are the statistic values
-
_getLogger() – [used by parent class]
Returns: The logging object for this class. This is used so that that the operations in the superclass use the same logger object.
It also provides the following functionality, common to all models:
-
enableLearning()/disableLearning()
Set’s the learning flag for the model. This can be queried internally and externally using the isLearningEnabled() method
-
enableInference(inferenceArgs=None)/disableInference()
Enables/Disables inference output for this model. Enabling inference takes an optional argument inferenceArgs, which is a dictionary with extra parameters that affect how inference is performed. For instance, an anomaly detection model may have a boolean parameter “doPrediction”, which toggles whether or not a prediction is computed in addition to the anomaly score.
The inference state of a model can be queried internally and externally using the isInferenceEnabled() method. The inference arguments can be queried using the getInferenceArgs() method.
-
save(saveModelDir)
Save the model state via pickle and saves the resulting object in the saveModelDir directory.
-
_serializeExtraData(extaDataDir)/_deSerializeExtraData(extraDataDir)
If there is state that cannot be pickled and needs to be saved separately, this can be done by overriding these methods (implemented as no-ops be default).
Figure 2: Records are input to models in the form of dictionary-like objects, where the keys are field names and the values are the raw field values.
Certain field types need to be converted into primitive input types. For example, datetime types are converted to 2 integer values, timeOfDay and dayOfWeek. In the OPF, this process is called translation. Generally, all models will have a translation step. Conceptually, translation produces two parallel lists (for performance reasons): A list of field metadata, and a list of translated field values. In practice, the first list is constant, so it can be pre-computed and stored in the model. This is the return value of getFieldInfo().
Additionally, for some model types (such as the CLA model), the translated inputs are quantized (put into buckets) and converted into binary vector representation. This process is called encoding. Most models may not need to encode the input (or, more likely, they will just need to quantize the input).