-
Notifications
You must be signed in to change notification settings - Fork 0
Online Prediction Framework
Online Prediction Framework (OPF) is a framework for working with and deriving predictions from online learning algorithms, including Numenta’s Cortical Learning Algorithm (CLA). OPF is designed to work in conjunction with a larger architecture, as well as in a standalone mode (i.e. directly from the command line). It is also designed such that new model algorithms and functionalities can be added with minimal code changes.
The OPF has three main responsibilities:
- Provide an interface/implementations for models
- Compute metrics on the output of models
- Provide an interface to write model output to a permanent store (csv file or some form of database)
Each of these 3 components is in a separate set of modules. Metrics and writing output are option al when running models.
Figure 1: Data flow in the OPF
- The OPF does not create models. It is up to the client code to figure out how many models to run, and to instantiate the correct types of models
- The OPF does not run models automatically. All the models in the OPF operate under a “push” model. The client is responsible for getting records from some data source, feeding records into the model, and handling the output of models.
The OPF defines the abstract "Model" interface for the implementation of any online learning model. All models must implement the following methods:
-
__init__(modelDescription, inferenceType)
Constructor for the model. Must take a modelDescription dictionary, which contains all the parameters necessary to instantiate the model, and an InferenceType value (see below). A model’s __init__() method should always call the __init__() method of the superclass.
-
run(inputRecord)
The main function for the model that does all the computation required for a new input record. Because the OPF only deals with online streaming models, each record is fed to the model one at a time Returns: A populated ModelResult object (see below)
-
getFieldInfo()
Returns a list of metadata about each of the translated fields (see below about translation). Each entry in the list is a FieldMetaInfo object, which contains information about the field, such as name and data type Returns: A list of FieldMetaInfo objects
-
finishLearning()
This is a signal from the client code that the model may be placed in a permanent "finished learning" mode where it will not be able to learn from subsequent input records. This allows the model to perform optimizations and clean up any learning-related state Returns: Nothing
-
resetSequenceStates()
Signals the model that a logical sequence has finished. The model should not treat the subsequent input record as subsequent to the previous record. Returns: Nothing
-
mapInputRecord() - not used
-
getRuntimeStats() – [can be a no-op]
Get runtime statistics specific to this model. Examples include “number of records seen” or “average cell overlap”
Returns: A dictionary where the keys are the statistic names, and the values are the statistic values
-
_getLogger() – [used by parent class]
Returns: The logging object for this class. This is used so that that the operations in the superclass use the same logger object.
It also provides the following functionality, common to all models:
-
enableLearning()/disableLearning()
Set’s the learning flag for the model. This can be queried internally and externally using the isLearningEnabled() method
-
enableInference(inferenceArgs=None)/disableInference()
Enables/Disables inference output for this model. Enabling inference takes an optional argument inferenceArgs, which is a dictionary with extra parameters that affect how inference is performed. For instance, an anomaly detection model may have a boolean parameter “doPrediction”, which toggles whether or not a prediction is computed in addition to the anomaly score.
The inference state of a model can be queried internally and externally using the isInferenceEnabled() method. The inference arguments can be queried using the getInferenceArgs() method.
-
save(saveModelDir)
Save the model state via pickle and saves the resulting object in the saveModelDir directory.
-
_serializeExtraData(extaDataDir)/_deSerializeExtraData(extraDataDir)
If there is state that cannot be pickled and needs to be saved separately, this can be done by overriding these methods (implemented as no-ops be default).
Figure 2: Records are input to models in the form of dictionary-like objects, where the keys are field names and the values are the raw field values.
Certain field types need to be converted into primitive input types. For example, datetime types are converted to 2 integer values, timeOfDay and dayOfWeek. In the OPF, this process is called translation. Generally, all models will have a translation step. Conceptually, translation produces two parallel lists (for performance reasons): A list of field metadata, and a list of translated field values. In practice, the first list is constant, so it can be pre-computed and stored in the model. This is the return value of getFieldInfo().
Additionally, for some model types (such as the CLA model), the translated inputs are quantized (put into buckets) and converted into binary vector representation. This process is called encoding. Most models may not need to encode the input (or, more likely, they will just need to quantize the input).
The ModelResult object is the main data container in the OPF. When a record is fed to a model, it instantiates a new ModelResult instance, which contains model input and inferences, and is shuttled around to the various OPF modules. Below is a description of each of the ModelResult attributes. They default to None when the ModelResult is instantiated, and must be populated by the Model object.
- rawInput: This is the exact record that is fed into the model. It is a dictionary-like object where the keys are the input field names, and the values are input values of the fields. All the input values maintain their original types.
- sensorInput: The tranlated input record, as well as auxillary information about the input (See below)
- inferences: A dictionary that contains the output of a model (i.e. it's inference). The keys are InferenceElement values (described below), and the values are the corresponding inference values
- metrics: A dictionary where the keys are the unique metric labels, and the values are the metric values (a single float). This is the only element that is not populated by the model object, but by the surrounding code.
As explained above, fields from the raw input are translated into primitive input types. There also may be additional information about the input record that is needed by the OPF framework. The SensorInput object is a container that stores translated input record, as well as auxiliary information about the input. More attributes may be added to the SensorInput object as new features require them. Note: not every model needs to populate every field in SensorInput, and the exact requirements depend on which inferences and metrics are being computed.
- sequenceReset: Control field for temporal patterns. This field has a value of 1 if an explicit temporal reset was specified for this record, 0 otherwise. Resets are currently not being used.
- dataRow: The translated version of the input row.
- dataEncodings: The encoded version of the input, used by some metrics. This is a list of binary numpy arrays, one for each field in dataRow.
- category: In classification problems, this is the class label for the input record.
The concept of InferenceElements is a key part of the OPF. A model's inference may have multiple parts to it. For example, a model may output both a prediction and an anomaly score. Models output their set of inferences as a dictionary that is keyed by the enumerated type InferenceElement. Each entry in an inference dictionary is considered a separate inference element, and is handled independently by the OPF.
Data structures related to inference elements are located in opfutils.py.
For reasons unknown and poorly explained, the OPF handles different data types for inferences differently. This helps with the automation of handling new inference types, but can be confusing.
In order to compute metrics and write output, the OPF needs to know which input values (i.e. attributes of SensorInput) correspond to each inference element. This mapping between inputs and outputs are defined in InferenceElement.__inferenceInputMap. By specifying this mapping here, the same logic can be used both for writing to output and computing metrics
Below is an example.
class InferenceElement(...):
...
_inferenceInputMap = {
"prediction": "dataRow",
"encodings": "dataEncodings",
"classification": "category",
"multiStepPredictions": "dataRow"
}
Snippet 1: Mapping inferences to input
In this example, we can see that the “prediction” inference element is associated with SensorInput.dataRow, and the “classification” inference element is associated with SensorInput.category.
This association is used to compute metrics and to determine which parts of the input to write to output. For example, to compute error, the value of “prediction” will be compared to the value of SensorInput.dataRow, and the value of “classification” will be compared to value of SensorInput.category
Figure 3: Inference Elements
When a new inference element is added, an entry needs to be added in this map to connect it with input.
For example, if we add a new inferenceElement InferenceElement.foo, which corresponds to dataRow (i.e. the groundTruth value for foo will be contained in dataRow), you will need to add an entry:
{InferenceElement.foo : "dataRow"}
Because OPF Models make predictions about the future, the OPF needs to line up inferences with their respective ground truth values so that it can compute metrics and write results appropriately. For example, InferenceElement.prediction is a prediction about the next record. In order to compute error metrics, this inference needs to be shifted one record forward in time to be compared with its corresponding ground-truth record.
def getTemporalDelay(inferenceElement, key=None):
if inferenceElement in (InferenceElement.prediction,
InferenceElement.encodings):
return 1
if inferenceElement in (InferenceElement.anomalyScore,
InferenceElement.classification,
InferenceElement.classConfidences):
return 0
if inferenceElement in (InferenceElement.multiStepPredictions,
InferenceElement.multiStepBestPredictions):
return int(key)
return 0
Snippet 2: The getTemporalDelay() method defines how inferences are shifted