Skip to content

Latest commit

 

History

History
826 lines (412 loc) · 45.5 KB

glossary.md

File metadata and controls

826 lines (412 loc) · 45.5 KB

Glossary

A

Abstract

An abstract is a brief summary of any in-depth analysis of a particular subject used to help the reader quickly ascertain results.

Artificial Intelligence (AI)

Any machine device (intelligence agent) that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. The term is applied when a machine mimics cognitive functions associated with human mental processes, such as learning or software that is able to perform tasks that usually require human-level intelligence.

Auto-Suggestion

A platform recommended search term, data set or visualization, proactively and dynamically displayed to the user based on the query they are entering or the data they are currently working with in their Notebook.

B

Bar Chart

A chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.

Bot

A Bot is a Knowledge Microservice that provides specialized queries and mutations that perform BotActions, which allow the bot to provide asynchronous status updates.

Bot Actions

Knowledge Services can become Knowledge Bots by offering and using Bot Actions, which allows the service to interact more directly to users. Users can configure, start, stop, and schedule Bot Actions. Services can report on status and progress for potentially long-running, asynchronous operations, as well as report any errors or messages.

Business Question

Proposal: A question that, if successfully and correctly answered, will provide value to the business unit. Answering the Business Questions should be the driving force behind the development of Knowledge Applications. It is also generally the high level starting point for the Problem Decomposition process.

C

Candidate

Candidate Join

A potential Relational Join or Common Key Join discovered by the Join Discovery process that is yet to be accepted or rejected by a user. Candidate Joins are ignored by the platform until accepted or rejected by the user.

Candidate Key

A potential Primary Key or Foreign Key discovered by the Key Discovery process and used by the platform as an input to the Join Discovery process.

Candidate Kind

A potential Kind discovered by the Kind Discovery process that is yet to be accepted or rejected by a user. Candidate Kinds are ignored by the platform until accepted or rejected by the user.

Candidate Kind Mapping

A potential Kind Mapping discovered by the Kind Mapping Discovery process that is yet to be accepted or rejected by a user. Candidate Kind Mappings are ignored by the platform until accepted or rejected by the user.

Canonicalization

The process of converting indexed Field or Property values that have more than one possible representation into a standard, normal, or canonical form.

CKG

The Maana Computational Knowledge Graph™ is a network of models that provides enhanced self-service capabilities to subject-matter experts, enabling them to turn their domain expertise and data from across silos into digital knowledge to make better and faster decisions.

This unique technology adds a knowledge layer over complex enterprise and industrial data, eliminates the need to move data, and enables the creation of re-usable models across the enterprise.

Class

Classes are probabilistic semantic labels assigned to Schema Fields, Query Node Fields, and Kind Properties that describe the likelihood that the data in a Field or Property is of a given class (e.g. PersonName, CompanyName, Year, Latitude, etc.). Classes are used to enable the discovery of Kind Mappings between Kinds and Schema, drive the automated selection of visualizations when summarizing Record and Entity data, and as a means for the user to scope a Query.

Classifier

Code extensions that run inside the Maana framework. Its outputs are Classes that are associated with Schema Fields(field classifiers) or modifications of the ESG and/or indices. Classifiers are a type of Miner.

Command Line Interface

Command line tool for common GraphQL development workflows.

Common Key Join

A type of Join between two Fields of different Schema, signifying the existence of a significantly common set of values in the data indexed against those Fields.

Composite Key

A Key that consists of the values from multiple Fields in a single Schema.

Computational Model

Simulates and studies the behavior of complex systems using mathematics, physics and computer science. A computational model contains numerous variables used to characterize the system. (math of the classifiers and the answer service) at each stage in the data flow diagram.

Compute Connector

Software that connects the Maana Knowledge Platform to compute sources that are either internal or external to the Maana Knowledge Platforming order to enable the Maana Knowledge Platform to execute logic in different computational environments (e.g., SAQ, Spark, TensorFlow, and gridMathematica).

Concept

A type of thing (aka kind) e.g. Person, Invoice, Problem Resolution, Well, Event.

Concept Knowledge Graph (CKG)

A network of models built using machine learning techniques and artificial intelligence that powers AI-driven applications used to digitize decision support and operations.

Concept Knowledge Model

definition?

Concrete Kind

definition?

Crawl

The reading of data from its source (e.g. file or database), creating the ESG elements that represent the structure of the source data (e.g. Namespaces, Schemas, Fields), and writing the source values along with their Type to the MID Index.

CRUD

Create, read, update, and delete* are the four basic functions of persistent storage. Alternate words are sometimes used when defining the four basic functions of CRUD, such as retrieve instead of read, modify instead of update, or destroy instead of delete. CRUD is also sometimes used to describe user interface conventions that facilitate viewing, searching, and changing information; often using computer-based forms and reports.

Curation

Any action the user takes that changes the original crawled data, e.g. renaming a Schema or Field, refining Field values, defining and mapping Kinds.

Customer

The counter-party to the license agreement with Maana.

Customer Compute Connector

A Compute Connector developed by or for Customer, including Compute Connectors developed by Maana for Customer under a work for hire agreement pursuant to which Maana assigns such Compute Connectors to Customer. Customer Compute Connectors will not execute or operate with or on any platform other than the Maana Software.

Customer Data

All Customer data from any source that is provided by or on behalf of Customer to Maana for Maana’s analysis and/or development of Models using the Maana Software.

Customer Data Connector

A Data Connector developed by or for Customer - including Data Connectors developed by Maana for Customer under a work for hire agreement pursuant to which Maana assigns such Data Connectors to Customer. Customer Data Connectors will not execute or operate with or on any platform other than the Maana Software.

Customer Knowledge Application

A Knowledge Application developed by or for Customer, including Knowledge Applications developed by Maana for Customer under a work for hire agreement pursuant to which Maana assigns such Knowledge Applications to Customer. Customer Knowledge Applications will not execute or operate with or on any platform other than the Maana Software.

Customer Line-of-Business Applications

A Customer’s third-party or internally developed application software (e.g., Oracle, SalesForce, or an internally developed database), that accesses the features and functions of the Maana Software through a Maana application programing interface. For clarity, the Maana Software is not a Customer Line of Business Application.

Customer Model

means a Model developed by or for Customer and derived from Customer Data, including Models developed by Maana for Customer under a work for hire agreement pursuant to which Maana assigns such Models to Customer. Customer Models will not execute or operate with or on any platform other than the Maana Software.

Customer Solutions Artifacts

Collection of documentation to show how the solution was completed on Maana

Customer UI

A customized user interface developed by or for Customer, including user interfaces developed by Maana for Customer under a work for hire agreement pursuant to which Maana assigns such user interface to Customer. Customer UIs will not execute or operate with or on any platform other than the Maana Software.

D =

Data Connector

The software that connects the Maana Knowledge Platform to data sources in order to import, export, or convert one data format to another data format.

Data Flow Diagram

How the whole solution fits together and the boundaries between Maana and external systems. Also highlights the custom classifiers and answer services needed.

Data Lake

Organizations bring together data from all aspects of their business to create strategic data assets called Data Lakes. This data is generally on a massive scale (big data), stored in a wide-variety of formats but is not in a form optimized for discovery and consumption.

Data Science

Data science uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms - both structured and unstructured.

Data Stream

A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities

Data Transformation

The process of converting data or information from one format or structure to another format or structure, usually from the format of a source system into the required format of a new destination system.

Dependent Variable

SYN: Outcome (business, laymen), Response (ML) Dependent variables are the observed outcome of a random process (also sometimes called an experiment). Heuristically, it's the thing you want to understand the causality of, predict, or otherwise explain. In ML only supervised learners have dependent variable.

Docker, Dockerize

Docker is a computer program that performs operating-system-level virtualization also known as containerization. It automates the deployment of code inside software containers. Sometimes referred to as "dockerizing" or "to dockerize".

Domain Model

A domain model represents the relevant concepts, their relevant properties (also concepts), and the relevant relationships between them in order to address the specific problem-questions under consideration. This is a user created sub-section of the Knowledge Graph that is made up of kinds and sources that they decide are important to model their domain. This will likely be the union (collected set) of kinds from a bunch of knowledge models, but is likely to be more than only what is in the KMs.

E

Elements

The items that make up Maana. See Knowledge Models, Domain Models, Kinds, Knowledge Graph, Presentation View, Join Assists and Similarity Assist.

Entity

A unique instance of a Kind with data from the indexes mapped into its Properties. Kinds are sets of unique Entities. Examples of entities related to the People concept: age, sex, height, weight, etc.

Entity Conflation

The process of merging two or more sets of data that represent a single Entity into a single unified set of data representing that Entity.

Entity Extraction

The process of reading a set of Records or Entities, generating a set of new Entities and indexing the new Entities directly against a Kind. Entity Extraction relies on Entity Resolution and potentially Entity Conflation. In the case where Entity Extraction is generating Entities from only part of a source Schema/Kind, the Entity Extraction process will also generate Entity References between the source Record / Entity and the newly created Entity.

Entity Reference

A named and indexed relationship between a source Record/Entity and an Entity.

Entity Resolution

The process of using Field or Property data believed to be a reference to an Entity (e.g. an id or name) and resolving it to a specific Entity. Often the goal will be to subsequently replace or augment the Field / Property that contains the Entity identifiers with an Entity Reference connecting the source Record/Entity to the resolved Entity. However, there may also the need to allow Entity Resolution simply to visualize the details of an Entity.

ESG ™ (Emergent Semantic Graph)

The ESG is a graph representation of the structure of crawled and indexed data along with any enrichments that have been made to the data either through human intervention (i.e. curation) or from the output of machine learning processes run across the data (i.e. classification and mining). Data is not stored in the ESG, only a representation of the data structure and metadata useful to the connection, navigation, and display of indexed data.

ETL (Extract, Transform, Load)

ETL is a general description of a process where some tool Extracts data from homogeneous or heterogeneous data sources, Transforms the data for storing it in proper format or structure for querying and analysis purposes and Loads it into the final target (e.g. database). Maana does not include ETL capabilities as part of the core platform. This process is assumed to be carried out external to the Maana platform, with the final Load step being the pushing of data into Maana through Maana’s API layer. The Load step in ETL is equivalent to the Crawl process in Maana terminology.

Example (statistics and ml)

An example is a single row or single complete tuple of information according to some defined set of variables in data table/ matrix/ array. SYN: Row (layman), Observation (statistician, business stats)

Events

Knowledge Services can emit events (act as publishers) or consume events, which trigger possible actions from subscribers.export. The platform allows a user to make a copy of data and/or structure contained in Maana and write it to an external data store such as a file or database.

F

Feature

A real valued vector representing a information bearing property of an input object, used for machine learning and statistical models. For example a featurizer might take a temperature in string form, in either Fahrenheit or Celsius, and covert it to a binary 1.0 if it is greater than 100C, or 0.0 if not. The vector may be dense or sparse. An example of a sparse feature vector is a map from words to counts of those word occurring in some body of text (known as a bag of words model).

Feature Model

A compact representation of Maana Software elements.

Featurization

The process of transforming data from its raw form into a form that useful for a machine learning algorithm. A featurization consists of an ordered collection of one or more Featurizers. If one or more of the Features in this collection outputs a sparse vector then the Featurization must also output a sparse vector.

Featurizer

A function f: X --> Rn, where X is a set of objects of some type. A featurizer is an implementation of a feature.

Field

When Maana Crawls a source of data, it models the structure of the source in the ESG as a Schema with a set of Fields. The Schema represents the top level container of data (e.g. database table, XML document, JSON object, CSV file), and the Fields are the substructure within that container (e.g. database table columns, XML elements, JSON properties, CSV of spreadsheet columns).

Field Analyzer

A process that runs across data indexed against a Field to generate statistics and probabilistic classifications about the Field. For example, a Field could be classified as a Person Name, Company Name, or Distance Measure. The Field Analyzer is extensible through the installation of purpose specific classifiers.

Flume

A distributed, reliable, and available service for efficiently moving large amounts of data as the data is produced. This release provides a scalable conduit to shipping data around a cluster and concentrates on reliable logging. The primary use case is as a logging system that gathers a set of log files on every machine in a cluster and aggregates them to a centralized persistent store such as HDFS.

Foreign Key

A Schema Field whose values represent the unique ID of Records indexed against a Field of a different Schema.

Full Outer Join

Produces the set of records that match in both tables according to a common key, and a set of records populated will null values where matches across tables do not exist.

Function

A function is a process or relationship that associates elements of a [set](https://en.wikipedia.org/wiki/Set_(mathematics) "Set (mathematics)"), (X- the domain of the function), to a single element y of another set, (Y - possibly of the same set) of the function. A function is uniquely represented by its graph.

Function Node

A special kind of ESG node that can be used in place of a Schema Field or Kind Property. The value of the Field or Property is calculated based on a user specified Function, allowing for values to be derived from other data in the Schema or Kind.

G

Graph QL

Command line tool for common GraphQL development workflows.

GraphQL Endpoint

A GraphQL Endpoint consists of types, queries, mutations and subscriptions.

H

HBase

provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Cloudera recommends installing HBase in a standalone mode before you try to run it on a whole cluster.

HCatalog

A tool that provides table data access for CDH components such as Pig and MapReduce.

Header

The area of Maana that exists at the top of the screen. It contains the Maana Logo, Main Menu, Tabs, Search Bar, Branding Logo, Customer first and last name, Customer Image

Hive

A powerful data warehousing application built on top of Hadoop which enables you to access your data using Hive QL, a language that is similar to SQL.

Hue

A graphical user interface to work with CDH. Hue aggregates several applications which are collected into a desktop-like environment and delivered as a Web application that requires no client installation by individual users.

I

Independent Variable

Independent variables are the set of variables assumed to explain some observed outcome. (It's important to note that correlation does not imply causation, so while changes in a set of variables may be associated with an observed change in a dependent variable, that DOES NOT mean that the changes in the set of predictors CAUSED the change in the outcome, only that they changed together in a quantifiable way.) Example: The price of gas (Y - the dependent variable) is determined by a set of X including Oil Supply and Oil Demand. In this example the independent variables are Oil Supply and Oil Demand.

Indexed Join

A Join where the instance level connections between the participating Schema Field values have been written to the Join Index. This improves the speed at which the query engine can resolve the Join and also allows for arbitrary mappings between instance values because no algorithm is needed to calculate the mapping at runtime.

Inner Join

Produces a set of records that match in both tables according to a common key.

Instance

A particular set of values for entities within a concept. example of a concept, or an element of the set of things described by a concept e.g. Maana is an instance of the concept Company. Jumanji is an instance of the concept Movie. Concept (kind) "Movie" is an instance of concept Kind

J

Join

A collective term for Common Key Joins and Relational Joins. Appending information from one set of information to another.

Join Assist

definition?

K

Key Discovery

A process that runs across Field data and assigns a probabilistic classification to the Field describing the platforms confidence in whether the Field is a Primary Key or Foreign Key.

Kinds

Kinds are collections of unique things, such as concepts people typically think about on a daily basis (e.g. people, movies, wells, etc.) Every business domain may have its own set of Kinds important to that domain. Kinds are user-defined and mapped onto the indexed data, which allows generating a set of meaningful entities out of raw, indexed data via Search.

Kind Mapping

How the entities in a Kind map onto a source Schema or a Query Node. Through multiple Kind Mappings, entities for a single Kind can be sourced from different Schemas and Query Nodes. Each Kind mapping for a Kind specifies how the entities are mapped to properties via specific property mappings.

Knowledge Application

A software application that addresses a Use Case when operating on and in connection with the Maana Software, is developed using the Maana Knowledge Platform, and contains one or more Models and a user interface.

Knowledge Assistant

definition?

Knowledge Bots

Maana Q includes several Knowledge Bots (microservices), which automate many of the computational modeling data scientists perform - such as field classification, entity recognition and supervised machine learning. The Bots accelerate building AI-driven knowledge applications that enhance both the quality and the speed with which day-to-day decisions are made in the enterprise.

Knowledge Graph

All the data associated with a Tenant. Combined with Maana’s proprietary AI algorithms, the knowledge graph expedites knowledge extraction from data silos, to reveal their relationships in the context of optimizing operations and decision flows.

Knowledge Model

This provides an answer to a question in a logical format. A Knowledge Model contains a Problem Question (PQ), which allows the mapping (function) of input concepts into an output concept, representing an answer to Problem Question.

L

Landing Page

The first screen/page displayed to a user after successfully logging into any software application. It may also be referred to as a "home page".

Left Outer Join

Produces the complete set of records from the "left" table (the first table specified, or primary table) with matching records from the right table where matches exist and null otherwise.

Line of Business (lob)

A general term that refers to a product or set of related products or services that serve a particular customer transaction or business need. A Knowledge Application is sometimes referred to as an LOB Application.

Liquid Index ™

The values of the data crawled and indexed are stored in the Liquid Indices –a collection of purpose specific indexes that map the values from the data to the ESG node representing the location where it was found. There are multiple types of indices, each responsible for dealing with a different type of data. There is one forward index called the MIDIndex, and multiple inverted indexes such as n-grams (strings up to 5 words), geospatial data, and temporal data.

Logical Data Model

Internal schema, kinds and relations.

Lozenge

see Kind.

M

Maana Compute Connector

A Compute Connector developed by or for Maana (but not including any Customer Compute Connector).[Note to Customer: The Maana Software currently includes several dozen Maana Compute Connectors being used by Maana customers.]

Maana Data Connector

A Data Connector developed by or for Maana (but not including any Customer Data Connector). [Note to Customer: The Maana Software currently includes several dozen Maana Data Connectors being used by Maana customers.]

Maana Knowledge Application

A Knowledge Application developed by or for Maana(but not including any Customer Knowledge Application). [Note to Customer: The Maana Software currently includes several dozen Maana Knowledge Applications being used by Maana customers.]

Maana Knowledge Modeling Process

Maana’s proprietary methodology and approach used in the development of Models and Knowledge Applications. For clarity, the Maana Knowledge Modeling Process is “how” Models and Knowledge Applications are developed. Maana has used the Maana Knowledge Modeling Process to develop Maana Models and Maana Knowledge Applications and Models and Knowledge Applications for customers and will use the Maana Knowledge Modeling Process to develop additional Maana Models and Maana Knowledge Applications and Models and Knowledge Applications for other customers in the future, including Models and Knowledge Applications that have the same or similar functionality to one or more Customer Models or Customer Knowledge Applications. Ownership: Maana owns the Maana Knowledge Modeling Process, including all modifications, improvements and derivative works thereof, and the Maana Knowledge Modeling Process is Maana Confidential Information.

Maana Model

A Model developed by or for Maana (but not including any Customer Model).[Note to Customer: The Maana Software currently includes several dozen Maana Models being used by Maana customers.]

Maana Software

(aka the “Maana Knowledge Platform”) Maana's proprietary analysis and development software platform and distributed storage, query, and inference system (aka the “Maana Knowledge Graph”)for asset and process optimization, including all (i) Maana proprietary algorithms used to evaluate one or more input values (e.g., a data point or element, a field, a record, a stream, a binary large object, etc.) and produce a Model (aka “Knowledge Assistants”), (ii) Maana Knowledge Applications, (iii) Maana Models, (iv) Maana Compute Connectors, and (v) Maana Data Connectors, and, in each case of (i) through (v) above, included by Maana in the Maana Knowledge Platform or provided to customers by Maana for use with the Maana Knowledge Platform. The term “Maana Software” also includes all Updates and Upgrades to any of the foregoing developed by or for Maana that Maana makes available to customers. Ownership: Maana owns the Maana Software, including all source code and object code thereof, and all modifications, improvements and derivative works thereof, and the Maana Software is Maana Confidential Information.

Machine Assist

A recurring interaction pattern in Maana, where the platform applies statistical and/or machine learning techniques to analyze the structure and content of data and suggest various curation opportunities to the user. This includes Key Discovery, Join Discovery, Kind Discovery, and Kind Mapping Discovery, as well as various data refinement activities.

Machine Learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" with data, without being explicitly programmed.

Machine Learning Models

The process of training an machine learning model involves providing an machine learning algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process.

Magnify Mode

One of two modes that canvas supports. It is the mode where a user would see an element at 100% the size of the canvas area (aka "In-Full"). The user would (See Workspace mode)

Mahout

A machine-learning tool. It enables the user to build machine-learning libraries that are scalable to “reasonably large” datasets, making the building of building intelligent applications easier and faster.

Materialization

The indexing of data directly against Kinds.

Materialized Entity

Materialized Kinds do not map onto a Schema, instead having consolidated data indexed directly against the Kind Properties (e.g. there is only one “Actor” Harrison Ford although 30 instances of Harrison Ford may exist across multiple movies in the raw data).

Materialized Kind

A Kind that does not map to an underlying Schema but instead has data indexed directly against its Properties. The result of Entity Extraction is a Materialized Kind, but it is also possible for data to be indexed directly against a Kind during crawl.

Microservices

An architectural style that structures an application as a collection of loosely-coupled, single-function services with well-defined interfaces. services are processes that communicate with each other over a network in order to fulfill a goal using technology-agnostic protocols such as HTTP. It enables the continuous delivery/deployment of large, complex applications. It also enables an organization to evolve its technology stack.

MID

Common abbreviation of Model ID. Model ID (MID):A structure used throughout the Maana platform to refer to an instance of indexed data. The MID consists of a PID and an ordered array of Keys. The PID identifies the ESG Node where the data belongs and Key array contains one Key for each segment of the ESG Path represented by the PID.

Miner

Code that uses the Maana APIs to process data. TODO –expand / refine this.

Model

Software that encodes mathematical relationships among, for example,(i) a set or sets of data(aka a “data model”), (ii) processes (aka a “process model”), (iii) domain knowledge (aka a “domain or knowledge model”), (iv) outcomes (aka a “predictive model”), and (v) matter and motion (aka, a “physics model”) that is developed using the Maana Knowledge Platform and addresses a Use Case when operating on and in connection with the Maana Knowledge Platform.

Modeling (Kinds)

Kind modeling helps relate raw indexed data to meaningful business concepts. In Maana, users model data by creating Kinds and mapping their Properties to indexed data (i.e. to Schema Fields). Once data has been modeled as Kinds, the associated data can more easily be surfaced in Search results.

Mutation

A GraphQL mutation performs operations such as CREATE, UPDATE and DELETE. Queries do not change data, a mutation does.

N

Named Entity Recognition (NER)

the location, classification, and extraction of ngrams or text elements that are expected to identify or refer to Entities. Entity Resolution would typically follow Named Entity Recognition to find the actual Entity referenced.

Named Relation

These allow users to create custom relationships between nodes in the ESG. These can be used by the user for any purpose, but will surface most visibly when Maana is displaying related data in search results. For example, for the two Kinds Employee and Company, the Named Relation might be “works for”.

Namespace

ESG nodes that allow users to create a hierarchical structure (like directories in a file system) into which Schemas and Kinds can be organized.

Node

See Kind.

Normalize (database users)

The process of minimizing data redundancy across data sets/ data tables/ or within a relational database

Normalize (statistics and ml)

The process of transforming continuous real variables such that they are centered and scaled by some measure of dispersion. The most common normalization is mean centered and variance scaled.

notebook

TODO

O

Oozie

A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs. A command line client is also available that allows remote administration and management of workflows within the Oozie server.

P

P-Value

Suppose an analyst has observed a random variable X, and wants to know if, on average x is different than some other value (for example -the mean of a different variable, zero, or some other fixed value - we'll call it target value). To answer that question we need the p - value. The p value tells us the probability that an observed set of values came from a distribution with a mean of the target value and we purely by cosmic chance observed a set of values where the mean was different. There are two critical points to remember about the p - value: 1. It's only meaningful in the context of a question comparing two values. 2. Smaller p values are better.

Page

A page is a area of the canvas that displays a slot at 100% or broken up into various combinations of 50% and 25% increments. There are two types of pages: Primary and Secondary.

Path

The fully qualified name of an ESG node representing where the node (Namespace, Schema, Field, Kind, Property, etc.) fits in the hierarchical naming structure.

PID

The ESG can be seen as a hierarchical namespace (like directories in a filesystem) where interior nodes are simply name disambiguators (e.g., /Maana/, /GE/, /Chevron/) and outer (but not leaf) nodes represent tables or objects from XML/JSON and leaf nodes represent fields, or columns. Any node, therefore, can be reached using a path expression which is simply a set of names separated by ’/’-s. These paths are internally represented using a 64-bit mathematical hash called a Path Identifier (PID).

Pig

Enables you to analyze large amounts of data using Pig’s query language called Pig Latin. Pig Latin queries run in a distributed way on a Hadoop cluster.

POC

Proof of concept.

Predictive Models

Uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modeling can be applied to any type of unknown event, regardless of when it occurred.

Prescriptive Model

Prescriptive analytics goes beyond descriptive and predictive models by recommending one or more courses of action -- and showing the likely outcome of each decision.

Presentation Assist

definition?

Presentation View

definition?

Primary Key

A Primary Key uniquely specifies a Record within a Schema.

Primary Page

Always page 1 on the Canvas. Use both Giant and Petite size slots.

Problem Question

The root node in a Knowledge Model. It typically uses the syntax of "Given X, what is Y?"

Process Model

Process models are processes of the same nature that are classified together into a model. Thus, a process model is a description of a process at the type level. Since the process model is at the type level, a process is an instantiation of it.

Property

Properties are the data fields that define Kinds. Properties of Kinds are mapped to Fields in the underlying Schema.

Provenance

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.

Publishers

Senders of messages in a publish–subscribe messaging pattern.

Pub-Sub Standard

Enables microservices to be aware of events and take action in reaction to those events. Publish–subscribe is a messaging pattern where senders of messages - called publishers - do not program the messages to be sent directly to specific receivers - called subscribers - but instead categorize published messages into classes without knowledge of which subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.

Q

Query Node

An Emergent Semantic Graph (ESG) node that contains a query that is executed whenever the data for the node is retrieved. Mapping a Kind to a Query Node (through a Kind Mapping) enables the creation of Kinds that span multiple Schemas and/or apply filters to the source Schema. Query Nodes should not be used if the output needs to be searchable, as Query Node results are temporary.

Queries

A GraphQL query performs the READ operation (in a GraphQL API). In contrast, a GraphQL mutation performs other operations as well, such as CREATE, UPDATE and DELETE. This simply means that queries don't change data, a mutation does.

R

Raw Data Kind

A raw data Kind represents files uploaded to Maana and their corresponding metadata.

Reconciliation Assist

The ability to reconcile any 2 Kinds or 2 Properties; outputs are: 3 things returned plus any required Kinds. TODO –expand definition

Record

A Record is an instance of a Schema with data mapped into its Fields.

Refinement

The act of cleaning up data and transforming it from one format to another. E.g. names in a table may be listed in various formats (First Last vs. Last, First) and require standardization to help improve usability.

Relations

Connections /dependencies that can be established between 2 kinds. They can be suggested automatically by the system when new files are loaded into MAANA or manually created by the user.

Related Kind

There are three ways for Kinds to be related to each other and surfaced to the user as a Related Kind: through a shared Property, through an indexed Join or through a Named Relation.

Relation

A relation or relationship, in the context of databases, is a situation that exists between two relational database tables when one table has a foreign key that references the primary key of the other table. Relationships allow relational databases to split and store data in different tables, while linking disparate data items.

Relational Join

A traditional relation join modeled in Maana, by creating a connection between a Field that represents the Foreign Key and a Field represents the Primary Key.

Result

means the product/output of analysis performed by the Maana Software on Customer Data, and any reports containing such analysis.

Runtime Join

A Join where the connections between Record Field Values is resolved on demand at query time, as opposed to being resolved and stored at Join creation time, which is the case with Indexed Joins. Runtime Joins may also be referred to as Dynamic Joins.

S

Scenario

Scenarios are variations of the more general Use Case documented related to specific user interactions in Maana. For example, two scenarios for Edit Class are: add Class to a single Field and add Class to multiple Fields.

Schema

The Schema is a view of how the source data was organized when presented to Maana for indexing. When Maana crawls data, it models the structure of the source in the ESG as a Schema with a set of Fields.

Schema View

When Maana crawls a source of data, such as a csv, mdb, or xml file, it models the structure of the file in the ESG as a Schema with a set of Fields. The structure is placed at a location in the namespace defined by the person crawling the data. The structure of Kinds by users is also stored in the ESG, isolated in a separate part of the namespace hierarchy.

Scrolling

A vertical or horizontal bar located on the far right or bottom of a page or slot that allows the user to move either the page or the slot's viewing area up, down, left, or right. Maana supports two types of scrolling: Page Scrolling and Slot Scrolling

Search

definition?

Search and Query (SAQ)

definition?

Similarity Assist

definition?

Similarity

Similarity is the measure of how much alike two data objects are. Similarity in a data mining context is usually described as a distance with dimensions representing features of the objects.

Slot

A container that appears on the Canvas that houses an element (KM, DM, PV, that is dragged and dropped to the canvas from the side panel or Search.

Slot Scrolling

The ability to scroll the contents of a slot up or down/left or right. This is supported everywhere a slot is supported (Workspace and Magnify Mode and all slot types and sizes).

Snappy

A compression/decompression library. For Maana V2.0, you do not need to install Snappy if you are already using the native library, but you do need to configure it; see Snappy Installation for more information.

Source Model

definition?

Sqoop

A tool that imports data from relational databases into Hadoop clusters. Using JDBC to interface with databases, Sqoop imports the contents of tables into a Hadoop Distributed File System (HDFS) and generates Java classes that enable users to interpret the table’s schema. Sqoop can also export records from HDFS to a relational database.

Sqoop 2

A server-based tool for transferring data between Hadoop and relational databases. You can use Sqoop 2 to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data with Hadoop MapReduce, and then export it back into an RDBMS.

Statistical Model

Statistical modeling is a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. The statistical model is the mathematical equation that is used.

Statistical Vizualizations+a169

definition?

String

String is any arbitrary length character sequence and is the most generic representation. A String could look like: “the quick brown fox jumped over the lazy dog”, “90210”,”CA”, “jill smith”, "TCO", "aosofiywj fq9h" , "49583736Q3004PIJG074", "mdkapufpau”

Structural Model

definition?

Subscribers

Receivers of messages in a publish–subscribe messaging pattern.

System Bus

A system bus is a single computer bus that connects the major components of a computer system, combining the functions of a data bus to carry information, an address bus to determine where it should be sent, and a control bus to determine its operation.

System Kind

The structural elements of the ESG depicted as Kinds when they are surfaced in Query results, allowing them to be used in the same way as user generated Kinds. System Kinds include, Schema, Field, Type, Class, Kind, Property, join, Entity Reference, Kind Reference, Named Relation.

T

Tab

is represents one of three things in Maana: The landing page (contains flag icon), a workspace (contains the name of the workspace) and a new tab, (plus sign) When there's more Workspaces on the page than can be displayed within the space provided, the plus sign icon will update to include a down arrow to show an extended menu.

Tenant

definition?

Training Example Pair

definition?

Tuple

Tuples are unordered sets of known values with names. In mathematics, a tuple is a finite ordered list (sequence) of elements.

Type

The set of fundamental data types in Maana are: Integer, Floating Point, String and Date/Time. Types are associated with Fields in the form of a probabilistic score that the Field is of the given Type based on the values that have been written to the Field. Types are also associated with Properties to identify the Type a Field must have to be considered for machine assisted mapping to that containing Kind.

Text

Text indicates that a string is expected to be some form of natural language and can be processed as such (entity/fact extraction, association network, word embeddings, etc.) Text is a stronger classification of a string (i.e., it has more meaning - "Paul Allen", is interpreted by the classifier as a person's name).

U

UAT (User Acceptance Testing)

User Acceptance Testing. Typically, the final testing phase in a software development process where software is tested by the intended audience for functionality.

Unnamed Relations

Two (or more) Kinds can become related when an implicit link between them is created when a Relational Join is formed between at least two Schema that have been mapped to those Kinds.

Use Case

The business problem being faced by Customer that has the potential to be addressed with the Maana Knowledge Platform.

These are user interactions in Maana (i.e. functionality), which include variations (called “scenarios”) on actions such as Browse, Edit, Add, Read, Delete, Search, etc.

UX Storyboard

The interaction between the customer and the core platform to find and filter information as well as optionally request ranked results from the custom answer service through custom UI

V

Values

A particular size, measure, number of an entity. Value example: 40.

View (Grid)

For immediate numbers of items or raw records, Maana will default to a familiar grid view, allowing the user to sort, search, and filter locally. Pagination may be implemented, in case large data sets are involved.

View (Item)

(formerly called Entity View) If the results represent a single entity, Maana will display Item cards, providing direct access to information about each item. This is a profile view.

View (Multi-item)

Same as Item view; however, the user will be able to easily select from a list of Items (carousel, list, etc.) one Item to view in detail.

View (Summary)

For larger result sets, Maana displays a Summary View, presenting a set of data visualizations representing the aggregate values across the result set and allowing the user to see trends and patterns in the data.

W

Whirr

Provides a fast way to run cloud services.

Workspace

A container that groups together collections of elements from the Knowledge Graph. A workspace is a dedicated place within the MAANA portal where a user can build, visualize and explore a computational knowledge graph.

Workspace Aode

One of two modes that the canvas supports. Workspace mode allows the customer to show and unlimited number of slots at a time.

X

Y

Z

Zookeeper

A highly reliable and available service that provides coordination between distributed processes.