You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We propose adding a dependsOn attribute to the Input Port Object definition to capture relationships between different data products, enabling data product lineage tracking.
Motivation
To map data lineage between source systems and data products, it is essential to define the external dependencies of each input port. Specifically, an input port can consume data from an output port of another data product or from an external system.
Note: an input port cannot consume from multiple external sources (systems or other output ports). If a data product needs to consume data from multiple external sources, it must declare multiple input ports.
Design and examples
We define a field for input ports to specify where they consume data from. The reference to the component on which the port depends is made using the fullyQualifiedName. The fqn used should allow differentiation between an outputPort of other data products and generic external systems.
We call the field dependsOn because it indicates a dependency between interfaces (i.e., components of the same type) that, if unmet, prevents the creation of the port and, consequently, the entire product. The dependsOn field, therefore, always has the same meaning regardless of the component on which it is defined.
See the row that describes the new attribute dependsOn...
Field Name
Type
Description
id
string:uuid
(READONLY) It's an UUID version 5 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, any way to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualifiedName MUST be always used. Example: "id": "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName
string:fqn
(READONLY). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name}. Example: "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC".
entityType
string:alphanumeric
(READONLY) The type of the entity. It MUST be a constant value equal to inputport.
name
string:name
(REQUIRED) The name of the port. It MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC".
version
string:version
(REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented.
displayName
string
The human readable name of the port. It SHOULD be used by the frontend tool to visualize the port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product.
description
string
The port descripion. CommonMark syntax MAY be used for rich text representation.
dependsOn
[string:fqn]
The list of output ports or external systems from which this input port receives data. Each input port SHOULD read from only one output port or external system, so this array SHOULD always have a length of 1. The reference to the output port or external system from which this port receives data must be specified using its fullyQualifiedName.
componentGroup
string:name
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub-module can be used as a base for creating reusable templates.
The following is an example of an input port receiving data from an upstream data product
{
"fullyQualifiedName": "urn:dpds:com.company-xyz:dataproducts:downstreamProduct:1:inputports:inputRawData",
"name": "inputRawData",
"displayName": "Input Raw Data",
"description": "The input port that reads raw data exposed by the upstreamProduct ",
"version": "1.2.0",
"dependsOn": ["urn:dpds:com.company-xyz:dataproducts:upstreamProduct:1:outputports:outputRawData"]
}
The following is an example of an input port receiving data from an upstream external system
{
"fullyQualifiedName": "urn:dpds:com.company-xyz:dataproducts:downstreamProduct:1:inputports:inputRawData",
"name": "inputRawData",
"displayName": "Input Raw Data",
"description": "The input port that ingests data from Salesforce",
"version": "1.2.0",
"dependsOn": ["urn:dpds:com.company-xyz:systems:salesforce"]
}
Alternatives
Using consumeTo in place of dependsOn would make little sense, as it would not apply in cases where the port reads data in a push rather than a pop manner from the source.
Decision
We have decided to make the modification described in this RFC in version 1.1.0 of the specification.
Consequences
To date, all ports have the same set of attributes. dependsOn would be the first attribute defined specifically for a particular type of port.
References
NA
The text was updated successfully, but these errors were encountered:
Data Product Lineage
Champion: @andrea-gioia
Summary
We propose adding a
dependsOn
attribute to the Input Port Object definition to capture relationships between different data products, enabling data product lineage tracking.Motivation
To map data lineage between source systems and data products, it is essential to define the external dependencies of each input port. Specifically, an input port can consume data from an output port of another data product or from an external system.
Note: an input port cannot consume from multiple external sources (systems or other output ports). If a data product needs to consume data from multiple external sources, it must declare multiple input ports.
Design and examples
We define a field for input ports to specify where they consume data from. The reference to the component on which the port depends is made using the
fullyQualifiedName
. The fqn used should allow differentiation between an outputPort of other data products and generic external systems.We call the field
dependsOn
because it indicates a dependency between interfaces (i.e., components of the same type) that, if unmet, prevents the creation of the port and, consequently, the entire product. The dependsOn field, therefore, always has the same meaning regardless of the component on which it is defined.See the row that describes the new attribute
dependsOn
...string:uuid
fullyQualifiedName
. It MAY be used when calling the API exposed by thedata product experience plane
to referentiate the port. Because thefullyQualifiedName
is globally unique also theid
is globally unique, any way to referentiate the data product when calling API different from the ones exposed by thedata product experience plane
the port'sfullyQualifiedName
MUST be always used. Example:"id": "3235744b-8d2e-57b5-afba-f66862cc6a21"
string:fqn
urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name}
. Example:"fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC"
.string:alphanumeric
inputport
.string:name
"name: "tmsTripCDC"
.string:version
string
name
property. It's RECOMMENDED to not use the samedisplayName
for different input ports belonging to the same data product.string
string:fqn
]fullyQualifiedName
.string:name
string
]The following is an example of an input port receiving data from an upstream data product
The following is an example of an input port receiving data from an upstream external system
Alternatives
Using
consumeTo
in place ofdependsOn
would make little sense, as it would not apply in cases where the port reads data in a push rather than a pop manner from the source.Decision
We have decided to make the modification described in this RFC in version 1.1.0 of the specification.
Consequences
To date, all ports have the same set of attributes.
dependsOn
would be the first attribute defined specifically for a particular type of port.References
NA
The text was updated successfully, but these errors were encountered: