Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration context spec #335

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions specification/behaviors.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,8 @@ Logs](https://github.com/open-telemetry/opentelemetry-specification/tree/main/sp
The logs containing profiling data MUST be sent via OTLP. Instrumentation
libraries SHOULD reuse persistent OTLP connections from other signals (traces,
metrics).

## Trace State Interchange

See [integration_context.md](integration_context.md) for specifics about
exchanging additional context between AppD and splunk-otel based agents.
86 changes: 86 additions & 0 deletions specification/integration_context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<!-- markdownlint-configure-file
{
"MD013": {
"tables": false
}
}
-->

# Integration Context

**Status**: [Experimental](../README.md#versioning-and-status-of-the-specification)

In order to create a more complete picture of a distributed environment, we
propose the use of additional headers to pass extra context details between
Cisco observability components. Specifically, we are concerned with passing
information between OpenTelemetry-based Splunk distributions and AppDynamics
agents, in both directions.

# Headers

The following defines several new headers and describes their intended use.
All headers MUST be treated as optional -- peer services will not always
generate them.

* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe AppD prefix as it feels more unique than cisco?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like about cisco is that we are able to use the same prefix for data generated in AppD components and data generated in splunk-otel components. With this approach, the intention is to be able to say "Oh, I see this cisco-ctx-* header, I know exactly why it exists -- to facilitate integrated product experience.

* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application.
* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier.
* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT).
* `cisco-ctx-env` - Contains
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
resource value from an OpenTelemetry based component.
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.
Comment on lines +28 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use the "standardized/canonical" way of defining headers

Suggested change
* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.
* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application.
* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier.
* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT).
* `cisco-ctx-env` - Contains
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
resource value from an OpenTelemetry based component.
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.
* `Cisco-Ctx-Acc-ID` - Contains the ID of the AppDynamics account.
* `Cisco-Ctx-App-ID` - Contains the ID of the AppDynamics application.
* `Cisco-Ctx-Tier-ID` - Contains the ID of the AppDynamics tier.
* `Cisco-Ctx-BT-ID` - Contains the ID of the AppDynamics business transaction (BT).
* `Cisco-Ctx-Env` - Contains
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
resource value from an OpenTelemetry based component.
* `Cisco-Ctx-Service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matter much -- but my purely aesthetic preference is to use lowercase. Most libraries/frameworks standardize on lowercase anyway I believe.

Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most libraries/frameworks standardize on lowercase anyway I believe.

I am not sure. At least, this is not true for Go

I don't think it matter much

It does not, but this is how most RFC and docs like https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers follow. But maybe it just me who is not used to lowercase header field names. I won't fight for it, but sometime the difference in "casing" makes it easier for me to "pattern-match" the "type". E.g. when I see SPLUNK_IS_COOL I see an env var 😆

Feel free to do whatever you want. I do not want to end up with an academical discussion 😉


HTTP headers are capable of being multivalued. As such, implementations
Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also refer to https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.2:

A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).

SHOULD use the _last_ value when the above headers contain multiple values.

## Splunk OpenTelemetry distributions
Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to describe how this can be implemented using OTel components (via propagator and processor) based on signalfx/splunk-otel-java#2198.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

How would one tell that context is being propagated to an app that is using appd? There is an open spec issue for controlling context propagation boundary. I hope that none of these headers are considered sensitive as limiting where they are propagated will be complicated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurit raises a good point that a customer might want the control of this behavior to be (optionally) more granular - e.g. cisco.ctx.enabledwould be the preferred way, but also cisco.ctx.enabled.inbound=true/false and cisco.ctx.enabled.outbound as more specific toggles. (If so, further spec issue to settle - what if the "combination" enabled is set and a more specific one is also set - which takes precedence?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For outbound injection, instrumentation is obviously not going to be able to tell what it's talking to. For security/leakage concerns I've suggested more granular behavior controls, but the fallback could also be telling the user to configure a network proxy/firewall/etc. for their needs.

This feature (appd<->splunk context sharing) needs to be off by default in splunk-otel per PM, who I'm sure would be happy to share reasoning in separate channels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

No, this was not the intention. What phrasing leads you to believe this?

How would one tell that context is being propagated to an app that is using appd?

You can't and shouldn't need to. The headers shouldn't contain sensitive information. Users who choose to put sensitive information in any of these fields (service name, environment, appd IDs, whatever) shouldn't use this feature.

Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What phrasing leads you to believe this?

This:

See integration_context.md for specifics about
exchanging additional context between AppD and splunk-otel based agents.

Threat 1: Information disclosure
If I understand correctly the current design sends splunk-otel context to everyone (via outgoing requests).

Thread 2: Spoofing, Tampering?
The other problem is that the incoming requests are also not verifying who send the data (if this is actually coming from AppD). A malicious actor can send cisco-ctx-acct-id headers which would be stored in spans.

Are we planning to do a Thread Modeling and store somewhere its outcomes?

Given it is an opt-in feature I think that documenting the side-effects for the users might be good enough...


This section applies to Splunk distributions of OpenTelemetry instrumentation
components. In brief, Splunk OTel distributions will need to understand when
spans are being created as part of an AppDynamics "Business Transaction" (BT),
and will add span attributes to clarify the AppDynamics invocation context.

### Configuration

Splunk OTel distributions MUST provide a way for users to opt-into the
consumption and propagation of additional Cisco (bespoke) integration context.

The configuration SHOULD be named `cisco.ctx.enabled` or an idiomatic
equivalent for each language. If using environment variables, the
configuration SHOULD be named `CISCO_CTX_ENABLED`. The default value
MUST be `false` or equivalent language-specific non-truthy value.

## Incoming State

When `cisco.ctx.enabled` is `true`, Splunk implementations MUST
extract fields from the `cisco-ctx-*` headers (above) and add extra
attributes to any Spans created as part of the incoming request context.
Null or missing values MUST be handled gracefully by simply
omitting the span attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, we should create POC for this.
I do no like the idea of adding new attributes to every span. Why not just to the first span in the application? Other calculation could be done by the standard trace-id, span-id and the hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be on every span for that request. It is more concise that way.

You also have to allow for a scenario where a call enters the app, leaves the app, and comes back in from a different tier. Multiple tiers possibly for the same trace from different tiers.

You said that you don't like adding attributes, but you didn't say why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.


| header | span attribute | description | example |
|---------------------|----------------------------|-------------------------------------------------------------------------|-----------------------------------|
| `cisco-ctx-acct-id` | `appd.upstream.account_id` | The AppDynamics account ID of the upstream component making the request | 65230, 10018b |
| `cisco-ctx-app-id` | `appd.upstream.app_id` | The ID of the AppDynamics instrumented application. | 0293845, destination-factory-9000 |
| `cisco-ctx-bt-id` | `appd.upstream.bt_id` | The ID of the upstream AppDynamics business transaction (BT) | 209834098273 |
| `cisco-ctx-tier-id` | `appd.upstream.tier_id` | The "tier id" to which the AppDynamics instrumented application belongs | 12, xdev.tier9 |

## Outgoing State

When `cisco.ctx.enabled` is `true`, Splunk implementations MUST
generate extra headers that contain certain values obtained from the
OTel Resource:

| resource attribute | outbound header | description | example |
|-------------------------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| `deployment.environment.name` | `cisco-ctx-env` | The name of the OpenTelemetry deployment environment ([spec](https://github.com/open-telemetry/semantic-conventions/blob/4f77620fe731c10d40f7d50c543d4e5c73a46ebf/docs/attributes-registry/deployment.md#deployment-environment-name)) | production, staging |
| `service.name` | `cisco-ctx-service` | The name of the OpenTelemetry service ([spec](https://github.com/open-telemetry/semantic-conventions/blob/4f77620fe731c10d40f7d50c543d4e5c73a46ebf/docs/attributes-registry/service.md#service-name)) | checkout, cart |

All other resource attributes SHOULD be ignored and not placed into
any headers, unless required elsewhere.

# AppDynamics agents

TBD