Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Apollo usage reporting to include persisted query metrics #7166

Open
wants to merge 27 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d3daf8b
Add PQ ID to context
bonnici Apr 3, 2025
bf3cfcb
Move signature and apollo op id generation into function
bonnici Apr 3, 2025
587e45e
Refactor error handling
bonnici Apr 4, 2025
dd86724
Rename functions
bonnici Apr 4, 2025
4ea39c6
PQ reporting and fixes
bonnici Apr 4, 2025
d0a1ea8
Lint fix
bonnici Apr 4, 2025
84a6f40
Merge branch 'dev' into njm/P-416/pq-reporting
bonnici Apr 4, 2025
4038ace
Fix test
bonnici Apr 7, 2025
adf43bf
Store PQ ID in usagereporting
bonnici Apr 7, 2025
3fc8047
Merge branch 'dev' into njm/P-416/pq-reporting
bonnici Apr 7, 2025
24c66da
Fix lint - include operation name and sig only in EmptyPlan errors
bonnici Apr 7, 2025
d8513dc
Fixes
bonnici Apr 7, 2025
d372dcd
Add changset
bonnici Apr 7, 2025
9d817b6
Add tests
bonnici Apr 7, 2025
a96f8e9
lint fix
bonnici Apr 7, 2025
35a1fa3
Merge branch 'dev' into njm/P-416/pq-reporting
bonnici Apr 9, 2025
a31bfbb
PR feedback
bonnici Apr 9, 2025
6ddd998
Add missing dependency
bonnici Apr 9, 2025
097c583
Merge branch 'dev' into njm/P-416/pq-reporting
bonnici Apr 10, 2025
9743b81
Change UsageReporting to enum
bonnici Apr 10, 2025
aec51eb
Remove for_error func
bonnici Apr 10, 2025
433e9ec
Fix changeset
bonnici Apr 10, 2025
5a0e4a1
Small fixes
bonnici Apr 10, 2025
6dbb2b4
Updater with_pq_id to always require a pq id
bonnici Apr 11, 2025
2720134
Merge branch 'dev' into njm/P-416/pq-reporting
bonnici Apr 11, 2025
94d1265
Fix for PQ with multiple operation IDs
bonnici Apr 11, 2025
93702d5
More refactoring/fixing of op name/op sig/stats report key stuff
bonnici Apr 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changesets/feat_persisted_query_reporting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Brief but complete sentence that stands on its own ([PR #7166](https://github.com/apollographql/router/pull/7166))

Enables reporting of persisted query usage by PQ ID to Apollo.

By [@bonnici](https://github.com/bonnici) in https://github.com/apollographql/router/pull/7166
128 changes: 109 additions & 19 deletions apollo-router/src/apollo_studio_interop/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ use apollo_compiler::schema::ExtendedType;
use apollo_compiler::validation::Valid;
use serde::Deserialize;
use serde::Serialize;
use sha2::Digest;

use crate::json_ext::Object;
use crate::json_ext::Value as JsonValue;
Expand Down Expand Up @@ -165,15 +166,97 @@ impl AddAssign<ReferencedEnums> for AggregatedExtendedReferenceStats {
#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone)]
#[serde(rename_all = "camelCase")]
pub(crate) struct UsageReporting {
/// The `stats_report_key` is a unique identifier derived from schema and query.
/// Metric data sent to Studio must be aggregated
/// via grouped key of (`client_name`, `client_version`, `stats_report_key`).
pub(crate) stats_report_key: String,
/// The operation name, or None if there is no operation name
pub(crate) operation_name: Option<String>,
/// The normalized operation signature, or None if there is no valid signature
pub(crate) operation_signature: Option<String>,
/// The error key to use for the stats report, or None if there is no error
pub(crate) error_key: Option<String>,
/// The persisted query ID used to request this operation, or None if the query was not requested via PQ ID
pub(crate) pq_id: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be crazy to split this into an enum? I feel like we're already splitting logic a lot here in the for_error case.

enum UsageReporting {
  ERROR{key: String},
  OPERATION{name: String, signature: String, pq_id: Option<String>}
}

Copy link
Contributor Author

@bonnici bonnici Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started going down this route but it makes code a bit too complicated. Also there is a possible future where we'll want to be able to have an error that still has referenced fields by type, or maybe an operation name/signature (via query metadata). We'd need to update the reporting code, but if we do that we'd definitely need to change back from an enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'm surprised the code ends up more complex that way. Every time I saw a for_error() in your code I was thinking "that'd look nice as a UsageReporting::Error() instead". And I'd imagine the internal logic is basically just doing match instead of the if/else on what attrs exist in the current code.

Understood on the possible future state -- In my head it'd be easy to add those init params if we need them to the error enum, but I haven't worked with enums in practice so I'll trust your experience on all this 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another go and I think I figured it out

/// a list of all types and fields referenced in the query
#[serde(default)]
pub(crate) referenced_fields_by_type: HashMap<String, ReferencedFieldsForType>,
}

impl UsageReporting {
pub(crate) fn for_error(error_key: String) -> UsageReporting {
UsageReporting {
operation_name: None,
operation_signature: None,
error_key: Some(error_key),
pq_id: None,
referenced_fields_by_type: HashMap::new(),
}
}

pub(crate) fn with_pq_id(&self, pq_id: Option<String>) -> UsageReporting {
let mut updated_usage_report = self.clone();
updated_usage_report.pq_id = pq_id;
updated_usage_report
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a bit cleaner with struct update syntax:

pub(crate) fn with_pq_id(&self, pq_id: Option<String>) -> UsageReporting {  
  UsageReporting {
    pq_id,
    ..self.clone()
  }
}


/// The `stats_report_key` is a unique identifier derived from schema and query.
/// Metric data sent to Studio must be aggregated
/// via grouped key of (`client_name`, `client_version`, `stats_report_key`).
/// For errors, the report key is of the form "## <error name>\n".
/// For operations requested by PQ, the report key is of the form "pq# <pq id>".
/// For operations not requested by PQ, the report key is of the form "# <op name>\n<op sig>".
/// Note that this combination of operation name and signature is sometimes referred to in code as
/// "operation signature" even though it also contains the operation name as the first line.
pub(crate) fn get_stats_report_key(&self) -> String {
if let Some(error_key) = &self.error_key {
format!("## {}\n", error_key)
} else if let Some(persisted_query_id) = &self.pq_id {
format!("pq# {}", persisted_query_id)
} else {
let stats_report_key_op_name =
self.operation_name.as_deref().unwrap_or("-").to_string();
format!(
"# {}\n{}",
stats_report_key_op_name,
self.get_operation_signature()
)
}
}

pub(crate) fn get_operation_signature(&self) -> String {
self.operation_signature
.as_deref()
.unwrap_or("")
.to_string()
}

pub(crate) fn get_operation_id(&self) -> String {
let sig_to_hash = if self.error_key.is_some() {
// To match the logic of Apollo's error key handling, we need to handle error keys specially.
// The correct string to hash in this case is e.g. "# # GraphQLParseFailure\n".
format!("# # {}\n", self.error_key.as_ref().unwrap())
} else {
self.get_stats_report_key()
};

let mut hasher = sha1::Sha1::new();
hasher.update(sig_to_hash.as_bytes());
let result = hasher.finalize();
hex::encode(result)
}

pub(crate) fn get_operation_name(&self) -> String {
if let Some(op_name) = &self.operation_name {
op_name.clone()
} else if let Some(err_key) = &self.error_key {
format!("# {}", err_key)
} else {
"".to_string()
}
}

pub(crate) fn is_error(&self) -> bool {
self.error_key.is_some()
}
}

/// A list of fields that will be resolved for a given type
#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone)]
#[serde(rename_all = "camelCase")]
Expand All @@ -186,9 +269,10 @@ pub(crate) struct ReferencedFieldsForType {
pub(crate) is_interface: bool,
}

/// Generate a UsageReporting containing the stats_report_key (a normalized version of the operation signature)
/// and referenced fields of an operation. The document used to generate the signature and for the references can be
/// different to handle cases where the operation has been filtered, but we want to keep the same signature.
/// Generate a UsageReporting containing the data required to generate a stats_report_key (either a normalized version of
/// the operation signature or an error key or a PQ ID) and referenced fields of an operation. The document used to
/// generate the signature and for the references can be different to handle cases where the operation has been filtered,
/// but we want to keep the same signature.
pub(crate) fn generate_usage_reporting(
signature_doc: &ExecutableDocument,
references_doc: &ExecutableDocument,
Expand Down Expand Up @@ -362,12 +446,23 @@ struct UsageGenerator<'a> {
impl UsageGenerator<'_> {
fn generate_usage_reporting(&mut self) -> UsageReporting {
UsageReporting {
stats_report_key: self.generate_stats_report_key(),
operation_name: self.get_operation_name(),
operation_signature: self.generate_normalized_signature(),
error_key: None,
pq_id: None,
referenced_fields_by_type: self.generate_apollo_reporting_refs(),
}
}

fn generate_stats_report_key(&mut self) -> String {
fn get_operation_name(&self) -> Option<String> {
self.signature_doc
.operations
.get(self.operation_name.as_deref())
.ok()
.and_then(|operation| operation.name.as_ref().map(|node| node.to_string()))
}

fn generate_normalized_signature(&mut self) -> Option<String> {
self.fragments_map.clear();

match self
Expand All @@ -376,10 +471,10 @@ impl UsageGenerator<'_> {
.get(self.operation_name.as_deref())
.ok()
{
None => "".to_string(),
None => None,
Some(operation) => {
self.extract_signature_fragments(&operation.selection_set);
self.format_operation_for_report(operation)
Some(self.format_operation_signature_for_report(operation))
}
}
}
Expand Down Expand Up @@ -410,15 +505,10 @@ impl UsageGenerator<'_> {
}
}

fn format_operation_for_report(&self, operation: &Node<Operation>) -> String {
// The result in the name of the operation
let op_name = match &operation.name {
None => "-".into(),
Some(node) => node.to_string(),
};
let mut result = format!("# {}\n", op_name);
fn format_operation_signature_for_report(&self, operation: &Node<Operation>) -> String {
let mut result = String::new();

// Followed by a sorted list of fragments
// The signature starts with a sorted list of fragments
let mut sorted_fragments: Vec<_> = self.fragments_map.iter().collect();
sorted_fragments.sort_by_key(|&(k, _)| k);

Expand Down
114 changes: 113 additions & 1 deletion apollo-router/src/apollo_studio_interop/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use super::*;
use crate::Configuration;

fn assert_expected_signature(actual: &UsageReporting, expected_sig: &str) {
assert_eq!(actual.stats_report_key, expected_sig);
assert_eq!(actual.get_stats_report_key(), expected_sig);
}

macro_rules! assert_extended_references {
Expand Down Expand Up @@ -300,3 +300,115 @@ async fn test_enums_from_response_fragments() {
let generated = enums_from_response(query_str, op_name, schema_str, response_str);
assert_enums_from_response!(&generated);
}

#[test]
fn apollo_operation_id_hash() {
let usage_reporting = UsageReporting {
operation_name: Some("IgnitionMeQuery".to_string()),
operation_signature: Some("query IgnitionMeQuery{me{id}}".to_string()),
error_key: None,
pq_id: None,
referenced_fields_by_type: HashMap::new(),
};

assert_eq!(
"d1554552698157b05c2a462827fb4367a4548ee5",
usage_reporting.get_operation_id()
);
}

// The Apollo operation ID hash for these errors is based on a slightly different string. E.g. instead of hashing
// "## GraphQLValidationFailure\n" we should hash "# # GraphQLValidationFailure".
#[test]
fn apollo_error_operation_id_hash() {
assert_eq!(
"ea4f152696abedca148b016d72df48842b713697",
UsageReporting::for_error("GraphQLValidationFailure".into()).get_operation_id()
);
assert_eq!(
"3f410834f13153f401ffe73f7e454aa500d10bf7",
UsageReporting::for_error("GraphQLParseFailure".into()).get_operation_id()
);
assert_eq!(
"7486043da2085fed407d942508a572ef88dc8120",
UsageReporting::for_error("GraphQLUnknownOperationName".into()).get_operation_id()
);
}

#[test]
fn test_get_stats_report_key() {
let usage_reporting_for_errors = UsageReporting::for_error("GraphQLParseFailure".into());
assert_eq!(
"## GraphQLParseFailure\n",
usage_reporting_for_errors.get_stats_report_key()
);

let usage_reporting_for_pq = UsageReporting {
operation_name: Some("SomeQuery".into()),
operation_signature: Some("query SomeQuery{thing{id}}".into()),
error_key: None,
pq_id: Some("SomePqId".into()),
referenced_fields_by_type: HashMap::new(),
};
assert_eq!(
"pq# SomePqId",
usage_reporting_for_pq.get_stats_report_key()
);

let usage_reporting_for_named_operation = UsageReporting {
operation_name: Some("SomeQuery".into()),
operation_signature: Some("query SomeQuery{thing{id}}".into()),
error_key: None,
pq_id: None,
referenced_fields_by_type: HashMap::new(),
};
assert_eq!(
"# SomeQuery\nquery SomeQuery{thing{id}}",
usage_reporting_for_named_operation.get_stats_report_key()
);

let usage_reporting_for_unnamed_operation = UsageReporting {
operation_name: None,
operation_signature: Some("query{thing{id}}".into()),
error_key: None,
pq_id: None,
referenced_fields_by_type: HashMap::new(),
};
assert_eq!(
"# -\nquery{thing{id}}",
usage_reporting_for_unnamed_operation.get_stats_report_key()
);
}

#[test]
fn test_get_operation_name() {
let usage_reporting_for_errors = UsageReporting::for_error("GraphQLParseFailure".into());
assert_eq!(
"# GraphQLParseFailure",
usage_reporting_for_errors.get_operation_name()
);

let usage_reporting_for_named_operation = UsageReporting {
operation_name: Some("SomeQuery".into()),
operation_signature: Some("query SomeQuery{thing{id}}".into()),
error_key: None,
pq_id: None,
referenced_fields_by_type: HashMap::new(),
};
assert_eq!(
"SomeQuery",
usage_reporting_for_named_operation.get_operation_name()
);

let usage_reporting_for_unnamed_operation = UsageReporting {
operation_name: None,
operation_signature: Some("query{thing{id}}".into()),
error_key: None,
pq_id: None,
referenced_fields_by_type: HashMap::new(),
};
assert_eq!(
"",
usage_reporting_for_unnamed_operation.get_operation_name()
);
}
10 changes: 4 additions & 6 deletions apollo-router/src/error.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
//! Router errors.
use std::collections::HashMap;
use std::ops::Deref;
use std::sync::Arc;

Expand Down Expand Up @@ -271,7 +270,7 @@ pub(crate) enum QueryPlannerError {
JoinError(String),

/// empty query plan. This behavior is unexpected and we suggest opening an issue to apollographql/router with a reproduction.
EmptyPlan(UsageReporting), // usage_reporting_signature
EmptyPlan(String), // usage_reporting stats_report_key

/// unhandled planner result
UnhandledPlannerResult,
Expand Down Expand Up @@ -434,10 +433,9 @@ impl IntoGraphQLErrors for QueryPlannerError {
impl QueryPlannerError {
pub(crate) fn usage_reporting(&self) -> Option<UsageReporting> {
match self {
QueryPlannerError::SpecError(e) => Some(UsageReporting {
stats_report_key: e.get_error_key().to_string(),
referenced_fields_by_type: HashMap::new(),
}),
QueryPlannerError::SpecError(e) => {
Some(UsageReporting::for_error(e.get_error_key().to_string()))
}
_ => None,
}
}
Expand Down
41 changes: 13 additions & 28 deletions apollo-router/src/metrics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,6 @@ use crate::plugins::telemetry::CLIENT_VERSION;
use crate::plugins::telemetry::apollo::ErrorsConfiguration;
use crate::plugins::telemetry::apollo::ExtendedErrorMetricsMode;
use crate::query_planner::APOLLO_OPERATION_ID;
use crate::query_planner::stats_report_key_hash;
use crate::spec::GRAPHQL_PARSE_FAILURE_ERROR_KEY;
use crate::spec::GRAPHQL_UNKNOWN_OPERATION_NAME_ERROR_KEY;
use crate::spec::GRAPHQL_VALIDATION_FAILURE_ERROR_KEY;

pub(crate) mod aggregation;
pub(crate) mod filter;
Expand Down Expand Up @@ -1297,30 +1293,19 @@ pub(crate) fn count_operation_errors(
let client_name = unwrap_context_string(CLIENT_NAME);
let client_version = unwrap_context_string(CLIENT_VERSION);

// Try to get operation ID from the stats report key if it's not in context (e.g. on parse/validation error)
if operation_id.is_empty() {
let maybe_stats_report_key = context.extensions().with_lock(|lock| {
lock.get::<Arc<UsageReporting>>()
.map(|u| u.stats_report_key.clone())
});
if let Some(stats_report_key) = maybe_stats_report_key {
operation_id = stats_report_key_hash(stats_report_key.as_str());

// If the operation name is empty, it's possible it's an error and we can populate the name by skipping the
// first character of the stats report key ("#") and the last newline character. E.g.
// "## GraphQLParseFailure\n" will turn into "# GraphQLParseFailure".
if operation_name.is_empty() {
operation_name = match stats_report_key.as_str() {
GRAPHQL_PARSE_FAILURE_ERROR_KEY
| GRAPHQL_UNKNOWN_OPERATION_NAME_ERROR_KEY
| GRAPHQL_VALIDATION_FAILURE_ERROR_KEY => stats_report_key
.chars()
.skip(1)
.take(stats_report_key.len() - 2)
.collect(),
_ => "".to_string(),
}
}
let maybe_usage_reporting = context
.extensions()
.with_lock(|lock| lock.get::<Arc<UsageReporting>>().cloned());

if let Some(usage_reporting) = maybe_usage_reporting {
// Try to get operation ID from usage reporting if it's not in context (e.g. on parse/validation error)
if operation_id.is_empty() {
operation_id = usage_reporting.get_operation_id();
}

// Also try to get operation name from usage reporting if it's not in context
if operation_name.is_empty() {
operation_name = usage_reporting.get_operation_name();
}
}

Expand Down
Loading