gRPC: parse_query_response: Skip parsing empty Usage (#301)

daverigby · web-flow · commit 921e0b49aa40 · 2024-02-05T08:14:04.000-05:00
## Problem

When parsing the result of a gRPC query() call, we unconditionally
create a Usage model object, even if no usage information was returned
(e.g. non-serverless index).

## Solution

This adds a small but not insignificant cost to every query() call -
mostly due to the fact we use OpenAPI auto-generated model code for
the Usage and QueryResponse objects.

Benchmarks using a simple PineconeGRPC-based program making query()
calls against a p2.x4 pod show a 1.05x improvment in QPS by only
constructing a Usage class (and associating it to QueryResponse) if a
'usage' field is present in the protobuf response:

Before:

Type Name # reqs # fails | Avg Min Max Med | req/s failures/s

--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
grpc query_pinecone_no_filter 3223 0(0.00%) | 17 17 139 18 | 55.01 0.00

--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------

After:

Type Name # reqs # fails | Avg Min Max Med | req/s failures/s

--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
grpc query_pinecone_no_filter 3408 0(0.00%) | 17 16 96 17 | 57.55 0.00

--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------

## Type of Change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan

- Existing unit tests
- Tested manually against pinecone-field/pinecone-stress-test
diff --git a/pinecone/grpc/utils.py b/pinecone/grpc/utils.py
@@ -50,13 +50,13 @@ def parse_fetch_response(response: dict):
     return FetchResponse(
         vectors=vd, 
         namespace=namespace,
-        usage=parse_usage(response),
+        usage=parse_usage(response.get("usage", {})),
         _check_type=False
     )
 
-def parse_usage(response):
-    u = response.get("usage", {})
-    return Usage(read_units=int(u.get("readUnits", 0)))
+
+def parse_usage(usage: dict):
+    return Usage(read_units=int(usage.get("readUnits", 0)))
 
 
 def parse_query_response(response: dict, _check_type: bool = False):
@@ -72,13 +72,16 @@ def parse_query_response(response: dict, _check_type: bool = False):
         )
         matches.append(sc)
 
-    return QueryResponse(
-        namespace=response.get("namespace", ""), 
-        matches=matches,
-        usage = parse_usage(response),
-        _check_type=_check_type
-    )
-
+    # Due to OpenAPI model classes / actual parsing cost, we want to avoid
+    # creating empty `Usage` objects and then passing them into QueryResponse
+    # when they are not actually present in the response from the server.
+    args = {'namespace': response.get("namespace", ""),
+            'matches': matches,
+            '_check_type': _check_type}
+    usage = response.get("usage")
+    if usage:
+        args['usage'] = parse_usage(usage)
+    return QueryResponse(**args)
 
 def parse_stats_response(response: dict):
     fullness = response.get("indexFullness", 0.0)