Author: Emmanuel Odeke, Orijtech Inc
Audience: Google Cloud Spanner engineering and client libraries developers for all languages
Dates of issuance and publication: October 18th 2024 to November 25th 2024
Reference issue for Go: https://github.com/googleapis/google-cloud-go/issues/11073
Reference implementation for Go: https://github.com/googleapis/google-cloud-go/pull/11048
Version | Date | Author | Comments |
---|---|---|---|
v0 | 11-10-2024 | Emmanuel Odeke | Initial presentation of the specification and examples; Solicited comments and raised issues discovered in the initial version |
v1 | 11-26-2024 | Emmanuel Odeke | Listing of caveats and collision due to weak consideration and clash for the randProcessID using only 32-bits hence 50,000 processes within a project can trigger a 25% chance of a collision |
Simil from Google suggested introducing a versioning flag in the header | |||
now using uint64, not UUID4 nor uint32 for randProcessID: a counter balance between randomness/collisions+storage |
Reduce the mean-time-to-solution (MTTS) and make debugging more robust and reliable both for Google Cloud Spanner customers and Google Customer Engineers.
Trying to report customer code latency and behavioral malfunctions is quite the involved and difficult process simply due to lack of an always on tangible identifier that customers can provide to Google Customer Engineering who are the first line of answers for customers. Given that the customer engineering team are heavily detached from the actual engineering teams, there is a ton of loss of information, trust and translation; even more the current mechanisms of trying to resolve these problems is laborious because it places the onus on customers to try to debug their own problmes. Trying to use trace-ids is a very disconnected and quite ineffective firstly because by its very nature, tracing is sampled, that data is filtered out when it hits certain frontend servers and worse it is super expensive to try to customize solutions per client.
We can achieve this by plumbing through from customer code to Google’s backend servers with a specialized unsampled/always-on requestID header that customers can copy and paste into Google Customer Engineering support tickets anytime that an error occurs. We cannot rely on observability tracing headers because those are sampled (included or dropped depending on premature decisions) and Google Cloud Spanner’s engineering team is ready to accept this gRPC header to correlate calls that’ll be sent to their backend and also provided to customers to reference when filing bugs. This is where we introduce the header x-goog-spanner-request-id
that Google Cloud Spanner’s backends will receive and correlate with other backend operations.
This per RPC header is recognized by Google Cloud Spanner’s servers and allows direct correlation in customer calls all the way to Google’s backends. It is sent in the header “x-spanner-request-id”
This header’s structure is made up of a version, uuid4 version and 4 segments each with a max of a uint32 and separated by a dot “.” so