AI Gateway ADR 002: Exposing proxy endpoints to AI providers
Summary
AI Gateway exposes proxy endpoints to AI providers to let existing client libraries in GitLab-Rails access them. This is a drop-in replacement that should be used until stage groups move to a single purpose endpoint. We are veering from our ultimate desired architecture in order to bring these features to market for self-managed GitLab instances faster.
Context
The original iteration of the blueprint suggested to have a single purpose endopint for each AI-powered feature. There were multiple reasons for this:
- Avoid hard-coding AI-related logic in the GitLab monolith codebase to minimize the time required for customers to adopt our latest features.
- Retain the flexibility to make changes in our product without breaking support for a long-tail of older instances.
In issue 454543, we discussed various options to enable existing AI features in self-managed GitLab.
Decision
In the issue we decided to introduce proxy endpoints to AI providers so that our Ruby client libraries Anthropic::Client
and VertexAi::Client
work as-is. The reason is that:
- It's challenging to re-write the existing business logic in Python AI Gateway:
- Some of the business logic is using dependencies that are only available in GitLab-monolith (e.g. Feature Flag, Caching in Redis). This requires us to workaround these implementations, which is error prone.
- Due to the intensive inheritance in
Gitlab::LLm
namespace, it's hard to extract the actual business logic that are taking an effect. - We lack a tool to evaluate whether the quality and functionality of the feature remain consistent before and after changes.
- Duo Chat bacame GA regardless of the existing
POST /v1/chat/agent
endpoint which serves as a proxy endpoint. Technically, this is not a single purpose endpoint yet.
Technical details
Here is the overview of the request flow:
flowchart LR
subgraph AIGateway
Proxy["Proxy"]
end
subgraph Provider1["Anthropic"]
direction LR
Model1(["Claude 2.1"])
end
subgraph Provider2["VertexAI"]
direction LR
Model2(["text-bison"])
end
subgraph SM or SaaS GitLab
DuoFeatureA["Duo feature A"]
DuoFeatureB["Duo feature B"]
end
DuoFeatureA -- POST /v1/proxy/anthropic/v1/complete --- Proxy
DuoFeatureB -- POST /v1/proxy/vertex-ai/v1/text-bison:predict --- Proxy
Proxy -- POST /v1/complete --- Provider1
Proxy -- POST /v1/text-bison:predict --- Provider2
Anthropic
Expose the following HTTP/1.1 endpoint in AI Gateway:
POST /v1/proxy/anthropic/(*path)
path
can be forwarded to the folloinwg endpoints:
/v1/complete
-
/v1/messages
(Future iteration)
Vertex AI
Expose the following HTTP/1.1 endpoint in AI Gateway:
POST /v1/proxy/vertex-ai/(*path)
path
can be forwarded to the following endpoints:
-
/v1/{endpoint}:predict
-
endpoint
must be one of:chat-bison
,code-bison
,codechat-bison
,text-bison
,textembedding-gecko@003
.
-
Common behavior
- Request body is sent to AI providers as-is.
- Request headers are filtered/replaced by AI Gateway accordingly e.g. Allow only
accept
,content-type
,anthropic-version
and filter out the rest.x-api-key
is added. - Response body is returned to clients as-is.
- Response headers are filtered/replaced by AI Gateway accordingly e.g. Allow only
date
,content-type
,transfer-encoding
and filter out the rest. - Response status is returned to clients as-is.
- HTTP Streaming is supported.
- if unsupported
path
is specified, AI Gateway responds with a 404 Not Found error.
Access control
- Clients must send JWT issued by GitLab.com or Customer Dot.
- This JWT contains
scopes
that indicates the permissions given to the GitLab-instance. Thisscopes
will vary per Duo subscription tier. - To access these proxy endpoints,
scopes
must include one of:explain_vulnerability
,resolve_vulnerability
,generate_description
,summarize_all_open_notes
,summarize_submitted_review
,generate_commit_message
,summarize_review
,fill_in_merge_request_template
,analyze_ci_job_failure
. - Requests that do not meet the specified criteria will result in a 401 Unauthorized Access error.
- This JWT contains
- Clients must send
X-Gitlab-Feature-Usage
headers in HTTP requests.- This
X-Gitlab-Feature-Usage
header indicates the purpose of the API request. - To access these proxy endpoints,
X-Gitlab-Feature-Usage
must be one of:explain_vulnerability
,resolve_vulnerability
,generate_description
,summarize_all_open_notes
,summarize_submitted_review
,generate_commit_message
,summarize_review
,fill_in_merge_request_template
,analyze_ci_job_failure
. - Requests that do not meet the specified criteria will result in a 401 Unauthorized Access error.
- This
- For logging, we add the value of
X-Gitlab-Feature-Usage
header in access logs in AI Gateway. - For metrics, we instrument the concurrent requests with
ModelRequestInstrumentator
and input/output tokens withTextGenModelInstrumentator
in AI Gateway. It should be labled withX-Gitlab-Instance-Id
,X-Gitlab-Global-User-Id
andX-Gitlab-Feature-Usage
. - For telemetry, we add Internal Event Tracking for each feature in GitLab-Rails. Alternatively, we could use the existing snowplow tracker in AI Gateway, which requires additional work for introducing an unified schema.
For futher access control improvement, see this issue.
Consequences
- Experimental AI features are enabled on self-managed instances.
- Stage groups can start working on improving the business logic of the feature. This proxy work can be worked in parallel.
- Stage groups don't need to rush refactoring business logic in Python AI Gateway for GA release. They can take time post-GA.
- We can detect abusers by checking
X-Gitlab-Instance-Id
,X-Gitlab-Global-User-Id
andX-Gitlab-Feature-Usage
in logs and metrics. - We can block abusers by gating the access at Cloud Connector LB (Cloud Flare) or AI Gateway middleware.