-
Notifications
You must be signed in to change notification settings - Fork 623
feature(go/plugins/vertexai): add support for context caching in VertexAI and GoogleAI plugins #1566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
22a2be5 to
be381e8
Compare
|
|
||
| //copy:stop | ||
|
|
||
| //copy:start vertexai.go translateResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VertexAI should not replicate this function anymore since u.CachedContentTokenCount (@ line 558) is not being provided in the response struct (see https://pkg.go.dev/cloud.google.com/go/vertexai/genai#UsageMetadata)
go/plugins/googleai/cache.go
Outdated
| return nil, err | ||
| } | ||
|
|
||
| return client.CreateCachedContent(ctx, cc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thing to mention is that in JS we do a cache lookup based on the displayName (https://github.com/google-gemini/generative-ai-js/blob/0c8d8cde00a33ea9da94e5d1bb033dc261d0588f/types/server/caching.ts#L35) of the CachedContent in Go, this field is not available; making us unable to perform such operations.
apascal07
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this follows the JS implementation a little too closely with too much overly complicated logic. Let's keep it simple for now, focusing only on the cache marker + TTL in the metadata logic. If we ever add more fields, we can return a struct. It's all internal functions so there won't be breaking changes.
go/ai/generate.go
Outdated
|
|
||
| // WithTextPrompt adds a simple text user prompt to ModelRequest. | ||
| func WithTextPrompt(prompt string) GenerateOption { | ||
| func WithTextPrompt(prompt string, ttl ...int) GenerateOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this modified and why is it a variadic arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this shouldn't be there
go/ai/request_helpers.go
Outdated
| }, | ||
| } | ||
| // copy the existing metadata and add cache | ||
| if m.Metadata != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does the opposite of what it says here. It will overwrite cache with whatever was in there already. It makes sense for this deliberate call to overwrite any existing metadata.
| } | ||
|
|
||
| var ContextCacheSupportedModels = [...]string{ | ||
| "gemini-1.5-flash-001", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gemini-1.5-flash-002, gemini-1.5-pro-002, and Gemini 2.0 don't support caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, added gemini-2.0 stable models as well
go/plugins/vertexai/cache.go
Outdated
| return nil, nil | ||
| } | ||
| if endOfCachedContents < 0 || endOfCachedContents >= len(request.Messages) { | ||
| return nil, fmt.Errorf("invalid endOfCachedContents index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return nil, fmt.Errorf("invalid endOfCachedContents index") | |
| return nil, fmt.Errorf("end of cached content index %q is invalid", cacheEndIdx) |
| Temperature float64 `json:"temperature,omitempty"` | ||
| TopK int `json:"topK,omitempty"` | ||
| TopP float64 `json:"topP,omitempty"` | ||
| TTL time.Duration `json:"ttl,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generated file, there's no corresponding TTL field in the source:
genkit/genkit-tools/common/src/types/model.ts
Line 244 in 54d439e
| export const GenerationCommonConfigSchema = z.object({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, after addressing your comments, TTL config should not be in this place but in Message.Metadata instead
| OutputTokens int `json:"outputTokens,omitempty"` | ||
| OutputVideos float64 `json:"outputVideos,omitempty"` | ||
| TotalTokens int `json:"totalTokens,omitempty"` | ||
| CachedTokens int `json:"cachedTokens,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this. We can probably either add it as a field or add it to Custom.
| ) | ||
|
|
||
| // DEFAULT_TTL in seconds (5 minutes) | ||
| const DEFAULT_TTL = 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistent mix of seconds int and time.Duration throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed: ttlSeconds is always int and conversion to time.Duration is done right before detecting the cache mark
|
Since we are moving forward on using the unified |
The following changes are included in this PR:
Checklist (if applicable):