Skip to content

Conversation

@vedant381
Copy link
Contributor

@vedant381 vedant381 commented Oct 28, 2025

This pull request updates the default handling of embedding model dimensions across the vector store configuration and implementation to ensure consistency, particularly with OpenAI's embedder defaults. The changes ensure that if the embedding model dimensions are not specified, a sensible default is used, and the documentation is updated to reflect these defaults.

Configuration and Default Handling Improvements:

  • Set the default embedding_model_dims to 1536 in VectorStoreBase, aligning with the OpenAI embedder's default dimensions.
  • Updated the VectorStoreFactory to automatically set embedding_model_dims from VectorStoreBase if not specified in the configuration, ensuring consistent behavior across vector store instantiations.

Documentation Updates:

  • Updated the OpenSearch vector database example configuration in the documentation to set embedding_model_dims to 1536 and added an explicit embedder configuration for OpenAI with matching dimensions.This pull request updates the configuration documentation for integrating OpenSearch with an embedding model, ensuring the embedding dimensions are consistent and adding explicit embedder configuration.

Configuration consistency and embedder specification:

  • Updated the embedding_model_dims parameter in the OpenSearch configuration from 1024 to 1536 to match the embedder's output dimensions.
  • Added a new embedder configuration section specifying the use of OpenAI's text-embedding-3-small model with embedding_dims set to 1536.The previous OpenSearch config example was missing the 'embedder' configuration. This could lead to a mismatch between the embedding dimensions expected by the vector store and the dimensions produced by the embedder. This change adds the 'embedder' config to the example to prevent this issue and also fixes a minor syntax error.

Fixes #3677

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • Unit Test
  • Test Script (please provide)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed
The previous OpenSearch config example was missing the 'embedder' configuration. This could lead to a mismatch between the embedding dimensions expected by the vector store and the dimensions produced by the embedder. This change adds the 'embedder' config to the example to prevent this issue and also fixes a minor syntax error.
"port": 443,
"http_auth": auth,
"embedding_model_dims": 1024,
"embedding_model_dims": 1536, # should match embedder's dimensions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Isn't it applicable to every other Vector Provider?

Copy link
Contributor Author

@vedant381 vedant381 Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Many vector stores — like Qdrant and OpenSearch — default to 1536 dimensions, aligning with OpenAI’s embedding output. However, in this case, the dimension was explicitly set to 1024, which caused the mismatch. That’s why I mentioned it’s a deeper, system-level issue. While this change will resolve your immediate problem, it doesn’t address the underlying inconsistency between embedders and vector store configurations.

Moreover , if you go through the documentations for other vector stores, they are passing embedding_model_dims as 1536 hence the issue won't be there.

@vedant381 vedant381 changed the title docs: update opensearch config example Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants