You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello maintainers,
I noticed that when I call vector_store.add_texts_with_embeddings() with the sparse_embeddings parameter, the sparse embeddings are not actually stored in the Vertex AI index.
I followed the LangChain tutorial (link here), and used the following code snippet:
However, the sparse_embeddings do not appear in the resulting Vertex AI index. After investigating the source code, I found that in the data_points_to_batch_update_records function (in libs/vertexai/langchain_google_vertexai/vectorstores/_utils.py), the sparse embeddings are seemingly dropped. Specifically:
After applying the above fix, I ran the following sample code snippet from the tutorial:
vector_store.similarity_search_by_vector_with_score(
embedding=embedding,
sparse_embedding=sparse_embedding,
k=5,
rrf_ranking_alpha=0.7, # 0.7 weight to dense and 0.3 weight to sparse
)
As a result, I confirmed that the sparse embeddings is now retrieved as expected, ensuring the hybrid search functionality works properly.
Request for Guidance
I am a beginner in open-source contributions, so I am unsure if this change might have a broader impact on other parts of the library. I would greatly appreciate any feedback from the maintainers or other contributors regarding:
Whether this fix is valid or if it breaks anything else.
If there is a preferred or more robust approach to including sparse_embeddings in the index.
Thank you in advance for your time and guidance. If you think this is acceptable, I would be happy to open a PR with this proposed fix.
The text was updated successfully, but these errors were encountered:
Hello maintainers,
I noticed that when I call
vector_store.add_texts_with_embeddings()
with thesparse_embeddings
parameter, the sparse embeddings are not actually stored in the Vertex AI index.I followed the LangChain tutorial (link here), and used the following code snippet:
However, the
sparse_embeddings
do not appear in the resulting Vertex AI index. After investigating the source code, I found that in thedata_points_to_batch_update_records
function (inlibs/vertexai/langchain_google_vertexai/vectorstores/_utils.py
), the sparse embeddings are seemingly dropped. Specifically:It appears that
sparse_embeddings
is not included in the record dictionary here.Expected Behavior
sparse_embeddings
should be stored in the Vertex AI index so that hybrid search can function as intended.Actual Behavior
sparse_embeddings
are not stored in the index, preventing hybrid search from utilizing them.Proposed Fix
I tested a local fix by modifying the code to include
sparse_embeddings
data, as shown below:(
data_points_to_batch_update_records
function (inlibs/vertexai/langchain_google_vertexai/vectorstores/_utils.py
))After applying the above fix, I ran the following sample code snippet from the tutorial:
As a result, I confirmed that the sparse embeddings is now retrieved as expected, ensuring the hybrid search functionality works properly.
Request for Guidance
I am a beginner in open-source contributions, so I am unsure if this change might have a broader impact on other parts of the library. I would greatly appreciate any feedback from the maintainers or other contributors regarding:
sparse_embeddings
in the index.Thank you in advance for your time and guidance. If you think this is acceptable, I would be happy to open a PR with this proposed fix.
The text was updated successfully, but these errors were encountered: