Skip to content

Add support for jinaai/jina-embeddings-v2-base-de#270

Merged
joein merged 2 commits intoqdrant:mainfrom
deichrenner:add-jina-base-de
Jun 14, 2024
Merged

Add support for jinaai/jina-embeddings-v2-base-de#270
joein merged 2 commits intoqdrant:mainfrom
deichrenner:add-jina-base-de

Conversation

@deichrenner
Copy link
Contributor

Issue

This PR resolves #266.

A new version of jina embeddings was added:

    {
        "model": "jinaai/jina-embeddings-v2-base-de",
        "dim": 768,
        "description": "German embedding model supporting 8192 sequence length",
        "size_in_GB": 0.16,
        "sources": {"hf": "jinaai/jina-embeddings-v2-base-de"},
        "model_file": "onnx/model_fp16.onnx",
    },

The quantized, onnx exported model is directly hosted by jinaai.

Changes

The following files were changed to add this model:

  • fastembed/text/jina_onnx_embedding.py: Model definition was added

  • tests/test_text_onnx_embeddings.py: Test was added, where the expected data was created with the supplied Colab-Notebook

    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
    input_texts = [
        "hello world", "flag embedding"
    ]
    embeddings = model.encode(input_texts, normalize_embeddings=True)
    print(embeddings[0][:5])
    
    [-0.00857827  0.04176599  0.03420503  0.0309742  -0.01496792]
    

Tests

All tests passed after the changes.

@deichrenner deichrenner marked this pull request as ready for review June 11, 2024 11:49
@joein
Copy link
Member

joein commented Jun 14, 2024

Hey @deichrenner,
thank you for the contribution!

I'll approve it as soon as the CI is green
It'll be available as of the next release (in the meantime you can use it from the main branch)

@joein joein self-requested a review June 14, 2024 11:25
@joein joein merged commit fd0b26f into qdrant:main Jun 14, 2024
Anush008 pushed a commit that referenced this pull request Jun 17, 2024
* feat: add support for SOTA german embedding model with long context length jinaai/jina-embeddings-v2-base-de

* Fix jina de model weight

---------

Co-authored-by: George <panchuk.george@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request for model jinaai/jina-embeddings-v2-base-de

3 participants