Qwen3-Embedding-8B

The model can be used with sentence-transformers or Hugging Face Transformers, with both integration paths documented on the official model card.

Core Model

Embedding

Qwen3-Embedding-8B is a text embedding model with 8B parameters, built on the dense Qwen3 foundation to deliver state-of-the-art performance on retrieval, ranking, and semantic similarity tasks. As of June 2025, it ranks #1 on the MTEB Multilingual leaderboard with a score of 70.58, outperforming all other open models at the time of release. It is released under the Apache 2.0 license and is part of a broader family of embedding and reranking models available at 0.6B, 4B, and 8B scales.

How to getting started

pip install requests Code language: Bash (bash)

import requests
import json


api_url = "https://api.regolo.ai/v1/embeddings"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
  "model": "Qwen3-Embedding-8B",
  "input": [
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.",
    "Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
  ],
  "encoding_format": "float"
}

response = requests.post(api_url, headers=headers, json=data)

if response.status_code == 200:
    with open("./embedding.json", "w") as _file:
        json.dump(response.json(), _file)
else:
    print("Failed embedding request:", response.status_code, response.text)
Code language: Python (python)

Output

{
  "model": "Qwen3-Embedding-8B",
  "data": [
    {
      "index": 0,
      "object": "embedding",
      "embedding": [
        0.013102968223392963,
        -0.00117895065341144, ....
      ]
    }
  ],
  "object": "list",
  "usage": {
    "completion_tokens": 0,
    "prompt_tokens": 99,
    "total_tokens": 99,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  }
}Code language: JSON / JSON with Comments (json)

Applications & use cases

Semantic search and dense retrieval pipelines where high-recall embedding is needed before reranking.
Multilingual and cross-lingual search systems covering 100+ natural languages and programming languages.
Code retrieval and developer tooling that requires understanding of both natural language queries and code content.
Text classification and clustering workflows that benefit from high-quality contextual representations.
Bitext mining and parallel corpus construction for translation and localization pipelines.
Vector database integrations where flexible embedding dimensions (32 to 4096) reduce storage and latency costs.
RAG (Retrieval-Augmented Generation) systems that require long-context, instruction-conditioned embeddings up to 32K tokens.