Rank (Borda)

Model

Zero-shot

Active Params (B)

Total Params (B)

Embedding Dim

Max Tokens

1

harrier-oss-v1-27b

78%

25.6

27.0

5376

131072

2

KaLM-Embedding-Gemma3-12B-2511

73%

10.8

11.8

3840

32768

3

llama-embed-nemotron-8b

99%

7.0

7.5

4096

32768

4

Qwen3-Embedding-8B

99%

6.9

7.6

4096

32768

5

gemini-embedding-001

99%

3072

2048

6

Qwen3-Embedding-4B

99%

3.6

4.0

2560

32768

7

Octen-Embedding-8B

99%

6.9

7.6

4096

32768

8

F2LLM-v2-14B

88%

13.2

14.0

5120

40960

9

F2LLM-v2-8B

88%

6.9

7.6

4096

40960

10

harrier-oss-v1-0.6b

78%

0.440

0.596

1024

 

In addition to the large 27-billion-parameter model, there are two smaller variants (0.6B and 270M) for weaker hardware. All models are available on Hugging Face under the MIT license. The team plans to integrate the technology into Bing and into new grounding services for AI agents in the future.

Embedding models are responsible for searching, retrieving, and organizing information so that AI systems can deliver accurate answers. According to Microsoft, they are becoming increasingly important in the age of AI agents, since such agents must independently search for information, update context across multiple steps, and retain memory.