According to 1M AI News monitoring, Microsoft has released the harrier-oss-v1 open-source multilingual text embedding model family on Hugging Face, including three scales: 270M, 0.6B, and 27B. The model card indicates that this series adopts a decoder-only architecture, last-token pooling, and L2 normalization, supports up to 32768 tokens, and can be used for retrieval, clustering, semantic similarity, classification, bilingual mining, and re-ranking.
The Multilingual MTEB v2 is a widely used industry benchmark for multilingual text embeddings, primarily testing tasks such as retrieval, classification, clustering, and semantic similarity. According to Microsoft's model card, the scores of the three scales on this benchmark are 66.5, 69.0, and 74.3, with the 27B version ranking first on the day of its release. The 270M and 0.6B versions also utilize larger embedding models for knowledge distillation. All three models are released under the MIT license.
