Ettin Reranker Family: Six New State-of-the-Art Cross-Encoder Models Released
A new family of six Sentence Transformers CrossEncoder rerankers, the Ettin Reranker Family, has been released, achieving state-of-the-art performance at their respective sizes. Built on the Ettin ModernBERT encoders from Johns Hopkins University, the models range from 17 million to 1 billion parameters. They were trained using a pointwise MSE distillation recipe from the teacher model mixedbread-ai/mxbai-rerank-large-v2 on a dataset of approximately 143 million (query, document, teacher_score) triples, released as cross-encoder/ettin-reranker-v1-data. The training recipe was bootstrapped using the new train-sentence-transformers Agent Skill in Sentence Transformers v5.5.0. All models support up to 8K tokens of context and are released under the Apache 2.0 license. Benchmark results on MTEB(eng, v2) Retrieval show the 17M model outperforms the 33M ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10, and the 1B model matches its teacher within 0.0001 NDCG@10 while being 2.4x faster. The 150M model is the strongest under 600M parameters, edging out Qwen/Qwen3-Reranker-0.6B. Speed benchmarks on an H100 show the 17M model achieves 7517 pairs per second, the fastest in the comparison. The modular Transformer architecture enables unpadded inputs for Flash Attention 2, yielding up to 8.3x speedup over fp32+SDPA. The training script is approximately 150 lines and uses a single published dataset. The models are available on Hugging Face Hub under the cross-encoder namespace.
Key facts
- Six new CrossEncoder rerankers released: 17M, 32M, 68M, 150M, 400M, 1B parameters.
- Built on Ettin ModernBERT encoders from Johns Hopkins University.
- Trained via pointwise MSE distillation from mixedbread-ai/mxbai-rerank-large-v2.
- Training dataset: ~143M triples, released as cross-encoder/ettin-reranker-v1-data.
- All models support up to 8192 tokens of context.
- Released under Apache 2.0 license.
- 17M model beats ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10 on MTEB.
- 1B model matches teacher within 0.0001 NDCG@10, 2.4x faster.
- 150M model is strongest under 600M parameters on MTEB.
- 17M model achieves 7517 pairs per second on H100, fastest in comparison.
- Modular Transformer enables unpadded inputs for Flash Attention 2.
- Speedup up to 8.3x over fp32+SDPA with bf16+FA2 and unpadding.
- Training recipe bootstrapped with train-sentence-transformers Agent Skill v5.5.0.
- Training script ~150 lines, single recipe for all sizes.
Entities
Institutions
- Johns Hopkins University
- Hugging Face
- Mixedbread AI
- LightOn
- Alibaba
- IBM
- BAAI
- Qwen
- Sentence Transformers
- MTEB
- NanoBEIR
- NVIDIA
- H100
- RTX 3090