PrismML Releases Ternary Bonsai 1.58-Bit Language Models with 9x Memory Reduction

ai-technology · 2026-04-21

PrismML has introduced Ternary Bonsai, a new family of language models using 1.58-bit representation throughout their architecture. Available in 8B, 4B, and 1.7B parameter sizes, these models employ ternary weights {-1, 0, +1} with group-wise quantization, achieving memory footprints approximately nine times smaller than standard 16-bit models. The 8B variant scores 75.5 on average benchmarks, outperforming most peers in its parameter class despite its compact size. Compared to the earlier 1-bit Bonsai 8B, Ternary Bonsai 8B shows a 5-point improvement while requiring only 600MB more memory. Performance gains are broad across benchmarks including MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval, and BFCLv3. The models deliver strong throughput, running at 82 tokens per second on M4 Pro and 27 tokens per second on iPhone 17 Pro Max, with energy efficiency 3-4 times better than 16-bit counterparts. They run natively on Apple devices via MLX and are available under the Apache 2.0 License. PrismML, founded with support from Khosla Ventures, Cerberus, and Google, emerged from Caltech researchers focused on neural network compression. Full technical details are available in a whitepaper.

Key facts

Ternary Bonsai models use 1.58-bit representation throughout the entire network architecture
Models are available in 8B, 4B, and 1.7B parameter sizes
Memory footprint is approximately 9x smaller than standard 16-bit models
Ternary Bonsai 8B scores 75.5 on average benchmarks, outperforming most peers
Compared to 1-bit Bonsai 8B, it shows a 5-point improvement with 600MB more memory
Models run natively on Apple devices via MLX under Apache 2.0 License
PrismML was founded with support from Khosla Ventures, Cerberus, and Google
Full technical details are available in a whitepaper

Entities

Institutions

PrismML
Khosla Ventures
Cerberus
Google
Caltech

Sources

Hacker News AI — 2026-04-18