Concentrating intelligence

Large models can't fit on smartphones. Datacenters can't sustain them. PrismML is building ultra dense intelligence to solve both.
14× less memory
8× faster
5× less energy

1-bit Bonsai

The first commercially viable models with 1-bit weights. Available in 8B, 4B, and 1.7B sizes, these models were engineered for robotics, real-time agents, and edge computing. They have a 14× smaller footprint than their full-precision counterparts, run 8× faster, and are 5× more energy efficient, while matching leading models at similar parameter counts on benchmarks. This results in over 10× the intelligence density of full-precision equivalents¹.

Ternary Bonsai

Ternary Bonsai models use {-1, 0, 1} weights to deliver a powerful balance between model quality and deployment efficiency. Available in 8B, 4B, and 1.7B sizes, these models have a 9× smaller footprint than full-precision counterparts and run roughly 5× faster, while delivering substantially stronger benchmark performance than most models at similar parameter counts. This creates a compelling tradeoff between capability and efficiency2.

Bonsai Image 4B

Available in 1-bit and Ternary variants, Bonsai Image 4B brings high-quality image generation to everyday devices. Built for local inference on iPhone, Mac, and GPUs, it reduces the diffusion transformer size by up to 8x and speeds image generation by up to 5.6× versus its full-precision counterparts, while preserving strong visual quality. The result is a lighter, and more deployable image generation model.

Supported by:
Benchmark palette

Intelligence density

Negative log of the model's error rate divided by the model size

Model benchmark comparison

Average score (IFEval, GSM8K, HumanEval+, BFCL, MuSR, MMLU-Redux)

Throughput

Tokens per second across hardware platforms (higher is better)

Performance vs. size

Average score (IFEval, GSM8K, HumanEval+, BFCL, MuSR, MMLU-Redux)

Bonsai 8B canopy

Average score (IFEval, GSM8K, HumanEval+, BFCL, MuSR, MMLU-Redux)

Energy consumption

Milliwatt-hours per token across hardware (lower is better)

16.0 GB 16-bit (standard)

1-bit Bonsai 8B

Centering AI research on efficiency

Successful artificial intelligence isn’t just about making models larger, but also smarter. Utilizing breakthrough research at Caltech, PrismML is pushing the frontier of intelligence density by reshaping how models are designed, prioritizing intelligence per bit over sheer parameter count.