Domains/NPU Inference Core/P011

P011Filed

Attention Mechanism Compression via Ternary Quantization

The transformer attention mechanism — rebuilt for three values.

AU Application

2023900011

Filing Date

5 March 2023

Index Number

P011

Figures

15 figures

Batch / Category

Core 1

Explore the Vision

Discover this technology through five complementary perspectives — from technical architecture to partnership outcomes. Each layer reveals a different aspect of how this innovation creates value.

The transformer attention mechanism — rebuilt for three values.

What It IS

Technical Vision

The architectural essence — what makes this technology work

The multi-headed attention mechanism of a large language model rendered as a crown of parallel beams, each beam now carrying ternary queries, keys, and values. Where attention scores once demanded expensive matrix multiplications, compare-and-select operations produce the same focus in a fraction of the energy.

1/5

Explore the buyer's journey across 5 perspectives

Abstract

Methods for compressing transformer attention mechanisms into ternary representation while maintaining semantic information and model expressiveness.

Visual Essence

The multi-headed attention mechanism of a large language model rendered as a crown of parallel beams, each beam now carrying ternary queries, keys, and values. Where attention scores once demanded expensive matrix multiplications, compare-and-select operations produce the same focus in a fraction of the energy.

Visual Family:silicon-awakening

Technology Domains

NPU Inference Core(21)

← Previous Patent All Domains Next Patent →

Related Patents

From the silicon-awakening visual family

Ternary Neural Processing Unit Architecture for Binary NPU Optimization

Existing chips run ternary — no new silicon required.

Zero-Skip Gating for Ternary Neural Networks

Normalisation layers dissolve into the ternary fabric — no floating-point tax.

Ternary Weight Pruning and Sparsification

Weights and activations co-designed — the whole pipeline speaks three values.

Mixed-Precision Ternary Inference Scheduling

The architecture searches itself — evolution finds the optimal ternary shape.