Domains/NPU Inference Core/P038

P038Filed

Ternary Batch Matrix Multiplication (GEMM)

One execution unit handles both ternary and conventional — switchable precision in a single core.

AU Application

2023900038

Filing Date

20 July 2023

Index Number

P038

Figures

12 figures

Batch / Category

Core 2

Explore the Vision

Discover this technology through five complementary perspectives — from technical architecture to partnership outcomes. Each layer reveals a different aspect of how this innovation creates value.

One execution unit handles both ternary and conventional — switchable precision in a single core.

What It IS

Technical Vision

The architectural essence — what makes this technology work

A single processing element with two faces — one crystalline and ternary, one smooth and conventional. A configuration signal flips between modes mid-inference, running ternary layers at extreme efficiency and conventional layers at full precision. Bilingual silicon.

1/5

Explore the buyer's journey across 5 perspectives

Abstract

Optimized batch matrix multiplication kernel for ternary data on GPU and NPU, achieving 90% of theoretical peak throughput on modern silicon.

Visual Essence

A single processing element with two faces — one crystalline and ternary, one smooth and conventional. A configuration signal flips between modes mid-inference, running ternary layers at extreme efficiency and conventional layers at full precision. Bilingual silicon.

Visual Family:silicon-blueprint

Technology Domains

NPU Inference Core(21)

← Previous Patent All Domains Next Patent →

Related Patents

From the silicon-blueprint visual family

Ternary NPU Compiler Optimization Passes

The ternary chip itself — a complete microarchitecture specification.

Ternary Model Compression via Knowledge Distillation

Memory redesigned from the ground up for three-valued data.

Hardware-Aware Ternary Network Architecture Search

Ternary data streams through dataflow hardware — bandwidth-optimal inference.

Ternary Tensor Decomposition and Factorization

The compiler knows the hardware — scheduling ternary execution across heterogeneous cores.