P048Filed

Edge Language Model Inference with Ternary Quantization

Large language models running on edge — sparse attention, compressed KV cache, local personality.

AU Application
2023900048
Filing Date
10 September 2023
Index Number
P048
Figures
14 figures
Batch / Category
Applications

Explore the Vision

Discover this technology through five complementary perspectives — from technical architecture to partnership outcomes. Each layer reveals a different aspect of how this innovation creates value.

Large language models running on edge — sparse attention, compressed KV cache, local personality.

What It IS

Technical Vision

The architectural essence — what makes this technology work

A large language model running entirely on a local device — ternary sparse attention gating 85% of computation, compressed key-value cache fitting in on-chip memory, responses personalised to the user. A personal language model that never phones home.

1/5
Explore the buyer's journey across 5 perspectives

Abstract

Deployment of large language models to edge devices via ternary quantization, enabling on-device LLM inference under 500ms latency.

Visual Essence

A large language model running entirely on a local device — ternary sparse attention gating 85% of computation, compressed key-value cache fitting in on-chip memory, responses personalised to the user. A personal language model that never phones home.

Visual Family:edge-bloom

Technology Domains

Related Patents

From the edge-bloom visual family