Curious Now

Story

NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference

ComputingArtificial Intelligence

Key takeaway

Researchers have developed a way to cryptographically verify the model used in large language model (LLM) queries, preventing providers from substituting cheaper models or using cached responses. This could give users confidence that they are getting the expected AI inference.

Read the paper

Quick Explainer

NANOZK is a system that enables verifiable inference for large language models. It uses a novel approach that breaks down the overall model computation into independent layer-wise zero-knowledge proofs. This allows NANOZK to generate compact, constant-size proofs that can be produced in parallel, rather than a monolithic proof that would overwhelm current proving systems. To handle non-arithmetic operations, NANOZK employs lookup table approximations that preserve model accuracy. The system also includes a technique to prioritize the verification of the most influential layers, providing a flexible trade-off between proof cost and coverage of the model's sensitivity.

Deep Dive

Technical Deep Dive: NANOZK

Overview

NANOZK is a system that provides verifiable inference for large language models (LLMs). It uses a novel layerwise zero-knowledge proof framework to cryptographically confirm that an LLM service provider executed the claimed model computation, without revealing the model's proprietary weights. Key contributions include:

  • A layerwise proof framework that decomposes the monolithic proving task into independent layer proofs, enabling constant-size 6.9KB proofs that can be generated in parallel.
  • Lookup table approximations for non-arithmetic operations (softmax, GELU, LayerNorm) that introduce zero measurable accuracy loss.
  • A Fisher information-guided verification prioritization strategy that captures 65-86% of model sensitivity at 50% of the full proving cost.

Problem & Context

The rise of LLM-as-a-service has created a "trust asymmetry" where users pay high fees for frontier models but have no cryptographic way to verify that the claimed model was actually used. Service providers could substitute cheaper models or apply aggressive quantization without users' knowledge.

Zero-knowledge proofs (ZKPs) offer a solution, allowing the provider to demonstrate correct computation without revealing proprietary model weights. However, the scale of modern LLMs poses challenges, as monolithic ZK circuit representations can overwhelm current proving systems.

Methodology

The key insight behind NANOZK is that transformer inference has a natural compositional structure, where each layer's output depends only on the previous layer's output. This enables decomposing the monolithic proving task into independent layer proofs, connected by cryptographic commitments.

Specifically, NANOZK generates a proof $\pi_\ell$ for each layer $\ell$ demonstrating:

$h\ell = f\ell(h{\ell-1}; W\ell)$

where $h{\ell-1}$ is the input activation, $h\ell$ is the output, $W\ell$ are the layer weights, and $f\ell$ is the layer computation (attention + feed-forward network).

To ensure layers chain correctly, each proof includes cryptographic commitments $c{\ell-1} = \mathcal{H}(h{\ell-1})$ and $c\ell = \mathcal{H}(h\ell)$, where $\mathcal{H}$ is SHA-256. The verifier checks that adjacent proofs share consistent commitments.

This layerwise approach provides several benefits:

  • Constant proof size (6.9KB per layer) regardless of layer width.
  • Parallel proving, reducing wall-clock time.
  • Selective verification, allowing users to verify only critical layers.

The framework is proven to have "compositional soundness" - the composite proof inherits soundness from the individual layer proofs, achieving an overall soundness error below $10^{-37}$ for a 32-layer model.

To handle transformer operations that are not directly representable in arithmetic circuits (softmax, GELU, LayerNorm), NANOZK employs lookup table approximations with 16-bit precision. Surprisingly, these approximations introduce zero measurable perplexity degradation on standard benchmarks.

Finally, NANOZK introduces a Fisher information-guided layer selection strategy. For resource-constrained scenarios, it identifies the most influential layers to verify, capturing 65-86% of model sensitivity at 50% of the full proving cost.

Data & Experimental Setup

NANOZK is evaluated on GPT-2 architecture variants with hidden dimensions $d \in \{64, 128, 256, 512, 768\}$, matching the 124M-parameter GPT-2-Small configuration. Experiments run on a single Intel Xeon CPU.

For accuracy evaluation, the authors test on GPT-2 (124M), GPT-2-Medium (355M), and TinyLLaMA-1.1B (1.1B) on the WikiText-2 benchmark. For the Fisher information ablation, they also evaluate the 32-layer Phi-2 (2.7B) model.

Results

Key findings:

  • Transformer block proofs have constant 6.9KB size, regardless of hidden dimension. Proving time remains flat at ~6.2 seconds per block, with 23ms verification.
  • Compared to EZKL, the most widely-used ZKML toolkit, NANOZK achieves a 52× average speedup, reaching 228× for larger models where EZKL encounters memory pressure.
  • The lookup table approximations preserve model perplexity exactly across the evaluated GPT-2, GPT-2-Medium, and TinyLLaMA-1.1B models.
  • Fisher information-guided layer selection captures 65-86% of model sensitivity at 50% of the full proving cost, outperforming random selection by 7-12 percentage points.

Limitations & Uncertainties

  • Proving remains orders of magnitude slower than native inference (3.2 minutes for full GPT-2 verification vs. 3 seconds for native).
  • Fisher-guided selection provides probabilistic rather than cryptographic guarantees.
  • The current CPU-based implementation could potentially be accelerated on GPU, though porting the IPA-based proving to GPU requires careful handling of the sequential multi-scalar multiplication steps.

What Comes Next

Verifiable inference addresses a growing trust gap in AI deployment, as LLMs are integrated into critical systems. Zero-knowledge proofs can provide cryptographic assurance of correct computation while protecting model intellectual property. The authors hope this work contributes toward a future where AI systems are verifiable by default.

Source