ZKML Reading Notes
A chronological survey of the zkML field, through 2024.
2017 SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud
- A restricted class of neural nets that can be represented as arithmetic circuits over finite fields.
- Applies Thaler’s IP protocols for “regular” arithmetic circuits and matrix multiplication.
- Evaluates on a 2-layer CNN (MNIST) and a 4-layer fully-connected network (TIMIT)
2018 Model Cards for Model Reporting
- Thoughtful approach to ML model transparency
- Model cards are a template for documenting ML models
- Model cards contain benchmarks on evaluation datasets
2020 On Polynomial Approximations for Privacy-Preserving and Verifiable ReLU Networks
- Replaces ReLU activation function with a degree-2 polynomial, f(x) = x^2 + x
- Evaluates on CNNs, small decrease in accuracy relative to ReLU
2020 Zero Knowledge Proofs for Decision Tree Predictions and Accuracy
- Setup: Prover commits to a decision tree in linear time.
- Proves validity of a decision path in time proportional to prediction path length.
- Uses Aurora
2021 Mystique: Efficient Conversions for Zero-Knowledge Proofs with Applications to Machine Learning
2021 A Greedy Algorithm for Quantizing Neural Networks
- Quantizes the weights of a pre-trained neural network
- Proposes Greedy Path-Following Quantization (GFPQ) algorithm
2021 VeriDL: Integrity Verification of Outsourced Deep Learning Services
2021 ZEN: An Optimizing Compiler for Verifiable, Zero-Knowledge Neural Network Inferences
- Compiles floating-point pyTorch models to R1CS constraints
- Quantization algorithm with R1CS-friendly optimizations
- SIMD-style optimization encodes multiple 8-bit integers in a large finite field element
2021 zkCNN: Zero Knowledge Proofs for Convolutional Neural Network Predictions and Accuracy
- zkCNN repo
- Sumcheck protocol for FFT and 2-D convolutions
- Based on refinement of GKR to arbitrary circuits
- Efficient gadgets for ReLU activation and max pooling
- Evaluated on 15M parameter network
2022 pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing
- Privacy-preserving CNN testing via FHE and collaborative inference,
- zk-SNARK for the privacy-preserving CNN testing
- Quadratic Matrix Program (QMP) arithmetic circuit for 2D convolution
2022 Scaling up Trustless DNN Inference with Zero-Knowledge Proofs
- zk-SNARK (Halo2) for convolutional NN
- 8-bit quantized deep neural networks
- Custom gates for linear layers
- Lookup arguments for non-linearities
- Evaluated on ImageNet (avg. resolution: 469 × 387 pixels)
2022 ZK-IMG: Attested Images via Zero-Knowledge Proofs to Fight Disinformation
- Compiles high-level image transformations to zk-SNARKs.
- Attests to (multiple) image transformations while hiding the pre-transformed image
- Proving nanoGPT in EZKL (Halo2-KZG)
- nanoGPT expressed in ONNX format
- Expresses many ONNX operations as a single operation (Einstein summation)
2023 The Cost of Intelligence: Proving Machine Learning Inference with Zero-Knowledge
- Modulus Labs
- Benchmarks several ZKPs for NNs: Groth16, Gemini, Winterfell, Halo2, Plonky2, zkCNN
- Good survey of ZKML papers
2023 zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training
- zkReLU: a specialized proof for the ReLU activation and its backpropagation
2024 vCNN: Verifiable convolutional neural network based on zk-snarks
- Quadratic Polynomial Program (QPP) SNARK for convolutions
- Quadratic Arithmetic Program (QAP) SNARK for pooling and activation layers
- Commit-and-Prove SNARK for connecting adjacent layers
- Significant performance improvements on MNIST and VGG16 image datasets
2024 BatchZK: A Fully Pipelined GPU-Accelerated System for Batch Generation of Zero-Knowledge Proofs
- Evaluated on ZKML application (convolutional NN), 9.5 proofs per second
- Pipelined Modules for Sum-Check, Merkle tree, and linear-time encoder
2024 Sparsity-Aware Protocol for ZK-friendly ML Models: Shedding Lights on Practical ZKML
- SpaGKR sparsity-aware ZML framework
- Efficient results with ternary networks
- Sparsity-Aware Sum-Check
- Evaluated on…
2024 An Efficient and Extensible Zero-knowledge Proof Framework for Neural Networks
- Improves proofs for non-linear layers by converting complex non-linear relations into range and exponent relations
- Efficient proofs for range and exponent lookups
- Evaluation on CNN and Transformer (GPT-2)
2024 zkLLM: Zero Knowledge Proofs for Large Language Models
- Parallelized lookup argument for non-arithmetic tensor operations
- zkAttn ZKP for the attention mechanism
- Evaluated on 13B-parameter LLM
- CUDA implementation
2024 Scaling intelligence: Verifiable decision forest inference with remainder
- Modulus Labs
- Verifiable decision forest inference
- Remainder: GKR prover with Ligero polynomial commitment scheme
2024 ZKML: An Optimizing System for ML Inference in Zero-Knowledge Proofs
- Compiles from TensorFlow to optimized Halo2 circuits
- Gadgets for ML operations (dot products, softmax, pointwise non-linearities)
- Supports CNNs, LSTMs, Transformers, MLPs, diffusion models
2024 ZENO: A Type-based Optimization Framework for Zero Knowledge Neural Network Inference
- ZENO: ZEro knowledge Neural network Optimizer
- ZENO language maintains high-level semantics to enable compiler optimizations
- Knit encoding combines multiple low-bit (e.g.,8-bit) scalars into a high-bit (e.g., 254-bit) scalar
- Reduces VGG16 proof from 6 minutes to 48 seconds
2024 Zero-Knowledge Proofs of Training for Deep Neural Networks
- GKR-style (sumcheck-based) proof system for gradient-descent
- Recursively compose proofs across multiple iterations to attain succinctness
Quantization
Quantization is the conversion of model weights to a more compact format. Models are commonly trained with 16-bit or 32-bit floating point numbers. Converting these weights to small (e.g. 4-bit) integers or binary values can significantly reduce the size of models and the cost of inference. Converting weights to integers in a finite field allows for more practical implementation in zero knowledge proofs.
- {-1, 0, 1}
- Almost as small as binary networks, but with better performance
2016 Ternary Neural Networks for Resource-Efficient AI Applications
2018 A Survey on Methods and Theories of Quantized Neural Networks
2018 Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
2018 LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
2018 Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
- Quantizes weights and activations as 8-bit integers
- Inference with integer-only arithmetic
- Quantization-aware training framework
2022 Post-training Quantization for Neural Networks with Provable Guarantees
- Modifies GFPQ to promote sparsity of weights
- Quantization error analysis
2023 BitNet: Scaling 1-bit Transformers for Large Language Models
- 1-bit Transformer architecture for LLMs
- Competitive with 8-bit quantization an FP16
2024 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- Every parameter (weight) of the LLM is ternary {-1, 0, 1}.
- Comparable performance with full-precision (FP16) model
2024 Scaling Trust in the Age of Artificial Intelligence
GKR and Sum-Check
The Unreasonable Power of the Sum-Check Protocol
2012 Practical verified computation with streaming interactive proofs
- prover time was improved to quasi-linear $(O(|C| log |C|))$
2013 Time-Optimal Interactive Proofs for Circuit Evaluation
- For “regular” circuits, give a prover that is linear time in circuit size
- IP for Matrix multiplication
2017 vSQL: Verifying Arbitrary SQL Queries over Dynamic Outsourced Databases
- GKR + polynomial commitment scheme = ZK argument
2019 Libra: Succinct Zero-Knowledge Proofs with Optimal Prover Computation
- Prover time to strictly linear $(O(|C|))$ for layered arithmetic circuits without assuming any structures.
2021 Doubly Efficient Interactive Proofs for General Arithmetic Circuits with Linear Prover Time
- “Generalizes the GKR protocol to work on arbitrary arithmetic circuits efficiently for the first time. For a general circuit of size $|C|$ and depth d, the prover time is $O(|C|)$, the same as the original GKR protocol on a layered circuit with the same size.”
Hyperplonk, Brakedown, Orion
- Combine Sum-Check with multilinear polynomial commitments to build SNARKs
2023 Modular Sumcheck Proofs with Applications to Machine Learning and Image Processing
- “A new information-theoretic primitive called Verifiable Evaluation Scheme on Fingerprinted Data (VE) that captures the properties of diverse sumcheck-based interactive proofs, including the well-established GKR protocol”
2024 Sparsity-Aware Protocol for ZK-friendly ML Models: Shedding Lights on Practical ZKML
- SpaGKR: GKR protocol exploiting sparse parameters
- SpaSum: Sparsity-aware Sum-Check protocol