G
Google TechTalks
January 27, 2026

Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Quick Read

This research details a novel method for verifiable machine learning model training by controlling hardware non-determinism, ensuring identical model outputs across different GPUs for enhanced security and accountability.
Third-party ML training services introduce significant trust and security risks.
Hardware non-determinism (especially GPUs) prevents identical training runs, hindering verification.
A new method uses 'rounding logs' and Merkle trees to enable verifiable, identical training across different GPUs.

Summary

The increasing reliance on third-party services for large-scale machine learning model training introduces significant trust issues, as clients cannot easily verify if models were trained correctly. Traditional verification methods, like cryptographic proofs, are too computationally expensive for large models, while statistical tests are vulnerable to attacks and require excessive storage. This presentation introduces an 'optimistic verifiable training' approach that addresses hardware non-determinism, particularly from GPU differences and floating-point non-associativity. The method involves the model trainer storing and sharing 'rounding decisions' during intermediate computations. An auditor can then copy these decisions to achieve an identical training run, enabling efficient verification via a Merkle tree of model checkpoint hashes. This approach eliminates GPU-induced non-determinism, offers significant speed improvements over proof-based systems, and drastically reduces storage costs compared to statistical tests, making verifiable training practical for modern ML workloads.
As AI models grow in complexity and training shifts to specialized third-party services, users face a critical trust gap regarding model integrity. Without verifiable training, malicious actors can introduce backdoors or data poisoning, leading to compromised models and significant security risks. This work provides a practical path to ensuring accountability and trust in outsourced ML training, which is essential for the widespread adoption and reliability of large-scale AI applications.

Takeaways

  • The shift to large-scale ML models necessitates reliance on third-party training services, creating a trust problem.
  • Existing verification methods (cryptographic proofs, statistical tests) are impractical or insecure for large-scale ML.
  • Hardware non-determinism, particularly from GPUs and floating-point arithmetic, causes different training runs to produce non-identical models even with identical inputs.
  • The proposed 'optimistic verifiable training' method controls GPU non-determinism by recording and sharing intermediate rounding decisions.
  • This allows an auditor to replicate a trainer's exact model, enabling verification via Merkle tree hash comparisons.
  • The method is significantly faster and more storage-efficient than prior cryptographic or statistical approaches.

Insights

1Trust Deficit in Third-Party ML Training

The growing computational demands of large-scale ML models force users to rely on third-party services (e.g., OpenAI API, Amazon SageMaker) for training. This abstraction means users have limited visibility into the training process, creating a trust problem where malicious trainers could introduce data poisoning or backdoors without detection.

The speaker highlights the shift from users training models on their own GPUs to delegating to services, showing simple API calls that abstract away the training complexity. (, )

2Limitations of Prior Verification Approaches

Cryptographic proof-based systems are too slow and memory-intensive for large-scale ML, often requiring fixed-point arithmetic that harms model performance. Statistical tests, while less resource-intensive, require frequent saving of full model weights (massive storage) and are still vulnerable to sophisticated attacks like backdoor injection with minimal data alteration.

Examples show cryptographic proofs taking 72 seconds and 350MB per epoch for logistic regression, and 22 minutes per batch for a small VGG11. Statistical tests require frequent full model weight saves and are easily spoofed. (, )

3Hardware Non-Determinism as a Core Challenge

Even with identical data, hyperparameters, and deterministic algorithms, training the same model on different GPUs (or even different runs on the same GPU without specific flags) results in fundamentally different models. This is due to variations in GPU architecture (computation units, parallel operation partitioning, memory caches) leading to different operation orderings, which, combined with floating-point non-associativity, cause errors to accumulate.

Experiments show identical training setups on different NVIDIA GPUs (A40GP, Titan XP, RTXTI) yield models with similar aggregate metrics (accuracy, perplexity) but drastically different individual predictions and token likelihoods. (, )

4Optimistic Verifiable Training via Rounding Logs

The proposed solution leverages a 'verification game' where a client, suspicious of a trainer's model, asks an auditor to re-train. To ensure identical results despite hardware non-determinism, the trainer stores 'rounding decisions' (up, down, or none) made during intermediate computations (e.g., convolution, linear layers). The auditor copies these decisions when their own computation results fall within a 'logging region' near a rounding boundary, effectively forcing deterministic outcomes. This enables the use of Merkle trees of model checkpoint hashes for efficient, step-by-step discrepancy detection.

The speaker details the verification game, the role of Merkle trees, and the mechanism of storing and copying rounding decisions for intermediate computations. (, , )

5Efficiency and Security of the Approach

Empirical results demonstrate that this method successfully eliminates GPU-induced non-determinism, allowing for identical training runs. It achieves significantly faster training speeds compared to cryptographic proof-based systems and offers over a 100x reduction in storage cost compared to statistical test methods by only storing rounding decisions (not full model weights). The security relies on the fact that an adversarial trainer can only manipulate rounding logs when the auditor's values are also close to a rounding boundary, which occurs very infrequently.

The method eliminated all GPU non-determinism across three Nvidia GPUs for ResNet-50 and GPT-2 training. Speed comparisons show significant gains, and storage is reduced by only logging rounding decisions, which are needed for a tiny fraction of intermediate computations. (, , , )

Bottom Line

Only a minuscule percentage of intermediate computations (e.g., 2e-6% for ResNet-50) actually require rounding decision correction to ensure identical training runs across different GPUs.

So What?

This implies that the storage cost for rounding logs could be dramatically reduced if there were a more precise way to predict when these critical divergences occur, rather than logging all potential instances.

Impact

Develop predictive models or analytical bounds for identifying floating-point divergence points in ML computations, enabling highly optimized, sparse logging of rounding decisions and further reducing storage overhead for verifiable training.

The problem of non-determinism extends beyond model training to LLM inference, where a single divergent token generation can cascade into entirely different outputs.

So What?

This makes verifiable inference a significant challenge for companies building trusted, decentralized LLM inference services, as ensuring consistent output across distributed hardware is critical.

Impact

Adapt and apply the 'rounding log' approach to LLM inference pipelines to ensure deterministic token generation, enabling trusted and verifiable decentralized inference for large language models.

Opportunities

Premium Verifiable ML Training Service

Offer a specialized ML model training service that, for a premium, provides clients with a 'rounding log' alongside their trained model. This log enables clients to audit the training process independently using a third-party auditor, ensuring the model's integrity and adherence to specifications.

Source: The speaker suggests that a service could offer the ability to store rounding decisions for a premium, allowing for verification. (37:30)

Independent ML Model Auditing Service

Establish a third-party service specializing in auditing machine learning models. This service would receive a client's data, training specifications, the trained model, and the trainer's rounding log, then re-run the training to verify its integrity and pinpoint any discrepancies using the Merkle tree-based verification game.

Source: The entire talk revolves around the need for and mechanism of a third-party auditor to verify model training. (08:41)

Deterministic LLM Inference Platform

Develop a platform specifically designed for trusted, decentralized LLM inference that guarantees deterministic token generation. This would address the challenge of non-determinism in sequential token generation, crucial for applications requiring high consistency and verifiability in AI outputs.

Source: The speaker identifies LLM inference as a future work area where non-determinism causes discrepancies to 'blow up' and is a challenge for trusted decentralized services. (55:45)

Key Concepts

Merkle Tree for Verification

A data structure where each leaf node is a hash of a data block (e.g., model checkpoint), and each non-leaf node is a hash of its children. This allows for efficient verification of data integrity and pinpointing discrepancies through a binary search, without needing to store the full data at each step.

Floating-Point Non-Associativity

Floating-point arithmetic operations (like addition) are not associative due to precision limitations. The order in which numbers are summed can lead to slightly different results, which accumulate over many operations (e.g., during ML training), causing divergence in model weights across different hardware or execution paths.

Lessons

  • ML service providers should investigate implementing mechanisms to control hardware non-determinism, such as logging and sharing rounding decisions, to offer verifiable training options to clients.
  • Clients delegating large-scale ML training should inquire about the verifiability of the training process and consider engaging independent auditors or using services that provide verifiable logs.
  • Researchers should focus on optimizing the storage cost of verifiable training by developing methods to predict when critical floating-point divergences occur, rather than logging all intermediate computations.

Quotes

"

"If it was possible for two different parties to be able to train a model and get identical results then we wouldn't need a statistical test and we wouldn't be vulnerable to attacks."

MEA
"

"Everything else is identical like the initialization, the data being trained on, the data ordering, all that has changed is the GPU type that model training was carried on over and still that led to pretty fundamentally different models."

MEA

Q&A

Recent Questions

Related Episodes