Google TechTalks•January 27, 2026

Machine Learning Privacy Large Language Models Data Security Ethics of AI

Threat Models for Memorization: Privacy, Copyright, and Everything In-Between

YouTube · MdZjXUHJIq4

Quick Read

Relaxing threat models for machine learning memorization, even with natural data or benign users, creates unexpected privacy and copyright vulnerabilities in AI models.

●Heuristic privacy defenses often fail to protect the most vulnerable data due to flawed average-case evaluations.

●Large Language Models (LLMs) frequently reproduce verbatim training data, including copyrighted content, even during non-adversarial use.

●Differential Privacy (DP-SGD), even with weak guarantees, can be a surprisingly strong heuristic defense against memorization.

Summary

Michael Arie discusses two papers on threat models for memorization in machine learning, focusing on privacy and copyright. He highlights that moving away from worst-case assumptions requires caution. First, he demonstrates how heuristic privacy defenses are often misleadingly evaluated by focusing on average-case privacy rather than the worst-case leakage from vulnerable data (like outliers), even in natural datasets. He shows that even differentially private mechanisms (DP-SGD) can act as strong heuristic defenses. Second, he explores non-adversarial reproduction in large language models (LLMs), revealing that benign user interactions can inadvertently lead to the reproduction of verbatim training data, including copyrighted material, at a significantly higher rate than human-generated text. This challenges the notion that copyright infringement only occurs under adversarial attacks, suggesting a need for mitigation beyond just terms of service.

The findings challenge common assumptions in AI development and deployment: privacy evaluations of heuristic defenses are often flawed, understating risks to vulnerable data, and LLMs can inadvertently reproduce copyrighted material even when used benignly. This has significant implications for legal liability, ethical AI development, and the practical application of privacy-preserving techniques, urging developers to adopt more rigorous evaluation methods and consider intrinsic reproduction risks.

Takeaways

❖Heuristic privacy defenses are often evaluated incorrectly, focusing on average-case privacy rather than the worst-case leakage from vulnerable data.
❖Privacy should always be treated as a worst-case metric, even when dealing with natural datasets, to accurately assess risks.
❖Large Language Models (LLMs) can inadvertently reproduce significant portions of their training data, including copyrighted text, during benign, everyday usage.
❖LLMs systematically reproduce more existing online text than humans performing similar tasks, raising concerns about copyright infringement.
❖Differentially Private Stochastic Gradient Descent (DP-SGD) can be a surprisingly effective heuristic defense against memorization, even without strong theoretical guarantees.

Insights

1Flawed Evaluation of Heuristic Privacy Defenses

Typical evaluations of heuristic privacy defenses are misleading because they implicitly report an average-case notion of privacy, failing to reflect leakage from the most vulnerable data. This often includes outliers, mislabeled samples, or out-of-distribution data, which are precisely what one would want to protect most. Additionally, these evaluations often use weak attacks and compare against unfairly private (and useless) differentially private (DP) baselines.

The speaker demonstrates that when evaluated correctly, the actual privacy leakage for heuristic defenses was 7 to over 50 times larger than reported, and none provided meaningful privacy in practice. This is shown by shifting from averaging over the entire dataset to auditing 'canaries' that mimic the most vulnerable data.

2Privacy as a Worst-Case Metric for Natural Data

Even when relaxing the threat model from pathological worst-case datasets to more natural data, privacy should still be treated as a worst-case metric. It is crucial to report the privacy leakage of the most vulnerable data points within a natural dataset, as leakage can be highly non-uniform and sensitive data (e.g., rare medical cases) is often the most vulnerable.

The speaker illustrates this by comparing average-case privacy (low leakage) with worst-case privacy for individual samples (high leakage, especially for outliers), showing a 'very non-uniform' leakage pattern. The example of medical settings with rare diseases highlights the importance of protecting these vulnerable individuals.

3DP-SGD as a Strong Heuristic Defense

Differentially Private Stochastic Gradient Descent (DP-SGD), even when configured with very high epsilon parameters that provide 'meaningless' theoretical privacy guarantees (e.g., epsilon in the order of 10^8), can still act as a very strong heuristic defense. It empirically outperforms other heuristic approaches in protecting privacy, particularly when considering the worst-case leakage of vulnerable data.

When DP-SGD baselines are trained for high utility (comparable to heuristics) and evaluated on worst-case privacy leakage, they perform 'not too bad' and can even be 'the best thing you can do' depending on constraints, surprising the speaker.

4Non-Adversarial Reproduction in Large Language Models (LLMs)

LLMs can inadvertently reproduce verbatim training data, including copyrighted material, even when used by benign users for everyday tasks (e.g., creative writing, factual explanations). This 'non-adversarial reproduction' challenges the argument that copyright violations only occur under adversarial usage.

Examples show LLMs generating text with many long snippets (50+ characters) found verbatim online, while human-written text for the same prompts contains far fewer. For factual tasks like explanations, up to 25% of LLM output can consist of such snippets. This phenomenon is long-tailed, with rare instances of very long reproductions (e.g., entire Wikipedia articles or code snippets).

5LLMs Reproduce More Than Humans

Compared to human writers, LLMs systematically reproduce more existing online text. While human-generated text for creative, persuasive, and factual tasks often has zero median overlap with existing online snippets, LLMs consistently show non-trivial median and mean reproduction rates.

A comparison using prompts from Reddit and IMDb shows that the median fraction of reproduced text for all human baselines is zero, whereas for LLMs, the median and mean are 'much much larger or at least very non-trivially non zero.'

Bottom Line

The 'long-tailed phenomenon' of LLM reproduction means that while most outputs are original, rare instances can involve extremely long, verbatim copies of training data (e.g., entire articles or code blocks).

So What?

This makes detection and mitigation challenging, as average reproduction rates might seem low, but the risk of significant copyright infringement in specific cases remains high, potentially leading to substantial legal exposure.

Impact

Develop advanced, real-time detection systems for long-form verbatim reproduction in LLM outputs, potentially integrated into LLM APIs or content creation platforms, to flag or prevent high-risk outputs before distribution.

Opportunities

AI Content Plagiarism & Copyright Compliance Tool

A service that scans AI-generated content (especially from LLMs) for verbatim reproduction of existing online text, identifying potential copyright infringements or privacy leaks. This tool would go beyond standard plagiarism checkers by specifically targeting the 'long-tailed' reproduction phenomenon observed in LLMs, offering detailed reports on snippet length, source, and risk assessment.

Source: Discussion on non-adversarial reproduction in LLMs and the need for mitigation beyond terms of service.

Privacy-Preserving ML Evaluation Platform

A platform offering rigorous, worst-case privacy evaluations for machine learning models, particularly for heuristic defenses. It would implement the 'proper' evaluation protocol (e.g., using canaries to audit vulnerable data, comparing apples-to-apples with DP baselines) to provide accurate privacy leakage metrics, helping developers build more trustworthy and compliant AI systems.

Source: Critique of current heuristic privacy evaluation methods and the call for reporting worst-case privacy leakage.

Key Concepts

Worst-Case vs. Average-Case Privacy

Privacy should be evaluated against the worst-case scenario, focusing on the most vulnerable data points, rather than an average across all data. Average-case metrics can mask significant privacy leakage for specific, sensitive data.

Adversarial vs. Benign User Threat Models

Assuming a benign user does not eliminate harm. Even non-adversarial interactions with AI models can lead to unintended consequences, such as copyright infringement through verbatim reproduction of training data.

Lessons

When evaluating heuristic privacy defenses, always assess the worst-case privacy leakage of the most vulnerable data points in your dataset, rather than relying on average-case metrics.
Consider using differentially private mechanisms like DP-SGD as a strong heuristic defense, even if theoretical guarantees are weak, due to its empirical effectiveness in protecting privacy.
Implement robust detection mechanisms for verbatim reproduction in LLM outputs, especially for long snippets, to mitigate inadvertent copyright violations and data leakage, even from benign user interactions.
Educate users and developers about the inherent risk of non-adversarial reproduction in LLMs, emphasizing that terms of service alone are insufficient to prevent copyright issues.

Protocol for Rigorous Heuristic Privacy Defense Evaluation

**Identify Vulnerable Data:** Do not average privacy leakage over the entire dataset. Instead, identify and audit the most vulnerable data points (e.g., outliers, rare examples, mislabeled samples) using 'canaries' or similar techniques.

**Employ Strong Attacks:** Use state-of-the-art membership inference attacks that are adapted to the specific privacy defense being evaluated, rather than weak or generic attacks.

**Compare Fairly to DP Baselines:** When comparing to differentially private (DP) baselines, ensure an 'apples-to-apples' comparison by matching utility (e.g., test accuracy) rather than comparing high-utility heuristics to overly private (and practically useless) DP models.

**Report Worst-Case Metrics:** Explicitly report the worst-case privacy leakage observed across the most vulnerable data, treating privacy as a worst-case metric even in natural data settings.

Quotes

"You really need to report the worst case privacy leakage of the most vulnerable samples in your data set and not just an average overall samples. Precisely because privacy leakage can be very non-uniform and because the privacy the most vulnerable data the data with potentially the most privacy leakage could also I mean at least intuitively also could often be the data that you want to protect the most."

Michael Arie

"Just because you assume there is no attacker doesn't mean that there is no no copyright violations."

Michael Arie

Q&A

Related Episodes

Y Combinator• Jun 12, 2026

5 Papers That Show Where AI Research Is Heading Right Now

"This Y Combinator session explores five cutting-edge AI research papers, revealing advancements in AI for biology, self-play for LLMs, real-time voice agents, formal math verification, and agentic programming workflows."

Artificial IntelligenceMachine LearningBiology+2

Y Combinator• May 1, 2026

Recursion Is The Next Scaling Law In AI

"This episode explores how recursion, applied at inference time, is emerging as a powerful scaling law in AI, enabling models to achieve advanced reasoning capabilities with significantly fewer parameters than large language models."

Machine LearningDeep LearningNeural Networks+2

Julian Dorey Podcast• Jun 26, 2026

“Psychopaths in Power!” - HEATED Debate on Peter Thiel, Lutnick & Scariest AI Outcome | Pomp • 440

"A heated debate unpacks the societal and economic implications of AI, challenging official inflation data and questioning the ethics of powerful tech figures amidst a disappearing middle class."

Artificial IntelligenceEconomicsInflation+2

Google TechTalks• Jun 10, 2026

Machine Text Detectors are Membership Inference Attacks

"This research reveals that machine text detection and membership inference attacks, traditionally studied as separate problems, are fundamentally linked both theoretically and empirically, sharing optimal methods and exhibiting high cross-task transferability."

Membership Inference AttacksLarge Language ModelsAI Safety+2