Threat Models for Memorization: Privacy, Copyright, and Everything In-Between
Quick Read
Summary
Takeaways
- ❖Heuristic privacy defenses are often evaluated incorrectly, focusing on average-case privacy rather than the worst-case leakage from vulnerable data.
- ❖Privacy should always be treated as a worst-case metric, even when dealing with natural datasets, to accurately assess risks.
- ❖Large Language Models (LLMs) can inadvertently reproduce significant portions of their training data, including copyrighted text, during benign, everyday usage.
- ❖LLMs systematically reproduce more existing online text than humans performing similar tasks, raising concerns about copyright infringement.
- ❖Differentially Private Stochastic Gradient Descent (DP-SGD) can be a surprisingly effective heuristic defense against memorization, even without strong theoretical guarantees.
Insights
1Flawed Evaluation of Heuristic Privacy Defenses
Typical evaluations of heuristic privacy defenses are misleading because they implicitly report an average-case notion of privacy, failing to reflect leakage from the most vulnerable data. This often includes outliers, mislabeled samples, or out-of-distribution data, which are precisely what one would want to protect most. Additionally, these evaluations often use weak attacks and compare against unfairly private (and useless) differentially private (DP) baselines.
The speaker demonstrates that when evaluated correctly, the actual privacy leakage for heuristic defenses was 7 to over 50 times larger than reported, and none provided meaningful privacy in practice. This is shown by shifting from averaging over the entire dataset to auditing 'canaries' that mimic the most vulnerable data.
2Privacy as a Worst-Case Metric for Natural Data
Even when relaxing the threat model from pathological worst-case datasets to more natural data, privacy should still be treated as a worst-case metric. It is crucial to report the privacy leakage of the most vulnerable data points within a natural dataset, as leakage can be highly non-uniform and sensitive data (e.g., rare medical cases) is often the most vulnerable.
The speaker illustrates this by comparing average-case privacy (low leakage) with worst-case privacy for individual samples (high leakage, especially for outliers), showing a 'very non-uniform' leakage pattern. The example of medical settings with rare diseases highlights the importance of protecting these vulnerable individuals.
3DP-SGD as a Strong Heuristic Defense
Differentially Private Stochastic Gradient Descent (DP-SGD), even when configured with very high epsilon parameters that provide 'meaningless' theoretical privacy guarantees (e.g., epsilon in the order of 10^8), can still act as a very strong heuristic defense. It empirically outperforms other heuristic approaches in protecting privacy, particularly when considering the worst-case leakage of vulnerable data.
When DP-SGD baselines are trained for high utility (comparable to heuristics) and evaluated on worst-case privacy leakage, they perform 'not too bad' and can even be 'the best thing you can do' depending on constraints, surprising the speaker.
4Non-Adversarial Reproduction in Large Language Models (LLMs)
LLMs can inadvertently reproduce verbatim training data, including copyrighted material, even when used by benign users for everyday tasks (e.g., creative writing, factual explanations). This 'non-adversarial reproduction' challenges the argument that copyright violations only occur under adversarial usage.
Examples show LLMs generating text with many long snippets (50+ characters) found verbatim online, while human-written text for the same prompts contains far fewer. For factual tasks like explanations, up to 25% of LLM output can consist of such snippets. This phenomenon is long-tailed, with rare instances of very long reproductions (e.g., entire Wikipedia articles or code snippets).
5LLMs Reproduce More Than Humans
Compared to human writers, LLMs systematically reproduce more existing online text. While human-generated text for creative, persuasive, and factual tasks often has zero median overlap with existing online snippets, LLMs consistently show non-trivial median and mean reproduction rates.
A comparison using prompts from Reddit and IMDb shows that the median fraction of reproduced text for all human baselines is zero, whereas for LLMs, the median and mean are 'much much larger or at least very non-trivially non zero.'
Bottom Line
The 'long-tailed phenomenon' of LLM reproduction means that while most outputs are original, rare instances can involve extremely long, verbatim copies of training data (e.g., entire articles or code blocks).
This makes detection and mitigation challenging, as average reproduction rates might seem low, but the risk of significant copyright infringement in specific cases remains high, potentially leading to substantial legal exposure.
Develop advanced, real-time detection systems for long-form verbatim reproduction in LLM outputs, potentially integrated into LLM APIs or content creation platforms, to flag or prevent high-risk outputs before distribution.
Opportunities
AI Content Plagiarism & Copyright Compliance Tool
A service that scans AI-generated content (especially from LLMs) for verbatim reproduction of existing online text, identifying potential copyright infringements or privacy leaks. This tool would go beyond standard plagiarism checkers by specifically targeting the 'long-tailed' reproduction phenomenon observed in LLMs, offering detailed reports on snippet length, source, and risk assessment.
Privacy-Preserving ML Evaluation Platform
A platform offering rigorous, worst-case privacy evaluations for machine learning models, particularly for heuristic defenses. It would implement the 'proper' evaluation protocol (e.g., using canaries to audit vulnerable data, comparing apples-to-apples with DP baselines) to provide accurate privacy leakage metrics, helping developers build more trustworthy and compliant AI systems.
Key Concepts
Worst-Case vs. Average-Case Privacy
Privacy should be evaluated against the worst-case scenario, focusing on the most vulnerable data points, rather than an average across all data. Average-case metrics can mask significant privacy leakage for specific, sensitive data.
Adversarial vs. Benign User Threat Models
Assuming a benign user does not eliminate harm. Even non-adversarial interactions with AI models can lead to unintended consequences, such as copyright infringement through verbatim reproduction of training data.
Lessons
- When evaluating heuristic privacy defenses, always assess the worst-case privacy leakage of the most vulnerable data points in your dataset, rather than relying on average-case metrics.
- Consider using differentially private mechanisms like DP-SGD as a strong heuristic defense, even if theoretical guarantees are weak, due to its empirical effectiveness in protecting privacy.
- Implement robust detection mechanisms for verbatim reproduction in LLM outputs, especially for long snippets, to mitigate inadvertent copyright violations and data leakage, even from benign user interactions.
- Educate users and developers about the inherent risk of non-adversarial reproduction in LLMs, emphasizing that terms of service alone are insufficient to prevent copyright issues.
Protocol for Rigorous Heuristic Privacy Defense Evaluation
**Identify Vulnerable Data:** Do not average privacy leakage over the entire dataset. Instead, identify and audit the most vulnerable data points (e.g., outliers, rare examples, mislabeled samples) using 'canaries' or similar techniques.
**Employ Strong Attacks:** Use state-of-the-art membership inference attacks that are adapted to the specific privacy defense being evaluated, rather than weak or generic attacks.
**Compare Fairly to DP Baselines:** When comparing to differentially private (DP) baselines, ensure an 'apples-to-apples' comparison by matching utility (e.g., test accuracy) rather than comparing high-utility heuristics to overly private (and practically useless) DP models.
**Report Worst-Case Metrics:** Explicitly report the worst-case privacy leakage observed across the most vulnerable data, treating privacy as a worst-case metric even in natural data settings.
Quotes
"You really need to report the worst case privacy leakage of the most vulnerable samples in your data set and not just an average overall samples. Precisely because privacy leakage can be very non-uniform and because the privacy the most vulnerable data the data with potentially the most privacy leakage could also I mean at least intuitively also could often be the data that you want to protect the most."
"Just because you assume there is no attacker doesn't mean that there is no no copyright violations."
Q&A
Recent Questions
Related Episodes

Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
"Research reveals how dynamic LLM training, including PII additions and removals, creates 'assisted memorization' and 'privacy ripple effects,' making sensitive data extractable even when initially unmemorized."

We Went WAY Down the Melania Rabbit Hole (w/ Jane Coaston) | The Bulwark Podcast
"Melania Trump's rare public statement denying ties to Jeffrey Epstein is framed as a preemptive move against potential revelations from a deported former friend, while Donald Trump's attacks on MAGA commentators expose the movement's true loyalty to him alone."

Renowned DNA expert: Investigators should return to Nancy Guthrie’s house, new tech could solve case
"A renowned DNA expert reveals how cutting-edge forensic technology and a re-examination of the crime scene could still solve the high-profile disappearance of Nancy Guthrie, despite complex DNA evidence."

Trump’s Pentagon SLAMMED in Court for RETALIATION SCHEME
"A federal judge issued a preliminary injunction against the Trump administration's Department of Defense for illegally retaliating against AI company Anthropic over its ethical use restrictions on its technology."