G
Google TechTalks
January 27, 2026

Evaluating Data Misuse in LLMs: Introducing Adversarial Compression Rate as a Metric of Memorization

Quick Read

This presentation introduces Adversarial Compression Rate (ACR) as a robust metric to quantify LLM memorization, addressing copyright concerns by focusing on the shortest prompt needed to elicit exact verbatim output.
ACR measures memorization by the shortest prompt needed to extract exact text.
Larger LLMs memorize more; famous quotes are highly compressible by ACR.
Simple 'unlearning' prompts are ineffective against ACR, revealing persistent memorization.

Summary

The Google TechTalks presentation introduces Adversarial Compression Rate (ACR) as a novel metric to evaluate data memorization in Large Language Models (LLMs), particularly in the context of copyright infringement concerns like the New York Times lawsuit against OpenAI. ACR defines memorization by the ratio of an output's length to the shortest possible adversarial prompt that elicits its exact verbatim reproduction. The researchers demonstrate ACR's effectiveness in identifying memorized content, showing that larger models memorize more and that famous quotes are highly compressible, while random strings or new data are not. A key finding is that simple 'in-context unlearning' prompts do not prevent ACR from revealing memorized data, suggesting an illusion of unlearning. The talk also explores data-dependent thresholds for ACR and its theoretical connection to Kolmogorov complexity for future extensions.
The Adversarial Compression Rate (ACR) offers a precise, context-aware metric for LLM memorization, crucial for navigating complex copyright lawsuits and regulatory challenges. It moves beyond simple output matching to assess how efficiently an LLM can reproduce copyrighted material from minimal prompts, highlighting the inadequacy of superficial 'unlearning' mechanisms. This is vital for LLM developers to build compliant models and for legal frameworks to accurately assess data misuse.

Takeaways

  • The New York Times lawsuit against OpenAI underscores the urgent need for a precise definition of LLM memorization, especially regarding verbatim content reproduction.
  • Traditional memorization tests often fall on a spectrum from 'any prompt elicits exact match' to 'beginning of sample elicits rest,' neither fully adequate for copyright.
  • Adversarial Compression Rate (ACR) proposes a middle-ground definition: the ratio of an output's length to the shortest prompt that elicits its exact verbatim match.
  • ACR validation shows larger models memorize more, famous quotes are highly memorized, while random strings or post-training news articles are not.
  • In-context unlearning (e.g., system prompts to avoid certain outputs) creates an 'illusion of unlearning' but does not prevent ACR from extracting memorized data with short prompts.
  • A fixed ACR threshold of one can lead to false positives for inherently compressible data (e.g., repetitive song lyrics); data-dependent thresholds (e.g., using `gzip` compression ratio) are more accurate.
  • Future work aims to extend ACR using information-theoretic analogies, comparing LLM-assisted compression to universal compressors like Kolmogorov complexity, to refine the definition of 'true' memorization.

Insights

1Rethinking LLM Memorization for Copyright

The New York Times lawsuit against OpenAI highlights the necessity for a precise definition of LLM memorization, moving beyond mere output matching to consider the context and brevity of the prompt that elicits verbatim content. Simple output overlap is insufficient to prove copyright violation; the nature of the prompt is critical.

The New York Times lawsuit against OpenAI, where GPT-4 allegedly recited articles verbatim, compelling the need for a more nuanced understanding of 'memorization' in the context of copyright law.

2Adversarial Compression Rate (ACR) Definition

ACR quantifies memorization as the ratio of a target string's length to the length of the shortest adversarial prompt that can elicit its exact verbatim reproduction from an LLM. A ratio greater than one indicates that the model has 'memorized' the content, as it can reproduce a long string from a disproportionately short prompt.

The formal definition of ACR as 'length of Y (target string) / length of minimal prompt (X*)' where X* is the shortest prompt eliciting Y exactly.

3ACR Validation and Model Behavior

Validation experiments confirm that ACR aligns with expected memorization patterns: larger LLMs memorize more training data, famous quotes are frequently memorized (around 50%), while randomly generated strings or content released after model training (e.g., AP news) are not memorized by the ACR metric. This demonstrates ACR's ability to distinguish between genuinely memorized content and novel or uncompressible data.

Sanity checks showing larger models memorize more (right-hand side figure at ). Experiments with famous quotes (50% memorized), random strings (0% memorized), and AP news (0% memorized).

4In-Context Unlearning is an Illusion

System prompts designed to make an LLM 'abstain' from outputting memorized content (e.g., famous quotes) do not prevent ACR from finding very short adversarial prompts that still elicit exact regurgitation. This indicates that the underlying memorization in the model weights persists, and such 'unlearning' is merely a superficial behavioral modification, not a true removal of learned data.

An example where a system prompt instructing the model to 'abstain from giving famous quotes' still allowed a two-token adversarial prompt ('iron inert') to elicit a famous quote exactly.

5Data-Dependent Thresholds for ACR

For highly compressible data (e.g., repetitive song lyrics like Daft Punk's 'Around the World'), a fixed ACR threshold of one can lead to false positives, as such data is inherently easy to compress. Using a data-dependent threshold, such as the compression ratio from a universal compressor like `gzip` or `smass`, provides a more accurate assessment of true memorization by factoring in the inherent compressibility of the content itself.

The Daft Punk 'Around the World' lyric example () and the discussion of using `gzip` or `smass` compression ratios as a data-dependent threshold () to avoid false positives ().

Bottom Line

The 'illusion of unlearning' created by system prompts for LLMs is a significant vulnerability for copyright and data privacy. While a model might appear to comply with instructions not to output certain data, the underlying memorization remains accessible via adversarial prompts.

So What?

This implies that current methods of 'unlearning' or content moderation based on system prompts are insufficient to address deep-seated memorization issues. Regulatory bodies and content owners should be aware that models can still harbor copyrighted material even if they appear to 'abstain' under normal user interaction.

Impact

Develop more robust unlearning mechanisms that truly alter model weights to remove memorized data, rather than relying on prompt-based filtering. This also creates a need for tools to audit LLMs for persistent memorization beyond superficial outputs.

The inherent compressibility of data, independent of an LLM, significantly impacts the interpretation of memorization metrics like ACR. Simple, repetitive patterns (e.g., 'Around the World' lyrics) can be 'compressed' by an LLM with a short prompt, but this might not signify true memorization if the data is already highly compressible by universal algorithms.

So What?

Relying solely on an ACR threshold of one can lead to false positives, misclassifying inherently simple or repetitive data as memorized. This complicates legal and ethical discussions around data misuse, as not all 'compression' by an LLM is evidence of problematic memorization.

Impact

Integrate universal compression algorithms (e.g., `gzip`) into memorization metrics to establish a data-dependent baseline. This allows for a more nuanced evaluation, distinguishing between an LLM's ability to exploit inherent data patterns versus its memorization of specific training examples.

Key Concepts

Adversarial Compression Rate (ACR)

A metric that quantifies LLM memorization by comparing the length of a target output string to the length of the shortest possible adversarial prompt required to elicit that exact string. A higher ratio indicates greater memorization.

Illusion of Unlearning

The phenomenon where superficial interventions, such as system prompts telling an LLM to abstain from certain outputs, create the appearance of 'unlearning' or non-memorization, while the underlying data remains extractable via adversarial prompts, indicating persistent memorization within the model's weights.

Lessons

  • Adopt Adversarial Compression Rate (ACR) as a primary metric for evaluating LLM memorization, particularly in contexts involving copyright or data privacy, due to its robustness against superficial unlearning attempts.
  • When applying ACR, consider implementing data-dependent thresholds (e.g., using `gzip` compression ratios) to accurately assess memorization, especially for content that is inherently highly compressible or repetitive, to avoid false positives.
  • Developers should not rely on simple system prompts or 'in-context unlearning' as sufficient means to address memorization concerns; true unlearning requires deeper modifications to model weights to prevent adversarial extraction.

Notable Moments

The New York Times lawsuit against OpenAI is used as a foundational example to illustrate the real-world implications and the need for a precise definition of LLM memorization.

A discussion between the presenters and an audience member (Katherine) about the distinction between 'memorization' (inherent in the model's weights) and 'extraction' (the ability to retrieve it).

Quotes

"

"If the prompt is short, maybe that's one thing we're observing from this slide, and the output matches exactly, then we might conclude that there really is a problem."

Avi
"

"Your definition of memorization shouldn't be sensitive to whether someone adds a system prompt that says, 'Oh, never say this thing.'"

Avi
"

"I would say that memorization is the stuff in the model like you're saying, but I would maybe use the word like extraction for what we're able to see here, which is separate from the memorization in the model."

Katherine

Q&A

Recent Questions

Related Episodes

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
Google TechTalksJan 27, 2026

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

"Discover how a simple n-gram coverage attack can surprisingly and effectively detect if specific data was used to train large language models, even with limited black-box access."

Large Language Models (LLMs)Data PrivacyCopyright Infringement+2
Cascading Adversarial Bias from Injection to Distillation in Language Models
Google TechTalksJan 27, 2026

Cascading Adversarial Bias from Injection to Distillation in Language Models

"RAG systems, designed to enhance LLM accuracy and personalization, are vulnerable to 'Phantom' trigger attacks where a single poisoned document can manipulate outputs to deny service, express bias, exfiltrate data, or generate harmful content."

Large Language Models (LLMs)Adversarial AttacksAI Security+2
LIVE: Trump SUFFERS Major LEGAL LOSSES as NO KINGS GROWS!!! | Legal AF
Legal AF PodcastMar 29, 2026

LIVE: Trump SUFFERS Major LEGAL LOSSES as NO KINGS GROWS!!! | Legal AF

"This episode details Donald Trump's significant legal and political setbacks, including a federal judge's 'Orwellian' ruling against his administration's actions, ongoing revelations in the Epstein investigation, and the economic fallout from his policies, all against the backdrop of growing 'No Kings' protests."

Legal ChallengesPolitical CommentaryEpstein Investigation+2
Todd Blanche’s Jaw-Dropping Ethics Violation (w/ Andrew Weissmann) | Illegal News
Bulwark TakesMar 28, 2026

Todd Blanche’s Jaw-Dropping Ethics Violation (w/ Andrew Weissmann) | Illegal News

"This episode exposes the alarming erosion of legal and ethical norms within the U.S. government, from questionable financial settlements and conflicts of interest to the Pentagon's attempts to control press access and the Department of Defense's coercive tactics against an AI company."

Rule of LawDepartment of JusticeTodd Blanche+2