How to Protect Contract Confidentiality When Using AI: The Moderation Layer Approach

Every time you paste contract text into an AI tool, you hand confidential client data to a third party.
In this article, we present an alternative called as 'Moderation Layer', understand how it works, and what to look for when evaluating legal AI vendors.

$5.08M

Average breach cost for professional services firms (IBM, 2024)

79%

Of lawyers reported using AI in their practice in 2024

10%

Of law firms had formal AI usage policies in place

ABA Formal Opinion 512 (July 2024)
The ABA's first formal ethics guidance on generative AI is explicit: under Model Rule 1.6, lawyers must make "reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of the client." The opinion warns that self-learning AI tools create specific risks that confidential input may be disclosed to others, and requires lawyers to obtain informed client consent before inputting confidential data into such tools. Boilerplate provisions in engagement letters are not sufficient. State bars in California, Florida, New York, New Jersey, and Pennsylvania have issued parallel guidance. In the UK, the SRA holds solicitors personally liable for confidentiality failures even when using third-party technology. The European CCBE designates confidentiality as a "fundamental and primary right and duty" of the lawyer, with no time limitation.

The bottom line: using AI for contract work is not optional going forward. But doing it without a privacy architecture is indefensible both ethically and commercially.

What a Moderation Layer Actually Does

A moderation layer is a processing step that sits between the user and the language model. Its entire purpose is to ensure that the LLM never sees raw confidential data while still delivering useful, contextualized analysis back to the user.

The process has three stages: detect, replace, and reverse.

First, the moderation layer scans the contract text and identifies every sensitive entity: party names, individual names, financial figures, dates, addresses, email addresses, phone numbers, and any custom-flagged terms. Second, it replaces each detected entity with a consistent pseudonymized placeholder. Third, after the LLM returns its analysis on the sanitized text, the moderation layer reverses the substitution using a securely stored mapping table, reinserting all original values into the output.

Contract text in Word


Local moderation layer detects & anonymizes→


Only sanitized text sent to LLM


LLM output de-anonymized via mapping table


User sees full, contextualized result

The LLM processes only anonymized placeholders. It never sees actual party names, financial figures, dates, or identifying details.

The critical word here is pseudonymization, not redaction. This distinction makes or breaks the approach. Here is why.

If you replace "Acme Corp" with [REDACTED], you destroy the relational context the LLM needs to reason about the contract. The model cannot tell which party has which obligation. If three different entities are all replaced with [REDACTED], the analysis becomes incoherent.

Pseudonymization replaces "Acme Corp" with PARTY_A consistently throughout the document. Every reference maps to the same placeholder. The LLM can now determine that PARTY_A has indemnification obligations to PARTY_B, that the liability cap is AMOUNT_1, and that the agreement terminates on DATE_1. The structural and semantic relationships are fully intact. After the LLM returns its analysis, the mapping table swaps the placeholders back. The user sees a complete, fully contextualized result. The AI saw none of the real data.

Let's look at what this actually looks like in a real contract clause.

Raw clause sent to LLM (no moderation layer)
Section 8.1 Indemnification. Meridian Technologies Inc. ("Indemnifying Party") shall indemnify, defend, and hold harmless Pinnacle Partners LLC and its officers, including Sarah Chen (VP Legal), from and against all claims arising from the Indemnifying Party's breach of this Agreement, up to a maximum aggregate liability of $4,500,000. Notice of any claim must be delivered to legal@meridiantech.com within thirty (30) calendar days of discovery.
Anonymized clause sent to LLM (with moderation layer)
Section 8.1 Indemnification. PARTY_A ("Indemnifying Party") shall indemnify, defend, and hold harmless PARTY_B and its officers, including PERSON_1 (VP Legal), from and against all claims arising from the Indemnifying Party's breach of this Agreement, up to a maximum aggregate liability of AMOUNT_1. Notice of any claim must be delivered to EMAIL_1 within PERIOD_1 of discovery.

The LLM can analyze the anonymized version with the same accuracy as the raw version. It can identify the indemnification structure, flag that the liability cap is one-directional, note the notice period requirement, and compare the clause against standard market terms. It just does all of this without knowing who the parties are, how much money is at stake, or who to contact.

How It Works: Detection, Pseudonymization, and De-anonymization

A production-grade moderation layer is not a single technique. It combines three detection methods, each covering a different category of sensitive data. Here is how each works, with contract-specific examples.

Layer 1: Named Entity Recognition (NER)

NER is a machine learning technique trained to identify named entities within unstructured text: person names, organization names, geographic locations, dates, and similar categories. In a contract context, a trained NER model reading the sentence "Pursuant to Section 4.2, Rajesh Mehta of Meridian Technologies shall deliver all source materials to Sarah Chen at Pinnacle Partners by March 15, 2026" will flag five entities:

NER detection output
Rajesh Mehta → PERSON (replaced with PERSON_2)
Meridian Technologies → ORGANIZATION (replaced with PARTY_A)
Sarah Chen → PERSON (replaced with PERSON_1)
Pinnacle Partners → ORGANIZATION (replaced with PARTY_B)
March 15, 2026 → DATE (replaced with DATE_3)

NER handles the hardest category of sensitive data: names and entities embedded in natural language where no predictable format exists. Open-source NER libraries like spaCy and tools like Microsoft Presidio provide solid baselines. Production-grade legal AI systems fine-tune their NER models on contract-specific language to improve accuracy on entities like law firm names, subsidiary structures, and jurisdiction references that generic models often miss.

Layer 2: Regular Expressions (Regex)

Regex handles the structured, format-predictable data that NER models sometimes overlook. Contracts are full of these: monetary values with currency symbols, email addresses, phone numbers, Social Security numbers, EIN/TIN numbers, bank account numbers, and IP addresses.

Regex detection patterns in contracts
$4,500,000.00 → matches USD currency pattern → AMOUNT_1
legal@meridiantech.com → matches email pattern → EMAIL_1
+1 (212) 555-0147 → matches phone pattern → PHONE_1
87-1234567 → matches EIN pattern → TAX_ID_1
192.168.1.100 → matches IPv4 pattern → IP_ADDR_1

Regex is deterministic and fast. If a pattern is defined correctly, it catches every match without exception. The limitation is that it only works on data with predictable formats. A company name like "Meridian Technologies" has no universal format, which is why NER handles that category.

Layer 3: Custom Dictionaries and Rules

This is where legal-specific anonymization diverges from generic PII redaction. Off-the-shelf NER and regex will not flag terms that are sensitive in context but not inherently identifiable: internal project codenames, deal identifiers, specific product names under NDA, proprietary clause language, or department-specific terminology.

A configurable dictionary layer lets InfoSec and legal teams define additional terms that must be anonymized. For example, a pharmaceutical company might flag the codename of an unreleased drug. A technology company might flag a proprietary algorithm name. A law firm might flag specific matter numbers that could identify a client engagement.

Custom dictionary detection examples
Project Nightingale → custom rule (internal codename) → PROJECT_1
MRD-4820 compound → custom rule (unreleased drug) → PRODUCT_1
Matter #2024-CLO-0892 → custom rule (matter ID) → MATTER_ID_1

Why all three layers are non-negotiable

No single technique catches everything. NER handles names and contextual entities but can miss unusual formatting. Regex catches structured patterns with precision but has no understanding of context. Custom dictionaries fill the gaps unique to each organization. A vendor relying on only one or two of these techniques will have blind spots. The layered approach is what separates a production system from a proof-of-concept.

The Mapping Table: How De-anonymization Works

After detection, the moderation layer builds a mapping table that pairs each original entity with its placeholder. This table is the key to reversibility.

Mapping table (stored securely, never sent to LLM)
PARTY_A ↔ Meridian Technologies Inc.
PARTY_B ↔ Pinnacle Partners LLC
PERSON_1 ↔ Sarah Chen
PERSON_2 ↔ Rajesh Mehta
AMOUNT_1 ↔ $4,500,000.00
DATE_3 ↔ March 15, 2026
EMAIL_1 ↔ legal@meridiantech.com

Consistency in the mapping is critical. If "Meridian Technologies" appears 47 times across the contract, it must map to PARTY_A every time. Inconsistent mapping would confuse the LLM and degrade analysis quality. After the LLM returns its output (which references PARTY_A, AMOUNT_1, etc.), the de-anonymization step consults the mapping table and swaps every placeholder back to its original value. The user sees a complete result. The LLM never saw any of the real data.

The mapping table itself must be stored securely within the user's environment or encrypted on the vendor's servers, and it must never be sent alongside the anonymized text to the LLM provider. If it were, the entire exercise would be pointless.

Let's see the full cycle in a complete example.

Original clause (what the user sees in Word)
Section 12.3 Non-Compete. For a period of twenty-four (24) months following the Closing Date (June 30, 2026), David Park, in his capacity as former CEO of NovaBridge Solutions, shall not directly or indirectly engage in any business that competes with Atlas Ventures Group within the United States, Canada, and United Kingdom. Breach of this provision shall entitle Atlas Ventures Group to liquidated damages of $2,000,000.
Anonymized clause (what the LLM sees)
Section 12.3 Non-Compete. For a period of PERIOD_1 following the Closing Date (DATE_1), PERSON_1, in his capacity as former CEO of PARTY_A, shall not directly or indirectly engage in any business that competes with PARTY_B within the TERRITORY_1. Breach of this provision shall entitle PARTY_B to liquidated damages of AMOUNT_1.

The LLM can analyze this clause fully: it can flag the non-compete duration, evaluate the geographic scope, note the liquidated damages provision, and compare these terms against market norms. It just does all of this without knowing the real names, the real dollar figure, or the real jurisdictions.

After the LLM returns its risk analysis, the moderation layer reverses the substitution. The user's output reads: "The 24-month non-compete period for David Park is within standard range, but the $2,000,000 liquidated damages clause for NovaBridge Solutions may face enforceability challenges in the United Kingdom, where..." Every original value is restored. The analysis is fully contextualized.

Comparison: Four Approaches to Contract Data Protection

Not all approaches to this problem are equal. Here is how the four most common configurations stack up across the criteria that matter to legal teams.

Criteria No Protection Server-Side Anonymization Client-Side Anonymization Hybrid (Client + Server)
How it works Raw contract text sent directly to LLM API Vendor's cloud server strips sensitive data, then forwards sanitized text to LLM Anonymization runs locally (e.g., inside a Word add-in) before any data leaves the user's machine Client-side anonymization runs first; server-side validation catches edge cases; only then is sanitized text sent to LLM
Who sees raw data LLM provider Vendor briefly, LLM does not Nobody external Nobody external (server sees only pre-anonymized text)
LLM training risk High. Your data may enter training corpus Low. LLM gets only placeholders Low. LLM gets only placeholders Low. LLM gets only placeholders
Breach impact at LLM provider Full contract data exposed Only anonymized text exposed Only anonymized text exposed Only anonymized text exposed
Breach impact at vendor N/A (direct LLM access) Raw data at risk (vendor held it pre-anonymization) No raw data held No raw data held (server received pre-anonymized text)
ABA/ethics compliance Non-compliant without explicit informed client consent Partial. Depends on vendor security posture and contractual terms Strong. Raw data stays within user's control Strongest. Defense in depth satisfies "reasonable efforts" standard
Analysis quality Full context (at unacceptable risk) Full context via pseudonymization Full context via pseudonymization Full context + dual-pass reduces missed entities
InfoSec auditability No visibility into what data was sent Server logs available Local logs only Full audit trail: local logs + server logs + anonymized versions stored
Best suited for Non-sensitive, internal-only documents with no client data Teams comfortable trusting vendor with raw data in transit Maximum privacy requirements, single-user workflows Legal departments, regulated industries, enterprise contract operations

Two rows in that table deserve special attention. First, "breach impact at vendor" is the blind spot in pure server-side approaches. If the vendor's own infrastructure is compromised, any raw data that was held pre-anonymization is at risk. Client-side and hybrid approaches eliminate this vector entirely because the vendor never handles raw contract text. Second, "InfoSec auditability" matters more than most legal teams realize at the evaluation stage. When a client or regulator asks, "Can you prove that no confidential data left your environment?", you need logs and stored anonymized versions to demonstrate compliance after the fact.

How ContractKen Does It

ContractKen's moderation layer is built directly inside Microsoft Word as an add-in. This is a deliberate architectural choice, not a convenience feature. By running the moderation layer locally within Word, the first and most important stage of anonymization happens on the user's own machine, before any contract data touches an external server of any kind.

Here is the data flow in practice:

Step 1: Local detection and anonymization - When a user triggers AI analysis from within Word, the moderation layer scans the full contract text using a combination of NER, regex, and configurable custom rules. It identifies and replaces all sensitive entities with consistent pseudonymized placeholders. The user can preview the exact anonymized version of the document at the click of a button, a WYSIWYG (What You See Is What You Get) view that shows precisely what text will be sent externally. Nothing leaves the user's machine until the user is satisfied with the anonymization.

Step 2: Server-side validation - The already-anonymized text is sent to ContractKen's server, where a second-pass validation layer checks for edge cases the client-side processing may have missed: unusual entity formatting, misspellings, domain-specific patterns. This step operates on anonymized text only. ContractKen's own servers never see the raw contract.

Step 3: LLM analysis - Only the fully sanitized text, validated by two passes, is forwarded to the LLM for analysis. The model processes the anonymized contract and returns its output.

Step 4: De-anonymization and delivery - ContractKen's moderation layer consults the securely stored mapping table and reverses every placeholder substitution. The user sees a fully contextualized result directly in Word, with all original party names, financial figures, and dates restored.

Three features make this approach especially relevant for enterprise legal teams:

Admin-configurable sensitivity rules. InfoSec teams have the ability to define what words, phrases, and data categories qualify as sensitive, private, or confidential. The system automatically anonymizes standard PII elements and financial information, but administrators can add proprietary terms, custom clause language, and organization-specific identifiers. This means the moderation layer adapts to each organization's specific risk profile rather than applying a one-size-fits-all ruleset.

Granular audit logs. Every AI interaction is logged with full detail: who triggered the analysis, what data was shared, which AI provider received it, and when. The anonymized versions of documents are stored separately to create a complete audit trail. This is not just about compliance in theory. It means that when a client asks, "Can you prove our contract data was protected?", there is a documented, auditable answer.

WYSIWYG anonymization preview. Before any data leaves the user's environment, users can view the actual anonymized version of their document. This serves as both a trust mechanism (users can verify the moderation layer is working) and a quality assurance step (if a sensitive term was missed by the automated detection, the user catches it visually before the text is sent).

Why "inside Word" matters architecturally

Most legal AI tools require users to upload contracts to a separate web application. That means raw contract text travels from the user's machine to the vendor's servers before any anonymization occurs. Even if the vendor anonymizes before calling the LLM, the vendor has already received raw data. By running the moderation layer inside Word itself, ContractKen ensures that the first anonymization pass happens locally. The raw text never enters any external server at all. For legal teams evaluating AI tools against ABA Opinion 512's "reasonable efforts" standard, this architectural difference is the most defensible position available.

Frequently Asked Questions

Why is it risky to paste contracts directly into ChatGPT or other LLMs?

Contracts contain party names, deal values, indemnification caps, IP assignments, and personally identifiable information. When you paste this into a public LLM, you risk training data absorption (the provider may use your input to improve its models), platform breaches (OpenAI disclosed a bug in March 2023 that exposed user conversations), and privilege waiver (ABA Formal Opinion 512 warns that sharing client data with third-party AI tools can compromise attorney-client privilege). Samsung's 2023 incident, where engineers pasted proprietary source code into ChatGPT across three separate occasions, remains the most widely cited example of this risk materializing.

What is a moderation layer in legal AI, and how does it protect contract data?

A moderation layer is a processing step between the user and the LLM. It scans contract text using NER, regex, and custom rules to identify sensitive entities, replaces them with consistent pseudonymized placeholders (e.g., "Acme Corp" becomes "PARTY_A"), sends only the sanitized version to the LLM, and then reverses the substitution after the model returns its analysis. The LLM never processes actual confidential data, but the user receives a fully contextualized output with all original values restored.

What is the difference between server-side and client-side anonymization?

Server-side anonymization processes the contract on the vendor's cloud server before forwarding sanitized text to the LLM. The vendor sees raw data briefly. Client-side anonymization runs locally on the user's machine (for example, inside a Word add-in), so raw text never leaves the user's environment. A hybrid approach runs client-side anonymization first, then server-side validation catches edge cases. The hybrid approach is the gold standard for legal workflows because even the vendor's own servers never handle raw contract text.

Does anonymizing contract text reduce the quality of AI analysis?

No. The key is pseudonymization, not redaction. Replacing "Acme Corp" with "[REDACTED]" destroys context. Replacing it with "PARTY_A" consistently throughout the document preserves every structural and relational relationship the LLM needs. The model can still analyze obligations, risk allocation, liability caps, and clause structure. After analysis, the original values are swapped back. The quality difference is negligible for contract review, risk scoring, and clause extraction tasks.

What does the ABA say about using AI with confidential client information?

ABA Formal Opinion 512 (July 2024) requires lawyers to keep client information confidential under Model Rule 1.6 and to make "reasonable efforts" to prevent unauthorized disclosure. The opinion specifically warns that self-learning AI tools create risks of confidential data disclosure, requires informed client consent before using such tools with client data, and states that boilerplate language in engagement letters is insufficient. State bars in California, Florida, New York, New Jersey, and Pennsylvania have issued parallel guidance.

What detection techniques do moderation layers use?

Production-grade systems combine three techniques. Named Entity Recognition (NER) uses ML models to identify unstructured entities like person and organization names. Regular expressions (regex) catch structured patterns like email addresses, phone numbers, monetary amounts, and tax IDs. Custom dictionaries and rules handle domain-specific terms that generic models miss, such as internal project codenames, proprietary product names, and matter identifiers. All three are needed because each covers a different category of sensitive data.

Can I use ChatGPT Enterprise or the API to avoid these issues?

Enterprise plans and API access offer better data handling: OpenAI states that API data is not used for model training, and enterprise agreements include stronger confidentiality terms. However, your data still travels to third-party servers. In a breach, subpoena, or security incident, that data is exposed in full. A no-training clause is a contractual safeguard. An anonymization layer is a technical safeguard. For regulated industries, you need both. With a moderation layer in place, even a complete compromise of the LLM provider's systems exposes only meaningless pseudonymized text.

How does ContractKen handle contract confidentiality?

ContractKen uses a hybrid moderation layer built inside Microsoft Word. The first anonymization pass runs locally within the Word add-in, so raw contract text never leaves the user's machine. A server-side validation pass catches edge cases on the already-anonymized text. Only fully sanitized text reaches the LLM. Users can preview the exact anonymized document before sending, InfoSec teams configure what qualifies as sensitive, and granular audit logs record every AI interaction with full detail. The architecture means that neither ContractKen's servers nor the LLM provider ever handle raw contract data.