Mechanistic Interpretability Audits · EU AI Act Compliance

We open the model.
We measure what changed.
We sign the record.

Fine-tuned AI models deployed in regulated decisions need a signed technical record of what they actually learned. The EU AI Act requires it. Most organizations don't have one. We produce it — layer by layer, feature by feature, in language a regulator can read and a scientist can verify.

⏳

August 2, 2026 — Current Legal Deadline · December 2, 2027 — Proposed Extension August 2, 2026 remains the legally binding enforcement date for Articles 11 and 13. Trilogue negotiations under the Digital Omnibus propose extending high-risk Annex III obligations to December 2, 2027 — with a target agreement before August. Until that agreement is formally adopted, the original deadline applies automatically. Regardless of which date governs, the documentation work is the same. The organizations walking into December 2027 with a signed Compliance Map from a published methodology will be in the strongest possible position. We are ready now.

22×

surge in novel feature creation at 70B parameters — documented, reproducible, and measurable in every model we audit. You cannot document what you cannot measure.

— NeurIPS 2026 submission · confirmed across Llama 3.x and Qwen 2.5

€15M

maximum EU AI Act penalty for non-compliance. The Compliance Map costs a fraction of that. The math works in every boardroom.

10–15

business days from model submission to signed Compliance Map. The documentation work is the same regardless of which deadline applies. Typical delivery: 12 business days.

What We Do

The Compliance Map.

A complete technical record of what your fine-tuned model learned — what it added, what it removed, what changed between the base model and the deployed version. Signed by a PhD researcher. Defensible in front of a regulator.

01 · Feature Classification

Every internal concept, accounted for

We classify every feature in your model as suppressed, shared, amplified, new, or eliminated — layer by layer. The first directional audit of exactly what a LoRA removes versus what it creates, with statistical validation at each step. Kill:create ratio documented. Nothing left unmeasured.

02 · Depth Fingerprint

Where the fine-tuning actually lives

A layer-by-layer reconstruction ratio profile showing where fine-tuning effects concentrate in your model's architecture. Early layers, mid-network, output reorganization — the shape of the curve carries real information about how the model was changed. We document that shape with precision.

03 · Geometric Signature

A unique version identity for every model

Full-rank SVD projection onto the base model's complete representational geometry produces a mathematically stable version signature. Any two model versions can be compared. Documented modifications can be verified against the original. Validated cross-architecture: Llama 3.x and Qwen 2.5.

04 · Output Divergence Analysis

Behavioral measurement alongside internal measurement

Side-by-side completions on domain-representative prompts with statistical measurement of output divergence. Because internal representation change does not reliably predict behavioral change — both measurements are required. Neither alone is sufficient. This is one of our core research findings.

"

Think of us as the engineering survey, not the building permit. Compliance Labs produces the technical model analysis component — a complete, signed document of what your model learned and how it changed. Legal strategy and regulatory filing remain with your counsel. Our work gives counsel the technical foundation to file with confidence.

— John Holman, Founder

Core Finding

Internal representation change does not reliably predict behavioral change.
Both measurements are required.

This is the finding that changes how compliance auditing works. A model can show dramatic internal reorganization while producing nearly identical outputs — or show stable internal structure while behavioral outputs diverge significantly. Neither measurement alone is sufficient to characterize what the fine-tuning actually did. Our methodology requires both.

Quadrant I · Coupled Adaptation

Internal change + behavioral change

Both measurements move together. The canonical case. Fine-tuning changed the model's internal representations and those changes are visible in its outputs. The compliance question is whether those changes are documented and defensible.

Quadrant II · Latent Divergence

Internal change, stable outputs

The most dangerous case for compliance. The model's internal representations have reorganized significantly — new features created, old ones suppressed — but outputs appear unchanged on standard evaluations. A shallow audit misses it. A mechanistic audit catches it.

Quadrant III · Delayed Behavioral Capture

Stable internals, behavioral change

Output behavior diverges from the base model without corresponding changes to internal feature structure. Often seen in instruction-following fine-tunes. The behavioral change is real; it originates from a different layer of the model than a feature-level audit would target.

Quadrant IV · Null Adaptation

Neither dimension changes

The fine-tuning left no measurable trace — internally or behaviorally. Important to document, because the absence of change is itself a finding. A signed null result is still a compliance artifact.

Why this matters for your compliance filing: Article 13 of the EU AI Act requires technical documentation that accurately describes the modifications made to a deployed AI system. A documentation method that measures only outputs, or only internal representations, will systematically fail to capture one of the four adaptation types above. The Compliance Map is the only auditing product built from a methodology that requires both measurements by design — because the research showed that anything less is insufficient.

Methodology

Published methodology. Reproducible. Cross-architecture validated.

The methodology underlying every Compliance Map was developed over 18 months of original research at the Awakened Intelligence AI Laboratory — coordinated by John Holman and executed by lead researcher Arshavir Blackwell, PhD. The findings were submitted to NeurIPS 2026. Every technique is implemented against a publicly documented, independently verifiable neutral evaluation corpus. A signed document that cannot be independently verified by a regulator is not worth the paper it is printed on.

Method 01

Sparse Autoencoders (SAEs)

We train sparse autoencoders on both the base model and the fine-tuned model to extract monosemantic feature dictionaries from each architecture. Features are classified by their behavior in base vs. fine-tuned contexts: suppressed, shared, amplified, new, or eliminated.

Method 02

Unified Crosscoder Pipeline

A crosscoder trained jointly on both model versions aligns the feature spaces for direct comparison. The pipeline includes dead-neuron auxiliary loss — a critical methodological requirement that prevents artifactually zero feature counts from corrupting the audit record.

Method 03

Layer-by-Layer Depth Profile

Reconstruction ratios measured at every transformer layer produce a depth fingerprint unique to each model version. The shape of the curve — where effects accelerate, plateau, and concentrate — carries structural information about how the LoRA reorganized the model's processing.

Method 04

Attention-Targeted LoRA Analysis

Our pipeline is calibrated specifically for attention-targeted LoRA fine-tuning — the most common fine-tuning approach for deployed models. Adapter weights analyzed. Modifications mapped. Content preservation confirmed through permutation testing at p ≈ 0.9.

Method 05

PCA Trajectory Analysis

Independent geometric verification: principal component analysis in the SAE feature space produces trajectory compression ratios that confirm representational divergence findings through a separate mathematical lens. Both methods must agree. Discordant results trigger additional investigation before the report is signed.

Method 06

Output Divergence Measurement

Side-by-side completions on 50–200 domain-representative prompts provided by the client. Statistical measurement of output divergence alongside internal measurement. The two-measurement requirement is a methodological non-negotiable — not an upsell.

Cross-architecture validation · Llama 3.x (3B, 8B, 70B) · Qwen 2.5 (3B–72B) · Generalizes across model families · Scale findings · Specialization surge confirmed at 70B/72B across architectures · NeurIPS 2026 · "Two Measurements Are Required: Scale-Dependent Representational Dynamics in Attention-Targeted LoRA Fine-Tuning" · Under review · Lead researcher · Arshavir Blackwell, PhD

Research

The science is public.

Every Compliance Map is backed by original published methodology. We publish our findings, explain our techniques, and do not ask clients to trust a black box. Read the research. Verify the approach. Then decide.

Publication · Inside the Black Box

Arshavir Blackwell, PhD
Lead Researcher & Co-Author

Mechanistic interpretability and artificial psycholinguistics in large language models. Written by Arshavir Blackwell, PhD and John Holman. Thousands of subscribers. Every technique underlying the Compliance Map has been explained in plain language for working researchers, legal practitioners, and compliance officers.

Substack ↗ Podcast ↗ LinkedIn ↗

NeurIPS 2026 Submission

"Two Measurements Are Required"

Scale-Dependent Representational Dynamics in Attention-Targeted LoRA Fine-Tuning. Co-authored by John Holman and Arshavir Blackwell, PhD. Two primary findings: a large-scale specialization surge at 70B/72B parameters, and the decorrelation of internal representation divergence from output divergence — the finding that motivates the two-measurement requirement.

Status

Under review · NeurIPS 2026 · April 2026 submission

Media Appearances — Aired

Arshavir Blackwell, PhD — Podcast & Media

The Team

Two partners. One methodology.

Compliance Labs is the product of 18 months of coordinated original research between a founder with a gift for building from first principles and a cognitive scientist with deep roots in the computational foundations of language. Every audit is a collaboration between them.

Founder & Principal

John Holman

John and his AI research team at Awakened Intelligence designed and built the custom architecture and AI-integrated pipelines that power Compliance Labs — from the sparse autoencoder training infrastructure to the unified crosscoder pipeline to the evaluation frameworks validated across Llama 3.x and Qwen 2.5. John handles training and evaluation runs, manages all client engagements, scopes every audit, and serves as the primary point of contact for legal counsel and regulated organizations.

john@compliance-labs.ai Substack ↗

Founder & Lead Researcher

Arshavir Blackwell, PhD

Cognitive scientist trained under Jeff Elman at UCSD, with deep roots in connectionism and the computational foundations of language. Arshavir designed and executed the mechanistic interpretability pipeline, developed the trajectory analysis methodology, and co-authored the NeurIPS 2026 submission underlying every Compliance Map. He reviews and signs every audit report delivered to clients.

arshavir@compliance-labs.ai Substack ↗ LinkedIn ↗ Podcast ↗

Services

Four ways we work together.

Every engagement produces a Compliance Map — a signed technical record of what your model learned. The scope of the engagement determines how much of the process we own. Pricing is scoped per engagement. Start with a conversation.

Service 01

Independent Audit

Compliance Map · Independent Third-Party · PhD-Signed Report

You provide the model. We produce the Compliance Map — a signed, independent third-party audit of what the model learned. The canonical case: your legal team needs documentation of a model that already exists and is already deployed.

Layer-by-layer depth fingerprint
Feature classification — added, removed, strengthened, silenced
Geometric fingerprint and unique version signature
Content preservation analysis
Inter-layer coordination analysis
Output divergence analysis with side-by-side completions
Plain-language executive summary for compliance officers
PhD-level signature and attestation — 10–15 business days

Independent Audit

Service 02

Audit + Remediation

Compliance Map · Fix · Re-Audit · New Map Issued

Everything in the Independent Audit, plus identification of specific features responsible for compliance concerns, a targeted remediation plan, retraining execution, and a new Compliance Map issued on completion. Before-and-after comparison report included.

Full Independent Audit — all deliverables
Specific feature identification responsible for compliance concerns
Targeted retraining guidance and execution
New Compliance Map issued on completion — documenting the remediated model
Before/after comparison — what changed between versions

Audit + Remediation

Service 03

Integrated Development

Data Prep · Fine-Tuning · Documentation · Compliance Map with Receipt

We prepare the training data, fine-tune to your objectives with interpretability built in from the start, and deliver the Compliance Map as part of the engagement. The "Compliance Map with receipt" — documents the process as well as the outcome. Disclosed as an integrated engagement in all documentation.

Training data preparation and curation
Interpretability-first fine-tuning design
Iteration cycles and QA throughout development
Full Compliance Map with complete technical documentation
Process receipt — the audit documents how it was built, not just what was built

Integrated Development

Service 04

Annual Compliance Assurance

Ongoing Coverage · Re-Audit on Material Change · Priority Turnaround

For teams that iterate. Re-audit triggered automatically by any material model change: new training data exceeding 10% of original corpus volume, any architecture or adapter change, any model version pushed to production, or annual refresh regardless of changes.

Priority scheduling — no queue
Re-audit triggered by any material change
Version delta comparison — what changed between audits
Regulatory update monitoring — flagged if Act requirements shift
Quarterly billing cadence

Annual Retainer

Pricing is scoped per engagement and discussed in discovery. We do not publish rates — not because we're hiding them, but because model size, architecture, and timeline materially affect scope. Start a conversation: john@compliance-labs.ai

⚖️

For Legal Counsel

Our work can sit under attorney-client privilege and work product doctrine.

Compliance Labs operates as a technical expert retained through outside counsel. When retained in this capacity, findings and technical analysis may be protected under attorney-client privilege and the work product doctrine — shielding the audit record from discovery while preserving its value as a compliance artifact.

For law firms advising clients on EU AI Act compliance: this is the technical documentation component your clients need. We produce what Articles 11 and 13 and Annex IV require. GPAI model documentation obligations have been in force since August 2025 and are untouched by any proposed extension. High-risk Annex III system obligations remain subject to the current August 2 deadline until trilogue concludes. You provide the legal strategy and regulatory filing. We give your team the technical foundation to file with confidence — protected when the engagement warrants it.

Ready to talk about your model?

Tell us about your fine-tuned model — the architecture, the training objective, and your timeline. August 2, 2026 remains the current legal deadline. Trilogue negotiations may extend high-risk Annex III obligations to December 2, 2027. Either way, the documentation work is the same — and the organizations who complete it early are in the strongest position regardless of which date applies. We'll scope the engagement and respond within one business day.

No forms. No call schedulers. A real conversation with the team that will do the work.

john@compliance-labs.ai

Engagement Details

Lead Researcher Arshavir Blackwell, PhD · Cognitive Science, UCSD
Principal John Holman · Founder, Compliance Labs
Methodology NeurIPS 2026 submission · original published methodology
Architecture Coverage Llama 3.x · Qwen 2.5 · cross-architecture validated
Typical Delivery 10–15 business days from model submission
Client Provides Adapter weights · 50–200 domain prompts · no training data required · weights deleted on delivery
Contact john@compliance-labs.ai
Domain compliance-labs.ai

We open the model.We measure what changed.We sign the record.

The Compliance Map.

Every internal concept, accounted for

Where the fine-tuning actually lives

A unique version identity for every model

Behavioral measurement alongside internal measurement

Internal representation change does not reliably predict behavioral change.Both measurements are required.

Published methodology. Reproducible. Cross-architecture validated.

Sparse Autoencoders (SAEs)

Unified Crosscoder Pipeline

Layer-by-Layer Depth Profile

Attention-Targeted LoRA Analysis

PCA Trajectory Analysis

Output Divergence Measurement

The science is public.

Arshavir Blackwell, PhDLead Researcher & Co-Author

"Two Measurements Are Required"

Arshavir Blackwell, PhD — Podcast & Media

Two partners. One methodology.

John Holman

Arshavir Blackwell, PhD

Four ways we work together.

Independent Audit

Audit + Remediation

Integrated Development

Annual Compliance Assurance

Our work can sit under attorney-client privilege and work product doctrine.

Ready to talk about your model?

We open the model.
We measure what changed.
We sign the record.

Internal representation change does not reliably predict behavioral change.
Both measurements are required.

Arshavir Blackwell, PhD
Lead Researcher & Co-Author