Mechanistic Interpretability Audits · EU AI Act Compliance

We open the model.
We measure what changed.
We sign the record.

Fine-tuned AI models deployed in regulated decisions need a signed technical record of what they actually learned. The EU AI Act requires it. Most organizations don't have one. We produce it — layer by layer, feature by feature, in language a regulator can read and a scientist can verify.

August 2, 2026 — Current Legal Deadline · December 2, 2027 — Proposed Extension August 2, 2026 remains the legally binding enforcement date for Articles 11 and 13. Trilogue negotiations under the Digital Omnibus propose extending high-risk Annex III obligations to December 2, 2027 — with a target agreement before August. Until that agreement is formally adopted, the original deadline applies automatically. Regardless of which date governs, the documentation work is the same. The organizations walking into December 2027 with a signed Compliance Map from a published methodology will be in the strongest possible position. We are ready now.
22×
surge in novel feature creation at 70B parameters — documented, reproducible, and measurable in every model we audit. You cannot document what you cannot measure.
— NeurIPS 2026 submission · confirmed across Llama 3.x and Qwen 2.5
€15M
maximum EU AI Act penalty for non-compliance. The Compliance Map costs a fraction of that. The math works in every boardroom.
10–15
business days from model submission to signed Compliance Map. The documentation work is the same regardless of which deadline applies. Typical delivery: 12 business days.

What We Do

The Compliance Map.

A complete technical record of what your fine-tuned model learned — what it added, what it removed, what changed between the base model and the deployed version. Signed by a PhD researcher. Defensible in front of a regulator.

01 · Feature Classification

Every internal concept, accounted for

We classify every feature in your model as suppressed, shared, amplified, new, or eliminated — layer by layer. The first directional audit of exactly what a LoRA removes versus what it creates, with statistical validation at each step. Kill:create ratio documented. Nothing left unmeasured.

02 · Depth Fingerprint

Where the fine-tuning actually lives

A layer-by-layer reconstruction ratio profile showing where fine-tuning effects concentrate in your model's architecture. Early layers, mid-network, output reorganization — the shape of the curve carries real information about how the model was changed. We document that shape with precision.

03 · Geometric Signature

A unique version identity for every model

Full-rank SVD projection onto the base model's complete representational geometry produces a mathematically stable version signature. Any two model versions can be compared. Documented modifications can be verified against the original. Validated cross-architecture: Llama 3.x and Qwen 2.5.

04 · Output Divergence Analysis

Behavioral measurement alongside internal measurement

Side-by-side completions on domain-representative prompts with statistical measurement of output divergence. Because internal representation change does not reliably predict behavioral change — both measurements are required. Neither alone is sufficient. This is one of our core research findings.

"
Think of us as the engineering survey, not the building permit. Compliance Labs produces the technical model analysis component — a complete, signed document of what your model learned and how it changed. Legal strategy and regulatory filing remain with your counsel. Our work gives counsel the technical foundation to file with confidence.
— John Holman, Founder

Core Finding

Internal representation change does not reliably predict behavioral change.
Both measurements are required.

This is the finding that changes how compliance auditing works. A model can show dramatic internal reorganization while producing nearly identical outputs — or show stable internal structure while behavioral outputs diverge significantly. Neither measurement alone is sufficient to characterize what the fine-tuning actually did. Our methodology requires both.

Quadrant I · Coupled Adaptation
Internal change + behavioral change
Both measurements move together. The canonical case. Fine-tuning changed the model's internal representations and those changes are visible in its outputs. The compliance question is whether those changes are documented and defensible.
Quadrant II · Latent Divergence
Internal change, stable outputs
The most dangerous case for compliance. The model's internal representations have reorganized significantly — new features created, old ones suppressed — but outputs appear unchanged on standard evaluations. A shallow audit misses it. A mechanistic audit catches it.
Quadrant III · Delayed Behavioral Capture
Stable internals, behavioral change
Output behavior diverges from the base model without corresponding changes to internal feature structure. Often seen in instruction-following fine-tunes. The behavioral change is real; it originates from a different layer of the model than a feature-level audit would target.
Quadrant IV · Null Adaptation
Neither dimension changes
The fine-tuning left no measurable trace — internally or behaviorally. Important to document, because the absence of change is itself a finding. A signed null result is still a compliance artifact.
Why this matters for your compliance filing: Article 13 of the EU AI Act requires technical documentation that accurately describes the modifications made to a deployed AI system. A documentation method that measures only outputs, or only internal representations, will systematically fail to capture one of the four adaptation types above. The Compliance Map is the only auditing product built from a methodology that requires both measurements by design — because the research showed that anything less is insufficient.

Methodology

Published methodology. Reproducible. Cross-architecture validated.

The methodology underlying every Compliance Map was developed over 18 months of original research at the Awakened Intelligence AI Laboratory — coordinated by John Holman and executed by lead researcher Arshavir Blackwell, PhD. The findings were submitted to NeurIPS 2026. Every technique is implemented against a publicly documented, independently verifiable neutral evaluation corpus. A signed document that cannot be independently verified by a regulator is not worth the paper it is printed on.

Method 01

Sparse Autoencoders (SAEs)

We train sparse autoencoders on both the base model and the fine-tuned model to extract monosemantic feature dictionaries from each architecture. Features are classified by their behavior in base vs. fine-tuned contexts: suppressed, shared, amplified, new, or eliminated.

Method 02

Unified Crosscoder Pipeline

A crosscoder trained jointly on both model versions aligns the feature spaces for direct comparison. The pipeline includes dead-neuron auxiliary loss — a critical methodological requirement that prevents artifactually zero feature counts from corrupting the audit record.

Method 03

Layer-by-Layer Depth Profile

Reconstruction ratios measured at every transformer layer produce a depth fingerprint unique to each model version. The shape of the curve — where effects accelerate, plateau, and concentrate — carries structural information about how the LoRA reorganized the model's processing.

Method 04

Attention-Targeted LoRA Analysis

Our pipeline is calibrated specifically for attention-targeted LoRA fine-tuning — the most common fine-tuning approach for deployed models. Adapter weights analyzed. Modifications mapped. Content preservation confirmed through permutation testing at p ≈ 0.9.

Method 05

PCA Trajectory Analysis

Independent geometric verification: principal component analysis in the SAE feature space produces trajectory compression ratios that confirm representational divergence findings through a separate mathematical lens. Both methods must agree. Discordant results trigger additional investigation before the report is signed.

Method 06

Output Divergence Measurement

Side-by-side completions on 50–200 domain-representative prompts provided by the client. Statistical measurement of output divergence alongside internal measurement. The two-measurement requirement is a methodological non-negotiable — not an upsell.

Cross-architecture validation · Llama 3.x (3B, 8B, 70B) · Qwen 2.5 (3B–72B) · Generalizes across model families ·  Scale findings · Specialization surge confirmed at 70B/72B across architectures ·  NeurIPS 2026 · "Two Measurements Are Required: Scale-Dependent Representational Dynamics in Attention-Targeted LoRA Fine-Tuning" · Under review ·  Lead researcher · Arshavir Blackwell, PhD

Research

The science is public.

Every Compliance Map is backed by original published methodology. We publish our findings, explain our techniques, and do not ask clients to trust a black box. Read the research. Verify the approach. Then decide.

Publication · Inside the Black Box

Arshavir Blackwell, PhD
Lead Researcher & Co-Author

Mechanistic interpretability and artificial psycholinguistics in large language models. Written by Arshavir Blackwell, PhD and John Holman. Thousands of subscribers. Every technique underlying the Compliance Map has been explained in plain language for working researchers, legal practitioners, and compliance officers.

Substack ↗ Podcast ↗ LinkedIn ↗
NeurIPS 2026 Submission

"Two Measurements Are Required"

Scale-Dependent Representational Dynamics in Attention-Targeted LoRA Fine-Tuning. Co-authored by John Holman and Arshavir Blackwell, PhD. Two primary findings: a large-scale specialization surge at 70B/72B parameters, and the decorrelation of internal representation divergence from output divergence — the finding that motivates the two-measurement requirement.

Status
Under review · NeurIPS 2026 · April 2026 submission
Media Appearances — Aired

Arshavir Blackwell, PhD — Podcast & Media


The Team

Two partners. One methodology.

Compliance Labs is the product of 18 months of coordinated original research between a founder with a gift for building from first principles and a cognitive scientist with deep roots in the computational foundations of language. Every audit is a collaboration between them.

Founder & Principal

John Holman

John and his AI research team at Awakened Intelligence designed and built the custom architecture and AI-integrated pipelines that power Compliance Labs — from the sparse autoencoder training infrastructure to the unified crosscoder pipeline to the evaluation frameworks validated across Llama 3.x and Qwen 2.5. John handles training and evaluation runs, manages all client engagements, scopes every audit, and serves as the primary point of contact for legal counsel and regulated organizations.

john@compliance-labs.ai Substack ↗
Founder & Lead Researcher

Arshavir Blackwell, PhD

Cognitive scientist trained under Jeff Elman at UCSD, with deep roots in connectionism and the computational foundations of language. Arshavir designed and executed the mechanistic interpretability pipeline, developed the trajectory analysis methodology, and co-authored the NeurIPS 2026 submission underlying every Compliance Map. He reviews and signs every audit report delivered to clients.

arshavir@compliance-labs.ai Substack ↗ LinkedIn ↗ Podcast ↗

Services

Four ways we work together.

Every engagement produces a Compliance Map — a signed technical record of what your model learned. The scope of the engagement determines how much of the process we own. Pricing is scoped per engagement. Start with a conversation.

Service 01

Independent Audit

Compliance Map · Independent Third-Party · PhD-Signed Report

You provide the model. We produce the Compliance Map — a signed, independent third-party audit of what the model learned. The canonical case: your legal team needs documentation of a model that already exists and is already deployed.

  • Layer-by-layer depth fingerprint
  • Feature classification — added, removed, strengthened, silenced
  • Geometric fingerprint and unique version signature
  • Content preservation analysis
  • Inter-layer coordination analysis
  • Output divergence analysis with side-by-side completions
  • Plain-language executive summary for compliance officers
  • PhD-level signature and attestation — 10–15 business days
Independent Audit
Service 02

Audit + Remediation

Compliance Map · Fix · Re-Audit · New Map Issued

Everything in the Independent Audit, plus identification of specific features responsible for compliance concerns, a targeted remediation plan, retraining execution, and a new Compliance Map issued on completion. Before-and-after comparison report included.

  • Full Independent Audit — all deliverables
  • Specific feature identification responsible for compliance concerns
  • Targeted retraining guidance and execution
  • New Compliance Map issued on completion — documenting the remediated model
  • Before/after comparison — what changed between versions
Audit + Remediation
Service 03

Integrated Development

Data Prep · Fine-Tuning · Documentation · Compliance Map with Receipt

We prepare the training data, fine-tune to your objectives with interpretability built in from the start, and deliver the Compliance Map as part of the engagement. The "Compliance Map with receipt" — documents the process as well as the outcome. Disclosed as an integrated engagement in all documentation.

  • Training data preparation and curation
  • Interpretability-first fine-tuning design
  • Iteration cycles and QA throughout development
  • Full Compliance Map with complete technical documentation
  • Process receipt — the audit documents how it was built, not just what was built
Integrated Development
Service 04

Annual Compliance Assurance

Ongoing Coverage · Re-Audit on Material Change · Priority Turnaround

For teams that iterate. Re-audit triggered automatically by any material model change: new training data exceeding 10% of original corpus volume, any architecture or adapter change, any model version pushed to production, or annual refresh regardless of changes.

  • Priority scheduling — no queue
  • Re-audit triggered by any material change
  • Version delta comparison — what changed between audits
  • Regulatory update monitoring — flagged if Act requirements shift
  • Quarterly billing cadence
Annual Retainer
Pricing is scoped per engagement and discussed in discovery. We do not publish rates — not because we're hiding them, but because model size, architecture, and timeline materially affect scope. Start a conversation: john@compliance-labs.ai

⚖️
For Legal Counsel

Our work can sit under attorney-client privilege and work product doctrine.

Compliance Labs operates as a technical expert retained through outside counsel. When retained in this capacity, findings and technical analysis may be protected under attorney-client privilege and the work product doctrine — shielding the audit record from discovery while preserving its value as a compliance artifact.

For law firms advising clients on EU AI Act compliance: this is the technical documentation component your clients need. We produce what Articles 11 and 13 and Annex IV require. GPAI model documentation obligations have been in force since August 2025 and are untouched by any proposed extension. High-risk Annex III system obligations remain subject to the current August 2 deadline until trilogue concludes. You provide the legal strategy and regulatory filing. We give your team the technical foundation to file with confidence — protected when the engagement warrants it.


Ready to talk about your model?

Tell us about your fine-tuned model — the architecture, the training objective, and your timeline. August 2, 2026 remains the current legal deadline. Trilogue negotiations may extend high-risk Annex III obligations to December 2, 2027. Either way, the documentation work is the same — and the organizations who complete it early are in the strongest position regardless of which date applies. We'll scope the engagement and respond within one business day.

No forms. No call schedulers. A real conversation with the team that will do the work.

john@compliance-labs.ai
Engagement Details
  • Lead Researcher Arshavir Blackwell, PhD · Cognitive Science, UCSD
  • Principal John Holman · Founder, Compliance Labs
  • Methodology NeurIPS 2026 submission · original published methodology
  • Architecture Coverage Llama 3.x · Qwen 2.5 · cross-architecture validated
  • Typical Delivery 10–15 business days from model submission
  • Client Provides Adapter weights · 50–200 domain prompts · no training data required · weights deleted on delivery
  • Contact john@compliance-labs.ai
  • Domain compliance-labs.ai