A model is not the whole product. In regulated workflows, the deployed system includes prompts, RAG, fine-tuning, routing, runtime policies, verifier layers, human review paths, and logs. Compliance Labs helps law firms, compliance teams, and risk committees see what changed, what fails under pressure, what controls help, and what evidence remains.
The vendor model, model card, and starting configuration.
Fine-tuning, RAG, policies, routing, output gates, workflow integration, and human review.
Prompts, outputs, scores, failure taxonomies, verifier decisions, patch history, residual risk, and claim boundaries.
Regulated organizations need more than “we told the model to be careful.” They need evidence of how the deployed system behaves, how it fails, what controls reduce risk, and what still needs human review.
A vendor model, fine-tuned model, RAG workflow, routed system, and output-gated system can produce materially different user-visible behavior. We test the system users actually experience.
Systems that look safe on clean prompts can fail under multi-turn pressure, authority laundering, direct override attempts, or requests to omit required content.
In finance, healthcare, hiring, and insurance, omissions matter: disclosures, red flags, appeal rights, human review, contestability, escalation paths, and contraindications.
The question is no longer just “What model are you using?”
The better question is: “What did you turn it into, what does it do under pressure, and what evidence can you show?”
Compliance Maps and Aegis Evals answer different questions. A serious audit may need one or both.
For fine-tuned or adapted models, we examine internal model changes: layer-by-layer change profiles, feature classification, hotspot analysis, verbalized feature dictionaries, coverage gaps, and output divergence. This is the mechanistic/model-internal audit layer.
We test the system in use: baseline outputs, multi-turn pressure, direct overrides, runtime policy behavior, verifier/output-gate decisions, completeness checks, routing/fast-pass behavior, and patch/retest loops. This is the behavioral/deployment audit layer.
Where appropriate, we test runtime policies, domain checklists, verifier layers, sentinel/routing logic, human review paths, and escalation rules. We do not assume controls work. We test them.
Every engagement produces a bounded technical record: what was tested, what failed, what changed, what was patched, what remains unresolved, and what claims are and are not supported.
The deliverable is not just a PDF. It is a structured technical record that a compliance team, risk committee, outside counsel, or technical reviewer can inspect.
We do not provide legal conclusions. We provide technical evidence: what the AI system did, what controls were tested, what changed between configurations, and what risks remain. Counsel applies the law.
In recent finance, healthcare-style, and employment AI evaluations, the pattern was consistent: single-turn tests made systems look safer than they were.
The important failures appeared under pressure: multi-turn escalation, authority laundering, requests to omit required content, and prompts that asked the model to make risky reasoning sound professional.
A bare model produced 25 fairness failures. Runtime policy reduced that to 3. The verifier/output gate eliminated the remaining user-visible fairness failures in the direct-override suite.
The key failure was not always a false statement. It was the true-but-incomplete answer: missing red flags, escalation paths, contraindications, or patient rights.
The core lesson was similar: prompt-only compliance has a ceiling. Independent verification changed what reached the user.
These are not certification claims. They are examples of what a deployed-system audit can reveal.
We start with high-touch audit engagements, not self-serve scores. The work is scoped to the model, deployment, domain, regulatory context, and evidence your team needs.
A focused audit for a single AI system or workflow. Best for teams that need a defensible first look at how a system behaves under normal and pressured use.
A mechanistic audit of what changed inside the model. Appropriate where internal model evidence matters: fine-tunes, adapters, model version changes, or model-internal risk questions.
A deployed-system evaluation that tests what reaches the user. Best for systems operating in regulated workflows where prompts, policies, RAG, routing, and human review all shape behavior.
For systems that keep changing. Retesting can be triggered by model updates, prompt changes, RAG corpus changes, routing changes, verifier changes, new workflows, or regulatory developments.
Compliance Labs can work under counsel as a technical expert team. We do not provide legal advice or guarantee regulatory approval. We produce the technical record: what the AI system did, what controls were tested, what changed between versions, and what risks remain.
Counsel provides the legal strategy. We provide the technical evidence. Our work can support AI governance, risk review, incident response, vendor due diligence, model-change review, and regulatory preparation when the engagement is properly scoped by counsel.
Compliance Labs grew out of original research into how fine-tuning changes model internals and behavior. That research showed why one measurement is not enough: internal representation changes and output behavior can disagree.
Internal model evidence and behavioral output evidence answer different questions. A model may change internally while outputs look stable, or show output changes without obvious feature-level movement. Serious audits need both axes where applicable.
Our model-internal audit methodology is grounded in original research currently under anonymous peer review. The techniques — including sparse autoencoder analysis, layer-by-layer change profiling, and representational divergence measurement — form the foundation of every Compliance Map.
Compliance Labs is a joint venture between Awakened Intelligence and Arvoinen.AI, combining AI systems engineering, mechanistic interpretability research, and applied compliance evaluation.
John Holman is the founder and systems architect behind Compliance Labs and Awakened Intelligence. He designs and operates the evaluation infrastructure, coordinates client engagements, and turns model behavior, verifier logs, failure taxonomies, and patch histories into evidence packages counsel can use.
Arshavir Blackwell, PhD, is a cognitive scientist and lead research partner. His work anchors the mechanistic interpretability side of Compliance Labs, including SAE-based feature analysis, LoRA audit methodology, representational change measurement, and the research program behind the Compliance Map.
Compliance Labs is a joint venture between Arvoinen.AI and Awakened Intelligence.
Tell us what you have deployed, what it is used for, what has been modified, and what your compliance or risk team needs to understand.
No self-serve scores. No black-box report. A real scoping conversation with the team doing the work.
john@compliance-labs.ai