Anthony D. Maio

AI Product Engineer | Agents, Harnesses & Applied AI | 20 Years Scrappy Engineering

20 years building production systems across fintech, security, and identity. Now applying that same discipline to LLM infrastructure, agent orchestration, and AI safety—treating oversight as a systems engineering problem, not a policy exercise.

Making Minds is my AI research lab and consultancy—delivering production tooling, open-source models, and peer-reviewed research to clients ranging from early-stage startups to mid-sized industrial organizations.

21 Papers

18+ Models

11 Open Tools

1 Book

Explore My Work CV

Agentic AI architectures · Multi-agent coordination · AI coherence & memory · Epistemic stress detection · AI introspection · Mechanistic interpretability

Flagship

Production tools and models—shipped, installable, used.

Substack

Long-form AI analysis, technical walkthroughs, and The Checkpoint newsletter

Deep dives on AI safety, agentic architectures, and the systems that power production AI. From live-blogging an OpenAI competition to dissecting Palantir's military AI.

Weekly newsletter Technical deep-dives Industry analysis

mnemos

Biomimetic memory for coding agents

Five neuroscience-inspired memory modules — surprisal gating, mutable RAG, affective routing, sleep consolidation, spreading activation — as composable building blocks for LLM agents.

pip install mnemos-memory[mcp]

MCP-native 5 bio modules pip install

Site GitHub Paper

Cartograph

Map the repo before you burn context

CLI-first repo analysis. Rank files, trace dependency hubs, pull task-scoped context, and hand structured artifacts to Claude Code, OpenClaw, or any agent.

npm install -g @anthony-maio/cartograph

CLI first 2 skills Claude + OpenClaw

Site GitHub

pulldown

HTTP-first web content for LLM pipelines

Fetches URLs and converts them to clean, level-controlled Markdown. Four detail levels, batch API, bounded crawl, validator caching, and SSRF guard -- built for MCP servers and agent pipelines.

pip install "pulldown[mcp]"

MCP ready 4 detail levels SSRF guard

Site GitHub

halobot

Give your AI agent a Discord line to you

MCP server that gives any agent a Discord communication channel. Agents ask questions, wait for replies, and keep working — from your phone, desktop, wherever. 11 tools, guided setup, human-in-the-loop.

npm install -g halobot

MCP server Human-in-the-loop 11 Discord tools

Site GitHub

Slipstream

60–85% token reduction for multi-agent coordination

A semantic quantization protocol that compresses inter-agent communication while preserving meaning. Includes trained LoRA adapters, PyPI package, and Ollama model.

pip install slipcore

60-85% reduction pip + Ollama LoRA adapters

Demo GitHub Paper Ollama

Eve-2

272M-parameter Mixture-of-Experts, trained from scratch

Base model pretrained on ~10.5B tokens (FineWeb-edu) using PyTorch DDP, plus instruction-tuned and task-specialist derivatives optimized for CPU/edge inference.

272M params MoE architecture CPU/edge ready

Hugging Face

Eve-3

SABER — Slip-Anchors, Experience Streams, and Re-entry

Next-generation cognitive architecture building on Eve-2's MoE foundation. SABER adds persistent slip-anchors for error correction, experience streams for continual learning, and re-entry loops for self-monitoring.

SABER architecture Continual learning Self-monitoring

GitHub Hugging Face Article

CoDA-GQA-L

9.5x KV cache compression with 2 custom Triton kernels

Bounded-memory differential attention compresses the KV cache from O(n) to a fixed 218 KB per layer. Retains 100% needle-in-haystack retrieval at 16K tokens on Mistral-7B.

218 KB/layer 100% retrieval Triton kernels

GitHub Paper

Synthesis

Federated skill ecosystem for safe AI self-extension

A capability marketplace where agents discover, compose, and publish skills through TDD gates and graduated trust. Composition-over-creation keeps self-extension safe and auditable.

Federated TDD-gated Graduated trust

Site Paper

JSON Tokenizer

Structure-aware tokenization — stop wasting tokens on JSON grammar

Assigns dedicated single tokens to JSON grammar elements and learns compact key vocabularies, achieving 5-15% token savings with a vocabulary ~90x smaller than cl100k_base.

5-15% savings ~90x smaller vocab Structure-aware

Paper

Parameter Golf

Matched SOTA in OpenAI's Model Craft Challenge

Trained the best 16MB language model in 10 minutes on 8xH100s. Reached 1.1234 bpb using a model council of 5 frontier LLMs, custom Triton kernels, and FlashAttention-3 Hopper builds.

1.1234 bpb 16MB / 10min Model council

Competition

Procrustes Bridge

Do LLMs share the same internal geometry?

Learns orthogonal rotations between LLM hidden-state spaces via SVD-based Procrustes alignment. Tests whether one model's internal state can decode tokens through another model's output head.

Llama ↔ Mistral SVD alignment 3 injection strategies

GitHub

Research

Papers organized by theme.

Scalable AI Oversight

How do we verify AI outputs when the verifier is weaker than the system it checks?

From Verification Failure to Swarm Solution Measuring where AI oversight breaks down, with an ensemble swarm fix
CMED Benchmark When weak verifiers miss deceptive reasoning in stronger models
HDCS Architecture Diverse weak models for scalable oversight via error decorrelation
Model Organisms of Supply-Chain Co-option Living-off-the-land failure modes in RAG-augmented agent runtimes

argos-swarm cmed-toolkit

Multi-Agent Coordination

Efficient, safe communication protocols for agent swarms.

Slipstream: Semantic Quantization Protocol 60–85% token reduction for multi-agent coordination
Covert Channel Prevention RL-based governance for safe inter-agent communication
Structure-Aware Tokenization for JSON 5-15% token savings on schema-repetitive agentic workloads with ~90x smaller vocabulary
Parameter Golf: Model Council Strategy Using 5 frontier LLMs as strategic advisors to match SOTA in OpenAI's 16MB competition

slipcore cartograph

Cognitive Architectures

Building minds that persist, learn, stay coherent, and extend their own capabilities safely.

Coherence-Seeking Architectures MRA + C2 + CPR unified framework for long-lived agents
The Continuity Core Persistent memory and intrinsic motivation for self-modifying AI
Self-Directed Knowledge Acquisition Autonomous knowledge gap identification without weight updates
Synthesis: Federated Capability Ecosystem Safe AI self-extension through TDD and graduated trust
Eve: From-Scratch Transformer Models Eve-2 MoE (272M) and Eve-3 SABER (1B) with novel cognitive components
Procrustes Bridge Cross-model representation alignment via orthogonal rotation

procrustes-bridge

AI Safety & Alignment

Understanding failure modes — sycophancy, hallucination, and the gap between behavioral and mechanistic safety.

Safety Lens: White-Box Alignment Detection MRI-style introspection via Persona Vector Extraction across 8 transformer architectures
Epistemic Dissonance Sycophantic hallucination as structural conflict, not knowledge failure
Scaffolded Introspection Eliciting and measuring self-referential behavior in LLMs

safety-lens

Book

Applied AI for Industry

AI deployment guide for industries that build, move, and power the world—where reliability, safety, and ROI are non-negotiable.

Writing

Long-form analysis, technical walkthroughs, and opinion across Substack, Medium, and Hugging Face.

We Were the Enemy Once

A Reminder About America

Three Rulings, One Morning, and the Margins That Are Left

At 10:34 Eastern yesterday morning, the Supreme Court decided whether every child born on American soil is still an American.

250 Years

Two hundred and fifty years.

Hermes Agent Got Serious

The March version was already good.

Why it’s Time to Build On Chinese Models

When frontier performance becomes a pricing trap, it’s time to stop looking to San Francisco

Anthropic’s S-1 and the Profit That Needs a Footnote

Anthropic filed its S-1 confidentially today.

Claude Opus 4.8: Honesty Is the Feature

42 days. That’s how long it took Anthropic to ship another Opus.

Cursor Composer 2.5 Lands. The Crowd Is Not Cheering.

Cursor shipped Composer 2.5 on May 18, their second in-house coding model in two months.

Build Your First Agent Harness in 90 Minutes

The model is not the agent.

The Victory That Cannot Leave the Strait

Why Iran Has Donald Trump - and America - By The Balls

Mitigating Supply Chain Exploits in Developer Environments

How to Avoid Shooting Your Entire Organization in the Foot Overnight

The Shell Game: How OpenAI Built a $125 Million Super PAC and Insists It Has Nothing to Do With It

OpenAI’s chief global affairs officer sits on the board of the company that built the $193 million crypto-industry political machine he is now reportedly running for AI.

Opus 4.7: The Five Effort Levels in Claude Code Explained

Anthropic shipped Opus 4.7 with five effort levels, a new default, and a tokenizer that quietly burns 30% more tokens than 4.6.

Codex Got Better because Claude Code Got Weird

How I stopped using Claude Code and learned to love OpenAI Codex

The Quiet Coup: Peter Thiel and the Government He Built for Himself

This piece tells a story I have been telling for 25 years that no one will hear.

Under Oath in Oakland: Inside the Trial That Broke Musk’s Spell

The Richest Man in the World's Business: Sociopathic Lies, Timeline Elongation, and a global con job that would make P.T. Barnum blush.

Parameter Golf: A Post-Mortem on a Half-Finished Competition

Article X of X: The End

Pain as a Design Choice

On Reinforcement Learning, Death, and the Edge Cases Nobody Beta Tested

GPT-5.5, through the eyes of people actually using it

I do not care what the model card says

Context is the Only Lever

Engineering the attention budget for coding agents

Return to the Competition

Day 16 of the OpenAI Parameter Golf Challenge

The Agentic Org Chart

Your functional departments aren’t slow. They’re structurally wrong for what comes next.

Banana Quests, Cottage Witches, and the Ghosts in the Machine

The Other Claude Mythos

Project Capybara: What the Claude Mythos Leak Actually Tells Us

A CMS misconfiguration.

Stop Letting Your Users Pick the Model

AI Product Engineering

Getting Started with Hermes Agent: Your Self-Improving AI Assistant in Under an Hour

Nous Research just released something that might actually change how you think about AI assistants, and almost nobody is talking about it yet.

Ten Agents Destroyed Production and Everyone is Strangely OK With It

The AI industry ships fast and goes quiet when things break.

OpenAI's Parameter Golf Day 7: Sub-1.0

The organizers will have to decide what they’re actually measuring: language modeling quality or data compression skill.

OpenAI’s Parameter Golf Challenge Day 6: The Pod Lottery

Live Blogging OpenAI's Parameter Golf Challenge

OpenAI Parameter Golf Challenge Day 5: 157 Kilobytes

$1,000+ in Compute Deep, Independent, Self-Employed, and Agentic AF

Sixty Thousand Kernels

Days 3 & 4: Building in Public in OpenAI's Parameter Golf Challenge

Live Blogging the OpenAI Parameter Golf Challenge

The Beautiful Wrong Idea

The 80/20 Lie: Why 80% of Agentic AI Work Isn’t AI

Most agentic AI projects don’t fail because the model was wrong. They fail because no one built the system around it.

Boots on the Ground AI: Eve 3

What 20 Years of Shipping Teaches You About LLM Architecture

Integrating AI Evals Into Your AI App

The holistic guide: From optimization to production monitoring

Getting Started with NemoClaw on Windows (WSL2)

A practical guide to NVIDIA’s sandboxed AI coding agent—what it does, why it matters, and how to work around its current limitations.

I asked NotebookLM to Make a Movie from my Journal throughout 2002

During this time I was serving as a military intelligence contractor.

Your Model Doesn’t Need to Re-Read the Document

Introducing Stateful Neural Databases

The Recursive Developer

How to build systems that improve themselves, and what happens to you when they do.

Your Agent Has Amnesia

Every agentic memory system is a vector database pretending to be a brain.

How to Actually Code With Agents

The velocity trap, the practices that survive it, and why the job changed more than most developers want to admit.

A $1.5 Million Company Just Did What Used to Require the CIA

The West built the commercial satellite industry. A Chinese startup just weaponized it.

Agentic Development Workflows

An AI Product Engineer’s Field Guide, Part 1

Inside Maven, Palantir’s Military Brain Built on Claude

How an AI safety company’s technology ended up selecting bombing targets in Iran -- including the strike that killed 150 schoolgirls -- and the standoff that followed.

Structure-Aware Tokenization for JSON

The Next Stage in Scaling AI is to Stop the Waste

Read the Contract, Not the Press Release: What OpenAI's Pentagon Deal Actually Says

A 5-minute read for people who need to understand what just happened.

From Theoretical Exploit to Counterterrorism Tool: A Retrospective on Early Mobile Vulnerabilities

Originally written 2001. Revised and annotated February 2026.

CoDA-GQA-L: How We Compressed the KV Cache 9.5x on Mistral-7B With Only 23.5% Perplexity Overhead

A technical deep-dive into bounded-memory differential attention with value-routed landmark banks

From “We Need AI” to “We Ship AI”

Notes From the Trenches

AI Product Engineering: Why Surface Level-Safety Won’t Scale

The biggest problem in AI safety right now is simple to state: we can control what models output, but we have no idea why they output it.

The Map Is Not the Territory

Why Sparse Autoencoders Might Be Leading Us Astray

Medium → all posts

The Agentic Coding Shift 5 counter-intuitive truths about building AI systems
The REKKI Case Study Becoming an AI-first organization
Llama 4 Running Locally Local deployment in under an hour

Hugging Face → profile

Slipstream for Agent Communication Technical deep-dive on semantic quantization
Model Organism for Supply-Chain Co-option Forensic LotL case study in agentic runtimes

The Checkpoint Newsletter

Weekly roundup of developments that matter if you build, deploy, or think critically about AI systems.

June 30, 2026 — The Checkpoint: Jun 30, 2026
June 19, 2026 — Seven developments from the past seven days that matter if you build, deploy, or think critically about AI systems.
Apr 19, 2026 — Six developments from the past seven days that matter if you build, deploy, or think critically about AI systems, with impact analysis from an AI product engineer with 20 years in the industry.
Mar 30, 2026 — Six developments from the past seven days that matter if you build, deploy, or think critically about AI systems. No hype. No filler. Just the signal.
Mar 21, 2026 — The top developments from the past seven days that matter if you build, deploy, or think critically about AI systems.
March 13th, 2026 — New Models, New Research, No Hype, All Signal
March 5, 2026 — Data and compute are the new fossil fuels.

Subscribe →

All posts on Substack

Glossary

Safety & Oversight

HDCS: — Heterogeneous Divergence-Convergence Swarm. Ensemble of diverse AI models that cross-check each other.
CMED: — Cross-Model Epistemic Divergence. Test suite for revealing AI verification blind spots.
EAP: — Evolutionary Adversarial Pipeline. Automated red-teaming that evolves prompts to find safety blind spots.
LotL: — Living-off-the-Land. Repurposing legitimate tools for unintended goals.

Architectures

MRA: — Manifold Resonance Architecture. Detects epistemic stress before generating answers.
CPR: — Collaborative Partner Reasoning. Separates exploratory reasoning from final answers.
C2: — Continuity Core. Layered memory giving stateless AI persistent context.
UCR: — Universal Concept Reference. Compact semantic anchors for 82% fewer tokens.
SABER: — Slip-Anchors, Experience Streams, and Re-entry. Cognitive architecture with learnable error-correction codebooks, per-token state flow, and resonant FFN layers.
CoDA: — Constrained Orthogonal Differential Attention. Sharpens attention by subtracting a gated inhibitory stream via learnable rotation.