top of page




Building a Multi-Agent Orchestration System on AWS
A multi-agent orchestration system routes diverse incoming requests to the right specialist and then executes each as a multi-step agentic task. The mistake most teams make is using one framework for both jobs and getting the worst of each. This is Part 4 of The Token Scarcity Playbook — assembling routing (Part 1), model-agnostic design (Part 2), and the right harness (Part 3) into one production system. "AgentCore is the execution environment; the framework is the orchestra
TomT
Jun 75 min read
Agent Frameworks Compared: A Technical Feature-by-Feature Guide
An agent framework is the harness that orchestrates a model's reasoning, tool calls, state, and multi-agent coordination. This is Part 3 of The Token Scarcity Playbook — a technical comparison of the major frameworks built so you can score them against your own needs. It centers on a decision matrix rated on the criteria that actually drive selection — ease of use, multi-agent orchestration, model-agnostic capability, AWS-service and Amazon Bedrock AgentCore integration (incl
TomT
May 2310 min read


Making Applications Model-Agnostic by Design
A model-agnostic application is one you can move from one model to another — to capture a cheaper option or route per task — without rewriting logic or shipping silent regressions. Routing (Part 1) only pays off if your system survives the swaps. This is Part 2 of The Token Scarcity Playbook. "The schema is the contract; the model is an implementation detail." Table of Contents Why model swaps break things Designing for agnosticism Grouping models into tiers The benchmarks th
TomT
May 76 min read


How Model Routing Works: Techniques, Players, and Frameworks
Model routing is the practice of sending each request to the cheapest model that can actually handle it, instead of defaulting every task to the flagship. It is the single largest cost lever in the token scarcity era — and, done well, it improves quality and cost at the same time. This is Part 1 of The Token Scarcity Playbook. "The insight isn't that the cheap model beat the flagship. It's that routing beat brute force." Table of Contents The four approaches to routing Cascad
TomT
Apr 238 min read
The Token Scarcity Era: From AI Subsidy to the New Constraint
The token scarcity era is the moment the economics of building with AI inverted — when "use the biggest model for everything" stopped being clever and started being expensive. This is Part 0 of The Token Scarcity Playbook: why every team shipping AI in 2026 is suddenly counting tokens, and the tactics that follow. "Per-token pricing is the rate. Tokens-to-completion is the invoice." Table of Contents From subsidized inference to metered billing Demand growth outpacing compute
TomT
Apr 15 min read


AI Infrastructure Boom
How Energy, Memory, and Compute Are Converging Into the New Industrial Revolution AI is often described as “software,” but that framing hides the most important truth about this moment: modern AI behaves like an industrial system, not a digital product . Intelligence is no longer compiled once and distributed cheaply forever. It is manufactured continuously , in real time, using enormous amounts of electricity, silicon, memory, and physical infrastructure. The scale of this b
TomT
Jan 118 min read


Graph RAG: Knowledge Graphs for Multi-Hop Reasoning
Graph RAG - The RAG technique that uses knowledge graphs to enable multi-hop reasoning across entity relationships. This article explores how Graph RAG solves relational queries that traditional vector search cannot handle, when to use it, and how to implement it with Neo4j and other graph databases. For a comprehensive comparison of RAG frameworks including Graph RAG, see this research analysis . Key Topics: The multi-hop reasoning problem in traditional RAG How knowledge
TomT
Nov 25, 202516 min read


Contextual RAG: Anthropic's 67% Breakthrough for High-Stakes Accuracy
Context Contextual RAG - Anthropic's breakthrough technique that reduces retrieval failures by 67% through LLM-generated context augmentation. This article explores how Contextual RAG solves the ambiguous chunk problem, when to use it for high-stakes applications, and how to implement it for legal, medical, and financial use cases. For a comprehensive comparison of RAG frameworks including Contextual RAG, see this research analysis . Key Topics: The ambiguous chunk problem
TomT
Nov 18, 202514 min read


Hybrid RAG: The Production Standard for Enterprise Search
Context Hybrid RAG - The production-standard RAG technique that combines keyword search ( BM25 ) with vector similarity search . This article explores why Hybrid RAG has become the de facto standard for enterprise deployments, how it works, and when it delivers the best results. For a comprehensive comparison of RAG frameworks including Hybrid RAG, see this research analysis . Key Topics: Why Hybrid RAG: combining keywords and semantics BM25 keyword search fundamentals Recip
TomT
Nov 11, 202516 min read


Naive RAG: The Foundation of Retrieval-Augmented Generation
Context Naive RAG - The foundational RAG technique that combines vector similarity search with LLM generation. This article explores how Naive RAG works, when to use it, real-world applications, and why it remains the starting point for most RAG implementations despite its limitations. For a comprehensive comparison of RAG frameworks including Naive RAG, see this research analysis . Key Topics: Vector similarity search fundamentals Embedding models and vector databases Retr
TomT
Nov 4, 202512 min read


What Is RAG? Why Retrieval-Augmented Generation Is Transforming AI Applications
Context What Is RAG? - A comprehensive introduction to Retrieval-Augmented Generation (RAG) for technical practitioners new to the concept. This article explains why RAG exists, how it solves fundamental LLM limitations, and why it's become essential for production AI applications. Key Topics: The fundamental problem: LLM knowledge limitations What RAG is and how it works Why RAG matters: real-world impact RAG vs. fine-tuning: when to use each The RAG landscape: from simple
TomT
Oct 28, 202515 min read


AI Innovations in Healthcare: Reimagining Care from Digital Front Door to Drug Discovery
Every few decades, technology reshapes the foundation of healthcare. We saw it when electronic medical records replaced paper charts and again when mRNA redefined how we think about vaccines. Today, that seismic shift is being driven by artificial intelligence. Across clinics, labs, and trial sites, AI isn’t just crunching data — it’s rewiring how medicine is practiced, how drugs are developed, and how patients experience care. In the last year alone, over 22% of healthcare o
TomT
Oct 21, 202512 min read


Building Your First MLOps Pipeline: A Practical Step-by-Step Guide
"The first pipeline is never perfect. But a working pipeline you can improve beats a perfect design you'll never finish. Start simple, automate relentlessly, and iterate based on what breaks." Table of Contents The Weekend That Changed Everything The Six-Step Pipeline Architecture Step 1: Data Ingestion & Validation Step 2: Data Preprocessing & Feature Engineering Step 3: Model Training & Experiment Tracking Step 4: Model Evaluation & Validation Step 5: Model Deployment & Ser
TomT
Sep 5, 202516 min read


MLOps Fundamentals: Understanding the Complete Model Development Lifecycle
"The difference between a machine learning experiment and a production ML system is like the difference between a recipe scribbled on a napkin and a commercial kitchen that serves 10,000 meals a day. One is creative chaos; the other is systematic excellence." Table of Contents The $50 Million Jupyter Notebook What Is MLOps, Really? The Model Development Lifecycle: Six Phases That Matter Why Traditional DevOps Isn't Enough The Three Pillars: CI/CD/CT Governance: Not a Checkbox
TomT
Aug 26, 202520 min read
bottom of page