Blog | mCloud

Building a Multi-Agent Orchestration System on AWS

A multi-agent orchestration system routes diverse incoming requests to the right specialist and then executes each as a multi-step agentic task. The mistake most teams make is using one framework for both jobs and getting the worst of each. This is Part 4 of The Token Scarcity Playbook — assembling routing (Part 1), model-agnostic design (Part 2), and the right harness (Part 3) into one production system. "AgentCore is the execution environment; the framework is the orchestra

TomT

Jun 75 min read

Agent Frameworks Compared: A Technical Feature-by-Feature Guide

An agent framework is the harness that orchestrates a model's reasoning, tool calls, state, and multi-agent coordination. This is Part 3 of The Token Scarcity Playbook — a technical comparison of the major frameworks built so you can score them against your own needs. It centers on a decision matrix rated on the criteria that actually drive selection — ease of use, multi-agent orchestration, model-agnostic capability, AWS-service and Amazon Bedrock AgentCore integration (incl

TomT

May 2310 min read

Making Applications Model-Agnostic by Design

A model-agnostic application is one you can move from one model to another — to capture a cheaper option or route per task — without rewriting logic or shipping silent regressions. Routing (Part 1) only pays off if your system survives the swaps. This is Part 2 of The Token Scarcity Playbook. "The schema is the contract; the model is an implementation detail." Table of Contents Why model swaps break things Designing for agnosticism Grouping models into tiers The benchmarks th

TomT

May 76 min read

How Model Routing Works: Techniques, Players, and Frameworks

Model routing is the practice of sending each request to the cheapest model that can actually handle it, instead of defaulting every task to the flagship. It is the single largest cost lever in the token scarcity era — and, done well, it improves quality and cost at the same time. This is Part 1 of The Token Scarcity Playbook. "The insight isn't that the cheap model beat the flagship. It's that routing beat brute force." Table of Contents The four approaches to routing Cascad

TomT

Apr 238 min read

The Token Scarcity Era: From AI Subsidy to the New Constraint

The token scarcity era is the moment the economics of building with AI inverted — when "use the biggest model for everything" stopped being clever and started being expensive. This is Part 0 of The Token Scarcity Playbook: why every team shipping AI in 2026 is suddenly counting tokens, and the tactics that follow. "Per-token pricing is the rate. Tokens-to-completion is the invoice." Table of Contents From subsidized inference to metered billing Demand growth outpacing compute

TomT

Apr 15 min read

AI Infrastructure Boom

How Energy, Memory, and Compute Are Converging Into the New Industrial Revolution AI is often described as “software,” but that framing hides the most important truth about this moment: modern AI behaves like an industrial system, not a digital product . Intelligence is no longer compiled once and distributed cheaply forever. It is manufactured continuously , in real time, using enormous amounts of electricity, silicon, memory, and physical infrastructure. The scale of this b

TomT

Jan 118 min read

Graph RAG: Knowledge Graphs for Multi-Hop Reasoning

Graph RAG - The RAG technique that uses knowledge graphs to enable multi-hop reasoning across entity relationships. This article explores how Graph RAG solves relational queries that traditional vector search cannot handle, when to use it, and how to implement it with Neo4j and other graph databases. For a comprehensive comparison of RAG frameworks including Graph RAG, see this research analysis . Key Topics: The multi-hop reasoning problem in traditional RAG How knowledge

TomT

Nov 25, 202516 min read

Contextual RAG: Anthropic's 67% Breakthrough for High-Stakes Accuracy

Context Contextual RAG - Anthropic's breakthrough technique that reduces retrieval failures by 67% through LLM-generated context augmentation. This article explores how Contextual RAG solves the ambiguous chunk problem, when to use it for high-stakes applications, and how to implement it for legal, medical, and financial use cases. For a comprehensive comparison of RAG frameworks including Contextual RAG, see this research analysis . Key Topics: The ambiguous chunk problem

TomT

Nov 18, 202514 min read

Hybrid RAG: The Production Standard for Enterprise Search

Context Hybrid RAG - The production-standard RAG technique that combines keyword search ( BM25 ) with vector similarity search . This article explores why Hybrid RAG has become the de facto standard for enterprise deployments, how it works, and when it delivers the best results. For a comprehensive comparison of RAG frameworks including Hybrid RAG, see this research analysis . Key Topics: Why Hybrid RAG: combining keywords and semantics BM25 keyword search fundamentals Recip

TomT

Nov 11, 202516 min read

Naive RAG: The Foundation of Retrieval-Augmented Generation

Context Naive RAG - The foundational RAG technique that combines vector similarity search with LLM generation. This article explores how Naive RAG works, when to use it, real-world applications, and why it remains the starting point for most RAG implementations despite its limitations. For a comprehensive comparison of RAG frameworks including Naive RAG, see this research analysis . Key Topics: Vector similarity search fundamentals Embedding models and vector databases Retr

TomT

Nov 4, 202512 min read

What Is RAG? Why Retrieval-Augmented Generation Is Transforming AI Applications

Context What Is RAG? - A comprehensive introduction to Retrieval-Augmented Generation (RAG) for technical practitioners new to the concept. This article explains why RAG exists, how it solves fundamental LLM limitations, and why it's become essential for production AI applications. Key Topics: The fundamental problem: LLM knowledge limitations What RAG is and how it works Why RAG matters: real-world impact RAG vs. fine-tuning: when to use each The RAG landscape: from simple

TomT

Oct 28, 202515 min read

AI Innovations in Healthcare: Reimagining Care from Digital Front Door to Drug Discovery

Every few decades, technology reshapes the foundation of healthcare. We saw it when electronic medical records replaced paper charts and again when mRNA redefined how we think about vaccines. Today, that seismic shift is being driven by artificial intelligence. Across clinics, labs, and trial sites, AI isn’t just crunching data — it’s rewiring how medicine is practiced, how drugs are developed, and how patients experience care. In the last year alone, over 22% of healthcare o

TomT

Oct 21, 202512 min read

Building Your First MLOps Pipeline: A Practical Step-by-Step Guide

"The first pipeline is never perfect. But a working pipeline you can improve beats a perfect design you'll never finish. Start simple, automate relentlessly, and iterate based on what breaks." Table of Contents The Weekend That Changed Everything The Six-Step Pipeline Architecture Step 1: Data Ingestion & Validation Step 2: Data Preprocessing & Feature Engineering Step 3: Model Training & Experiment Tracking Step 4: Model Evaluation & Validation Step 5: Model Deployment & Ser

TomT

Sep 5, 202516 min read

MLOps Fundamentals: Understanding the Complete Model Development Lifecycle

"The difference between a machine learning experiment and a production ML system is like the difference between a recipe scribbled on a napkin and a commercial kitchen that serves 10,000 meals a day. One is creative chaos; the other is systematic excellence." Table of Contents The $50 Million Jupyter Notebook What Is MLOps, Really? The Model Development Lifecycle: Six Phases That Matter Why Traditional DevOps Isn't Enough The Three Pillars: CI/CD/CT Governance: Not a Checkbox

TomT

Aug 26, 202520 min read