Agent Frameworks Compared: A Technical Feature-by-Feature Guide

TomT
May 23
10 min read

An agent framework is the harness that orchestrates a model's reasoning, tool calls, state, and multi-agent coordination. This is Part 3 of The Token Scarcity Playbook — a technical comparison of the major frameworks built so you can score them against your own needs. It centers on a decision matrix rated on the criteria that actually drive selection — ease of use, multi-agent orchestration, model-agnostic capability, AWS-service and Amazon Bedrock AgentCore integration (including AgentCore Memory), latency, and observability so you can weight the rows that matter to you.

The frameworks covered: LangGraph, LangChain, CrewAI, AWS Strands, Google ADK, LangChain deepagents, and Microsoft Agent Framework (the successor to AutoGen and Semantic Kernel).

What an agent framework does — the jobs to evaluate
Adoption and usage data
Development activity
Decision matrix — score against your needs
Capability reference
Context and memory management
Use-case fit
Per-framework assessment
Convergence trends — where the field is heading
Decision summary
References

What an agent framework does — the jobs to evaluate

An agent framework is far more than a router; it bundles the capabilities you would otherwise build yourself. These are the value areas to weigh when choosing — each is a row or a section later in this guide:

Orchestration & control flow — the core job, in two distinct shapes: horizontal routing (dispatch many diverse requests to the right tool or agent) and vertical, long-horizon execution (drive one complex task through planning, sub-agents, and many sequential tool calls). Most frameworks favor one; the matrix shows which. The field is converging on explicit, graph-based control (see Convergence trends).
State management & persistence — durable execution, checkpointing, and resume-after-failure, so long-running agents survive crashes and support rollback and human-in-the-loop pauses.
Memory & context management — short-term/conversational memory, cross-session long-term memory, summarization, and context-window control. This is the difference between an agent that remembers and one that forgets every turn; it's also the area most tied to AWS AgentCore (detailed in Context and memory management below).
Tool integration — function calling, OpenAPI, and MCP, with automatic schema generation and tool-access governance.
Multi-agent coordination — supervisor, hierarchical, swarm, and agent-as-tool patterns, plus cross-framework interoperability via the A2A protocol.
Model abstraction (provider-agnosticism) — a unified interface so you can route per task and swap models without rewriting logic (the subject of Part 2).
Human-in-the-loop & guardrails — approval gates, steering/interrupts, and policy enforcement before a tool runs.
Observability & evaluation — step-level tracing, replay debugging, and built-in evaluation harnesses.
Streaming & multimodality — token streaming and, increasingly, bidirectional audio/image/video.
Deployment & runtime hosting — where the agent actually runs, increasingly decoupled from the framework via managed runtimes

The decision matrix scores the frameworks on the subset of these that most often drives selection; the capability reference and the memory section give the detail.

Adoption and usage data (primary sources, 2026-06-08)

Framework	GitHub stars	Forks	PyPI downloads / month	First released	License
LangChain	138,831	22,989	301.6M	2022	MIT
AutoGen (→ MS Agent Framework)	58,782	8,874	1.4M (autogen-agentchat)	2023	MIT
CrewAI	53,089	7,424	14.9M	2023	MIT
LlamaIndex	50,012	7,524	11.8M	2022	MIT
LangGraph	34,197	5,745	56.1M	2023	MIT
Google ADK (adk-python)	20,028	3,533	26.0M	2025	Apache-2.0
AWS Strands	6,064	871	14.8M	2025	Apache-2.0

Development activity (how actively maintained)

Release cadence and recent commit volume are the clearest live signals of how actively a project is being developed. Captured from the GitHub API on 2026-06-09:

Framework	Latest release	Last commit	Commits (last 12 wk)	Releases (last 90 d)
Microsoft Agent Framework	python-1.8.0 (2026-06-04)	2026-06-08	515	30
Google ADK	v2.2.0 (2026-06-04)	2026-06-09	444	24
AWS Strands	v1.42.0 (2026-06-01)	2026-06-08	439	16
CrewAI	1.14.6 (2026-05-28)	2026-06-09	408	55
LangGraph	1.2.4 (2026-06-02)	2026-06-07	317	69
LlamaIndex	v0.14.22 (2026-05-14)	2026-06-04	144	6
AutoGen (legacy)	python-v0.7.5 (2025-09-30)	2026-04-15	2	0

Decision matrix — score against your needs

Legend: ● strong · ◐ moderate · ○ basic/weak · — none. Ratings are a qualitative read of each project's official documentation and documented integrations (2026-06-08). For every row, ● is better — including "low framework overhead," where ● means less overhead. Latency is directional: in practice the model and task dominate end-to-end latency far more than framework overhead does.

Selection criterion	LangGraph	CrewAI	AWS Strands	Google ADK	MS Agent Framework	LlamaIndex
Ease of use (authoring)	◐	●	●	◐	◐	◐
Multi-agent orchestration	●	●	●	●	●	◐
Explicit control / auditability	●	◐	◐	●	●	○
Model-agnostic capability	●	●	●	◐	◐	●
Integration with AWS services	◐	◐	●	○	○	◐
AgentCore Runtime support	●	●	●	●	◐	●
Context / memory management	●	●	●	●	◐	●
AgentCore Memory integration	◐	◐	●	◐	◐	◐
Active development (commits/12 wk)	● 317	● 408	● 439	● 444	● 515	◐ 144
Ease of use with AgentCore	◐	◐	●	◐	○	◐
Low framework overhead (latency)	●	○	●	◐	◐	◐
Observability	●	◐	●	◐	●	◐
Maturity / adoption	●	●	◐	◐	◐	●

How to read it: weight the rows that matter for your situation and compare columns. The AWS/AgentCore rows are the sharpest differentiators:

AgentCore Runtime support — AgentCore is framework-agnostic and officially documents CrewAI, LangGraph, LlamaIndex, Google ADK, OpenAI Agents SDK, and AWS Strands. Microsoft Agent Framework can run as custom code but isn't AWS-documented, hence ◐.
AgentCore Memory integration — only Strands ships a native helper, the AgentCoreMemorySessionManager, wiring short- and long-term AgentCore Memory through its hook system (AWS — Strands SDK Memory; Strands docs). Every other framework can use AgentCore Memory, but through the MemoryClient API directly (◐) rather than a built-in integration.
Integration with AWS services — Strands is AWS-native (Bedrock, AgentCore, Lambda/Fargate/EKS) (AWS Prescriptive Guidance); LangGraph/CrewAI/LlamaIndex integrate well via Bedrock + the AgentCore SDK; ADK and Microsoft Agent Framework are GCP- and Azure-native respectively.
Model-agnostic capability — LangGraph, CrewAI, Strands, and LlamaIndex route to any provider; ADK and Microsoft Agent Framework support multiple providers but lean toward Gemini and Azure OpenAI.

Capability reference (from official docs)

Dimension	LangGraph	CrewAI	Google ADK	AWS Strands	MS Agent Framework
Orchestration model	Low-level graph (StateGraph); explicit nodes + conditional edges	Role-based Crews + event-driven Flows	LLM agents + Workflow agents (Sequential/Parallel/Loop) + Graph Workflows (ADK 2.0)	Model-driven agent loop + multi-agent primitives	Agents + graph Workflows (type-safe routing)
Control style	Explicit, deterministic	Higher-level abstraction (agent personas)	Model-driven + deterministic graphs (2.0)	Autonomous (model decides the loop)	Explicit workflows alongside autonomous agents
State / persistence / HITL	Durable execution, checkpointing, human-in-the-loop, short + long-term memory	Flow-level state & event control	Context mgmt: auto filtering, summarization, lazy-loading; sessions	Conversation managers (Summarizing, SlidingWindow); session state	Session state, checkpointing, human-in-the-loop
Multi-agent primitives	Supervisor / graph patterns	Crews (teams); sequential & hierarchical process	Workflow agents; hierarchical delegation; A2A	Agent-as-tool, Swarm, graph, workflow	Multi-agent workflows with type-safe routing
Long-horizon / context	via deepagents layer (see below)	Token-optimized execution	Auto summarization + artifact lazy-loading	Summarizing conversation manager	Context providers (memory)
Tools & MCP	LangChain tools + langchain-mcp-adapters	Flexible tools; MCP	Function / OpenAPI / MCP tools; Google Search grounding	Native MCP (MCPClient); @tool decorators	Tools + hosted MCP servers
Model providers / lock-in	Any (LangChain / LiteLLM) — none	Any (LiteLLM) — none	Gemini-native + LiteLLM + Anthropic — GCP-leaning	Any model, any cloud — none	Foundry / Azure OpenAI / OpenAI / Anthropic / Ollama — Azure-leaning
Streaming / multimodal	Streaming	—	Gemini Live: bidirectional audio/image/video	Streaming	Streaming
Observability	LangSmith (deep tracing, replay)	Third-party	adk eval + Cloud Trace	OpenTelemetry + hooks	Telemetry (OpenTelemetry)
Deployment	LangGraph Platform; any runtime	CrewAI Cloud or self-host	Cloud Run / GKE / Vertex Agent Engine	AgentCore / Lambda / Fargate / EKS / Docker	Azure or self-host
Cross-framework interop	Limited	—	A2A native (150+ orgs)	Runs on AgentCore beside other frameworks	—
Languages	Python (+ JS)	Python	Python, TypeScript, Go, Java, Kotlin	Python, TypeScript	.NET, Python
License	MIT	MIT	Apache-2.0	Apache-2.0	MIT

LangChain is the foundational library beneath much of this — the broad, batteries-included ecosystem (138.8K stars) from which LangGraph emerged; it's the fastest way to prototype and integrate, at the cost of heavier default token and memory overhead. deepagents is LangChain's long-horizon harness — write_todos planning, filesystem context-offload, sub-agent spawning, and auto-summarization layered on the LangGraph runtime — purpose-built for vertical single-task depth rather than horizontal routing. LlamaIndex remains the strongest option when the workload is RAG- and data-connector-heavy.

Context and memory management

Memory is where these frameworks differ in kind, not just degree — and it's the area most tied to AWS AgentCore.

LangGraph — memory is its core competency. Checkpointers save a state snapshot at every super-step into threads (short-term/conversational memory), with get_state/update_state and tunable durability modes; the Store interface adds cross-thread long-term memory with semantic search over embeddings. (persistence & memory docs)
CrewAI — a unified Memory class (it consolidated short-term, long-term, entity, and external memory into one), backed by LanceDB, with LLM-inferred scope/importance and adaptive recall combining semantic similarity, recency, and importance scores. (memory docs)
Google ADK — Session/State for short-term context, plus a pluggable MemoryService: InMemoryMemoryService for prototyping or the managed Vertex AI Memory Bank (VertexAiMemoryBankService), which uses Gemini to extract durable memories; retrieval via PreloadMemoryTool/LoadMemoryTool. (sessions & memory docs)
AWS Strands — built-in conversation managers (SummarizingConversationManager, SlidingWindowConversationManager) for context-window control, and — uniquely — a native Amazon Bedrock AgentCore Memory integration via the AgentCoreMemorySessionManager, wiring short- and long-term AgentCore Memory through the hook system. (Strands AgentCore Memory · AWS docs)
Microsoft Agent Framework — context providers supply agent memory alongside session-based state management. (overview)
LlamaIndex — strongest where memory is retrieval: chat memory buffers layered on its indexing/RAG core. (docs)

Best at context/memory generally: LangGraph (first-class state + cross-thread Store) and Google ADK (managed Vertex AI Memory Bank) lead for durable, searchable long-term memory; CrewAI's unified semantic memory is the easiest to adopt.

Native AgentCore Memory: AWS Strands is the only framework with a first-class AgentCore Memory integration (AgentCoreMemorySessionManager). Every other framework can use AgentCore Memory — it's a managed AWS service callable through the MemoryClient API, but without a built-in helper, so you wire it up yourself. If AgentCore Memory is central to your design, Strands is the path of least resistance.

Use-case fit

If you need…	Best fit	Why
Explicit, auditable control; HITL approval gates; compliance trails	LangGraph	Low-level graph with durable state, checkpointing, and named routing nodes you can test and log
Fast role-based multi-agent prototyping	CrewAI	Crews + Flows let you define agent personas and event-driven control without graph theory
One long-horizon task (deep research, large code changes)	deepagents (on LangGraph)	Todos, filesystem offload, sub-agents, and summarization for vertical depth
Fastest path on AWS / Bedrock	AWS Strands	AWS-native, native MCP, and one-step AgentCore/Lambda/Fargate deployment
GCP + Gemini multimodal, or a multi-framework estate	Google ADK	Gemini Live (bidi audio/video), A2A interop (150+ orgs), Vertex Agent Engine
.NET / Azure enterprise environments	Microsoft Agent Framework	First-class .NET + Python, Azure integration; AutoGen + Semantic Kernel successor
Broad integrations or RAG-heavy pipelines	LangChain / LlamaIndex	Largest connector and retrieval ecosystems

Per-framework assessment

LangGraph — the control-first choice and the adoption leader on production pull (56.1M downloads/month). Its low-level StateGraph, durable execution, and checkpointing make routing logic explicit, testable, and auditable; you pay for that in boilerplate and a steeper curve. Best where correctness, rollback, and human-in-the-loop matter more than speed of authoring. (docs)

CrewAI — the velocity choice (53.1K stars). Crews model teams of role-playing agents; Flows add event-driven state and control. It's standalone and token-conscious, and gets a working multi-agent system up quickly. The trade-off is less explicit low-level control than a graph gives you. (docs)

Google ADK — the most feature-complete on multimodality and interoperability (20.0K stars, 26.0M downloads/month, Apache-2.0). ADK 2.0's graph runtime closed much of the deterministic-control gap with LangGraph; Gemini Live and the A2A protocol are genuine differentiators. Strongest for GCP shops and cross-framework estates; leans toward Gemini/GCP. (docs)

AWS Strands — the AWS-native choice (6.1K stars but 14.8M downloads/month, reflecting AWS-backed production use). A model-driven loop with native MCP, rich multi-agent primitives (swarm, graph, agent-as-tool), OpenTelemetry, and deployment straight to AgentCore/Lambda/Fargate. The trade-off is less explicit routing transparency than a graph framework. (docs)

Microsoft Agent Framework — the Azure/.NET choice and the direct successor to AutoGen + Semantic Kernel (GA, Feb 2026). It pairs simple agent abstractions with graph-based workflows (type-safe routing, checkpointing, HITL), MCP, and first-class .NET + Python. Best where the estate is Azure-centric. (docs)

Convergence trends — where the field is heading

The frameworks are converging faster than their branding suggests. Five trends stand out:

Explicit, deterministic graph control is becoming universal. Early frameworks were model-driven — the LLM decided the loop. For production reliability and auditability, everyone is now wrapping that autonomy in deterministic, graph-based orchestration. The clearest example is Google ADK: it began hierarchical and model-driven, then shipped Graph Workflows in 2.0 (2026-05-19) — a graph execution engine with routing, fan-out/fan-in, loops, retries, and explicit state, pitched as "weave deterministic code with adaptive AI reasoning" (ADK docs · ADK 2.0 release). That is a direct convergence on the StateGraph + conditional-edges model LangGraph pioneered. Microsoft Agent Framework did the same with graph Workflows (type-safe routing, checkpointing), and CrewAI added event-driven Flows beside its autonomous Crews. The field has effectively agreed that production agents need explicit control around autonomous reasoning — LangGraph's original thesis.
MCP is the universal tool protocol. Every framework here speaks Model Context Protocol, so tools are increasingly portable across frameworks (ADK MCP tools · Strands MCP · Microsoft Agent Framework MCP).
A2A is standardizing cross-framework interop — 150+ organizations, letting agents built in different frameworks call one another.
Hosting is decoupling from the framework. Managed runtimes — AWS AgentCore, Vertex AI Agent Engine, and Azure — run any framework, so framework choice and deployment target are now independent decisions.
Model-agnostic is the default. LiteLLM (or equivalent) is table stakes; single-provider lock-in is now the exception.

The practical consequence: frameworks are differentiating less on core orchestration capability — which is converging on the LangGraph-style explicit graph — and more on ecosystem, cloud-nativeness, and ergonomics. Increasingly you pick for your cloud and your team's style, not for a capability only one framework has.

Decision summary

Compliance / explicit control / AWS-agnostic → LangGraph (+ deepagents for long single tasks).
Rapid multi-agent prototyping → CrewAI.
AWS / Bedrock → AWS Strands.
GCP / Gemini multimodal / A2A interop → Google ADK.
Azure / .NET → Microsoft Agent Framework.
RAG / broad integrations → LangChain or LlamaIndex.

One framework-independent point: all of these run on managed agent runtimes such as AWS AgentCore, so the framework and the hosting are separate decisions — which is what Part 4 assembles into a production system.

Next → Part 4: Building a Multi-Agent Orchestration System.