Agent Frameworks Compared: A Technical Feature-by-Feature Guide
- TomT
- May 23
- 10 min read
An agent framework is the harness that orchestrates a model's reasoning, tool calls, state, and multi-agent coordination. This is Part 3 of The Token Scarcity Playbook — a technical comparison of the major frameworks built so you can score them against your own needs. It centers on a decision matrix rated on the criteria that actually drive selection — ease of use, multi-agent orchestration, model-agnostic capability, AWS-service and Amazon Bedrock AgentCore integration (including AgentCore Memory), latency, and observability so you can weight the rows that matter to you.
The frameworks covered: LangGraph, LangChain, CrewAI, AWS Strands, Google ADK, LangChain deepagents, and Microsoft Agent Framework (the successor to AutoGen and Semantic Kernel).
Table of Contents
What an agent framework does — the jobs to evaluate
An agent framework is far more than a router; it bundles the capabilities you would otherwise build yourself. These are the value areas to weigh when choosing — each is a row or a section later in this guide:
Orchestration & control flow — the core job, in two distinct shapes: horizontal routing (dispatch many diverse requests to the right tool or agent) and vertical, long-horizon execution (drive one complex task through planning, sub-agents, and many sequential tool calls). Most frameworks favor one; the matrix shows which. The field is converging on explicit, graph-based control (see Convergence trends).
State management & persistence — durable execution, checkpointing, and resume-after-failure, so long-running agents survive crashes and support rollback and human-in-the-loop pauses.
Memory & context management — short-term/conversational memory, cross-session long-term memory, summarization, and context-window control. This is the difference between an agent that remembers and one that forgets every turn; it's also the area most tied to AWS AgentCore (detailed in Context and memory management below).
Tool integration — function calling, OpenAPI, and MCP, with automatic schema generation and tool-access governance.
Multi-agent coordination — supervisor, hierarchical, swarm, and agent-as-tool patterns, plus cross-framework interoperability via the A2A protocol.
Model abstraction (provider-agnosticism) — a unified interface so you can route per task and swap models without rewriting logic (the subject of Part 2).
Human-in-the-loop & guardrails — approval gates, steering/interrupts, and policy enforcement before a tool runs.
Observability & evaluation — step-level tracing, replay debugging, and built-in evaluation harnesses.
Streaming & multimodality — token streaming and, increasingly, bidirectional audio/image/video.
Deployment & runtime hosting — where the agent actually runs, increasingly decoupled from the framework via managed runtimes
The decision matrix scores the frameworks on the subset of these that most often drives selection; the capability reference and the memory section give the detail.
Adoption and usage data (primary sources, 2026-06-08)
Framework | GitHub stars | Forks | PyPI downloads / month | First released | License |
LangChain | 22,989 | 2022 | MIT | ||
AutoGen (→ MS Agent Framework) | 8,874 | 1.4M (autogen-agentchat) | 2023 | MIT | |
CrewAI | 7,424 | 2023 | MIT | ||
LlamaIndex | 7,524 | 2022 | MIT | ||
LangGraph | 5,745 | 2023 | MIT | ||
Google ADK (adk-python) | 3,533 | 2025 | Apache-2.0 | ||
AWS Strands | 871 | 2025 | Apache-2.0 |
Development activity (how actively maintained)
Release cadence and recent commit volume are the clearest live signals of how actively a project is being developed. Captured from the GitHub API on 2026-06-09:
Framework | Latest release | Last commit | Commits (last 12 wk) | Releases (last 90 d) |
python-1.8.0 (2026-06-04) | 2026-06-08 | 515 | 30 | |
v2.2.0 (2026-06-04) | 2026-06-09 | 444 | 24 | |
v1.42.0 (2026-06-01) | 2026-06-08 | 439 | 16 | |
1.14.6 (2026-05-28) | 2026-06-09 | 408 | 55 | |
1.2.4 (2026-06-02) | 2026-06-07 | 317 | 69 | |
v0.14.22 (2026-05-14) | 2026-06-04 | 144 | 6 | |
AutoGen (legacy) | python-v0.7.5 (2025-09-30) | 2026-04-15 | 2 | 0 |
Decision matrix — score against your needs
Legend: ● strong · ◐ moderate · ○ basic/weak · — none. Ratings are a qualitative read of each project's official documentation and documented integrations (2026-06-08). For every row, ● is better — including "low framework overhead," where ● means less overhead. Latency is directional: in practice the model and task dominate end-to-end latency far more than framework overhead does.
Selection criterion | ||||||
Ease of use (authoring) | ◐ | ● | ● | ◐ | ◐ | ◐ |
Multi-agent orchestration | ● | ● | ● | ● | ● | ◐ |
Explicit control / auditability | ● | ◐ | ◐ | ● | ● | ○ |
Model-agnostic capability | ● | ● | ● | ◐ | ◐ | ● |
Integration with AWS services | ◐ | ◐ | ● | ○ | ○ | ◐ |
AgentCore Runtime support | ● | ● | ● | ● | ◐ | ● |
Context / memory management | ● | ● | ● | ● | ◐ | ● |
AgentCore Memory integration | ◐ | ◐ | ● | ◐ | ◐ | ◐ |
Active development (commits/12 wk) | ● 317 | ● 408 | ● 439 | ● 444 | ● 515 | ◐ 144 |
Ease of use with AgentCore | ◐ | ◐ | ● | ◐ | ○ | ◐ |
Low framework overhead (latency) | ● | ○ | ● | ◐ | ◐ | ◐ |
Observability | ● | ◐ | ● | ◐ | ● | ◐ |
Maturity / adoption | ● | ● | ◐ | ◐ | ◐ | ● |
How to read it: weight the rows that matter for your situation and compare columns. The AWS/AgentCore rows are the sharpest differentiators:
AgentCore Runtime support — AgentCore is framework-agnostic and officially documents CrewAI, LangGraph, LlamaIndex, Google ADK, OpenAI Agents SDK, and AWS Strands. Microsoft Agent Framework can run as custom code but isn't AWS-documented, hence ◐.
AgentCore Memory integration — only Strands ships a native helper, the AgentCoreMemorySessionManager, wiring short- and long-term AgentCore Memory through its hook system (AWS — Strands SDK Memory; Strands docs). Every other framework can use AgentCore Memory, but through the MemoryClient API directly (◐) rather than a built-in integration.
Integration with AWS services — Strands is AWS-native (Bedrock, AgentCore, Lambda/Fargate/EKS) (AWS Prescriptive Guidance); LangGraph/CrewAI/LlamaIndex integrate well via Bedrock + the AgentCore SDK; ADK and Microsoft Agent Framework are GCP- and Azure-native respectively.
Model-agnostic capability — LangGraph, CrewAI, Strands, and LlamaIndex route to any provider; ADK and Microsoft Agent Framework support multiple providers but lean toward Gemini and Azure OpenAI.
Capability reference (from official docs)
Dimension | |||||
Orchestration model | Low-level graph (StateGraph); explicit nodes + conditional edges | Role-based Crews + event-driven Flows | LLM agents + Workflow agents (Sequential/Parallel/Loop) + Graph Workflows (ADK 2.0) | Model-driven agent loop + multi-agent primitives | Agents + graph Workflows (type-safe routing) |
Control style | Explicit, deterministic | Higher-level abstraction (agent personas) | Model-driven + deterministic graphs (2.0) | Autonomous (model decides the loop) | Explicit workflows alongside autonomous agents |
State / persistence / HITL | Durable execution, checkpointing, human-in-the-loop, short + long-term memory | Flow-level state & event control | Context mgmt: auto filtering, summarization, lazy-loading; sessions | Conversation managers (Summarizing, SlidingWindow); session state | Session state, checkpointing, human-in-the-loop |
Multi-agent primitives | Supervisor / graph patterns | Crews (teams); sequential & hierarchical process | Workflow agents; hierarchical delegation; A2A | Agent-as-tool, Swarm, graph, workflow | Multi-agent workflows with type-safe routing |
Long-horizon / context | via deepagents layer (see below) | Token-optimized execution | Auto summarization + artifact lazy-loading | Summarizing conversation manager | Context providers (memory) |
Tools & MCP | LangChain tools + langchain-mcp-adapters | Flexible tools; MCP | Function / OpenAPI / MCP tools; Google Search grounding | Native MCP (MCPClient); @tool decorators | Tools + hosted MCP servers |
Model providers / lock-in | Any (LangChain / LiteLLM) — none | Any (LiteLLM) — none | Gemini-native + LiteLLM + Anthropic — GCP-leaning | Any model, any cloud — none | Foundry / Azure OpenAI / OpenAI / Anthropic / Ollama — Azure-leaning |
Streaming / multimodal | Streaming | — | Gemini Live: bidirectional audio/image/video | Streaming | Streaming |
Observability | LangSmith (deep tracing, replay) | Third-party | adk eval + Cloud Trace | OpenTelemetry + hooks | Telemetry (OpenTelemetry) |
Deployment | LangGraph Platform; any runtime | CrewAI Cloud or self-host | Cloud Run / GKE / Vertex Agent Engine | AgentCore / Lambda / Fargate / EKS / Docker | Azure or self-host |
Cross-framework interop | Limited | — | A2A native (150+ orgs) | Runs on AgentCore beside other frameworks | — |
Languages | Python (+ JS) | Python | Python, TypeScript, Go, Java, Kotlin | Python, TypeScript | .NET, Python |
License | MIT | MIT | Apache-2.0 | Apache-2.0 | MIT |
LangChain is the foundational library beneath much of this — the broad, batteries-included ecosystem (138.8K stars) from which LangGraph emerged; it's the fastest way to prototype and integrate, at the cost of heavier default token and memory overhead. deepagents is LangChain's long-horizon harness — write_todos planning, filesystem context-offload, sub-agent spawning, and auto-summarization layered on the LangGraph runtime — purpose-built for vertical single-task depth rather than horizontal routing. LlamaIndex remains the strongest option when the workload is RAG- and data-connector-heavy.
Context and memory management
Memory is where these frameworks differ in kind, not just degree — and it's the area most tied to AWS AgentCore.
LangGraph — memory is its core competency. Checkpointers save a state snapshot at every super-step into threads (short-term/conversational memory), with get_state/update_state and tunable durability modes; the Store interface adds cross-thread long-term memory with semantic search over embeddings. (persistence & memory docs)
CrewAI — a unified Memory class (it consolidated short-term, long-term, entity, and external memory into one), backed by LanceDB, with LLM-inferred scope/importance and adaptive recall combining semantic similarity, recency, and importance scores. (memory docs)
Google ADK — Session/State for short-term context, plus a pluggable MemoryService: InMemoryMemoryService for prototyping or the managed Vertex AI Memory Bank (VertexAiMemoryBankService), which uses Gemini to extract durable memories; retrieval via PreloadMemoryTool/LoadMemoryTool. (sessions & memory docs)
AWS Strands — built-in conversation managers (SummarizingConversationManager, SlidingWindowConversationManager) for context-window control, and — uniquely — a native Amazon Bedrock AgentCore Memory integration via the AgentCoreMemorySessionManager, wiring short- and long-term AgentCore Memory through the hook system. (Strands AgentCore Memory · AWS docs)
Microsoft Agent Framework — context providers supply agent memory alongside session-based state management. (overview)
LlamaIndex — strongest where memory is retrieval: chat memory buffers layered on its indexing/RAG core. (docs)
Best at context/memory generally: LangGraph (first-class state + cross-thread Store) and Google ADK (managed Vertex AI Memory Bank) lead for durable, searchable long-term memory; CrewAI's unified semantic memory is the easiest to adopt.
Native AgentCore Memory: AWS Strands is the only framework with a first-class AgentCore Memory integration (AgentCoreMemorySessionManager). Every other framework can use AgentCore Memory — it's a managed AWS service callable through the MemoryClient API, but without a built-in helper, so you wire it up yourself. If AgentCore Memory is central to your design, Strands is the path of least resistance.
Use-case fit
If you need… | Best fit | Why |
Explicit, auditable control; HITL approval gates; compliance trails | LangGraph | Low-level graph with durable state, checkpointing, and named routing nodes you can test and log |
Fast role-based multi-agent prototyping | CrewAI | Crews + Flows let you define agent personas and event-driven control without graph theory |
One long-horizon task (deep research, large code changes) | deepagents (on LangGraph) | Todos, filesystem offload, sub-agents, and summarization for vertical depth |
Fastest path on AWS / Bedrock | AWS Strands | AWS-native, native MCP, and one-step AgentCore/Lambda/Fargate deployment |
GCP + Gemini multimodal, or a multi-framework estate | Google ADK | Gemini Live (bidi audio/video), A2A interop (150+ orgs), Vertex Agent Engine |
.NET / Azure enterprise environments | Microsoft Agent Framework | First-class .NET + Python, Azure integration; AutoGen + Semantic Kernel successor |
Broad integrations or RAG-heavy pipelines | LangChain / LlamaIndex | Largest connector and retrieval ecosystems |
Per-framework assessment
LangGraph — the control-first choice and the adoption leader on production pull (56.1M downloads/month). Its low-level StateGraph, durable execution, and checkpointing make routing logic explicit, testable, and auditable; you pay for that in boilerplate and a steeper curve. Best where correctness, rollback, and human-in-the-loop matter more than speed of authoring. (docs)
CrewAI — the velocity choice (53.1K stars). Crews model teams of role-playing agents; Flows add event-driven state and control. It's standalone and token-conscious, and gets a working multi-agent system up quickly. The trade-off is less explicit low-level control than a graph gives you. (docs)
Google ADK — the most feature-complete on multimodality and interoperability (20.0K stars, 26.0M downloads/month, Apache-2.0). ADK 2.0's graph runtime closed much of the deterministic-control gap with LangGraph; Gemini Live and the A2A protocol are genuine differentiators. Strongest for GCP shops and cross-framework estates; leans toward Gemini/GCP. (docs)
AWS Strands — the AWS-native choice (6.1K stars but 14.8M downloads/month, reflecting AWS-backed production use). A model-driven loop with native MCP, rich multi-agent primitives (swarm, graph, agent-as-tool), OpenTelemetry, and deployment straight to AgentCore/Lambda/Fargate. The trade-off is less explicit routing transparency than a graph framework. (docs)
Microsoft Agent Framework — the Azure/.NET choice and the direct successor to AutoGen + Semantic Kernel (GA, Feb 2026). It pairs simple agent abstractions with graph-based workflows (type-safe routing, checkpointing, HITL), MCP, and first-class .NET + Python. Best where the estate is Azure-centric. (docs)
Convergence trends — where the field is heading
The frameworks are converging faster than their branding suggests. Five trends stand out:
Explicit, deterministic graph control is becoming universal. Early frameworks were model-driven — the LLM decided the loop. For production reliability and auditability, everyone is now wrapping that autonomy in deterministic, graph-based orchestration. The clearest example is Google ADK: it began hierarchical and model-driven, then shipped Graph Workflows in 2.0 (2026-05-19) — a graph execution engine with routing, fan-out/fan-in, loops, retries, and explicit state, pitched as "weave deterministic code with adaptive AI reasoning" (ADK docs · ADK 2.0 release). That is a direct convergence on the StateGraph + conditional-edges model LangGraph pioneered. Microsoft Agent Framework did the same with graph Workflows (type-safe routing, checkpointing), and CrewAI added event-driven Flows beside its autonomous Crews. The field has effectively agreed that production agents need explicit control around autonomous reasoning — LangGraph's original thesis.
MCP is the universal tool protocol. Every framework here speaks Model Context Protocol, so tools are increasingly portable across frameworks (ADK MCP tools · Strands MCP · Microsoft Agent Framework MCP).
A2A is standardizing cross-framework interop — 150+ organizations, letting agents built in different frameworks call one another.
Hosting is decoupling from the framework. Managed runtimes — AWS AgentCore, Vertex AI Agent Engine, and Azure — run any framework, so framework choice and deployment target are now independent decisions.
Model-agnostic is the default. LiteLLM (or equivalent) is table stakes; single-provider lock-in is now the exception.
The practical consequence: frameworks are differentiating less on core orchestration capability — which is converging on the LangGraph-style explicit graph — and more on ecosystem, cloud-nativeness, and ergonomics. Increasingly you pick for your cloud and your team's style, not for a capability only one framework has.
Decision summary
Compliance / explicit control / AWS-agnostic → LangGraph (+ deepagents for long single tasks).
Rapid multi-agent prototyping → CrewAI.
AWS / Bedrock → AWS Strands.
GCP / Gemini multimodal / A2A interop → Google ADK.
Azure / .NET → Microsoft Agent Framework.
RAG / broad integrations → LangChain or LlamaIndex.
One framework-independent point: all of these run on managed agent runtimes such as AWS AgentCore, so the framework and the hosting are separate decisions — which is what Part 4 assembles into a production system.
References
Official documentation
AWS AgentCore integration
Context & memory documentation
14. CrewAI — memory
Usage & development-activity data (primary)

