top of page

Agent Frameworks Compared: A Technical Feature-by-Feature Guide

  • TomT
  • May 23
  • 10 min read

An agent framework is the harness that orchestrates a model's reasoning, tool calls, state, and multi-agent coordination. This is Part 3 of The Token Scarcity Playbook — a technical comparison of the major frameworks built so you can score them against your own needs. It centers on a decision matrix rated on the criteria that actually drive selection — ease of use, multi-agent orchestration, model-agnostic capability, AWS-service and Amazon Bedrock AgentCore integration (including AgentCore Memory), latency, and observability so you can weight the rows that matter to you.


The frameworks covered: LangGraph, LangChain, CrewAI, AWS Strands, Google ADK, LangChain deepagents, and Microsoft Agent Framework (the successor to AutoGen and Semantic Kernel).


Table of Contents


What an agent framework does — the jobs to evaluate

An agent framework is far more than a router; it bundles the capabilities you would otherwise build yourself. These are the value areas to weigh when choosing — each is a row or a section later in this guide:

  • Orchestration & control flow — the core job, in two distinct shapes: horizontal routing (dispatch many diverse requests to the right tool or agent) and vertical, long-horizon execution (drive one complex task through planning, sub-agents, and many sequential tool calls). Most frameworks favor one; the matrix shows which. The field is converging on explicit, graph-based control (see Convergence trends).

  • State management & persistence — durable execution, checkpointing, and resume-after-failure, so long-running agents survive crashes and support rollback and human-in-the-loop pauses.

  • Memory & context management — short-term/conversational memory, cross-session long-term memory, summarization, and context-window control. This is the difference between an agent that remembers and one that forgets every turn; it's also the area most tied to AWS AgentCore (detailed in Context and memory management below).

  • Tool integration — function calling, OpenAPI, and MCP, with automatic schema generation and tool-access governance.

  • Multi-agent coordination — supervisor, hierarchical, swarm, and agent-as-tool patterns, plus cross-framework interoperability via the A2A protocol.

  • Model abstraction (provider-agnosticism) — a unified interface so you can route per task and swap models without rewriting logic (the subject of Part 2).

  • Human-in-the-loop & guardrails — approval gates, steering/interrupts, and policy enforcement before a tool runs.

  • Observability & evaluation — step-level tracing, replay debugging, and built-in evaluation harnesses.

  • Streaming & multimodality — token streaming and, increasingly, bidirectional audio/image/video.

  • Deployment & runtime hosting — where the agent actually runs, increasingly decoupled from the framework via managed runtimes

The decision matrix scores the frameworks on the subset of these that most often drives selection; the capability reference and the memory section give the detail.


Adoption and usage data (primary sources, 2026-06-08)

Framework

GitHub stars

Forks

PyPI downloads / month

First released

License

LangChain

22,989

2022

MIT

AutoGen (→ MS Agent Framework)

8,874

1.4M (autogen-agentchat)

2023

MIT

CrewAI

7,424

2023

MIT

LlamaIndex

7,524

2022

MIT

LangGraph

5,745

2023

MIT

Google ADK (adk-python)

3,533

2025

Apache-2.0

AWS Strands

871

2025

Apache-2.0


Development activity (how actively maintained)

Release cadence and recent commit volume are the clearest live signals of how actively a project is being developed. Captured from the GitHub API on 2026-06-09:

Framework

Latest release

Last commit

Commits (last 12 wk)

Releases (last 90 d)

python-1.8.0 (2026-06-04)

2026-06-08

515

30

v2.2.0 (2026-06-04)

2026-06-09

444

24

v1.42.0 (2026-06-01)

2026-06-08

439

16

1.14.6 (2026-05-28)

2026-06-09

408

55

1.2.4 (2026-06-02)

2026-06-07

317

69

v0.14.22 (2026-05-14)

2026-06-04

144

6

AutoGen (legacy)

python-v0.7.5 (2025-09-30)

2026-04-15

2

0


Decision matrix — score against your needs

Legend: ● strong · ◐ moderate · ○ basic/weak · — none. Ratings are a qualitative read of each project's official documentation and documented integrations (2026-06-08). For every row, ● is better — including "low framework overhead," where ● means less overhead. Latency is directional: in practice the model and task dominate end-to-end latency far more than framework overhead does.

Selection criterion

Ease of use (authoring)

Multi-agent orchestration

Explicit control / auditability

Model-agnostic capability

Integration with AWS services

AgentCore Runtime support

Context / memory management

AgentCore Memory integration

Active development (commits/12 wk)

● 317

● 408

● 439

● 444

● 515

◐ 144

Ease of use with AgentCore

Low framework overhead (latency)

Observability

Maturity / adoption

How to read it: weight the rows that matter for your situation and compare columns. The AWS/AgentCore rows are the sharpest differentiators:

  • AgentCore Runtime support — AgentCore is framework-agnostic and officially documents CrewAI, LangGraph, LlamaIndex, Google ADK, OpenAI Agents SDK, and AWS Strands. Microsoft Agent Framework can run as custom code but isn't AWS-documented, hence ◐.

  • AgentCore Memory integration — only Strands ships a native helper, the AgentCoreMemorySessionManager, wiring short- and long-term AgentCore Memory through its hook system (AWS — Strands SDK Memory; Strands docs). Every other framework can use AgentCore Memory, but through the MemoryClient API directly (◐) rather than a built-in integration.

  • Integration with AWS services — Strands is AWS-native (Bedrock, AgentCore, Lambda/Fargate/EKS) (AWS Prescriptive Guidance); LangGraph/CrewAI/LlamaIndex integrate well via Bedrock + the AgentCore SDK; ADK and Microsoft Agent Framework are GCP- and Azure-native respectively.

  • Model-agnostic capability — LangGraph, CrewAI, Strands, and LlamaIndex route to any provider; ADK and Microsoft Agent Framework support multiple providers but lean toward Gemini and Azure OpenAI.


Capability reference (from official docs)

Dimension

Orchestration model

Low-level graph (StateGraph); explicit nodes + conditional edges

Role-based Crews + event-driven Flows

LLM agents + Workflow agents (Sequential/Parallel/Loop) + Graph Workflows (ADK 2.0)

Model-driven agent loop + multi-agent primitives

Agents + graph Workflows (type-safe routing)

Control style

Explicit, deterministic

Higher-level abstraction (agent personas)

Model-driven + deterministic graphs (2.0)

Autonomous (model decides the loop)

Explicit workflows alongside autonomous agents

State / persistence / HITL

Durable execution, checkpointing, human-in-the-loop, short + long-term memory

Flow-level state & event control

Context mgmt: auto filtering, summarization, lazy-loading; sessions

Conversation managers (Summarizing, SlidingWindow); session state

Session state, checkpointing, human-in-the-loop

Multi-agent primitives

Supervisor / graph patterns

Crews (teams); sequential & hierarchical process

Workflow agents; hierarchical delegation; A2A

Agent-as-tool, Swarm, graph, workflow

Multi-agent workflows with type-safe routing

Long-horizon / context

via deepagents layer (see below)

Token-optimized execution

Auto summarization + artifact lazy-loading

Summarizing conversation manager

Context providers (memory)

Tools & MCP

LangChain tools + langchain-mcp-adapters

Flexible tools; MCP

Function / OpenAPI / MCP tools; Google Search grounding

Native MCP (MCPClient); @tool decorators

Tools + hosted MCP servers

Model providers / lock-in

Any (LangChain / LiteLLM) — none

Any (LiteLLM) — none

Gemini-native + LiteLLM + Anthropic — GCP-leaning

Any model, any cloud — none

Foundry / Azure OpenAI / OpenAI / Anthropic / Ollama — Azure-leaning

Streaming / multimodal

Streaming

Gemini Live: bidirectional audio/image/video

Streaming

Streaming

Observability

LangSmith (deep tracing, replay)

Third-party

adk eval + Cloud Trace

OpenTelemetry + hooks

Telemetry (OpenTelemetry)

Deployment

LangGraph Platform; any runtime

CrewAI Cloud or self-host

Cloud Run / GKE / Vertex Agent Engine

AgentCore / Lambda / Fargate / EKS / Docker

Azure or self-host

Cross-framework interop

Limited

A2A native (150+ orgs)

Runs on AgentCore beside other frameworks

Languages

Python (+ JS)

Python

Python, TypeScript, Go, Java, Kotlin

Python, TypeScript

.NET, Python

License

MIT

MIT

Apache-2.0

Apache-2.0

MIT

LangChain is the foundational library beneath much of this — the broad, batteries-included ecosystem (138.8K stars) from which LangGraph emerged; it's the fastest way to prototype and integrate, at the cost of heavier default token and memory overhead. deepagents is LangChain's long-horizon harness — write_todos planning, filesystem context-offload, sub-agent spawning, and auto-summarization layered on the LangGraph runtime — purpose-built for vertical single-task depth rather than horizontal routing. LlamaIndex remains the strongest option when the workload is RAG- and data-connector-heavy.


Context and memory management

Memory is where these frameworks differ in kind, not just degree — and it's the area most tied to AWS AgentCore.

  • LangGraph — memory is its core competency. Checkpointers save a state snapshot at every super-step into threads (short-term/conversational memory), with get_state/update_state and tunable durability modes; the Store interface adds cross-thread long-term memory with semantic search over embeddings. (persistence & memory docs)

  • CrewAI — a unified Memory class (it consolidated short-term, long-term, entity, and external memory into one), backed by LanceDB, with LLM-inferred scope/importance and adaptive recall combining semantic similarity, recency, and importance scores. (memory docs)

  • Google ADKSession/State for short-term context, plus a pluggable MemoryService: InMemoryMemoryService for prototyping or the managed Vertex AI Memory Bank (VertexAiMemoryBankService), which uses Gemini to extract durable memories; retrieval via PreloadMemoryTool/LoadMemoryTool. (sessions & memory docs)

  • AWS Strands — built-in conversation managers (SummarizingConversationManager, SlidingWindowConversationManager) for context-window control, and — uniquely — a native Amazon Bedrock AgentCore Memory integration via the AgentCoreMemorySessionManager, wiring short- and long-term AgentCore Memory through the hook system. (Strands AgentCore Memory · AWS docs)

  • Microsoft Agent Frameworkcontext providers supply agent memory alongside session-based state management. (overview)

  • LlamaIndex — strongest where memory is retrieval: chat memory buffers layered on its indexing/RAG core. (docs)

Best at context/memory generally: LangGraph (first-class state + cross-thread Store) and Google ADK (managed Vertex AI Memory Bank) lead for durable, searchable long-term memory; CrewAI's unified semantic memory is the easiest to adopt.

Native AgentCore Memory: AWS Strands is the only framework with a first-class AgentCore Memory integration (AgentCoreMemorySessionManager). Every other framework can use AgentCore Memory — it's a managed AWS service callable through the MemoryClient API, but without a built-in helper, so you wire it up yourself. If AgentCore Memory is central to your design, Strands is the path of least resistance.


Use-case fit

If you need…

Best fit

Why

Explicit, auditable control; HITL approval gates; compliance trails

LangGraph

Low-level graph with durable state, checkpointing, and named routing nodes you can test and log

Fast role-based multi-agent prototyping

CrewAI

Crews + Flows let you define agent personas and event-driven control without graph theory

One long-horizon task (deep research, large code changes)

deepagents (on LangGraph)

Todos, filesystem offload, sub-agents, and summarization for vertical depth

Fastest path on AWS / Bedrock

AWS Strands

AWS-native, native MCP, and one-step AgentCore/Lambda/Fargate deployment

GCP + Gemini multimodal, or a multi-framework estate

Google ADK

Gemini Live (bidi audio/video), A2A interop (150+ orgs), Vertex Agent Engine

.NET / Azure enterprise environments

Microsoft Agent Framework

First-class .NET + Python, Azure integration; AutoGen + Semantic Kernel successor

Broad integrations or RAG-heavy pipelines

LangChain / LlamaIndex

Largest connector and retrieval ecosystems


Per-framework assessment

LangGraph — the control-first choice and the adoption leader on production pull (56.1M downloads/month). Its low-level StateGraph, durable execution, and checkpointing make routing logic explicit, testable, and auditable; you pay for that in boilerplate and a steeper curve. Best where correctness, rollback, and human-in-the-loop matter more than speed of authoring. (docs)

CrewAI — the velocity choice (53.1K stars). Crews model teams of role-playing agents; Flows add event-driven state and control. It's standalone and token-conscious, and gets a working multi-agent system up quickly. The trade-off is less explicit low-level control than a graph gives you. (docs)

Google ADK — the most feature-complete on multimodality and interoperability (20.0K stars, 26.0M downloads/month, Apache-2.0). ADK 2.0's graph runtime closed much of the deterministic-control gap with LangGraph; Gemini Live and the A2A protocol are genuine differentiators. Strongest for GCP shops and cross-framework estates; leans toward Gemini/GCP. (docs)

AWS Strands — the AWS-native choice (6.1K stars but 14.8M downloads/month, reflecting AWS-backed production use). A model-driven loop with native MCP, rich multi-agent primitives (swarm, graph, agent-as-tool), OpenTelemetry, and deployment straight to AgentCore/Lambda/Fargate. The trade-off is less explicit routing transparency than a graph framework. (docs)

Microsoft Agent Framework — the Azure/.NET choice and the direct successor to AutoGen + Semantic Kernel (GA, Feb 2026). It pairs simple agent abstractions with graph-based workflows (type-safe routing, checkpointing, HITL), MCP, and first-class .NET + Python. Best where the estate is Azure-centric. (docs)


The frameworks are converging faster than their branding suggests. Five trends stand out:

  1. Explicit, deterministic graph control is becoming universal. Early frameworks were model-driven — the LLM decided the loop. For production reliability and auditability, everyone is now wrapping that autonomy in deterministic, graph-based orchestration. The clearest example is Google ADK: it began hierarchical and model-driven, then shipped Graph Workflows in 2.0 (2026-05-19) — a graph execution engine with routing, fan-out/fan-in, loops, retries, and explicit state, pitched as "weave deterministic code with adaptive AI reasoning" (ADK docs · ADK 2.0 release). That is a direct convergence on the StateGraph + conditional-edges model LangGraph pioneered. Microsoft Agent Framework did the same with graph Workflows (type-safe routing, checkpointing), and CrewAI added event-driven Flows beside its autonomous Crews. The field has effectively agreed that production agents need explicit control around autonomous reasoning — LangGraph's original thesis.

  2. MCP is the universal tool protocol. Every framework here speaks Model Context Protocol, so tools are increasingly portable across frameworks (ADK MCP tools · Strands MCP · Microsoft Agent Framework MCP).

  3. A2A is standardizing cross-framework interop150+ organizations, letting agents built in different frameworks call one another.

  4. Hosting is decoupling from the framework. Managed runtimes — AWS AgentCore, Vertex AI Agent Engine, and Azure — run any framework, so framework choice and deployment target are now independent decisions.

  5. Model-agnostic is the default. LiteLLM (or equivalent) is table stakes; single-provider lock-in is now the exception.

The practical consequence: frameworks are differentiating less on core orchestration capability — which is converging on the LangGraph-style explicit graph — and more on ecosystem, cloud-nativeness, and ergonomics. Increasingly you pick for your cloud and your team's style, not for a capability only one framework has.


Decision summary

  • Compliance / explicit control / AWS-agnostic → LangGraph (+ deepagents for long single tasks).

  • Rapid multi-agent prototyping → CrewAI.

  • AWS / Bedrock → AWS Strands.

  • GCP / Gemini multimodal / A2A interop → Google ADK.

  • Azure / .NET → Microsoft Agent Framework.

  • RAG / broad integrations → LangChain or LlamaIndex.

One framework-independent point: all of these run on managed agent runtimes such as AWS AgentCore, so the framework and the hosting are separate decisions — which is what Part 4 assembles into a production system.



References

Official documentation

AWS AgentCore integration

Context & memory documentation

Usage & development-activity data (primary)

bottom of page