MetaGPT Agent 动态 Action 机制详解

概述

本文档详细记录了如何在 MetaGPT 框架中实现一个具有动态 Action 创建能力的 Agent。通过本教程，你将深入理解 MetaGPT 的 React 机制（run → react → think → act）以及如何在运行时动态切换 Action 序列。

作业目标：

创建一个 Agent，初始化时拥有三个动作：Print1, Print2, Print3
顺序执行这三个动作
执行完毕后，动态生成新的动作：Print4, Print5, Print6
继续顺序执行新动作

学习重点：

MetaGPT 的 React 循环机制
状态管理（state, todo, actions）
动态 Action 创建与替换
Python 对象引用与多态

核心概念

1. MetaGPT 的 React 机制

┌─────────────────────────────────────┐
│            run(message)             │
│  - 接收消息并存入 memory            │
│  - 调用 react()                     │
└──────────────┬──────────────────────┘
               ↓
┌─────────────────────────────────────┐
│            react()                  │
│  - while True 循环                  │
│    - think() → 决定下一个动作       │
│    - 检查 todo 是否为 None          │
│    - act() → 执行当前动作           │
└──────────────┬──────────────────────┘
               ↓
┌──────────────┴──────────────────────┐
│                                     │
│  ┌─────────────┐  ┌──────────────┐ │
│  │   think()   │  │    act()     │ │
│  │ 分配下一个  │  │  执行当前    │ │
│  │   动作      │  │    动作      │ │
│  └─────────────┘  └──────────────┘ │
│                                     │
└─────────────────────────────────────┘

2. 关键属性

self.actions      # 动作列表 [Action1, Action2, ...]
self.states       # 状态列表 ['0. Action1', '1. Action2', ...]
self.rc.state     # 当前状态索引 (int)
self.rc.todo      # 当前要执行的动作 (Action 对象或 None)

3. 状态转换

# _set_state() 方法的作用
def _set_state(self, state: int):
    self.rc.state = state
    self.rc.todo = self.actions[state] if state >= 0 else None

状态值含义：

state = -1 → 无任务，todo = None
state = 0 → 执行第一个动作，todo = actions[0]
state = 1 → 执行第二个动作，todo = actions[1]
以此类推...

完整实现

Step 1: 定义 Action 类

from metagpt.actions import Action
from metagpt.logs import logger

class PrintAction(Action):
    """简单的打印动作
    
    Args:
        name: 动作名称
        content: 要打印的内容
    """
    
    name: str = "PrintAction"
    content: str = ""
    
    async def run(self, *args, **kwargs) -> str:
        """执行打印操作"""
        logger.info(f"执行 {self.name}: {self.content}")
        return self.content

关键点：

继承自 Action 基类
run() 方法必须是 async（因为可能调用 LLM）
返回执行结果（字符串）

Step 2: 定义 Agent 类框架

from metagpt.roles.role import Role, RoleReactMode
from metagpt.schema import Message

class SimpleAgent(Role):
    """简单的顺序执行 Agent
    
    Args:
        name: 角色名称
        profile: 角色描述
    """
    
    name: str = "SimpleAgent"
    profile: str = "Simple Sequential Agent"
    
    phase: int = 1  # 当前阶段（1 或 2）
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        
        # 初始化第一阶段的三个动作
        action1 = PrintAction(content="1")
        action2 = PrintAction(content="2")
        action3 = PrintAction(content="3")
        
        self._init_actions([action1, action2, action3])
        self._set_react_mode(react_mode=RoleReactMode.REACT.value)

关键点：

phase 属性用于追踪当前阶段
_init_actions() 将动作列表存入 self.actions
_set_react_mode() 设置为 REACT 模式（自定义 react 逻辑）

Step 3: 实现 `_think()` 方法

async def _think(self) -> None:
    """决定下一个要执行的动作"""
    logger.info(f"{self._setting}: thinking about the next action to take")
    
    # 情况 1: 刚开始或刚切换阶段，从第 0 个动作开始
    if self.rc.todo is None:
        self._set_state(0)
        return
    
    # 情况 2: 还有下一个动作，移动到下一个
    if self.rc.state + 1 < len(self.states):
        self._set_state(self.rc.state + 1)
    
    # 情况 3: 所有动作执行完毕，设为 None（停止信号）
    else:
        self.rc.todo = None

逻辑流程：

┌─────────────────────────┐
│  todo is None?          │
│  (刚开始或刚切换阶段)   │
└──────┬──────────────────┘
       │ Yes
       ↓
┌─────────────────────────┐
│  _set_state(0)          │
│  state=0, todo=actions[0]│
└─────────────────────────┘
       │ No
       ↓
┌─────────────────────────┐
│  state+1 < len(states)? │
│  (还有下一个动作吗)     │
└──────┬──────────────────┘
       │ Yes
       ↓
┌─────────────────────────┐
│  _set_state(state+1)    │
│  移动到下一个动作       │
└─────────────────────────┘
       │ No
       ↓
┌─────────────────────────┐
│  todo = None            │
│  (所有动作完成)         │
└─────────────────────────┘

Step 4: 实现 `_act()` 方法

async def _act(self) -> Message:
    """执行当前动作"""
    todo = self.rc.todo
    logger.info(f"{self._setting}: executing {todo.name}")
    
    # 执行当前动作
    result = await todo.run()
    
    # 检测阶段切换条件
    if self.phase == 1 and self.rc.state == 2:
        logger.info("========== 第一阶段完成，准备进入第二阶段 ==========")
        self._switch_to_phase_2()
    
    return Message(content=result, role=self.name)

关键点：

todo = self.rc.todo 获取当前动作对象
await todo.run() 执行动作（多态调用）
执行完后检测是否需要切换阶段
返回 Message 对象包装结果

为什么 todo.run() 会调用 PrintAction.run()？

# 对象引用追踪：
action1 = PrintAction(content="1")  # 创建 PrintAction 实例
  ↓
self.actions = [action1, ...]  # 存入列表（存的是引用）
  ↓
self.rc.todo = self.actions[0]  # todo 指向 action1
  ↓
todo = self.rc.todo  # todo 指向同一个 PrintAction 对象
  ↓
todo.run()  # Python 根据对象类型调用 PrintAction.run()

Step 5: 实现阶段切换方法

def _switch_to_phase_2(self) -> None:
    """切换到第二阶段"""
    logger.info("创建第二阶段的动作: Print4, Print5, Print6")
    
    # 更新阶段标志
    self.phase = 2
    
    # 创建新的动作
    action4 = PrintAction(content="4")
    action5 = PrintAction(content="5")
    action6 = PrintAction(content="6")
    
    # 替换动作列表
    self._init_actions([action4, action5, action6])
    
    # 重置 todo，让下次 _think() 重新开始
    self.rc.todo = None
    
    logger.info("第二阶段动作已准备就绪")

关键点：

为什么用 _init_actions() 而不是 append()？
- _init_actions() 会替换整个 self.actions 列表
- 这样第二阶段就只有 Print4, 5, 6，而不是 1-6 全部

为什么要设置 self.rc.todo = None？

让我们对比两种方案：

❌ 错误方案：使用 _set_state(0)

def _switch_to_phase_2(self):
    self._init_actions([...])
    self._set_state(0)  # state=0, todo=actions[0]
    
# 执行流程：
# 1. 返回 _react() 循环
# 2. 下次 _think() 被调用
# 3. if self.rc.todo is None: → False（todo 已经是 actions[0]）
# 4. if self.rc.state + 1 < len(self.states): → True (0+1 < 3)
# 5. self._set_state(1) → 跳过了 Print4，直接执行 Print5！

✓ 正确方案：使用 self.rc.todo = None

def _switch_to_phase_2(self):
    self._init_actions([...])
    self.rc.todo = None  # 重置为 None
    
# 执行流程：
# 1. 返回 _react() 循环
# 2. 下次 _think() 被调用
# 3. if self.rc.todo is None: → True
# 4. self._set_state(0) → state=0, todo=actions[0] (Print4)
# 5. 正确从 Print4 开始执行！

Step 6: 实现 `_react()` 方法

async def _react(self) -> Message:
    """循环执行 think 和 act"""
    msg = None
    
    while True:
        # 思考下一步
        await self._think()
        
        # 检查是否所有任务完成
        if self.rc.todo is None:
            break
        
        # 执行当前任务
        msg = await self._act()
    
    return msg

执行流程：

开始
  ↓
┌───────────────┐
│ msg = None    │
└───────┬───────┘
        ↓
    ┌───────────────┐
    │ while True:   │
    └───┬───────────┘
        ↓
    ┌───────────────┐
    │ _think()      │  ← 决定下一个动作
    └───┬───────────┘
        ↓
    ┌───────────────┐
    │ todo is None? │
    └───┬───────────┘
        │ Yes → break
        │ No
        ↓
    ┌───────────────┐
    │ msg = _act()  │  ← 执行动作
    └───┬───────────┘
        │
        └──→ 循环
        
返回 msg

Step 7: 测试代码

import asyncio

async def main():
    """测试函数"""
    logger.info("========== 开始测试 ==========")
    
    # 创建 Agent
    agent = SimpleAgent()
    
    # 运行 Agent
    result = await agent.run("开始执行")
    
    logger.info(f"========== 全部完成，最终结果: {result} ==========")

if __name__ == "__main__":
    asyncio.run(main())

执行流程详解

完整执行时间线

时刻 T1: 初始化
├─ actions = [Print1, Print2, Print3]
├─ states = ['0. PrintAction', '1. PrintAction', '2. PrintAction']
├─ state = -1
├─ todo = None
└─ phase = 1

─────────────────────────────────────────

时刻 T2: run("开始执行")
└─ 调用 react()

─────────────────────────────────────────

时刻 T3: 第 1 次循环
├─ _think()
│  └─ todo is None → _set_state(0)
│     ├─ state = 0
│     └─ todo = actions[0] (Print1)
├─ _act()
│  └─ 执行 PrintAction: 1

─────────────────────────────────────────

时刻 T4: 第 2 次循环
├─ _think()
│  └─ state+1 < 3 → _set_state(1)
│     ├─ state = 1
│     └─ todo = actions[1] (Print2)
├─ _act()
│  └─ 执行 PrintAction: 2

─────────────────────────────────────────

时刻 T5: 第 3 次循环
├─ _think()
│  └─ state+1 < 3 → _set_state(2)
│     ├─ state = 2
│     └─ todo = actions[2] (Print3)
├─ _act()
│  ├─ 执行 PrintAction: 3
│  └─ 检测到 phase==1 and state==2
│     └─ _switch_to_phase_2()
│        ├─ phase = 2
│        ├─ actions = [Print4, Print5, Print6]
│        └─ todo = None

─────────────────────────────────────────

时刻 T6: 第 4 次循环
├─ _think()
│  └─ todo is None → _set_state(0)
│     ├─ state = 0
│     └─ todo = actions[0] (Print4)
├─ _act()
│  └─ 执行 PrintAction: 4

─────────────────────────────────────────

时刻 T7: 第 5 次循环
├─ _think()
│  └─ state+1 < 3 → _set_state(1)
│     ├─ state = 1
│     └─ todo = actions[1] (Print5)
├─ _act()
│  └─ 执行 PrintAction: 5

─────────────────────────────────────────

时刻 T8: 第 6 次循环
├─ _think()
│  └─ state+1 < 3 → _set_state(2)
│     ├─ state = 2
│     └─ todo = actions[2] (Print6)
├─ _act()
│  └─ 执行 PrintAction: 6

─────────────────────────────────────────

时刻 T9: 第 7 次循环
├─ _think()
│  └─ state+1 < 3 → False
│     └─ todo = None
└─ todo is None → break

─────────────────────────────────────────

返回结果: SimpleAgent: 6

关键技术点

1. Python 对象引用

在 Python 中，变量是"标签"而不是"盒子"：

# 创建对象
action1 = PrintAction(content="1")

# 多个变量可以指向同一个对象
self.actions[0] = action1    # 指向同一对象
self.rc.todo = action1        # 指向同一对象
todo = action1                # 指向同一对象

# 验证（id 相同说明是同一对象）
id(self.actions[0]) == id(self.rc.todo) == id(todo)  # True

2. 多态（Polymorphism）

# _act() 方法不需要知道 todo 的具体类型
async def _act(self):
    todo = self.rc.todo  # 可能是任何 Action 子类
    result = await todo.run()  # Python 会自动找到对应类的 run() 方法
    
# 当 todo 是 PrintAction 时 → 调用 PrintAction.run()
# 当 todo 是 WriteDirectory 时 → 调用 WriteDirectory.run()
# 这就是"面向接口编程"

3. async/await 使用规则

# ✓ 需要 async（函数内有 await）
async def _act(self):
    result = await todo.run()  # ← 有 await
    return result

# ✓ 不需要 async（函数内无 await）
def _switch_to_phase_2(self):
    self.phase = 2  # ← 没有 await
    self._init_actions([...])

记忆法则：

函数内有 await → 必须用 async def
调用 async def 函数 → 必须用 await

4. 状态重置的重要性

# 在动态创建 actions 后，必须重置 todo
self._init_actions([新动作...])
self.rc.todo = None  # ← 关键：让 _think() 重新评估

# 为什么不用 _set_state(0)？
# 因为会导致 _think() 判断错误，跳过第一个动作

完整代码

from metagpt.actions import Action
from metagpt.logs import logger
from metagpt.roles.role import Role, RoleReactMode
from metagpt.schema import Message
import asyncio


class PrintAction(Action):
    """简单的打印动作"""
    name: str = "PrintAction"
    content: str = ""
    
    async def run(self, *args, **kwargs) -> str:
        logger.info(f"执行 {self.name}: {self.content}")
        return self.content


class SimpleAgent(Role):
    """简单的顺序执行 Agent"""
    name: str = "SimpleAgent"
    profile: str = "Simple Sequential Agent"
    phase: int = 1
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        action1 = PrintAction(content="1")
        action2 = PrintAction(content="2")
        action3 = PrintAction(content="3")
        self._init_actions([action1, action2, action3])
        self._set_react_mode(react_mode=RoleReactMode.REACT.value)
    
    async def _think(self) -> None:
        """决定下一个要执行的动作"""
        logger.info(f"{self._setting}: thinking about the next action to take")
        
        if self.rc.todo is None:
            self._set_state(0)
            return
        
        if self.rc.state + 1 < len(self.states):
            self._set_state(self.rc.state + 1)
        else:
            self.rc.todo = None
    
    async def _act(self) -> Message:
        """执行当前动作"""
        todo = self.rc.todo
        logger.info(f"{self._setting}: executing {todo.name}")
        result = await todo.run()
        
        if self.phase == 1 and self.rc.state == 2:
            logger.info("========== 第一阶段完成，准备进入第二阶段 ==========")
            self._switch_to_phase_2()
        
        return Message(content=result, role=self.name)
    
    def _switch_to_phase_2(self) -> None:
        """切换到第二阶段"""
        logger.info("创建第二阶段的动作: Print4, Print5, Print6")
        self.phase = 2
        action4 = PrintAction(content="4")
        action5 = PrintAction(content="5")
        action6 = PrintAction(content="6")
        self._init_actions([action4, action5, action6])
        self.rc.todo = None
        logger.info("第二阶段动作已准备就绪")
    
    async def _react(self) -> Message:
        """循环执行 think 和 act"""
        msg = None
        while True:
            await self._think()
            if self.rc.todo is None:
                break
            msg = await self._act()
        return msg


async def main():
    logger.info("========== 开始测试 ==========")
    agent = SimpleAgent()
    result = await agent.run("开始执行")
    logger.info(f"========== 全部完成，最终结果: {result} ==========")


if __name__ == "__main__":
    asyncio.run(main())

常见问题

Q1: 为什么要用 `_react()` 循环而不是递归？

答：

循环更高效（避免栈溢出）
更容易控制（可以随时 break）
符合 MetaGPT 的设计模式

Q2: 可以在 `init` 中就初始化所有 6 个动作吗？

答：
可以，但失去了动态性：

# 静态方案（不推荐）
self._init_actions([Print1, Print2, Print3, Print4, Print5, Print6])

# 动态方案（推荐）
# 初始只有 Print1-3，运行时根据条件创建 Print4-6

动态方案的优势：

更灵活（可以根据第一阶段的结果决定第二阶段）
模拟真实场景（如 TutorialAssistant 根据大纲动态创建章节）

Q3: `_set_state()` 和直接赋值 `self.rc.state = n` 有什么区别？

答：

# ✓ 推荐：使用方法
self._set_state(0)  # 同时更新 state 和 todo

# ❌ 不推荐：直接赋值
self.rc.state = 0  # 只更新 state，todo 还是旧的

_set_state() 确保 state 和 todo 同步更新。

Q4: 如何添加第三阶段？

答：
在 _act() 中添加新的检测条件：

async def _act(self) -> Message:
    todo = self.rc.todo
    result = await todo.run()
    
    # 第一阶段 → 第二阶段
    if self.phase == 1 and self.rc.state == 2:
        self._switch_to_phase_2()
    
    # 第二阶段 → 第三阶段
    elif self.phase == 2 and self.rc.state == 2:
        self._switch_to_phase_3()
    
    return Message(content=result, role=self.name)

def _switch_to_phase_3(self) -> None:
    self.phase = 3
    action7 = PrintAction(content="7")
    action8 = PrintAction(content="8")
    action9 = PrintAction(content="9")
    self._init_actions([action7, action8, action9])
    self.rc.todo = None

进阶扩展

扩展 1: 使用 LLM 决定下一阶段

async def _act(self) -> Message:
    todo = self.rc.todo
    result = await todo.run()
    
    if self.phase == 1 and self.rc.state == 2:
        # 询问 LLM 下一步要做什么
        prompt = "第一阶段完成了，请决定第二阶段要执行哪些动作，返回 JSON 格式"
        response = await self._aask(prompt)
        # 解析 LLM 返回的 JSON
        next_actions = self._parse_llm_response(response)
        self._switch_to_dynamic_phase(next_actions)
    
    return Message(content=result, role=self.name)

扩展 2: 条件分支

async def _act(self) -> Message:
    todo = self.rc.todo
    result = await todo.run()
    
    if self.phase == 1 and self.rc.state == 2:
        # 根据第一阶段的结果选择分支
        if int(result) % 2 == 0:
            self._switch_to_phase_2A()  # 偶数分支
        else:
            self._switch_to_phase_2B()  # 奇数分支
    
    return Message(content=result, role=self.name)

扩展 3: 循环执行

def _switch_to_phase_2(self) -> None:
    self.phase = 2
    
    # 创建更多动作
    actions = []
    for i in range(4, 10):  # Print4 到 Print9
        actions.append(PrintAction(content=str(i)))
    
    self._init_actions(actions)
    self.rc.todo = None

总结

通过本教程，你已经掌握了：

MetaGPT 的 React 循环机制
- run → react → think → act 的完整流程
- 状态管理（state, todo, actions）
动态 Action 创建
- 在运行时根据条件创建新 actions
- 使用 _init_actions() 替换动作列表
- 通过 self.rc.todo = None 重置状态
Python 核心概念
- 对象引用与多态
- async/await 正确使用
- 状态机设计模式
实践技巧
- 如何调试 Agent 执行流程
- 如何避免常见错误
- 如何扩展功能

核心要点：

_think() 负责决定下一个动作（状态转换）
_act() 负责执行动作并处理副作用（如阶段切换）
_react() 负责循环驱动整个流程
动态创建后必须重置 todo 为 None

MetaGPT Agent 动态 Action 机制详解

概述

核心概念

1. MetaGPT 的 React 机制

2. 关键属性

3. 状态转换

完整实现

Step 1: 定义 Action 类

Step 2: 定义 Agent 类框架

Step 3: 实现 `_think()` 方法

Step 4: 实现 `_act()` 方法

Step 5: 实现阶段切换方法

Step 6: 实现 `_react()` 方法

Step 7: 测试代码

执行流程详解

完整执行时间线

关键技术点

1. Python 对象引用

2. 多态（Polymorphism）

3. async/await 使用规则

4. 状态重置的重要性

完整代码

常见问题

Q1: 为什么要用 `_react()` 循环而不是递归？

Q2: 可以在 `init` 中就初始化所有 6 个动作吗？

Q3: `_set_state()` 和直接赋值 `self.rc.state = n` 有什么区别？

Q4: 如何添加第三阶段？

进阶扩展

扩展 1: 使用 LLM 决定下一阶段

扩展 2: 条件分支

扩展 3: 循环执行

总结

参考资源

添加新评论

最新文章

最近回复

分类

归档

其它

MetaGPT Agent 动态 Action 机制详解

概述

核心概念

1. MetaGPT 的 React 机制

2. 关键属性

3. 状态转换

完整实现

Step 1: 定义 Action 类

Step 2: 定义 Agent 类框架

Step 3: 实现 _think() 方法

Step 4: 实现 _act() 方法

Step 5: 实现阶段切换方法

Step 6: 实现 _react() 方法

Step 7: 测试代码

执行流程详解

完整执行时间线

关键技术点

1. Python 对象引用

2. 多态（Polymorphism）

3. async/await 使用规则

4. 状态重置的重要性

完整代码

常见问题

Q1: 为什么要用 _react() 循环而不是递归？

Q2: 可以在 __init__ 中就初始化所有 6 个动作吗？

Q3: _set_state() 和直接赋值 self.rc.state = n 有什么区别？

Q4: 如何添加第三阶段？

进阶扩展

扩展 1: 使用 LLM 决定下一阶段

扩展 2: 条件分支

扩展 3: 循环执行

总结

参考资源

添加新评论

最新文章

最近回复

分类

归档

其它

Step 3: 实现 `_think()` 方法

Step 4: 实现 `_act()` 方法

Step 6: 实现 `_react()` 方法

Q1: 为什么要用 `_react()` 循环而不是递归？

Q2: 可以在 `init` 中就初始化所有 6 个动作吗？

Q3: `_set_state()` 和直接赋值 `self.rc.state = n` 有什么区别？