Skip to content

Latest commit

 

History

History
58 lines (39 loc) · 1.86 KB

README.md

File metadata and controls

58 lines (39 loc) · 1.86 KB

Agent Gym

Agent Gym

Join our Discord Subscribe on YouTube Connect on LinkedIn Follow on X.com

Convert any model into a r1-like reasoning hyper-intelligent agent. Leverages TRL, Huggingface, and various other libraries. This is a work in progress. Our goal is to make it easy to train any model into a reasoning agent.

Installation

pip3 install -U agentgym

Usage

from agentgym.r1_pipeline import R1Pipeline, SFTConfig

r1_pipeline = R1Pipeline(
    sft_model="Qwen/Qwen2-0.5B-Instruct",
    tokenizer_name="Qwen/Qwen2-0.5B-Instruct",
    sft_dataset="trl-lib/tldr",
    sft_args=SFTConfig(output_dir="/tmp"),
    only_grpo=True,
    model_name="Qwen/Qwen2-0.5B-Instruct"
)

r1_pipeline.run()

Architecture

The architecture is as follows:

  • SFT: Supervised Fine-Tuning
  • GRPO: Generative Reinforcement Policy Optimization

-> model -> sft -> grpo -> model

graph TD;
    A[model] --> B[sft]
    B --> C[grpo]
    C --> D[reasoning model]
Loading

License

MIT