Flux¶
Adaptive Post-Training Framework for LLMs¶
The best of all worlds — Synchronous stability + Asynchronous efficiency + Native simplicity
Why Flux?¶
Existing RLHF frameworks force you to choose between stability (synchronous training) and efficiency (asynchronous training). Flux breaks this false dichotomy with adaptive async control that dynamically adjusts based on training dynamics.
-
Adaptive Async
Dynamically adjusts sync/async ratio based on measured staleness. Get 85% GPU utilization with synchronous-level stability.
-
Native Performance
Direct Megatron-LM + SGLang integration without Ray overhead. Maximum performance with minimal abstraction.
-
Algorithm Agnostic
Support for PPO, GRPO, DPO, REINFORCE, DAPO, RLOO, and easy extensibility for custom algorithms.
-
Simple & Extensible
Less than 5,000 lines of core code. Easy to understand, debug, and extend for your research needs.
The Spectrum, Not a Binary Choice¶
Sync ◄────────────────────────────────────────────────────► Async
VERL ████████████░░░░░░░░░░░░░░░░░░ Stable but slow
AReaL ░░░░░░░░░░░░░░░░░░████████████ Fast but risky
Flux ◄═══════ adapts here ═══════► Best of both
Flux treats the sync/async ratio as a continuous control variable, not a binary choice. A PID controller maintains your target staleness level, automatically adjusting based on real-time training dynamics.
Quick Comparison¶
| Aspect | VERL | AReaL | Slime | Flux |
|---|---|---|---|---|
| Sync Strategy | Fixed sync | Fixed async | Both modes | Adaptive |
| Orchestration | Ray | Custom | HTTP | asyncio |
| Training Backend | Megatron/FSDP | Custom | Megatron | Megatron |
| Inference Backend | vLLM/SGLang | Custom | SGLang | SGLang |
| Weight Sync | Ray Object Store | Custom | CUDA IPC | CUDA IPC |
| Staleness Handling | N/A | Staleness-aware | APRIL | Unified |
| Code Complexity | ~15k LOC | ~25k LOC | ~8k LOC | <5k LOC |
Quick Start¶
Installation¶
pip install flux-rlhf
# Or from source
git clone https://github.com/flux-team/flux.git
cd flux && pip install -e ".[dev]"
Basic Training¶
from flux import FluxConfig, FluxTrainer
config = FluxConfig(
model_path="Qwen/Qwen3-8B",
adaptive_async={
"target_staleness": 0.15,
"min_async_ratio": 0.1,
"max_async_ratio": 0.9,
},
algorithm="grpo",
)
trainer = FluxTrainer(config)
trainer.fit(prompts="data/prompts.jsonl")
Supported Algorithms¶
| Algorithm | Type | Best For |
|---|---|---|
| PPO | On-policy | Stable general training |
| GRPO | On-policy | Multi-sample efficiency |
| DPO | Preference | Direct preference learning |
| REINFORCE | On-policy | Simple baselines |
| DAPO | On-policy | High-variance rewards |
| RLOO | On-policy | Variance reduction |
Architecture¶
graph TB
subgraph Control["Adaptive Control Plane"]
AC[Adaptive Async Controller]
BC[Smart Batch Composer]
SM[Staleness Monitor]
end
subgraph Coordinator["Lightweight Coordinator"]
CO[FluxCoordinator]
WS[Weight Sync Manager]
end
subgraph Engines["Native Execution Engines"]
ME[Megatron Engine]
SG[SGLang Server]
end
AC --> CO
BC --> CO
SM --> AC
CO --> ME
CO --> SG
WS --> ME
WS --> SG
ME <-->|CUDA IPC| SG
Performance Targets¶
| Metric | Target | Description |
|---|---|---|
| GPU Utilization | > 80% | Measured via nvidia-smi |
| Throughput | 2x VERL | Samples per hour |
| Staleness | Mean < 0.2 | Combined staleness metric |
| Scaling | > 85% at 64 GPUs | Linear scaling efficiency |
Community¶
-
GitHub
Star the repo, report issues, and contribute code.
-
Discord
Join our community for discussions and support.
-
Documentation
Comprehensive guides, tutorials, and API reference.
Citation¶
If you use Flux in your research, please cite:
@software{flux2025,
title = {Flux: An Adaptive Post-Training Framework for LLMs},
year = {2025},
url = {https://github.com/flux-team/flux}
}
Flux: Where stability meets efficiency