Flux¶

Adaptive Post-Training Framework for LLMs¶

The best of all worlds — Synchronous stability + Asynchronous efficiency + Native simplicity

Why Flux?¶

Existing RLHF frameworks force you to choose between stability (synchronous training) and efficiency (asynchronous training). Flux breaks this false dichotomy with adaptive async control that dynamically adjusts based on training dynamics.

Adaptive Async

Dynamically adjusts sync/async ratio based on measured staleness. Get 85% GPU utilization with synchronous-level stability.

Learn more
Native Performance

Direct Megatron-LM + SGLang integration without Ray overhead. Maximum performance with minimal abstraction.

Architecture
Algorithm Agnostic

Support for PPO, GRPO, DPO, REINFORCE, DAPO, RLOO, and easy extensibility for custom algorithms.

Algorithms
Simple & Extensible

Less than 5,000 lines of core code. Easy to understand, debug, and extend for your research needs.

Contributing

The Spectrum, Not a Binary Choice¶

Sync ◄────────────────────────────────────────────────────► Async

     VERL        ████████████░░░░░░░░░░░░░░░░░░  Stable but slow
     AReaL       ░░░░░░░░░░░░░░░░░░████████████  Fast but risky
     Flux        ◄═══════ adapts here ═══════►  Best of both

Flux treats the sync/async ratio as a continuous control variable, not a binary choice. A PID controller maintains your target staleness level, automatically adjusting based on real-time training dynamics.

Quick Comparison¶

Aspect	VERL	AReaL	Slime	Flux
Sync Strategy	Fixed sync	Fixed async	Both modes	Adaptive
Orchestration	Ray	Custom	HTTP	asyncio
Training Backend	Megatron/FSDP	Custom	Megatron	Megatron
Inference Backend	vLLM/SGLang	Custom	SGLang	SGLang
Weight Sync	Ray Object Store	Custom	CUDA IPC	CUDA IPC
Staleness Handling	N/A	Staleness-aware	APRIL	Unified
Code Complexity	~15k LOC	~25k LOC	~8k LOC	<5k LOC

Quick Start¶

Installation¶

pip install flux-rlhf

# Or from source
git clone https://github.com/flux-team/flux.git
cd flux && pip install -e ".[dev]"

Basic Training¶

from flux import FluxConfig, FluxTrainer

config = FluxConfig(
    model_path="Qwen/Qwen3-8B",
    adaptive_async={
        "target_staleness": 0.15,
        "min_async_ratio": 0.1,
        "max_async_ratio": 0.9,
    },
    algorithm="grpo",
)

trainer = FluxTrainer(config)
trainer.fit(prompts="data/prompts.jsonl")

Full Getting Started Guide

Supported Algorithms¶

Algorithm	Type	Best For
PPO	On-policy	Stable general training
GRPO	On-policy	Multi-sample efficiency
DPO	Preference	Direct preference learning
REINFORCE	On-policy	Simple baselines
DAPO	On-policy	High-variance rewards
RLOO	On-policy	Variance reduction

All Algorithms

Architecture¶

graph TB
    subgraph Control["Adaptive Control Plane"]
        AC[Adaptive Async Controller]
        BC[Smart Batch Composer]
        SM[Staleness Monitor]
    end

    subgraph Coordinator["Lightweight Coordinator"]
        CO[FluxCoordinator]
        WS[Weight Sync Manager]
    end

    subgraph Engines["Native Execution Engines"]
        ME[Megatron Engine]
        SG[SGLang Server]
    end

    AC --> CO
    BC --> CO
    SM --> AC
    CO --> ME
    CO --> SG
    WS --> ME
    WS --> SG
    ME <-->|CUDA IPC| SG

Architecture Deep Dive

Performance Targets¶

Metric	Target	Description
GPU Utilization	> 80%	Measured via nvidia-smi
Throughput	2x VERL	Samples per hour
Staleness	Mean < 0.2	Combined staleness metric
Scaling	> 85% at 64 GPUs	Linear scaling efficiency

Community¶

GitHub

Star the repo, report issues, and contribute code.

flux-team/flux
Discord

Join our community for discussions and support.

Join Discord
Documentation

Comprehensive guides, tutorials, and API reference.

Read the Docs

Citation¶

If you use Flux in your research, please cite:

@software{flux2025,
  title  = {Flux: An Adaptive Post-Training Framework for LLMs},
  year   = {2025},
  url    = {https://github.com/flux-team/flux}
}

Flux: Where stability meets efficiency

Apache 2.0 License · Release Notes · Contributing