Adaptive Async Control¶

The core innovation in Flux is adaptive async control - dynamically adjusting the balance between synchronous and asynchronous training based on real-time measurements.

The Problem¶

Traditional RLHF frameworks force a binary choice:

Synchronous Training (VERL-style)¶

┌─────────────────────────────────────────────────────────┐
│                   Synchronous Training                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Step 1:  [Generate][Generate][Generate][   WAIT   ]   │
│  Step 2:  [  Train  ][  Train  ][  Train  ]            │
│  Step 3:  [Generate][Generate][Generate][   WAIT   ]   │
│                                                         │
│  GPU Utilization: ~45%  (lots of waiting)              │
│  Staleness: 0 (perfect)                                 │
│  Stability: ★★★★★                                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

Pros: Fresh data, stable training Cons: GPU bubbles, low throughput

Asynchronous Training (AReaL-style)¶

┌─────────────────────────────────────────────────────────┐
│                   Asynchronous Training                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Gen:     [Gen][Gen][Gen][Gen][Gen][Gen][Gen][Gen]     │
│  Train:   [Train][Train][Train][Train][Train][Train]   │
│                                                         │
│  GPU Utilization: ~95%  (always busy)                  │
│  Staleness: High (old data)                             │
│  Stability: ★★★☆☆                                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

Pros: High GPU utilization, fast throughput Cons: Training on stale data, potential instability

The Flux Solution¶

Flux treats sync/async as a continuous spectrum controlled by a PID controller:

┌─────────────────────────────────────────────────────────┐
│                   Adaptive Training                      │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Early:   [Gen][Gen][ Wait ][Train][Gen][ Wait ]       │
│  Mid:     [Gen][Gen][Gen][Train][Train][Gen]           │
│  Late:    [Gen][Gen][Gen][Gen][Train][Gen][Gen]        │
│                                                         │
│  GPU Utilization: ~85%  (adaptive)                     │
│  Staleness: Controlled (~0.15)                          │
│  Stability: ★★★★☆                                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

How It Works¶

The Control Loop¶

graph LR
    A[Measure Staleness] --> B[Compute Error]
    B --> C[PID Controller]
    C --> D[Adjust Async Ratio]
    D --> E[Execute Training]
    E --> A

Key Variables¶

Variable	Range	Description
`staleness`	[0, 1]	Current staleness measurement
`target_staleness`	[0, 1]	Desired staleness level
`async_ratio`	[0.1, 0.9]	Current sync/async balance
`error`	[-1, 1]	`target - current` staleness

PID Controller¶

class AdaptiveAsyncController:
    def __init__(self, config):
        self.target = config.target_staleness
        self.kp = config.kp  # Proportional gain
        self.ki = config.ki  # Integral gain
        self.kd = config.kd  # Derivative gain

        self.async_ratio = 0.5
        self.integral = 0.0
        self.prev_error = 0.0

    def update(self, staleness: float) -> float:
        # Compute error
        error = self.target - staleness

        # PID terms
        self.integral += error
        derivative = error - self.prev_error
        self.prev_error = error

        # PID output
        adjustment = (
            self.kp * error +
            self.ki * self.integral +
            self.kd * derivative
        )

        # Update and clip
        self.async_ratio = clip(
            self.async_ratio + adjustment,
            min=0.1,  # Never fully sync
            max=0.9   # Never fully async
        )

        return self.async_ratio

Async Ratio Effects¶

The async_ratio controls several subsystems:

1. Sync Policy¶

# When to trigger a sync barrier
should_sync = (
    staleness > target + tolerance or
    steps_since_sync > max_steps or
    buffer_capacity <= 0
)

Async Ratio	Sync Frequency	Behavior
0.1 (sync)	Every ~2 steps	Wait for fresh data
0.5 (balanced)	Every ~10 steps	Mixed fresh/stale
0.9 (async)	Every ~50 steps	Rarely wait

2. Buffer Capacity¶

# How many old trajectories to keep
max_buffer = (max_version_gap + 1) * batch_size

Higher async ratio → larger buffer → more old data allowed.

3. Batch Composition¶

# How to mix fresh and stale data
fresh_ratio = 1.0 - async_ratio
stale_ratio = async_ratio

# Stratified sampling
batch = sample_stratified(
    fresh_trajectories, fresh_ratio,
    stale_trajectories, stale_ratio
)

Configuration Guide¶

Basic Configuration¶

adaptive_async:
  target_staleness: 0.15  # Sweet spot for most tasks
  min_async_ratio: 0.1    # Never fully sync
  max_async_ratio: 0.9    # Never fully async
  kp: 0.1                 # Proportional gain
  ki: 0.01                # Integral gain
  kd: 0.05                # Derivative gain

Tuning Guidelines¶

Scenario	Recommended `target_staleness`	Notes
Stable training (default)	0.15	Good balance
Maximum throughput	0.3-0.4	Risk of instability
Critical stability	0.05-0.1	Slower but safer
Early training	0.1	Policy changing fast
Late fine-tuning	0.25	Policy stable

PID Tuning¶

Gain	Too Low	Too High
`kp`	Slow response	Oscillation
`ki`	Steady-state error	Overshoot
`kd`	Overshoot	Noise sensitivity

Default values (kp=0.1, ki=0.01, kd=0.05) work well for most cases.

Monitoring¶

Key Metrics¶

# Log these during training
metrics = {
    "staleness/current": controller.staleness_ema,
    "staleness/target": controller.target_staleness,
    "async/ratio": controller.async_ratio,
    "async/error": controller.prev_error,
    "sync/count": controller.sync_count,
}

Healthy Training Signs¶

staleness oscillates around target_staleness
async_ratio adjusts smoothly (not jumping)
GPU utilization > 70%
Loss decreasing steadily

Warning Signs¶

staleness consistently above target → increase sync
async_ratio stuck at min/max → adjust target
High loss variance → reduce target_staleness

Advanced: Phase-Aware Control¶

For advanced users, Flux supports phase-aware control:

config = FluxConfig(
    adaptive_async={
        "target_staleness": 0.15,
        "phase_schedule": {
            "warmup": {"target": 0.05, "steps": 100},
            "main": {"target": 0.15},
            "cooldown": {"target": 0.25, "steps": 50},
        }
    }
)

This automatically adjusts the target staleness based on training progress.

Comparison¶

Framework	Approach	Staleness Control
VERL	Fixed sync	None (always 0)
AReaL	Fixed async	Staleness-aware PPO
Slime	Manual switch	APRIL strategy
Flux	Adaptive	PID controller

Next Steps¶

Staleness & Importance - How staleness is measured
APRIL Strategy - Efficient rollout generation
Configuration Guide - Full configuration reference