Core Concepts¶
Understanding these concepts will help you get the most out of Flux and make informed decisions about your training configuration.
Overview¶
Flux is built around several key innovations that together enable adaptive, efficient RLHF training:
-
Adaptive Async Control
Dynamic sync/async ratio adjustment based on real-time staleness measurements
-
Staleness & Importance
Quantifying and correcting for off-policy data in asynchronous training
-
APRIL Strategy
Active Partial Rollout for efficient long-tail handling
-
Batch Composition
Smart batching for optimal padding and curriculum learning
-
Weight Synchronization
Efficient weight transfer between training and inference
-
Architecture
Three-layer architecture for maximum flexibility
The Big Picture¶
The False Dichotomy¶
Traditional RLHF frameworks force a binary choice:
| Approach | Pros | Cons |
|---|---|---|
| Synchronous (VERL) | Stable training, fresh data | GPU bubbles, low utilization |
| Asynchronous (AReaL) | High throughput, no waiting | Staleness issues, instability |
Flux treats this as a continuous spectrum, not a binary choice.
The Flux Approach¶
Sync ◄──────────────────────────────────────────────► Async
│ │
│ VERL AReaL │
│ ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████ │
│ │
│ ◄══════ Flux adapts here ══════► │
│ │
└──────────────────────────────────────────────┘
Key insight: The optimal configuration changes during training
- Early training: More sync (policy changing rapidly)
- Mid training: Balanced (stable, efficient)
- Late training: More async (fine-tuning, policy stable)
Concept Map¶
graph TB
subgraph Measurement["Staleness Measurement"]
KL[KL Divergence]
IW[Importance Weights]
VG[Version Gap]
end
subgraph Control["Adaptive Control"]
PID[PID Controller]
AR[Async Ratio]
end
subgraph Execution["Training Execution"]
BC[Batch Composer]
IC[Importance Correction]
WS[Weight Sync]
end
KL --> PID
IW --> PID
VG --> PID
PID --> AR
AR --> BC
AR --> WS
IW --> IC
IC --> Training
BC --> Training
WS --> Inference
Key Formulas¶
Staleness Score¶
Where:
- \(\text{KL}_{norm} = \min(1, \frac{D_{KL}(\pi_{behavior} \| \pi_{current})}{0.1})\)
- \(\text{IW}_{norm} = \min(1, \frac{\text{Var}(w)}{2.0})\)
- \(\text{version}_{norm} = \min(1, \frac{\text{version\_gap}}{5})\)
Importance Weight¶
PID Control¶
Where \(e = \text{target\_staleness} - \text{EMA(staleness)}\)
How They Work Together¶
Training Step Flow¶
sequenceDiagram
participant C as Coordinator
participant S as SGLang
participant SM as Staleness Monitor
participant AC as Adaptive Controller
participant BC as Batch Composer
participant T as Trainer
loop Every Step
C->>S: Request rollouts
S-->>C: Streaming trajectories
C->>SM: Compute staleness
SM-->>AC: staleness = 0.12
AC->>AC: PID update
AC-->>C: async_ratio = 0.45, should_sync = false
C->>BC: Compose batch
BC-->>T: Balanced batch (length, staleness)
T->>T: Importance-corrected update
T-->>S: Lazy weight sync
end
Sync Decision Logic¶
def should_sync(staleness, async_ratio, steps_since_sync):
# Sync if staleness exceeds threshold
if staleness > target_staleness + tolerance:
return True
# Sync if too many steps without sync
if steps_since_sync > max_steps_between_sync:
return True
# Sync if buffer capacity low
if buffer_capacity < min_capacity:
return True
return False
Configuration Impact¶
| Concept | Key Config | Effect |
|---|---|---|
| Adaptive Async | target_staleness |
Higher = more async, faster but riskier |
kp, ki, kd |
Controller responsiveness | |
| Staleness | staleness_decay |
How fast old data loses weight |
| APRIL | oversample_ratio |
How much to oversample rollouts |
batch_timeout |
When to abort long generations | |
| Batch Composer | length_bucket_boundaries |
Padding efficiency |
curriculum_enabled |
Easy-to-hard ordering | |
| Weight Sync | method |
"full", "delta", "snapshot" |
interval |
Steps between syncs |
Deep Dives¶
Ready to learn more? Explore each concept in detail:
- Adaptive Async Control - The PID controller and sync/async balance
- Staleness & Importance - Measuring and correcting for off-policy data
- APRIL Strategy - Efficient rollout generation
- Batch Composition - Smart batching strategies
- Weight Synchronization - Efficient weight transfer
- Architecture - System design and component interaction