Skip to content

Changelog

All notable changes to Flux are documented here.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.


[Unreleased]

Added

Native Trainer Contract (v0.2 Architecture)

  • TrainingBackend ABC (flux/training/base.py): Abstract base class for all training backends
  • GPU-direct assumptions: all tensors already on target device
  • Async-safe: train_step() can be called from asyncio event loop
  • Version tracking: increments after each successful train step
  • GPUBatch dataclass: Frozen, device-owned tensor batch for training
  • Required tensors: input_ids, attention_mask, behavior_log_probs, rewards, version_gaps
  • Optional tensors: loss_mask, token_rewards, ref_log_probs, values, advantages, returns
  • Validation and device transfer methods
  • TrainStepResult dataclass: Standardized return type with loss, metrics, and timing info
  • create_training_backend() factory: Creates backend from config enum

Training Backends

  • TransformersBackend (flux/training/backends/transformers.py): HuggingFace Transformers-based backend
  • Suitable for development, single-GPU, and multi-GPU with DDP
  • Supports Flash Attention 2, gradient checkpointing
  • PPO-style clipped surrogate loss implementation
  • Checkpoint save/load support
  • MegatronEngine refactoring: Now implements both TrainingBackend and legacy TrainingEngine interfaces
  • Dual interface support for backward compatibility
  • GPU-direct batch handling via GPUBatch
  • Importance weight computation on GPU

Mode Gate (Sync/Async State Machine)

  • AsyncMode enum: SYNC_BARRIER, ASYNC_RUNNING, THROTTLED
  • ModeGate class (flux/controller/mode_gate.py): State machine controlling sync/async transitions
  • Priority-based state transitions (capacity > staleness > buffer fill)
  • Hysteresis to prevent rapid oscillation
  • Barrier enforcement with timeout
  • ModeGateConfig dataclass: Configuration for thresholds and watermarks
  • ModeGateState dataclass: Current state with reason and metrics
  • ModeGateIntegration helper: Integration with staleness manager and trajectory buffer

Documentation

  • Initial documentation website with MkDocs Material
  • Comprehensive tutorials and how-to guides
  • API reference documentation
  • Updated architecture documentation with new components
  • Training backend and Mode Gate API documentation

Changed

  • Reorganized documentation structure for better navigation
  • Updated flux/training/__init__.py with new exports
  • Updated flux/controller/__init__.py with ModeGate exports

Fixed

  • Documentation links and cross-references

[0.1.0] - 2025-01-XX

Added

Core Features

  • Adaptive Async Controller: PID-based dynamic sync/async ratio adjustment
  • Staleness Measurement: KL divergence, importance weight variance, version gap tracking
  • Unified Importance Correction: Algorithm-agnostic off-policy correction
  • APRIL Strategy: Oversample, abort long-tail, reuse partial trajectories
  • Smart Batch Composer: Length bucketing, staleness balancing, curriculum learning

Algorithms

  • PPO (Proximal Policy Optimization)
  • GRPO (Group Relative Policy Optimization)
  • DPO (Direct Preference Optimization)
  • REINFORCE with configurable baselines
  • DAPO (Decoupled clip and dynamic sampling)
  • RLOO (Leave-One-Out baseline)
  • GSPO (Group Stability Policy Optimization)
  • Registry pattern for custom algorithms

Infrastructure

  • Megatron-LM integration for distributed training
  • SGLang HTTP client for inference
  • CUDA IPC weight synchronization
  • Delta compression for efficient weight transfer
  • Checkpoint management with best model tracking
  • Prometheus metrics export

Configuration

  • Pydantic-based hierarchical configuration
  • YAML configuration file support
  • Environment variable overrides
  • Configuration validation

CLI

  • flux train - Run training
  • flux test - Test configuration
  • flux generate - Generate samples
  • flux info - System information

Reward Functions

  • LengthReward
  • FormatReward
  • KeywordReward
  • CompositeReward
  • FunctionReward
  • RewardModel (neural)
  • LLMJudge

Infrastructure

  • Project scaffolding and CI/CD setup
  • pytest test suite with unit and integration tests
  • Type hints throughout codebase
  • Ruff linting and Black formatting

Version History

Version Date Highlights
0.1.0 2025-01 Initial release

Upgrade Guides

Upgrading to 0.1.0

This is the initial release. No upgrade path required.


Deprecation Policy

  • Minor versions (0.x.0): May include breaking changes during initial development (pre-1.0)
  • Patch versions (0.0.x): Bug fixes and documentation only
  • Major versions (x.0.0): May include breaking changes after 1.0

Deprecated features will be marked in documentation and emit warnings for at least one minor version before removal.


Release Schedule

We aim to release: - Patch releases: As needed for bug fixes - Minor releases: Monthly with new features - Major releases: When significant changes warrant


Contributing

See Contributing Guide for how to contribute to Flux.