Tutorials¶
Hands-on tutorials to help you master Flux for various RLHF training scenarios.
Learning Path¶
graph LR
A[Basic RLHF] --> B[Custom Rewards]
B --> C[Multi-GPU]
C --> D[DPO Training]
D --> E[Adaptive Async]
E --> F[Production]
Beginner Tutorials¶
-
Basic RLHF Training
Complete walkthrough of training an LLM with RLHF using Flux.
Time: 30 minutes Prerequisites: Flux installed
-
Custom Reward Functions
Learn to create custom reward functions for your specific task.
Time: 20 minutes Prerequisites: Basic RLHF tutorial
Intermediate Tutorials¶
-
Multi-GPU Training
Scale your training across multiple GPUs on a single node.
Time: 45 minutes Prerequisites: Basic training working
-
Fine-tuning with DPO
Use Direct Preference Optimization for preference learning.
Time: 30 minutes Prerequisites: Preference data available
Advanced Tutorials¶
-
Adaptive Async in Practice
Deep dive into configuring and monitoring adaptive async control.
Time: 60 minutes Prerequisites: Multi-GPU training
-
Production Deployment
Deploy Flux training at scale with monitoring and fault tolerance.
Time: 90 minutes Prerequisites: All previous tutorials
Quick Reference¶
| Tutorial | Difficulty | Time | Key Topics |
|---|---|---|---|
| Basic RLHF | Beginner | 30 min | FluxTrainer, GRPO, basic config |
| Custom Rewards | Beginner | 20 min | RewardFunction, FunctionReward |
| Multi-GPU | Intermediate | 45 min | TP, DP, distributed training |
| DPO Training | Intermediate | 30 min | DPO algorithm, preference data |
| Adaptive Async | Advanced | 60 min | PID tuning, staleness monitoring |
| Production | Advanced | 90 min | Monitoring, checkpoints, scaling |
Tutorial Format¶
Each tutorial follows a consistent structure:
- Overview - What you'll learn
- Prerequisites - What you need before starting
- Setup - Environment and data preparation
- Step-by-Step - Detailed instructions
- Verification - How to know it worked
- Troubleshooting - Common issues and solutions
- Next Steps - Where to go from here
Sample Projects¶
Complete example projects you can clone and run:
Math Reasoning (GSM8K)¶
Train a model for mathematical reasoning using GRPO.
Code Generation (HumanEval)¶
Train a model for code generation with custom evaluation.
Chat Assistant (UltraChat)¶
Build a general-purpose chat assistant.
Video Tutorials¶
Coming soon! Subscribe to our YouTube channel for video walkthroughs.
Community Tutorials¶
Have you written a tutorial about Flux? Submit it here and we'll feature it!
Getting Help¶
Stuck on a tutorial? Here's how to get help:
- Check the FAQ
- Search GitHub Issues
- Ask on Discord
- Open a new issue with the
tutorial-helplabel