Smart Batch Composition¶
Flux uses smart batching to optimize training efficiency.
Strategies¶
1. Length Bucketing¶
Groups similar-length sequences to minimize padding.
2. Staleness Balancing¶
Stratified sampling ensures balanced staleness distribution.
3. Curriculum Learning¶
Progressive difficulty ordering (easy → hard).