Optimizer
The optimizer module contains utility functions for automatically building optimizers as well as a implementation for the new Muon optimizer.
build_optimizers(model_params, config)
Build a list of optimizers based on the model parameters and configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_params
|
List[Parameter]
|
List of model parameters to optimize. |
required |
config
|
Mapping
|
Configuration dictionary containing optimizer settings. |
required |
Returns:
| Type | Description |
|---|---|
list[Optimizer]
|
List[optim.Optimizer]: List of optimizers. |
Source code in bfm/training/optimizer/builders.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
build_schedulers(optimizers, config, training_setup)
Build learning rate schedulers for the given optimizers. Both warmup and falloff schedules are supported (both optional).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizers
|
Optimizer
|
List of optimizers to build schedulers for. |
required |
config
|
Mapping
|
Configuration dictionary containing scheduler settings. |
required |
training_setup
|
Training setup object containing dataloaders. |
required |
Returns:
| Type | Description |
|---|---|
list[_LRScheduler]
|
List[_LRScheduler]: List of learning rate schedulers. |
Source code in bfm/training/optimizer/builders.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
Muon is like the newest and coolest optimizer that works better than Adam.
Muon
Bases: Optimizer
Muon is a new optimizer that works better than Adam.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params
|
Parameter
|
Parameters to optimize. |
required |
lr
|
float
|
Learning rate. |
0.02
|
momentum
|
float
|
Momentum factor. |
0.95
|
nesterov
|
bool
|
Whether to use Nesterov momentum. |
True
|
weight_decay
|
float
|
Weight decay (L2 penalty). |
0.0
|
backend
|
str
|
Backend to use for optimization. |
'newtonschulz5'
|
backend_steps
|
int
|
Number of steps for backend optimization. |
5
|
Source code in bfm/training/optimizer/muon.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
orthogonalize(G)
Orthogonalize G using the Newton-Schulz zeroth-power iteration (iterative inverse square root method).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Tensor
|
Input tensor to orthogonalize. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
torch.Tensor: Orthogonalized tensor, ≈ G (GᵀG)^(-1/2). |
Source code in bfm/training/optimizer/muon.py
26 27 28 29 30 31 32 33 34 35 36 37 | |