Optimizer#
from streamer.optimizers.streamer_optimizer import StreamerOptimizerArguments, StreamerOptimizer
optim_args = StreamerOptimizerArguments(world_size=8,
                                        alpha=3,
                                        max_layers=3,
                                        optimize_every=100,
                                        average_every=100,
                                        hgn_timescale=True,
                                        hgn_reach=True,
                                        )
optimizer = StreamerOptimizer(optim_args)
- class streamer.optimizers.streamer_optimizer.StreamerOptimizerArguments(world_size: int = 1, alpha: int = 3, max_layers: int = 3, optimize_every: int = 100, average_every: int = 1000, hgn_timescale: bool = True, hgn_reach: bool = True)[source]#
- world_size: int = 1#
- Number of gpus to distribute the dataset 
 - alpha: int = 3#
- The reach parameter for Hierarchical Gradient Normalization 
 - max_layers: int = 3#
- The maximum number of layers to stack 
 - optimize_every: int = 100#
- Take a gradient step every this value 
 - average_every: int = 1000#
- Average models across gpus every this value 
 - hgn_timescale: bool = True#
- Allow timescale parameter in Hierarchical Gradient Normalization 
 - hgn_reach: bool = True#
- Allow reach parameter in Hierarchical Gradient Normalization 
 
- class streamer.optimizers.streamer_optimizer.StreamerOptimizer(model, args: StreamerOptimizerArguments)[source]#
- The optimizer used with streamer. This class takes care of optimization, Gradient normalization and averaging across gpus. - Parameters:
- args (StreamerOptimizerArguments) – The parameters used for the Streamer optimizer 
 - get_param_groups(layer_num)[source]#
- Calculates the parameter groups and their weights. For example if the layer_num is 1 and we have 4 layers, then parameter groups will be [[1],[0,2],[3]] and their weights will depend on alpha but typically more on the early groups (e.g., [0.8, 0.15, 0.05]) - Parameters:
- layer_num (int) – the index of the layer 
- Returns:
- (List(List(int))): List of Lists dividing the layers into groups to assign different gradient multipliers to them 
- (List(float)): The weights assigned to the parameter groups 
 
 
 - get_gradients()[source]#
- accumulates gradient on all layers’ parameters from all losses in the model 
 - scale_gradients()[source]#
- Scales the gradients of all modules by the counters to normalize the gradients 
 - average_models()[source]#
- Average the model parameters across all gpus every - average_every()
 - step()[source]#
- Call the optimizer, which calulates the gradients and accumulates it on the parameters. This function does not actually do gradient stepping. It has a counter that does it every - optimize_every()