Layers#
- class streamer.models.networks.CNNEncoder(feature_dim)[source]#
A 4-layer CNN Encoder model used encode an image into a feature vector
- Parameters:
feature_dim (int) – the output feature dimension
- class streamer.models.networks.CNNDecoder(feature_dim)[source]#
A 4-layer CNN Decoder model used decode a feature vector back to an image
- Parameters:
feature_dim (int) – the input feature dimension
- class streamer.models.networks.TemporalEncoding(feature_dim, buffer_size, lr, n_heads=4, num_layers=2, encoder=None, patch=False)[source]#
The temporal encoding module receives as an input a sequence of feature vectors [S, feature_dim] or a sequence of images [S, 3, H, W] and returns the a single summary feature vector [1, feature_dim].
If the input is a sequence of images, it instantiates a user-defined encoder class to encode the images into a series of feature vectors
- Parameters:
feature_dim (int) – the input feature dimension
buffer_size (int) – the maximum buffer size to create positional encoding
lr (float) – the learning rate of this module
n_heads (int) – the number of heads for the attention layer
num_layers (int) – the number of transformer encoder layers
encoder (torch.nn.Module) – the encoder class to be used. Default: None
patch (bool) – patch the transformer model to retain attention information. Default: False
- get_params()[source]#
Function to extract the parameters of this module
- Returns:
(List(torch.tensor)): List of parameters
- step_params()[source]#
Applies gradient step on the parameters of this module. Called by
StreamerOptimizer
.
- zero_params()[source]#
Zeros out the gradients of the parameters of this module. Called by
StreamerOptimizer
.
- forward(x)[source]#
Forward propagation function that receives a sequence of inputs and returns a single feature vector summarizing the sequence.
- Parameters:
x (torch.Tensor) – an input a sequence of feature vectors [S, feature_dim] or a sequence of images [S, 3, H, W]
- Returns:
(torch.tensor): The output feature vector representation of the input sequence
(torch.tensor or None): The output attention values of the encoder. Return None if encoder not defined or if using CNNEncoder
- class streamer.models.networks.HierarchicalPrediction(feature_dim, max_layers, lr, layer_num, n_heads=4, num_layers=2, decoder=None, patch=False)[source]#
The hierarchical prediction modules receives as an input a sequence of feature vectors [S, feature_dim] where S is the number of layers and returns the a single prediction feature vector [1, feature_dim] or [1, 3, H, W] if a decoder class is provided.
- Parameters:
feature_dim (int) – the input feature dimension
max_layers (int) – the maximum number of layers
lr (float) – the learning rate of this module
layer_num (int) – the layer number where this class is instantiated
n_heads (int) – the number of heads for the attention layer
num_layers (int) – the number of transformer encoder layers
decoder (torch.nn.Module) – the decoder class to be used. Default: None
patch (bool) – patch the transformer model to retain attention information. Default: False
- get_params()[source]#
Function to extract the parameters of this module
- Returns:
(List(torch.tensor)): List of parameters
- step_params()[source]#
Applies gradient step on the parameters of this module. Called by
StreamerOptimizer
.
- zero_params()[source]#
Zeros out the gradients of the parameters of this module. Called by
StreamerOptimizer
.
- forward(x)[source]#
Forward propagation function that receives a sequence of inputs and returns a single feature vector (or a single decoded image) predicting the next input.
- Parameters:
x (torch.Tensor) – an input a sequence of feature vectors [S, feature_dim]
- Returns:
(torch.tensor): The output prediction feature vector [1, feature_dim] or decoded image [1, 3, H, W]
- class streamer.models.layer.StreamerLayerArguments(max_layers: int, feature_dim: int, evolve_every: int, buffer_size: int, loss_threshold: float, lr: float, reps_fn: function, snippet_size: float, demarcation_mode: str = 'average', distance_mode: str = 'similarity', force_base_dist: bool = False, window_size: int = 50, modifier_type: str = 'multiply', modifier: float = 1.0, force_fixed_buffer: bool = False)[source]#
- max_layers: int#
The maximum number of layers to stack
- feature_dim: int#
Feature dimension of the model embeddings
- evolve_every: int#
Create/stack a new layer every ‘evolve_every’
- buffer_size: int#
Maximum input buffer size to be used
- loss_threshold: float#
Loss threshold value. Not used in average demarcation mode
- lr: float#
Learning rate to be used in all modules
- reps_fn: function#
Function to aggregate representations from all layers
- snippet_size: float#
Snippet size of input video (seconds/image). Typically 0.5 seconds per image
- demarcation_mode: str = 'average'#
Demarcation mode used to detect boundaries
- distance_mode: str = 'similarity'#
Distance mode for loss calculation
- force_base_dist: bool = False#
Force the lowest layer to use MSE instead of Cosine Similarity
- window_size: int = 50#
Window size for average demarcation mode
- modifier_type: str = 'multiply'#
Modifier type to apply to average demarcation mode [‘multiply’, ‘add’]
- modifier: float = 1.0#
Modifier to apply to average demarcation mode
- force_fixed_buffer: bool = False#
Force the buffer to be fixed (not replacing inputs) by triggering a boundary when buffer is full
- class streamer.models.layer.StreamerLayer(args, layer_num, init_count, encoder=None, decoder=None, logger=None)[source]#
STREAMER layer implementation.
- This layer can:
Create/stack other
StreamerLayer
layers recursively for a maximum ofmax_layers
Call the other
StreamerLayer
layers by propagating current representationCalculate and store the loss for the
StreamerOptimizer
to use it
- Parameters:
args (StreamerLayerArguments) – arguments provided to every streamer layer
layer_num (int) – the index of the current layer in the layers stack
init_count (int) – used to create more layers at initialization. Useful for Inference model using pretrained weights.
encoder (torch.nn.Module) – Encoder class to be used at this layer. Passed later to the
TemporalEncoding
moduledecoder (torch.nn.Module) – Decoder class to be used at this layer. Passed later to the
HierarchicalPrediction
modulelogger (Logger) – Logger to be used for tensorboard.
- reset_layer()[source]#
Reset function to be used at the beginning of a new video. Recursively applied to every layer.
- optimize_layer()[source]#
Optimization step function. Steps then zeros the gradients. Calls the step_params() and zero_grad() functions of every module. (e.g.,
step_params()
) Recursively applied to every layer.
- get_num_layers(num)[source]#
Recursive function to get the total number of layers
- Parameters:
num (int) – current number of layers at the previous layer
- Returns:
(int): Previous num of layers + 1
- create_parent(create)[source]#
Function to create/stack another
StreamerLayer
layer- Parameters:
create (bool) – only add another layer if create is True
- predict()[source]#
Prediction function that calls the
TemporalEncoding
andHierarchicalPrediction
modules
- forward(x, base_counter)[source]#
Forward propagation function for a layer. Recursively calls the layer above at event boundary determined by the
EventDemarcation
module.- Parameters:
x (torch.Tensor) – the input feature vector [1, feature_dim] or image [1, 3, H, W]
base_counter (int) – the location of this input in the video for timescale caluation in the
StreamerOptimizer