Layers#

class streamer.models.networks.CNNEncoder(feature_dim)[source]#

A 4-layer CNN Encoder model used encode an image into a feature vector

Parameters:: feature_dim (int) – the output feature dimension

forward(x)[source]#

The forward propagation function that takes input image and returns output vector

Parameters:

x (torch.Tensor) – tensor of shape [1, 3, H, W]

Returns:

(torch.Tensor): feature vector of shape [1, feature_dim]
(None): Used for compatibility to return attention in other models

class streamer.models.networks.CNNDecoder(feature_dim)[source]#

A 4-layer CNN Decoder model used decode a feature vector back to an image

Parameters:: feature_dim (int) – the input feature dimension

forward(x)[source]#

The forward propagation function that takes a feature vector and returns an image

Parameters:: x (torch.Tensor) – tensor of shape [1, feature_dim]
Returns:: (torch.Tensor): image tensor of shape [1, 3, H, W]

class streamer.models.networks.TemporalEncoding(feature_dim, buffer_size, lr, n_heads=4, num_layers=2, encoder=None, patch=False)[source]#

The temporal encoding module receives as an input a sequence of feature vectors [S, feature_dim] or a sequence of images [S, 3, H, W] and returns the a single summary feature vector [1, feature_dim].

If the input is a sequence of images, it instantiates a user-defined encoder class to encode the images into a series of feature vectors

Parameters:

feature_dim (int) – the input feature dimension
buffer_size (int) – the maximum buffer size to create positional encoding
lr (float) – the learning rate of this module
n_heads (int) – the number of heads for the attention layer
num_layers (int) – the number of transformer encoder layers
encoder (torch.nn.Module) – the encoder class to be used. Default: None
patch (bool) – patch the transformer model to retain attention information. Default: False

get_params()[source]#

Function to extract the parameters of this module

Returns:: (List(torch.tensor)): List of parameters

step_params()[source]#: Applies gradient step on the parameters of this module. Called by StreamerOptimizer.

zero_params()[source]#: Zeros out the gradients of the parameters of this module. Called by StreamerOptimizer.

forward(x)[source]#

Forward propagation function that receives a sequence of inputs and returns a single feature vector summarizing the sequence.

Parameters:

x (torch.Tensor) – an input a sequence of feature vectors [S, feature_dim] or a sequence of images [S, 3, H, W]

Returns:

(torch.tensor): The output feature vector representation of the input sequence
(torch.tensor or None): The output attention values of the encoder. Return None if encoder not defined or if using CNNEncoder

class streamer.models.networks.HierarchicalPrediction(feature_dim, max_layers, lr, layer_num, n_heads=4, num_layers=2, decoder=None, patch=False)[source]#

The hierarchical prediction modules receives as an input a sequence of feature vectors [S, feature_dim] where S is the number of layers and returns the a single prediction feature vector [1, feature_dim] or [1, 3, H, W] if a decoder class is provided.

Parameters:

feature_dim (int) – the input feature dimension
max_layers (int) – the maximum number of layers
lr (float) – the learning rate of this module
layer_num (int) – the layer number where this class is instantiated
n_heads (int) – the number of heads for the attention layer
num_layers (int) – the number of transformer encoder layers
decoder (torch.nn.Module) – the decoder class to be used. Default: None
patch (bool) – patch the transformer model to retain attention information. Default: False

get_params()[source]#

Function to extract the parameters of this module

Returns:: (List(torch.tensor)): List of parameters

step_params()[source]#: Applies gradient step on the parameters of this module. Called by StreamerOptimizer.

zero_params()[source]#: Zeros out the gradients of the parameters of this module. Called by StreamerOptimizer.

forward(x)[source]#

Forward propagation function that receives a sequence of inputs and returns a single feature vector (or a single decoded image) predicting the next input.

Parameters:: x (torch.Tensor) – an input a sequence of feature vectors [S, feature_dim]
Returns:: (torch.tensor): The output prediction feature vector [1, feature_dim] or decoded image [1, 3, H, W]

class streamer.models.layer.StreamerLayerArguments(max_layers: int, feature_dim: int, evolve_every: int, buffer_size: int, loss_threshold: float, lr: float, reps_fn: function, snippet_size: float, demarcation_mode: str = 'average', distance_mode: str = 'similarity', force_base_dist: bool = False, window_size: int = 50, modifier_type: str = 'multiply', modifier: float = 1.0, force_fixed_buffer: bool = False)[source]#

max_layers: int#: The maximum number of layers to stack

feature_dim: int#: Feature dimension of the model embeddings

evolve_every: int#: Create/stack a new layer every ‘evolve_every’

buffer_size: int#: Maximum input buffer size to be used

loss_threshold: float#: Loss threshold value. Not used in average demarcation mode

lr: float#: Learning rate to be used in all modules

reps_fn: function#: Function to aggregate representations from all layers

snippet_size: float#: Snippet size of input video (seconds/image). Typically 0.5 seconds per image

demarcation_mode: str = 'average'#: Demarcation mode used to detect boundaries

distance_mode: str = 'similarity'#: Distance mode for loss calculation

force_base_dist: bool = False#: Force the lowest layer to use MSE instead of Cosine Similarity

window_size: int = 50#: Window size for average demarcation mode

modifier_type: str = 'multiply'#: Modifier type to apply to average demarcation mode [‘multiply’, ‘add’]

modifier: float = 1.0#: Modifier to apply to average demarcation mode

force_fixed_buffer: bool = False#: Force the buffer to be fixed (not replacing inputs) by triggering a boundary when buffer is full

class streamer.models.layer.StreamerLayer(args, layer_num, init_count, encoder=None, decoder=None, logger=None)[source]#

STREAMER layer implementation.

This layer can:

Create/stack other StreamerLayer layers recursively for a maximum of max_layers
Call the other StreamerLayer layers by propagating current representation
Calculate and store the loss for the StreamerOptimizer to use it

Parameters:

args (StreamerLayerArguments) – arguments provided to every streamer layer
layer_num (int) – the index of the current layer in the layers stack
init_count (int) – used to create more layers at initialization. Useful for Inference model using pretrained weights.
encoder (torch.nn.Module) – Encoder class to be used at this layer. Passed later to the TemporalEncoding module
decoder (torch.nn.Module) – Decoder class to be used at this layer. Passed later to the HierarchicalPrediction module
logger (Logger) – Logger to be used for tensorboard.

reset_layer()[source]#: Reset function to be used at the beginning of a new video. Recursively applied to every layer.

optimize_layer()[source]#: Optimization step function. Steps then zeros the gradients. Calls the step_params() and zero_grad() functions of every module. (e.g., step_params()) Recursively applied to every layer.

get_num_layers(num)[source]#

Recursive function to get the total number of layers

Parameters:: num (int) – current number of layers at the previous layer
Returns:: (int): Previous num of layers + 1

create_parent(create)[source]#

Function to create/stack another StreamerLayer layer

Parameters:: create (bool) – only add another layer if create is True

predict()[source]#: Prediction function that calls the TemporalEncoding and HierarchicalPrediction modules

forward(x, base_counter)[source]#

Forward propagation function for a layer. Recursively calls the layer above at event boundary determined by the EventDemarcation module.

Parameters:

x (torch.Tensor) – the input feature vector [1, feature_dim] or image [1, 3, H, W]
base_counter (int) – the location of this input in the video for timescale caluation in the StreamerOptimizer