Layers#

class streamer.models.networks.CNNEncoder(feature_dim)[source]#

A 4-layer CNN Encoder model used encode an image into a feature vector

Parameters:

feature_dim (int) – the output feature dimension

forward(x)[source]#

The forward propagation function that takes input image and returns output vector

Parameters:

x (torch.Tensor) – tensor of shape [1, 3, H, W]

Returns:

  • (torch.Tensor): feature vector of shape [1, feature_dim]

  • (None): Used for compatibility to return attention in other models

class streamer.models.networks.CNNDecoder(feature_dim)[source]#

A 4-layer CNN Decoder model used decode a feature vector back to an image

Parameters:

feature_dim (int) – the input feature dimension

forward(x)[source]#

The forward propagation function that takes a feature vector and returns an image

Parameters:

x (torch.Tensor) – tensor of shape [1, feature_dim]

Returns:

(torch.Tensor): image tensor of shape [1, 3, H, W]

class streamer.models.networks.TemporalEncoding(feature_dim, buffer_size, lr, n_heads=4, num_layers=2, encoder=None, patch=False)[source]#

The temporal encoding module receives as an input a sequence of feature vectors [S, feature_dim] or a sequence of images [S, 3, H, W] and returns the a single summary feature vector [1, feature_dim].

If the input is a sequence of images, it instantiates a user-defined encoder class to encode the images into a series of feature vectors

Parameters:
  • feature_dim (int) – the input feature dimension

  • buffer_size (int) – the maximum buffer size to create positional encoding

  • lr (float) – the learning rate of this module

  • n_heads (int) – the number of heads for the attention layer

  • num_layers (int) – the number of transformer encoder layers

  • encoder (torch.nn.Module) – the encoder class to be used. Default: None

  • patch (bool) – patch the transformer model to retain attention information. Default: False

get_params()[source]#

Function to extract the parameters of this module

Returns:

(List(torch.tensor)): List of parameters

step_params()[source]#

Applies gradient step on the parameters of this module. Called by StreamerOptimizer.

zero_params()[source]#

Zeros out the gradients of the parameters of this module. Called by StreamerOptimizer.

forward(x)[source]#

Forward propagation function that receives a sequence of inputs and returns a single feature vector summarizing the sequence.

Parameters:

x (torch.Tensor) – an input a sequence of feature vectors [S, feature_dim] or a sequence of images [S, 3, H, W]

Returns:

  • (torch.tensor): The output feature vector representation of the input sequence

  • (torch.tensor or None): The output attention values of the encoder. Return None if encoder not defined or if using CNNEncoder

class streamer.models.networks.HierarchicalPrediction(feature_dim, max_layers, lr, layer_num, n_heads=4, num_layers=2, decoder=None, patch=False)[source]#

The hierarchical prediction modules receives as an input a sequence of feature vectors [S, feature_dim] where S is the number of layers and returns the a single prediction feature vector [1, feature_dim] or [1, 3, H, W] if a decoder class is provided.

Parameters:
  • feature_dim (int) – the input feature dimension

  • max_layers (int) – the maximum number of layers

  • lr (float) – the learning rate of this module

  • layer_num (int) – the layer number where this class is instantiated

  • n_heads (int) – the number of heads for the attention layer

  • num_layers (int) – the number of transformer encoder layers

  • decoder (torch.nn.Module) – the decoder class to be used. Default: None

  • patch (bool) – patch the transformer model to retain attention information. Default: False

get_params()[source]#

Function to extract the parameters of this module

Returns:

(List(torch.tensor)): List of parameters

step_params()[source]#

Applies gradient step on the parameters of this module. Called by StreamerOptimizer.

zero_params()[source]#

Zeros out the gradients of the parameters of this module. Called by StreamerOptimizer.

forward(x)[source]#

Forward propagation function that receives a sequence of inputs and returns a single feature vector (or a single decoded image) predicting the next input.

Parameters:

x (torch.Tensor) – an input a sequence of feature vectors [S, feature_dim]

Returns:

(torch.tensor): The output prediction feature vector [1, feature_dim] or decoded image [1, 3, H, W]

class streamer.models.layer.StreamerLayerArguments(max_layers: int, feature_dim: int, evolve_every: int, buffer_size: int, loss_threshold: float, lr: float, reps_fn: function, snippet_size: float, demarcation_mode: str = 'average', distance_mode: str = 'similarity', force_base_dist: bool = False, window_size: int = 50, modifier_type: str = 'multiply', modifier: float = 1.0, force_fixed_buffer: bool = False)[source]#
max_layers: int#

The maximum number of layers to stack

feature_dim: int#

Feature dimension of the model embeddings

evolve_every: int#

Create/stack a new layer every ‘evolve_every’

buffer_size: int#

Maximum input buffer size to be used

loss_threshold: float#

Loss threshold value. Not used in average demarcation mode

lr: float#

Learning rate to be used in all modules

reps_fn: function#

Function to aggregate representations from all layers

snippet_size: float#

Snippet size of input video (seconds/image). Typically 0.5 seconds per image

demarcation_mode: str = 'average'#

Demarcation mode used to detect boundaries

distance_mode: str = 'similarity'#

Distance mode for loss calculation

force_base_dist: bool = False#

Force the lowest layer to use MSE instead of Cosine Similarity

window_size: int = 50#

Window size for average demarcation mode

modifier_type: str = 'multiply'#

Modifier type to apply to average demarcation mode [‘multiply’, ‘add’]

modifier: float = 1.0#

Modifier to apply to average demarcation mode

force_fixed_buffer: bool = False#

Force the buffer to be fixed (not replacing inputs) by triggering a boundary when buffer is full

class streamer.models.layer.StreamerLayer(args, layer_num, init_count, encoder=None, decoder=None, logger=None)[source]#

STREAMER layer implementation.

This layer can:
Parameters:
  • args (StreamerLayerArguments) – arguments provided to every streamer layer

  • layer_num (int) – the index of the current layer in the layers stack

  • init_count (int) – used to create more layers at initialization. Useful for Inference model using pretrained weights.

  • encoder (torch.nn.Module) – Encoder class to be used at this layer. Passed later to the TemporalEncoding module

  • decoder (torch.nn.Module) – Decoder class to be used at this layer. Passed later to the HierarchicalPrediction module

  • logger (Logger) – Logger to be used for tensorboard.

reset_layer()[source]#

Reset function to be used at the beginning of a new video. Recursively applied to every layer.

optimize_layer()[source]#

Optimization step function. Steps then zeros the gradients. Calls the step_params() and zero_grad() functions of every module. (e.g., step_params()) Recursively applied to every layer.

get_num_layers(num)[source]#

Recursive function to get the total number of layers

Parameters:

num (int) – current number of layers at the previous layer

Returns:

(int): Previous num of layers + 1

create_parent(create)[source]#

Function to create/stack another StreamerLayer layer

Parameters:

create (bool) – only add another layer if create is True

predict()[source]#

Prediction function that calls the TemporalEncoding and HierarchicalPrediction modules

forward(x, base_counter)[source]#

Forward propagation function for a layer. Recursively calls the layer above at event boundary determined by the EventDemarcation module.

Parameters:
  • x (torch.Tensor) – the input feature vector [1, feature_dim] or image [1, 3, H, W]

  • base_counter (int) – the location of this input in the video for timescale caluation in the StreamerOptimizer