mxahan.github.io

Slowfast Model tutorial

Basics

Requirement:

Bringing slowfast model

There is two ways we can bring a slowfast model with random weights initialised.

model_name = "slowfast_r50" # or "slowfast_r101"
model = torch.hub.load("facebookresearch/pytorchvideo", model=model_name, pretrained=False)

To take the pretrained weights

model_name = "slowfast_r50" # or "slowfast_r101"
model = torch.hub.load("facebookresearch/pytorchvideo", model=model_name, pretrained=True)

We can verify the model by checking the model variable. We can use the following setup to count the parameter numbers.

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(count_parameters(model))
import pytorchvideo.models as models
from pytorchvideo.models import create_slowfast
model = create_slowfast(model_depth=50) # Alternative 18, 101, 152

The created model should have equal number of parameters as earlier. To load the pretrained weight we require to download the weights from the link and load the weights. We can use the followings to load the weights.

pretrain = 1
if pretrain == 1:
    wt = torch.load("../Directory/SLOWFAST_8x8_R50.pyth")
    model.load_state_dict(wt["model_state"])

However, to load the model zoo weight of slowfast 100 model do the followings

model = create_slowfast(model_depth=101, slowfast_fusion_conv_kernel_size = (5, 1, 1,))

Inferring using the model

The pytorchvideo slowfast model takes list (formatted in a specific way) as input. It builds upon a class called multiPathWayWithFuse that takes list as input. Anyways, we require to do the following to infer using the model

#%% Defining functions
from pytorchvideo.transforms import  UniformTemporalSubsample
transform1=Compose(
    [UniformTemporalSubsample(num_frames),
        PackPathway()])

#%% Input processing
ipt_vid_frm = 64_consecutive_video_frame # shape of (3, 64, 224, 224)
f = transform1(ipt_vid_frm)
f = [_.to(device)[None, ...] for _ in f]
output = model(f)

Interesting fact: The model execution change the shape of input f since f is a list and when we process python list inside another function the list changes accordingly.

Intermediate

Batch Data processing

One issue I faces is to batching the data for the model to train/infer. Here’s my code to solve the issue.

device = 'cuda:0' # 'cpu'
#%% Defining requirement
def sf_data_ready(vdat):

    transform1=Compose(
        [
            UniformTemporalSubsample(num_frames),
            # Lambda(lambda x: x/255.0),
            # NormalizeVideo(mean, std),
            # ShortSideScale(
            #     size=side_size
            # ),
            # CenterCropVideo(crop_size),
            PackPathway()
        ]
    )


    s2 =[[], []]
    for i in range(vdat.shape[0]):
        f = transform1(vdat[i])
        f = [_.to(device)[None, ...] for _ in f]
        s2[0].append(f[0])
        s2[1].append(f[1])

    for i in range(2):s2[i] = torch.cat((s2[i][0:]))

    return s2

#%% Implementation
video_batch= torch.rand(BS, 3, 64, 224, 224) # BS = Batch size
inputs = sf_data_ready(video_batch)
outputs  = model(inputs)

Modifying Model Structure

We can build our custom slowfast model by providing different arguments choice for the create_slowfast model as enlisted in the docs. The list is exhaustive however, we will go over as much as we can,

slowfast = create_slowfast(slowfast_channel_reduction_ratio= (v_r,)) # v_r <=64

Well, the idea is we can go over this and create a customized model. One drawback of such approach is that we might not have the pretrained weights for our custom model!

Here is some args to look at: input_channel: May allow color and gray scale simultaneously, model_depth: Model depth (18, 50, 101, 151), head: Do we want the projection after the final fusion.

Advanced

The interesting fact that, SF model works with varying input size of (224, 224) or (256, 256). How is this possible even we have linear layer in the later section? Shouldn’t we encounter matrix multiplication error??

Modifying Model structure Heavily

We can also enter the layers and modify them as we suit as example below

model.blocks[6] = nn.Sequential(*list(model.blocks[6].children())[:-2], nn.AdaptiveAvgPool3d(1),
                                nn.Flatten(),
                                nn.Linear(in_features=2304, out_features=1000, bias=True), nn.ReLU()
                              )

We can only change the linear layer too.

model.blocks[6].proj = nn.Linear(in_features=2304, out_features=1000)

However, beware of the following bug of pytorch 1.9 version.

Happy coding.