Optical Flow

For frames in video, we assume object moves but the intensity of pixel remains same.

$I(x, y, t) = I(x +dx, y+dy, t+dy)$

Now using taylors formula

$I(x +dx, y+dy, t+dy) = I(x, y, t)+ \frac{\delta I}{\delta t} \delta x+ \frac{\delta I}{\delta t}\delta y +\frac{\delta I}{\delta t}\delta t + ...$

Combining the earlier two gives,

$;\frac{\delta I}{\delta t} \delta x+ \frac{\delta I}{\delta t}\delta y +\frac{\delta I}{\delta t}\delta t = 0$

$\frac{\delta I}{\delta t} u+ \frac{\delta I}{\delta t}v +\frac{\delta I}{\delta t} = 0$

This show relation between image gradients alone x, y and time axis. The unknowns are u and v. This requires methods like Mean shift color histogram tracking, Lucas-Kanade methods. It’s an optimization problem.

A distinction to keep in mind for recovering motion.

Feature-tracking; Extract visual features and track them
Optical flow; Recover image motion at pixel from spatio-temporal image brightness variations (the brightness assumption, small motion and spatial coherence should maintain).

Solving equation, modified and matrix form from the earlier equation.

$Au = b$

$A^TAu = A^Tb$

Deep learning has some implementation of the optical flow: FlowNet and its variations.

FlowNetS:
- Simple implementaion
- Encoder Decoder layeryer
EPE/APE
- Euclidean distance between true and ground truth vector
FlowNetC
- Correlated
- Two similar structure
FlowNet 2.0
- 1st Layer of FlowNetC
- FlowNetCS (Combination of C and S)
- Warping
  - Training Dataset: Syntheic data

Eulerian Video Maginification

A computational technique to visualize the small change in video. A function approximation and magnify the function. Related to fluid mechanics in Lagrangian Prospective. Transforming image into a complex steerable pyramid. Exaggerating the phase variation. Amplify the small motions.

Linear Video Magnification: First-Taylor arguments.

1.1 1D Translation: Goal is to motion magnification of the following signal.

$\hat{I}(x, t) = f(x-(1+\alpha)\delta t)$

$\hat{I}(x, t) \approx f(x)- (1+\alpha)\delta(t) \frac{f(x)}{\delta x}$

The interesting part are the change

$B(x, t) := I(x ,t) - I(x, 0)$

Using Taylor Expantion

$I(x, t) \approx f(x) - \delta(t) \frac{f(x)}{\delta x}$

$B(x, t) \approx - \delta(t) \frac{f(x)}{\delta x}$

Now the magnification: $\hat{I}(x, t) = I(x, t) +\alpha B(x, t)$

Amplified factor (1 + $\alpha$ )

1.2 General case: Similar like general taylor with amplification factor.

1.3. Limitation:

Overshoot or undershoot (too large motion causes artifacts)
Noise amplificatin

Has another better alternative

Phase based magnification: Use of wavelet

2.1 Simplified Global case:: Assumption about the functional form of previous function

$f(x) = \sum_\omega A_\omega e^{i\phi_\omega} e^{i_\omega x}$

Now we get phase difference by using the change in time by $\omega \delta (t)$ which get amplified by amplification factor. Breaking image into local sinusoid using complex steerable pyramid.

2.2 Complex steerable pyramid: Concept of wavelet and basis functon to localize frequency and space.

2.3 Phase Shift and Translation: Related to phase based optical flow.

Model Compression

survey 1

Focused on 4 key Contributions

Parameter pruning/ quantization (drop redundant/uncritical information)
- Quantization and Binarization
- Network pruning
- Structural matrix
Low-rank Factorization (estimate informative params)
Transferred or compact convolutional filters (training from scratch)
Knowledge distillation (From scratch)

Also

Dynamic capacity network
Stochastic depth network Table: Summary [source paper]

blog 1

Deep Compression
- Network Pruning, Quantization, Huffman encoding
Weight Quantization method
- Binarized Neural network link
- Trained ternary Quantization link
SqueezeNet link
MovileNet v1 v2
SepNetlink

Importance Sampling

medium link