mxahan.github.io

layout: post title: “TrendingTopics” categories: Hotopics —

The part1 consists of 10 topics, this is continuation covering some other interesting Topics

Collaborative Filtering
Metric Learning
Federated Learning
BERT
Self-supervision
Divergence algorithm
Meta Learning
Adversarial Training

Adversarial Training

link

Defend against adversarial attack

The PGD (projected gradient descent)
- Optimization Problem: PGD attempts to find the perturbation that maximises the loss of a model on a particular input while keeping the size of the perturbation smaller than a specified amount referred to as epsilon.

Collaborative Filtering

Metric Learning

Federated Learning

Nice Introduction

Decentralized training! Google uses in mobile phone.

Start from a generalized model
Personalize it [Many other personalize it too!]
Track the ensemble change (from different users) and update the local model.

Atrous Convolution and friends

Motivation

Origin French “a trous” or hole, a.k.a dilated convolution.

$y[i]=\sum__{k=1}^K x[i + r.k]w[k]$

where rate, r is an positive integer and r = 1 means regular convolution. Allow to enlarge the field of view.

Atrous Spatial Pyramid Pooling (ASPP)

Fully connected Conditional Random Field (CRF)

BERT

one of the great post and motivation.

Two most important point

Semi supervised training (Wiki, large language) [MLM, NSP]
Supervised data specific training (QA, ….)

BERT build on

Semi supervised sequence learning - Fine tuning concept
ELMo - Contextual embedding
ULM-Fit - Fine tuning and tx learning
OpenAI transformer
Transformer (Vaswani et al)

Model Architecture:

BERT base
BERT large

Need to have ideas regarding the word embedding and contextual word embedding.

Transformer: Better long term dependencies than LSTMs. Encoder-decoder for MT. But how to use it for sentence?

OpenAI transformer: Decoder of transformer only!! Predict next words 7000 books for training. But unidirectional?

BERT: Bidirectional, used encoders!!

Pretraining:

MLM: Mask to rescue from word seeing itself in bidirectional setting. 15% words masked in their approach.
Two sentence task:

Down-streaming Task:

sentence classification
- Single
- Pair
QA tasks
Sentence tagging

BERT for feature Extraction:

Contextual word embedding
Named entity recognition

another summary

original release

Self Supervision

Best starting point

great paper collection Definitely check it out.

Self-supervised task also named as pretext task. Before the final/downstream tasks.

Divergence Algorithm

contrastive divergence medium

Meta-learning

initial ideas

learning to learn. Generalized approach in learning ! (supervised, RL)

Where we can apply! Problem definition:

Solutions: Model-based, metric-based, optimization Based

Metric-based: Siamese Neural Network, Matching Network, Relation Network, Prototypical Network

Model-based: Memory-augmented NN, Meta-Network

Optimization-based: Meta-learner, MAML, first order MAML, Reptile

Algorithms for Few shot learning link

Model-Agnostic Meta-Learning (MAML) [gradient based meta-learning]
Matching network
Prototypical Network
Relation network

further details

Attention