Trending Topics 2
layout: post title: “TrendingTopics” categories: Hotopics —
The part1 consists of 10 topics, this is continuation covering some other interesting Topics
- Collaborative Filtering
- Metric Learning
- Federated Learning
- BERT
- Self-supervision
- Divergence algorithm
- Meta Learning
- Adversarial Training
Adversarial Training
Defend against adversarial attack
- The PGD (projected gradient descent)
- Optimization Problem: PGD attempts to find the perturbation that maximises the loss of a model on a particular input while keeping the size of the perturbation smaller than a specified amount referred to as epsilon.
Collaborative Filtering
Metric Learning
Federated Learning
Decentralized training! Google uses in mobile phone.
- Start from a generalized model
- Personalize it [Many other personalize it too!]
- Track the ensemble change (from different users) and update the local model.
Atrous Convolution and friends
Origin French “a trous” or hole, a.k.a dilated convolution.
where rate, r is an positive integer and r = 1 means regular convolution. Allow to enlarge the field of view.
Atrous Spatial Pyramid Pooling (ASPP)
Fully connected Conditional Random Field (CRF)
BERT
one of the great post and motivation.
Two most important point
- Semi supervised training (Wiki, large language) [MLM, NSP]
- Supervised data specific training (QA, ….)
BERT build on
- Semi supervised sequence learning - Fine tuning concept
- ELMo - Contextual embedding
- ULM-Fit - Fine tuning and tx learning
- OpenAI transformer
- Transformer (Vaswani et al)
Model Architecture:
- BERT base
- BERT large
Need to have ideas regarding the word embedding and contextual word embedding.
Transformer: Better long term dependencies than LSTMs. Encoder-decoder for MT. But how to use it for sentence?
OpenAI transformer: Decoder of transformer only!! Predict next words 7000 books for training. But unidirectional?
BERT: Bidirectional, used encoders!!
Pretraining:
- MLM: Mask to rescue from word seeing itself in bidirectional setting. 15% words masked in their approach.
- Two sentence task:
Down-streaming Task:
- sentence classification
- Single
- Pair
- QA tasks
- Sentence tagging
BERT for feature Extraction:
- Contextual word embedding
- Named entity recognition
Self Supervision
great paper collection Definitely check it out.
Self-supervised task also named as pretext task. Before the final/downstream tasks.
Divergence Algorithm
Meta-learning
learning to learn. Generalized approach in learning ! (supervised, RL)
Where we can apply! Problem definition:
Solutions: Model-based, metric-based, optimization Based
Metric-based: Siamese Neural Network, Matching Network, Relation Network, Prototypical Network
Model-based: Memory-augmented NN, Meta-Network
Optimization-based: Meta-learner, MAML, first order MAML, Reptile
Algorithms for Few shot learning link
- Model-Agnostic Meta-Learning (MAML) [gradient based meta-learning]
- Matching network
- Prototypical Network
- Relation network