mxahan.github.io

Few-Shot learning and ZSL

2023

Trosten, Daniel J., et al. “Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- RG: FSL classifiers are prone to hubness problem: few points (hubs) occur frequently in multiple nearest neighbour lists of other points.
- If a support sample is a hub, many query samples will be assigned to it regardless of their true label, resulting in low accuracy
- Hubness negatively impacts distance-based classification when hubs from one class appear often among the nearest neighbors of points from another class, degrading the classifier’s performance.
- TP: prove that hubness can be eliminated by distributing representations uniformly on the hypersphere.
- propose two new approaches to embed representations on the hypersphere, which optimize a tradeoff between uniformity and local similarity preservation (LSP) on the hypersphere – reducing hubness while retaining class structure.
  - leverage a decomposition of the KL divergence between representation and embedding similarities
- Related works: FSL, Hubness problem
  - ZN: zero mean reduces hubness, hyperspherical Uniformity
- TP: Uniform Hyperspherical Structure-preserving Embeddings (noHub) and noHub with Support labels (noHub-S)

2021

Cui, W., & Guo, Y. (2021, July). Parameterless transductive feature re-representation for few-shot learning. In International Conference on Machine Learning (pp. 2212-2221). PMLR.
- propose a parameterless (no extra-training parameters) transductive feature re-representation framework !! compatible with existing FSL methods, including meta-learning and fine tuning based models.
- RG:
- Experiments: three benchmark datasets by applying the framework to both representative meta-learning baselines and state-ofthe-art FSL methods
- Attention among the query samples!!
- section 3.2 methodology description. Attention mechanism to propagate information between them.
  - attention mechanism and cross matching.
    - Low when it accumulates similar example by attention.
- what’s preventing collapse?
Chen, Da, Yuefeng Chen, Yuhong Li, Feng Mao, Yuan He, and Hui Xue. “Self-supervised learning for few-shot image classification.” In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1745-1749. IEEE, 2021.
- TP: apply a much larger embedding network with self-supervised learning (SSL) to incorporate with episodic task based meta-learning.
  - WTF: Pretraining and then Meta-learning!!
Ramesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. “Zero-shot text-to-image generation.” arXiv preprint arXiv:2102.12092 (2021).
Dumoulin, Vincent, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, and Hugo Larochelle. “Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark.” arXiv preprint arXiv:2104.02638 (2021).

2020

Du, Xianzhi, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, and Xiaodan Song. “SpineNet: Learning scale-permuted backbone for recognition and localization.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11592-11601. 2020.
- Meta learning, NAS
- how does Scale decreased model work as backbone in object detection?
- Propose scale permuted network
Wang, Yaqing, Quanming Yao, James T. Kwok, and Lionel M. Ni. “Generalizing from a few examples: A survey on few-shot learning.” ACM Computing Surveys (CSUR) 53, no. 3 (2020): 1-34.
- Three approaches (i) Data (ii) Model (iii) Algorithm, all of them uses prior knowledge!
  - What is prior knowledge!!
    - Interesting section that distinguish between FSL with other learning problems (2.2)
      - FSL avails various options as prior knowledge. For example: Weak-supervised (semi-supervised, active learning) learning is a kind of FSL (when prior knowledge is the unlabeled data); alternatively, the prior knowledge may be the learned model (aka transfer learning.) [figure 3]
      - Unreliable empirical risk minimization
    - (i) data prior knowledge
      - Data augmentation - Transforming samples from the labeled training data
      - Transforming data from the unlabeled data (semi-supervised)
      - Transforming data from the similar dataset (GAN)
    - (ii) Model
      - Multitask learning [i. Parameter sharing ii. Parameter tying ? Pairwise] difference is penalized
      - Embedding Learning [i. Task specific ii. Task invariant: Matching net, prototypical networks iii. hybrid embedding model]
      - External memory: Key-value memory [i. Refining representation ii. Refining parameters]
      - Generative modeling [i. Decomposable components ii. Groupwise Shared Prior iii. Parameters of inference networks]
    - (iii) Algorithm
      - Refining existing parameters [i. Regularization ii. aggregation iii. Fine-tuning existing parameter]
      - Refining Meta-learned parameters
      - Learning the optimizer

Future works: four possible directions : Problem setup, Techniques, applications, and Theories

2019

Bennequin, Etienne. “Meta-learning algorithms for few-shot computer vision.” arXiv preprint arXiv:1909.13579 (2019).
- meta-learning algorithms, i.e. algorithms that learn to learn
  - TP: Meta learning review (N-way K-shot image classification) reviews
    - Support set (N number of classes, K example per class)
    - Query set, Q
    - Base-dataset, they are different from support set classes.
- Solution: i) Memory augmented networks ii) Metric learning (there are some approach combined with meta-learning) iii) Gradient based meta-learner iv) data generation
- Meta-learning Definition: given a task, an algorithm is learning “if its performance at the task improves with experience”, while, given a family of tasks, an algorithm is learning to learn if “its performance at each task improves with experience and with the number of tasks”: referred as a meta-learning algorithm.
Chen, Wei-Yu, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. “A closer look at few-shot classification.” arXiv preprint arXiv:1904.04232 (2019).
- Consistent! comparative analysis of few shot learning methods! (algorithm performance depends on the backbone networks) [deeper backbone reduces performance difference] algo comparison varies as backbone network changes
- Modified baseline methods (imagenet and CUB dataset)!
- new experimental setting for the cross domain generalization of FSL
- Some empirical results (comments on backbone network size! small - then intra-class variation should be low [as expected])
Sun, Qianru, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. “Meta-transfer learning for few-shot learning.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 403-412. 2019.

2018

Ren, Mengye, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. “Meta-learning for semi-supervised few-shot classification.” arXiv preprint arXiv:1803.00676 (2018).
- Extension (??) of Prototypical networks
  - Application scenario: partially labeled data, some classes are missing labeled section.
- Experiment with omniglot and MiniImagenet , tieredImageNet (new proposed data!)
  - With and without distractor classes
  - Three novel extension of prototypical network!
    - (i) Prototypical net with soft k-Means
    - (ii) Prototypical net with soft k-Means with a distractor cluster
    - (iii) Prototypical net with soft k-Means and Masking
Sung, Flood, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M. Hospedales. “Learning to compare: Relation network for few-shot learning.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199-1208. 2018.
- Relation network (RN) - simple! flexible! General! - end-2-end
- Utilizes meta-learning approach! keeps distance between small set of supports (metric learning based approach), reduces complexity in the inference
- Example based classification (FSL) and extension to ZSL
- Experiments with 5 benchmark dataset [2 FS: omniglot, MiniImagenet], [3 ZS: AwA1, AwA2 and CUB]
  - Two branch Relation network for FSL: Embedding module, and relation module (nonlinear comparator, multi-layer NN) Figure 1
    - Both modules are meta-learner
  - Interesting related work section : learning to fine-tune (MAML), RNN memory based, Embedding and Metric learning approachs
    - Mostly related to prototypical network and siamese network
- Problem definition and solution: Section 3

2017 and Earlier

Santoro, Adam, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. “Meta-learning with memory-augmented neural networks.” In International conference on machine learning, pp. 1842-1850. PMLR, 2016.
Vinyals, Oriol, Charles Blundell, Timothy Lillicrap, and Daan Wierstra. “Matching networks for one shot learning.” In Advances in neural information processing systems, pp. 3630-3638. 2016.
- Metric learning and augmented memory network!!
- Non-parametric setup for metric learning
- Two novelty: Modeling & training
Munkhdalai, Tsendsuren, and Hong Yu. “Meta networks.” In International Conference on Machine Learning, pp. 2554-2563. PMLR, 2017.
Reed, Scott, Zeynep Akata, Honglak Lee, and Bernt Schiele. “Learning deep representations of fine-grained visual descriptions.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 49-58. 2016.
Dinu, Georgiana, Angeliki Lazaridou, and Marco Baroni. “Improving zero-shot learning by mitigating the hubness problem.” arXiv preprint arXiv:1412.6568 (2014).
Snell, Jake, Kevin Swersky, and Richard S. Zemel. “Prototypical networks for few-shot learning.” arXiv preprint arXiv:1703.05175 (2017).
- Prototypical networks learn a metric space; classification by computing distances to prototype representations of each class. (non-parametric learning!)
- This paper: New algorithms!
- simple design; sometimes (when??) better than the meta learning and complex architecture.
- Extension to zero-shot??
  - Prior works: Matching networks and episode concepts
    - episodic learning to the meta learning
    - The learning is dependent on the episode selection
  - This paper: tries not to overfit; by using simple inductive bias ???
    - Prototypical Networks: single prototype for each class (classification by finding nearest prototype) - embed the meta-data into shared space
    - works for both few-shot and zero-shot learning
    - how it’s differ from the cluster?
  - Episodic training: By selecting some instances randomly from the given example pool
    - Algorithm 1: interesting
- This paper: uses Bregman Distance function (euclidean distance) to define embedding metric space
  - One shot learning Scenario (single support example): Prototypical network == matching network
    - matching network: weighted nearest neighborhood classifier.
- Experiments: (i) omniglot few-shot data (ii) MiniImagenet few-shot classification (iii) CUB zero-shot classification.
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. “Model-agnostic meta-learning for fast adaptation of deep networks.” In International Conference on Machine Learning, pp. 1126-1135. PMLR, 2017.
- learning and adapting quickly (MAML)
  - The key idea: train the initial parameters to maximize performance on a new task after the parameters have been updated through few gradient steps computed with a small data from that new task.
  - Where it works! Why don’t everybody used it afterwords!
  - Contribution: trains a model’s parameters such that a small number of gradient updates will lead to fast learning on a new task.
    - Prior arts: Learn update function!
      - MAML: more flexible (loss function and architectures)
- MAML: Problem setup:
Weston, Jason, Sumit Chopra, and Antoine Bordes. “Memory networks.” arXiv preprint arXiv:1410.3916 (2014).