On-Policy Distillation
首次发布: 2025-12-15
... 次访问
Currently, large models are post‑trained via RLHF, making them powerful but expensive to train and deploy, while smaller models are usually fine‑tuned with SFT or KD methods and are easier to deploy and adapt but often lack the performance of larger models.
944 个字词
|
5 分钟
Self-Distillation
首次发布: 2025-12-27
... 次访问
This paper proposes DINO, a self-distillation framework with no labels, to pretrain ViTs. Besides the fact that the DINO method works quite well on this kind of architecture, there are also two interesting properties emerging from the learned features:
1005 个字词
|
5 分钟
Foundational Models 论文阅读合集 1
首次发布: 2026-03-20 | 最后更新:2026-04-10
... 次访问
14538 个字词
|
73 分钟