Posts

Page 3 - Showing 8 of 75 posts

View all posts by years →

常用计算机指令

首次发布: 2026-02-01

... 次访问

Dev Journal

commands

记录一些常用的计算机指令

2537 个字词

13 分钟

t-SNE

首次发布: 2025-12-27

... 次访问

Course Notes

Algorithms

Machine Learning

The core objective of t-SNE (t-distributed Stochastic Neighbor Embedding) is to reduce the dimensionality of high-dimensional data while preserving local neighborhood structure.

828 个字词

4 分钟

Self-Distillation

首次发布: 2025-12-27

... 次访问

Explorations & Insights

Pretraining Methods

This paper proposes DINO, a self-distillation framework with no labels, to pretrain ViTs. Besides the fact that the DINO method works quite well on this kind of architecture, there are also two interesting properties emerging from the learned features:

1005 个字词

5 分钟

On-Policy Distillation

首次发布: 2025-12-15

... 次访问

Explorations & Insights

Large Models

Theory

Currently, large models are post‑trained via RLHF, making them powerful but expensive to train and deploy, while smaller models are usually fine‑tuned with SFT or KD methods and are easier to deploy and adapt but often lack the performance of larger models.

944 个字词

5 分钟

Fourier and Wavelets for Deep Learning

首次发布: 2025-12-08

... 次访问

Explorations & Insights

Theory

令 f\in L^2(\mathbb{R})。傅里叶变换（在 L^2 意义下）把信号表示为全局正弦基的叠加：

3373 个字词

17 分钟

流模型 Appendix

首次发布: 2025-10-14