Alignment
DPO, reward modeling, preference data, policy shaping.
Reasoning
Test-time compute, verifier design, chain-of-thought reliability.
Retrieval
RAG pipelines, document routing, context selection and ranking.
Efficiency
Long context, KV cache, batching strategies and serving tradeoffs.
Posts
这里整理近期的大模型算法笔记、论文阅读总结,以及围绕具体问题展开的研究思考。
Understanding Preference Optimization Beyond DPO
A working note on how DPO, IPO, and related objective variants differ in gradient behavior,
implicit reward modeling, and data sensitivity. I focus on what actually changes during
training instead of only comparing final benchmark numbers.
Date: Apr 2026 | Topic: Alignment | Status: Reading Note
Reasoning, Verifiers, and the Role of Test-Time Compute
This post collects ideas on why extra inference-time computation sometimes helps reasoning,
when self-consistency breaks down, and how verifier-guided decoding can improve answer quality
without retraining the whole model from scratch.
Date: Mar 2026 | Topic: Reasoning | Status: Draft
Designing Better Synthetic Data Pipelines for Instruction Tuning
A practical overview of synthetic data generation for SFT: prompt construction, filtering,
diversity control, and the tradeoff between scale and response quality when building
instruction datasets for domain adaptation.
Date: Feb 2026 | Topic: Data Generation | Status: Working Draft
Efficient Long-Context Inference: KV Cache, Chunking, and Retrieval Boundaries
Notes on serving long-context LLMs efficiently, including KV cache reuse, chunk scheduling,
retrieval granularity, and where the practical bottleneck shifts from model quality to system
design.
Date: Jan 2026 | Topic: Efficient Inference | Status: Research Memo
Archive
按主题对内容进行归档,方便后续持续整理,也让这个网站更像一个真实在更新的研究笔记站。
Training Objectives
Preference optimization, supervised fine-tuning, reward modeling, and objective design.
12 notes planned
Inference and Systems
Serving efficiency, long-context behavior, memory tradeoffs, batching, and latency analysis.
8 notes planned
Reasoning and Evaluation
Verifier models, self-consistency, evaluation protocol design, and failure case analysis.
10 notes planned
RAG and Knowledge Use
Retrieval pipelines, reranking, context compression, and grounding strategies for LLMs.
7 notes planned