Welcome to optimx

这里主要记录我在大语言模型算法方向上的学习与研究笔记,重点关注训练目标、推理能力、偏好优化、检索增强以及高效推理等相关主题。

大语言模型算法 · 研究笔记 · 阅读记录 · 实验想法
Alignment DPO, reward modeling, preference data, policy shaping.
Reasoning Test-time compute, verifier design, chain-of-thought reliability.
Retrieval RAG pipelines, document routing, context selection and ranking.
Efficiency Long context, KV cache, batching strategies and serving tradeoffs.

Posts

这里整理近期的大模型算法笔记、论文阅读总结,以及围绕具体问题展开的研究思考。

Understanding Preference Optimization Beyond DPO

A working note on how DPO, IPO, and related objective variants differ in gradient behavior, implicit reward modeling, and data sensitivity. I focus on what actually changes during training instead of only comparing final benchmark numbers.

Reasoning, Verifiers, and the Role of Test-Time Compute

This post collects ideas on why extra inference-time computation sometimes helps reasoning, when self-consistency breaks down, and how verifier-guided decoding can improve answer quality without retraining the whole model from scratch.

Designing Better Synthetic Data Pipelines for Instruction Tuning

A practical overview of synthetic data generation for SFT: prompt construction, filtering, diversity control, and the tradeoff between scale and response quality when building instruction datasets for domain adaptation.

Efficient Long-Context Inference: KV Cache, Chunking, and Retrieval Boundaries

Notes on serving long-context LLMs efficiently, including KV cache reuse, chunk scheduling, retrieval granularity, and where the practical bottleneck shifts from model quality to system design.

Archive

按主题对内容进行归档,方便后续持续整理,也让这个网站更像一个真实在更新的研究笔记站。

Training Objectives

Preference optimization, supervised fine-tuning, reward modeling, and objective design.

12 notes planned

Inference and Systems

Serving efficiency, long-context behavior, memory tradeoffs, batching, and latency analysis.

8 notes planned

Reasoning and Evaluation

Verifier models, self-consistency, evaluation protocol design, and failure case analysis.

10 notes planned

RAG and Knowledge Use

Retrieval pipelines, reranking, context compression, and grounding strategies for LLMs.

7 notes planned

Search

This section can later be connected to a real site search. For now, it highlights the main themes that appear across the notes and articles on this homepage.

LLM Alignment Preference Optimization Reasoning Verifier Synthetic Data RAG Long Context Inference Efficiency