Welcome to optimx

这里主要记录我在大语言模型算法方向上的学习与研究笔记，重点关注训练目标、推理能力、偏好优化、检索增强以及高效推理等相关主题。

大语言模型算法 · 研究笔记 · 阅读记录 · 实验想法

Alignment DPO, reward modeling, preference data, policy shaping.

Reasoning Test-time compute, verifier design, chain-of-thought reliability.

Retrieval RAG pipelines, document routing, context selection and ranking.

Efficiency Long context, KV cache, batching strategies and serving tradeoffs.

Posts

这里整理近期的大模型算法笔记、论文阅读总结，以及围绕具体问题展开的研究思考。

Understanding Preference Optimization Beyond DPO

A working note on how DPO, IPO, and related objective variants differ in gradient behavior, implicit reward modeling, and data sensitivity. I focus on what actually changes during training instead of only comparing final benchmark numbers.

Date: Apr 2026 | Topic: Alignment | Status: Reading Note

Reasoning, Verifiers, and the Role of Test-Time Compute

This post collects ideas on why extra inference-time computation sometimes helps reasoning, when self-consistency breaks down, and how verifier-guided decoding can improve answer quality without retraining the whole model from scratch.

Date: Mar 2026 | Topic: Reasoning | Status: Draft

Designing Better Synthetic Data Pipelines for Instruction Tuning

A practical overview of synthetic data generation for SFT: prompt construction, filtering, diversity control, and the tradeoff between scale and response quality when building instruction datasets for domain adaptation.

Date: Feb 2026 | Topic: Data Generation | Status: Working Draft

Efficient Long-Context Inference: KV Cache, Chunking, and Retrieval Boundaries

Notes on serving long-context LLMs efficiently, including KV cache reuse, chunk scheduling, retrieval granularity, and where the practical bottleneck shifts from model quality to system design.

Date: Jan 2026 | Topic: Efficient Inference | Status: Research Memo

Welcome to optimx

Posts

Understanding Preference Optimization Beyond DPO

Reasoning, Verifiers, and the Role of Test-Time Compute

Designing Better Synthetic Data Pipelines for Instruction Tuning

Efficient Long-Context Inference: KV Cache, Chunking, and Retrieval Boundaries

Archive

Training Objectives

Inference and Systems

Reasoning and Evaluation

RAG and Knowledge Use

Search