Tags - Zifeng Mai's Blog

Reinforcement Learning

变分序列级软策略优化

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

RFT的熵动力学分析

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

通过拟合整个奖励分布进行强化学习

FlowRL: Matching Reward Distributions for LLM Reasoning

强化学习基础 (3)

动态规划求解

强化学习基础 (2)

有限马尔可夫过程

强化学习中的熵 (2)

熵安全策略

强化学习中的熵 (1)

策略熵

强化学习基础 (1)

多臂赌博机

Multi-Armed Bandit

强化学习基础 (1)

多臂赌博机

Recommender System

生成式推荐系统中的预测解码加速

个性化搜索中的知识-动作对齐

KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao

利用PRM监督生成式推荐模型

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

多目标优化对齐

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Sequence Modeling

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Feature Interaction

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

ByteDance

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Policy Entropy

变分序列级软策略优化

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

RFT的熵动力学分析

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

强化学习中的熵 (2)

熵安全策略

强化学习中的熵 (1)

策略熵

Renyi Entropy

强化学习中的熵 (1)

策略熵

Policy Gradient

强化学习中的熵 (1)

策略熵

Policy Safety

强化学习中的熵 (2)

熵安全策略

Finite Markov Decision Process

强化学习基础 (3)

动态规划求解

强化学习基础 (2)

有限马尔可夫过程

Bellman Equation

强化学习基础 (2)

有限马尔可夫过程

Dynamic Programming

强化学习基础 (3)

动态规划求解

Policy Iteration

强化学习基础 (3)

动态规划求解

Value Iteration

强化学习基础 (3)

动态规划求解

Banach Fixed Point Theorem

强化学习基础 (3)

动态规划求解

Distribution Matching

通过拟合整个奖励分布进行强化学习

FlowRL: Matching Reward Distributions for LLM Reasoning

Generative Modeling

生成模型 (3.3)

Flow Matching

生成模型 (3.2)

Flow Model

生成模型 (3.1)

Flow-based Method

生成模型 (2.1)

Energy-based Model

生成模型 (1.3)

Denoising Diffusion Probabilistic Model

生成模型 (1.2)

Variational Auto-Encoder

生成模型 (1.1)

变分推断

生成模型 (0)

Overview of Deep Generative Modeling

VAE

生成模型 (0)

Overview of Deep Generative Modeling

DDPM

生成模型 (0)

Overview of Deep Generative Modeling

Flow Matching

生成模型 (3.3)

Flow Matching

生成模型 (0)

Overview of Deep Generative Modeling

GAN

生成模型 (0)

Overview of Deep Generative Modeling

Variational Inference

生成模型 (1.1)

变分推断

Variational Auto-Encoder

生成模型 (1.2)

Variational Auto-Encoder

Hierarchical VAE

生成模型 (1.2)

Variational Auto-Encoder

Denoising Diffusion Probabilistic Model

随机过程(8)

高斯过程(4): 高斯过程的应用

随机过程(6)

高斯过程(2): 多元高斯分布

生成模型 (1.3)

Denoising Diffusion Probabilistic Model

Energy-based Model

生成模型 (2.1)

Energy-based Model

Flow-based Method

生成模型 (3.1)

Flow-based Method

Velocity Field

生成模型 (3.1)

Flow-based Method

Optimal Transport

生成模型 (3.1)

Flow-based Method

Flow Model

生成模型 (3.2)

Flow Model

Continuity Equation

生成模型 (3.2)

Flow Model

Instantaneous Change of Variables Formula

生成模型 (3.2)

Flow Model

Bregman Divergence

生成模型 (3.3)

Flow Matching

Multi-objective Optimization

多目标优化对齐

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

AUC Optimization

多目标优化对齐

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

Generative Recommendation

生成式推荐系统中的预测解码加速

利用PRM监督生成式推荐模型

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

Process Reward Model

利用PRM监督生成式推荐模型

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

Semantic Drift

利用PRM监督生成式推荐模型

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

Optimizer

优化器 (3)

Gram Newton-Schulz

优化器 (2)

Muon

优化器 (1)

SGD和Adam

Machine Learning

优化器 (1)

SGD和Adam

Deep Learning

优化器 (1)

SGD和Adam

Optimization Theory

优化器 (3)

Gram Newton-Schulz

优化器 (2)

Muon

优化器 (1)

SGD和Adam

Muon

优化器 (3)

Gram Newton-Schulz

优化器 (2)

Muon

Newton-Schulz

优化器 (3)

Gram Newton-Schulz

优化器 (2)

Muon

LLM

优化器 (3)

Gram Newton-Schulz

NVIDIA Nemotron 3 Super 技术解读

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

FlashAttention 系列

优化器 (2)

Muon

Entropy Safe

变分序列级软策略优化

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

RFT的熵动力学分析

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Stochastic Process

随机过程(11)

马尔可夫链

随机过程(10)

泊松过程(2): 过滤泊松过程

随机过程(9)

泊松过程(1): 泊松分布

随机过程(8)

高斯过程(4): 高斯过程的应用

随机过程(7)

高斯过程(3): 高斯与非线性

随机过程(6)

高斯过程(2): 多元高斯分布

随机过程(5)

高斯过程(1): Gaussian is Everywhere

随机过程(4)

多元相关

随机过程(3)

非平稳随机过程

随机过程(2)

宽平稳随机过程相关函数的时频分析

随机过程(1)

线性相关

Linear Correlation

随机过程(1)

线性相关

Correlation Function

随机过程(2)

宽平稳随机过程相关函数的时频分析

Wide State Stationary

随机过程(2)

宽平稳随机过程相关函数的时频分析

Bochner's Theorem

随机过程(2)

宽平稳随机过程相关函数的时频分析

Wiener-Khintchine Theorem

随机过程(2)

宽平稳随机过程相关函数的时频分析

Cyclostationary Process

随机过程(3)

非平稳随机过程

PAM

随机过程(3)

非平稳随机过程

Orthogonal Increment Process

随机过程(3)

非平稳随机过程

Brownian Motion

随机过程(3)

非平稳随机过程

White Noise

随机过程(3)

非平稳随机过程

Multivariate Correlation

随机过程(4)

多元相关

Whitening

随机过程(4)

多元相关

PCA

随机过程(4)

多元相关

K-L Expansion

随机过程(4)

多元相关

Stieltjes Integral

随机过程(4)

多元相关

Spectral Representation

随机过程(4)

多元相关

Gaussian Process

随机过程(8)

高斯过程(4): 高斯过程的应用

随机过程(6)

高斯过程(2): 多元高斯分布

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Diffusion

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Central Limit Theorem

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Law of Large Numbers

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Random Walk

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Gaussian Distribution

随机过程(6)

高斯过程(2): 多元高斯分布

Isoperimetric Inequality

等周不等式

周长为定值时，面积最大的封闭图形是圆

FlashAttention

FlashAttention 系列

GPU Optimization

FlashAttention 系列

NVIDIA

NVIDIA Nemotron 3 Super 技术解读

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron

NVIDIA Nemotron 3 Super 技术解读

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Multimodal Alignment