Toggle navigation
Zifeng Mai's Blog
Home
About Me
Search
Tags
Tags
Life is like a box of chocolates, you never know what you're gonna get.
Reinforcement Learning
Multi-Armed Bandit
Recommender System
Sequence Modeling
Feature Interaction
ByteDance
Policy Entropy
Renyi Entropy
Policy Gradient
Policy Safety
Finite Markov Decision Process
Bellman Equation
Dynamic Programming
Policy Iteration
Value Iteration
Banach Fixed Point Theorem
Distribution Matching
Generative Modeling
VAE
DDPM
Flow Matching
GAN
Variational Inference
Variational Auto-Encoder
Hierarchical VAE
Denoising Diffusion Probabilistic Model
Energy-based Model
Flow-based Method
Velocity Field
Optimal Transport
Flow Model
Continuity Equation
Instantaneous Change of Variables Formula
Bregman Divergence
Multi-objective Optimization
AUC Optimization
Generative Recommendation
Process Reward Model
Semantic Drift
Optimizer
Machine Learning
Deep Learning
Optimization Theory
Muon
Newton-Schulz
LLM
Entropy Safe
Stochastic Process
Linear Correlation
Correlation Function
Wide State Stationary
Bochner's Theorem
Wiener-Khintchine Theorem
Cyclostationary Process
PAM
Orthogonal Increment Process
Brownian Motion
White Noise
Multivariate Correlation
Whitening
PCA
K-L Expansion
Stieltjes Integral
Spectral Representation
Gaussian Process
Diffusion
Central Limit Theorem
Law of Large Numbers
Random Walk
Gaussian Distribution
Isoperimetric Inequality
FlashAttention
GPU Optimization
NVIDIA
Nemotron
Multimodal Alignment
Speculative Decoding
Inference Acceleration
Real Analysis
Completeness
Price Theorem
Characteristic Function
Black-Scholes Equation
Ito Integral
Poisson Process
Moment Generating Function
Gamma Distribution
Exponential Distribution
Filtering Poisson Process
Queueing
Markov Chain
Chapman-Kolmogorov
Recurrent
Ergodic Theorem
Reinforcement Learning
变分序列级软策略优化
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
RFT的熵动力学分析
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
通过拟合整个奖励分布进行强化学习
FlowRL: Matching Reward Distributions for LLM Reasoning
强化学习基础 (3)
动态规划求解
强化学习基础 (2)
有限马尔可夫过程
强化学习中的熵 (2)
熵安全策略
强化学习中的熵 (1)
策略熵
强化学习基础 (1)
多臂赌博机
Multi-Armed Bandit
强化学习基础 (1)
多臂赌博机
Recommender System
生成式推荐系统中的预测解码加速
个性化搜索中的知识-动作对齐
KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao
利用PRM监督生成式推荐模型
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
多目标优化对齐
HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Sequence Modeling
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Feature Interaction
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
ByteDance
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Policy Entropy
变分序列级软策略优化
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
RFT的熵动力学分析
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
强化学习中的熵 (2)
熵安全策略
强化学习中的熵 (1)
策略熵
Renyi Entropy
强化学习中的熵 (1)
策略熵
Policy Gradient
强化学习中的熵 (1)
策略熵
Policy Safety
强化学习中的熵 (2)
熵安全策略
Finite Markov Decision Process
强化学习基础 (3)
动态规划求解
强化学习基础 (2)
有限马尔可夫过程
Bellman Equation
强化学习基础 (2)
有限马尔可夫过程
Dynamic Programming
强化学习基础 (3)
动态规划求解
Policy Iteration
强化学习基础 (3)
动态规划求解
Value Iteration
强化学习基础 (3)
动态规划求解
Banach Fixed Point Theorem
强化学习基础 (3)
动态规划求解
Distribution Matching
通过拟合整个奖励分布进行强化学习
FlowRL: Matching Reward Distributions for LLM Reasoning
Generative Modeling
生成模型 (3.3)
Flow Matching
生成模型 (3.2)
Flow Model
生成模型 (3.1)
Flow-based Method
生成模型 (2.1)
Energy-based Model
生成模型 (1.3)
Denoising Diffusion Probabilistic Model
生成模型 (1.2)
Variational Auto-Encoder
生成模型 (1.1)
变分推断
生成模型 (0)
Overview of Deep Generative Modeling
VAE
生成模型 (0)
Overview of Deep Generative Modeling
DDPM
生成模型 (0)
Overview of Deep Generative Modeling
Flow Matching
生成模型 (3.3)
Flow Matching
生成模型 (0)
Overview of Deep Generative Modeling
GAN
生成模型 (0)
Overview of Deep Generative Modeling
Variational Inference
生成模型 (1.1)
变分推断
Variational Auto-Encoder
生成模型 (1.2)
Variational Auto-Encoder
Hierarchical VAE
生成模型 (1.2)
Variational Auto-Encoder
Denoising Diffusion Probabilistic Model
随机过程(8)
高斯过程(4): 高斯过程的应用
随机过程(6)
高斯过程(2): 多元高斯分布
生成模型 (1.3)
Denoising Diffusion Probabilistic Model
Energy-based Model
生成模型 (2.1)
Energy-based Model
Flow-based Method
生成模型 (3.1)
Flow-based Method
Velocity Field
生成模型 (3.1)
Flow-based Method
Optimal Transport
生成模型 (3.1)
Flow-based Method
Flow Model
生成模型 (3.2)
Flow Model
Continuity Equation
生成模型 (3.2)
Flow Model
Instantaneous Change of Variables Formula
生成模型 (3.2)
Flow Model
Bregman Divergence
生成模型 (3.3)
Flow Matching
Multi-objective Optimization
多目标优化对齐
HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation
AUC Optimization
多目标优化对齐
HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation
Generative Recommendation
生成式推荐系统中的预测解码加速
利用PRM监督生成式推荐模型
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
Process Reward Model
利用PRM监督生成式推荐模型
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
Semantic Drift
利用PRM监督生成式推荐模型
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
Optimizer
优化器 (3)
Gram Newton-Schulz
优化器 (2)
Muon
优化器 (1)
SGD和Adam
Machine Learning
优化器 (1)
SGD和Adam
Deep Learning
优化器 (1)
SGD和Adam
Optimization Theory
优化器 (3)
Gram Newton-Schulz
优化器 (2)
Muon
优化器 (1)
SGD和Adam
Muon
优化器 (3)
Gram Newton-Schulz
优化器 (2)
Muon
Newton-Schulz
优化器 (3)
Gram Newton-Schulz
优化器 (2)
Muon
LLM
优化器 (3)
Gram Newton-Schulz
NVIDIA Nemotron 3 Super 技术解读
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
FlashAttention 系列
优化器 (2)
Muon
Entropy Safe
变分序列级软策略优化
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
RFT的熵动力学分析
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
Stochastic Process
随机过程(11)
马尔可夫链
随机过程(10)
泊松过程(2): 过滤泊松过程
随机过程(9)
泊松过程(1): 泊松分布
随机过程(8)
高斯过程(4): 高斯过程的应用
随机过程(7)
高斯过程(3): 高斯与非线性
随机过程(6)
高斯过程(2): 多元高斯分布
随机过程(5)
高斯过程(1): Gaussian is Everywhere
随机过程(4)
多元相关
随机过程(3)
非平稳随机过程
随机过程(2)
宽平稳随机过程相关函数的时频分析
随机过程(1)
线性相关
Linear Correlation
随机过程(1)
线性相关
Correlation Function
随机过程(2)
宽平稳随机过程相关函数的时频分析
Wide State Stationary
随机过程(2)
宽平稳随机过程相关函数的时频分析
Bochner's Theorem
随机过程(2)
宽平稳随机过程相关函数的时频分析
Wiener-Khintchine Theorem
随机过程(2)
宽平稳随机过程相关函数的时频分析
Cyclostationary Process
随机过程(3)
非平稳随机过程
PAM
随机过程(3)
非平稳随机过程
Orthogonal Increment Process
随机过程(3)
非平稳随机过程
Brownian Motion
随机过程(3)
非平稳随机过程
White Noise
随机过程(3)
非平稳随机过程
Multivariate Correlation
随机过程(4)
多元相关
Whitening
随机过程(4)
多元相关
PCA
随机过程(4)
多元相关
K-L Expansion
随机过程(4)
多元相关
Stieltjes Integral
随机过程(4)
多元相关
Spectral Representation
随机过程(4)
多元相关
Gaussian Process
随机过程(8)
高斯过程(4): 高斯过程的应用
随机过程(6)
高斯过程(2): 多元高斯分布
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Diffusion
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Central Limit Theorem
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Law of Large Numbers
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Random Walk
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Gaussian Distribution
随机过程(6)
高斯过程(2): 多元高斯分布
Isoperimetric Inequality
等周不等式
周长为定值时,面积最大的封闭图形是圆
FlashAttention
FlashAttention 系列
GPU Optimization
FlashAttention 系列
NVIDIA
NVIDIA Nemotron 3 Super 技术解读
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Nemotron
NVIDIA Nemotron 3 Super 技术解读
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Multimodal Alignment
个性化搜索中的知识-动作对齐
KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao
Speculative Decoding
生成式推荐系统中的预测解码加速
Inference Acceleration
生成式推荐系统中的预测解码加速
Real Analysis
实数集的完备性
七个等价表述及其证明
Completeness
实数集的完备性
七个等价表述及其证明
Price Theorem
随机过程(7)
高斯过程(3): 高斯与非线性
Characteristic Function
随机过程(7)
高斯过程(3): 高斯与非线性
Black-Scholes Equation
随机过程(8)
高斯过程(4): 高斯过程的应用
Ito Integral
随机过程(8)
高斯过程(4): 高斯过程的应用
Poisson Process
随机过程(10)
泊松过程(2): 过滤泊松过程
随机过程(9)
泊松过程(1): 泊松分布
Moment Generating Function
随机过程(9)
泊松过程(1): 泊松分布
Gamma Distribution
随机过程(9)
泊松过程(1): 泊松分布
Exponential Distribution
随机过程(9)
泊松过程(1): 泊松分布
Filtering Poisson Process
随机过程(10)
泊松过程(2): 过滤泊松过程
Queueing
随机过程(10)
泊松过程(2): 过滤泊松过程
Markov Chain
随机过程(11)
马尔可夫链
Chapman-Kolmogorov
随机过程(11)
马尔可夫链
Recurrent
随机过程(11)
马尔可夫链
Ergodic Theorem
随机过程(11)
马尔可夫链