Toggle navigation
Zifeng Mai's Blog
Home
About Me
Search
Tags
Tags
Life is like a box of chocolates, you never know what you're gonna get.
Reinforcement Learning
Multi-Armed Bandit
Recommender System
Sequence Modeling
Feature Interaction
ByteDance
Policy Entropy
Renyi Entropy
Policy Gradient
Policy Safety
Finite Markov Decision Process
Bellman Equation
Dynamic Programming
Policy Iteration
Value Iteration
Banach Fixed Point Theorem
Distribution Matching
Generative Modeling
VAE
DDPM
Flow Matching
GAN
Variational Inference
Variational Auto-Encoder
Hierarchical VAE
Denoising Diffusion Probabilistic Model
Energy-based Model
Flow-based Method
Velocity Field
Optimal Transport
Flow Model
Continuity Equation
Instantaneous Change of Variables Formula
Bregman Divergence
Multi-objective Optimization
AUC Optimization
Optimizer
Machine Learning
Deep Learning
Optimization Theory
Kimi
Newton-Schulz
LLM
Entropy Safe
Stochastic Process
Linear Correlation
Correlation Function
Wide State Stationary
Bochner's Theorem
Wiener-Khintchine Theorem
Cyclostationary Process
PAM
Orthogonal Increment Process
Brownian Motion
White Noise
Multivariate Correlation
Whitening
PCA
K-L Expansion
Stieltjes Integral
Spectral Representation
Gaussian Process
Diffusion
Central Limit Theorem
Law of Large Numbers
Random Walk
Reinforcement Learning
RFT的熵动力学分析
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
通过拟合整个奖励分布进行强化学习
FlowRL: Matching Reward Distributions for LLM Reasoning
强化学习基础 (3)
动态规划求解
强化学习基础 (2)
有限马尔可夫过程
强化学习中的熵 (2)
熵安全策略
强化学习中的熵 (1)
策略熵
强化学习基础 (1)
多臂赌博机
Multi-Armed Bandit
强化学习基础 (1)
多臂赌博机
Recommender System
多目标优化对齐
HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Sequence Modeling
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Feature Interaction
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
ByteDance
统一精排阶段的特征交叉和序列建模
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Policy Entropy
RFT的熵动力学分析
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
强化学习中的熵 (2)
熵安全策略
强化学习中的熵 (1)
策略熵
Renyi Entropy
强化学习中的熵 (1)
策略熵
Policy Gradient
强化学习中的熵 (1)
策略熵
Policy Safety
强化学习中的熵 (2)
熵安全策略
Finite Markov Decision Process
强化学习基础 (3)
动态规划求解
强化学习基础 (2)
有限马尔可夫过程
Bellman Equation
强化学习基础 (2)
有限马尔可夫过程
Dynamic Programming
强化学习基础 (3)
动态规划求解
Policy Iteration
强化学习基础 (3)
动态规划求解
Value Iteration
强化学习基础 (3)
动态规划求解
Banach Fixed Point Theorem
强化学习基础 (3)
动态规划求解
Distribution Matching
通过拟合整个奖励分布进行强化学习
FlowRL: Matching Reward Distributions for LLM Reasoning
Generative Modeling
生成模型 (3.3)
Flow Matching
生成模型 (3.2)
Flow Model
生成模型 (3.1)
Flow-based Method
生成模型 (2.1)
Energy-based Model
生成模型 (1.3)
Denoising Diffusion Probabilistic Model
生成模型 (1.2)
Variational Auto-Encoder
生成模型 (1.1)
变分推断
生成模型 (0)
Overview of Deep Generative Modeling
VAE
生成模型 (0)
Overview of Deep Generative Modeling
DDPM
生成模型 (0)
Overview of Deep Generative Modeling
Flow Matching
生成模型 (3.3)
Flow Matching
生成模型 (0)
Overview of Deep Generative Modeling
GAN
生成模型 (0)
Overview of Deep Generative Modeling
Variational Inference
生成模型 (1.1)
变分推断
Variational Auto-Encoder
生成模型 (1.2)
Variational Auto-Encoder
Hierarchical VAE
生成模型 (1.2)
Variational Auto-Encoder
Denoising Diffusion Probabilistic Model
生成模型 (1.3)
Denoising Diffusion Probabilistic Model
Energy-based Model
生成模型 (2.1)
Energy-based Model
Flow-based Method
生成模型 (3.1)
Flow-based Method
Velocity Field
生成模型 (3.1)
Flow-based Method
Optimal Transport
生成模型 (3.1)
Flow-based Method
Flow Model
生成模型 (3.2)
Flow Model
Continuity Equation
生成模型 (3.2)
Flow Model
Instantaneous Change of Variables Formula
生成模型 (3.2)
Flow Model
Bregman Divergence
生成模型 (3.3)
Flow Matching
Multi-objective Optimization
多目标优化对齐
HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation
AUC Optimization
多目标优化对齐
HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation
Optimizer
优化器 (2)
Muon
优化器 (1)
SGD和Adam
Machine Learning
优化器 (1)
SGD和Adam
Deep Learning
优化器 (1)
SGD和Adam
Optimization Theory
优化器 (2)
Muon
优化器 (1)
SGD和Adam
Kimi
优化器 (2)
Muon
Newton-Schulz
优化器 (2)
Muon
LLM
优化器 (2)
Muon
Entropy Safe
RFT的熵动力学分析
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
Stochastic Process
随机过程(5)
高斯过程(1): Gaussian is Everywhere
随机过程(4)
多元相关
随机过程(3)
非平稳随机过程
随机过程(2)
宽平稳随机过程相关函数的时频分析
随机过程(1)
线性相关
Linear Correlation
随机过程(1)
线性相关
Correlation Function
随机过程(2)
宽平稳随机过程相关函数的时频分析
Wide State Stationary
随机过程(2)
宽平稳随机过程相关函数的时频分析
Bochner's Theorem
随机过程(2)
宽平稳随机过程相关函数的时频分析
Wiener-Khintchine Theorem
随机过程(2)
宽平稳随机过程相关函数的时频分析
Cyclostationary Process
随机过程(3)
非平稳随机过程
PAM
随机过程(3)
非平稳随机过程
Orthogonal Increment Process
随机过程(3)
非平稳随机过程
Brownian Motion
随机过程(3)
非平稳随机过程
White Noise
随机过程(3)
非平稳随机过程
Multivariate Correlation
随机过程(4)
多元相关
Whitening
随机过程(4)
多元相关
PCA
随机过程(4)
多元相关
K-L Expansion
随机过程(4)
多元相关
Stieltjes Integral
随机过程(4)
多元相关
Spectral Representation
随机过程(4)
多元相关
Gaussian Process
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Diffusion
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Central Limit Theorem
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Law of Large Numbers
随机过程(5)
高斯过程(1): Gaussian is Everywhere
Random Walk
随机过程(5)
高斯过程(1): Gaussian is Everywhere