Tags - Zifeng Mai's Blog

Reinforcement Learning

RFT的熵动力学分析

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

通过拟合整个奖励分布进行强化学习

FlowRL: Matching Reward Distributions for LLM Reasoning

强化学习基础 (3)

动态规划求解

强化学习基础 (2)

有限马尔可夫过程

强化学习中的熵 (2)

熵安全策略

强化学习中的熵 (1)

策略熵

强化学习基础 (1)

多臂赌博机

Multi-Armed Bandit

强化学习基础 (1)

多臂赌博机

Recommender System

多目标优化对齐

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Sequence Modeling

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Feature Interaction

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

ByteDance

统一精排阶段的特征交叉和序列建模

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Policy Entropy

RFT的熵动力学分析

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

强化学习中的熵 (2)

熵安全策略

强化学习中的熵 (1)

策略熵

Renyi Entropy

强化学习中的熵 (1)

策略熵

Policy Gradient

强化学习中的熵 (1)

策略熵

Policy Safety

强化学习中的熵 (2)

熵安全策略

Finite Markov Decision Process

强化学习基础 (3)

动态规划求解

强化学习基础 (2)

有限马尔可夫过程

Bellman Equation

强化学习基础 (2)

有限马尔可夫过程

Dynamic Programming

强化学习基础 (3)

动态规划求解

Policy Iteration

强化学习基础 (3)

动态规划求解

Value Iteration

强化学习基础 (3)

动态规划求解

Banach Fixed Point Theorem

强化学习基础 (3)

动态规划求解

Distribution Matching

通过拟合整个奖励分布进行强化学习

FlowRL: Matching Reward Distributions for LLM Reasoning

Generative Modeling

生成模型 (3.3)

Flow Matching

生成模型 (3.2)

Flow Model

生成模型 (3.1)

Flow-based Method

生成模型 (2.1)

Energy-based Model

生成模型 (1.3)

Denoising Diffusion Probabilistic Model

生成模型 (1.2)

Variational Auto-Encoder

生成模型 (1.1)

变分推断

生成模型 (0)

Overview of Deep Generative Modeling

VAE

生成模型 (0)

Overview of Deep Generative Modeling

DDPM

生成模型 (0)

Overview of Deep Generative Modeling

Flow Matching

生成模型 (3.3)

Flow Matching

生成模型 (0)

Overview of Deep Generative Modeling

GAN

生成模型 (0)

Overview of Deep Generative Modeling

Variational Inference

生成模型 (1.1)

变分推断

Variational Auto-Encoder

生成模型 (1.2)

Variational Auto-Encoder

Hierarchical VAE

生成模型 (1.2)

Variational Auto-Encoder

Denoising Diffusion Probabilistic Model

生成模型 (1.3)

Denoising Diffusion Probabilistic Model

Energy-based Model

生成模型 (2.1)

Energy-based Model

Flow-based Method

生成模型 (3.1)

Flow-based Method

Velocity Field

生成模型 (3.1)

Flow-based Method

Optimal Transport

生成模型 (3.1)

Flow-based Method

Flow Model

生成模型 (3.2)

Flow Model

Continuity Equation

生成模型 (3.2)

Flow Model

Instantaneous Change of Variables Formula

生成模型 (3.2)

Flow Model

Bregman Divergence

生成模型 (3.3)

Flow Matching

Multi-objective Optimization

多目标优化对齐

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

AUC Optimization

多目标优化对齐

HarmonRank: Ranking-aligned Multi-objective Ensemble for Live-streaming E-commerce Recommendation

Optimizer

优化器 (2)

Muon

优化器 (1)

SGD和Adam

Machine Learning

优化器 (1)

SGD和Adam

Deep Learning

优化器 (1)

SGD和Adam

Optimization Theory

优化器 (2)

Muon

优化器 (1)

SGD和Adam

Kimi

优化器 (2)

Muon

Newton-Schulz

优化器 (2)

Muon

LLM

优化器 (2)

Muon

Entropy Safe

RFT的熵动力学分析

On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Stochastic Process

随机过程(5)

高斯过程(1): Gaussian is Everywhere

随机过程(4)

多元相关

随机过程(3)

非平稳随机过程

随机过程(2)

宽平稳随机过程相关函数的时频分析

随机过程(1)

线性相关

Linear Correlation

随机过程(1)

线性相关

Correlation Function

随机过程(2)

宽平稳随机过程相关函数的时频分析

Wide State Stationary

随机过程(2)

宽平稳随机过程相关函数的时频分析

Bochner's Theorem

随机过程(2)

宽平稳随机过程相关函数的时频分析

Wiener-Khintchine Theorem

随机过程(2)

宽平稳随机过程相关函数的时频分析

Cyclostationary Process

随机过程(3)

非平稳随机过程

PAM

随机过程(3)

非平稳随机过程

Orthogonal Increment Process

随机过程(3)

非平稳随机过程

Brownian Motion

随机过程(3)

非平稳随机过程

White Noise

随机过程(3)

非平稳随机过程

Multivariate Correlation

随机过程(4)

多元相关

Whitening

随机过程(4)

多元相关

PCA

随机过程(4)

多元相关

K-L Expansion

随机过程(4)

多元相关

Stieltjes Integral

随机过程(4)

多元相关

Spectral Representation

随机过程(4)

多元相关

Gaussian Process

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Diffusion

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Central Limit Theorem

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Law of Large Numbers

随机过程(5)

高斯过程(1): Gaussian is Everywhere

Random Walk