Wei XIONG
PhD Student
Phone: (852) Email: wxiongae@connect.ust.hk Office: Room
Papers and Preprints
(* equal contribution or alphabetical order)
-
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong*, Wei Xiong*, Deepanshu Goyal, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum and Tong Zhang, Preprint.
-
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [Slide]
Han Zhong*, Wei Xiong*, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang and Tong Zhang, Preprint.
-
Reward Teaching for Federated Multi-Armed Bandits
Chengshuai Shi, Wei Xiong, Cong Shen, and Jing Yang, IEEE International Symposium on Information Theory (ISIT 2023)
-
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
Chengshuai Shi, Wei Xiong, Cong Shen, and Jing Yang, ICML 2023.
-
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chenlu Ye, Wei Xiong, Quanquan Gu and Tong Zhang, ICML 2023.
-
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game [Slide]
Wei Xiong*, Han Zhong*, Chengshuai Shi, Cong Shen, Liwei Wang, and Tong Zhang, ICLR 2023.
-
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Game
Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, and Tong Zhang, ICML 2022.
-
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
Han Zhong*, Wei Xiong*, Jiyuan Tan*, Liwei Wang, Tong Zhang, Zhaoran Wang, and Zhuoran Yang, ICML 2022. -
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction [Slide]
Haishan Ye*, Wei Xiong*, and Tong Zhang, Under Minor Revision at TPAMI.
-
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization [Code]
Chengshuai Shi, Wei Xiong, Cong Shen, and Jing Yang, NeurIPS, 2021.
-
(Almost) Free Incentivized Exploration from Decentralized Learning Agents [Code]
Chengshuai Shi, Haifeng Xu, Wei Xiong, and Cong Shen, NeurIPS, 2021.
-
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
Pushi Zhang, Xiaoyu Chen, Li Zhao, Wei Xiong, Tao Qin, and Tie-Yan Liu, NeurIPS, 2021.
-
Decentralized multi-player multi-armed bandits with no collision information [Code*]
Chengshuai Shi, Wei Xiong, Cong Shen, and Jing Yang, AISTATS, 2020.