Liang Xu (徐良)
liangxuy96 [At] gmail [Dot] com

I'm a third-year PhD student of the joint program between Shanghai Jiao Tong University and Eastern Institute of Technology, Ningbo, supervised by Wenjun Zeng and Xiaokang Yang. I also work closely with Xin Jin and Yichao Yan.

Prior to that, I received my B.S. in Computer Science from Nanjing University in 2018, and the M.S. in Computer Science from Shanghai Jiao Tong University in 2021, supervised by Cewu Lu. I was also fortunate to work with Wenjun Zeng and Cuiling Lan as a research intern in Intelligent Multimedia Group at Microsoft Research Asia.

Email  /  Google Scholar  /  Github  /  Twitter  /  知乎

profile photo
Research

I have broad research interests on human-centric vision, including modeling human motion, human-centric interactions and embodied intelligence.

Selected Publications & Projects

* denotes equal contribution

PontTuset MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations
Liang Xu, Shaoyang Hua, Zili Lin, Yifan Liu, Feipeng Ma, Yichao Yan, Xin Jin, Xiaokang Yang, Wenjun Zeng
arXiv, 2024 [Paper] [Code]

We tackle the problem of how to build and benchmark a large motion model with video action datasets and disentangled rule-based annotations.

PontTuset HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Xintao Lv*, Liang Xu*, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang
ECCV, 2024 [Paper] [Project] [Code]

We build a large-scale dataset of human interacting with multiple objects with 3.3K HOI sequences and 4.08M frames.

PontTuset Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang
CVPR, 2024 [Paper] [Project] [Code]

We propose Inter-X, a large-scale dataset of human-human interactions with 11K interaction sequences and more than 8.1M frames.

PontTuset ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng
CVPR, 2024 [Paper] [Project] [Code]

We propose the first multi-setting human action-reaction synthesis benchmark to generate human reactions conditioned on given human actions.

PontTuset ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation
Liang Xu*, Ziyang Song*, Dongliang Wang, Jing Su, Zhicheng Fang, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xiaokang Yang, Wenjun Zeng, Wei Wu
ICCV, 2023 [Paper] [Project] [Code]

We present a GAN-based Transformer for general action-conditioned 3D human motion generation, including single-person actions and multi-person interactive actions.

PontTuset Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition
Liang Xu, Cuiling Lan, Wenjun Zeng, Cewu Lu
TMM, 2022 [Paper]

We propose a joint learning framework for mutually assisted interacted object localization and human action recognition based on skeleton data.

PontTuset PAL-Net: Predicate-Aware Learning Network for Visual Relationship Recognition
Liang Xu, Yong-Lu Li, Mingyang Chen, Yan Hao, Cewu Lu
ICME, 2021 (Oral Presentation)

We propose a novel and concise perspective called "predicate-aware learning network (PAL-Net)" for visual relationship recognition.

PontTuset PaStaNet: Toward Human Activity Knowledge Engine
Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu
CVPR, 2020 TPAMI, 2023 [Paper] [Code] [Project]

We build a large-scale knowledge base PaStaNet, which contains 7M+ PaSta annotations. We infer PaStas first and then reason out the activities based on part-level semantics

PontTuset Transferable Interactiveness Prior for Human-Object Interaction Detection
Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yan-Feng Wang, Cewu Lu
CVPR, 2019 TPAMI, 2022 [Paper] [Code]

We explore Interactiveness Knowledge which indicates whether human and object interact with each other or not for Human-Object Interaction (HOI) Detection.

Experience

Shanghai Jiao Tong University & Eastern Institute of Technology, Ningbo, Shanghai/Ningbo, China
PhD in Computer Science, Sept. 2022 - Present

WeChat, Tencent Inc., Beijing, China
Research Intern, Jan. 2023 - Feb. 2024

SenseTime Technology Development Co., Ltd., Shanghai, China
Computer Vision Researcher, Jun. 2021 - Sept. 2022

Microsoft Research Asia, Beijing, China
Research Intern at Intelligent Multimedia Group, Jul. 2020 - Feb. 2021

Shanghai Jiao Tong University, Shanghai, China
Master of Science and Technology in Computer Science, Sept. 2018 - Mar. 2021

Nanjing University, Nanjing, China
Bachelor of Science in Computer Science, Sept. 2014 - Jun. 2018

Service

Reviewer

  • CVPR'23/24/25
  • NeurIPS Datasets and Benchmarks Track'22/23/24
  • ECCV'24
  • ICLR'25
  • WACV'23/24/25
  • IEEE Transactions on Multimedia
  • IJCV



Last updated: Dec 2024.

Thanks for the awesome design!