Email: xit31 [at] pitt [dot] edu
I am a Research Scientist at Meta (GenAI Llama; previously FAIR). My recent research interests broadly lie in data and in understanding the fundamental principles of large-scale data curation and scaling. I’ve had the opportunity to work as a Core Contributor across several major projects, including Llama 2/3/4, Code Llama, MetaCLIP 1/1.2, and Chameleon. I also published first-authored papers at top conferences such as NeurIPS, ICML, and EMNLP.
Prior to that, I obtained my Ph.D. in Biostatistics at University of Pittsburgh in 2022, advised by Lu Tang and Gong Tang. My research focused on developing novel statistical and machine learning methods in causal inference, data integration, and decision fairness. I was a visiting student in Computer Science at Carnegie Mellon University from 2019 to 2021. I obtained my B.S. in Pharmacy and Computer Science at Sun Yat-sen University in 2018.
RISE: Robust Individualized Decision Learning with Sensitive Variables
Tan, X., Qi, Z., Seymour, C., Tang, L.
Advances in Neural Information Processing Systems (NeurIPS) 2022
** Distinguished Student Paper Award for the ENAR 2023 Spring Meeting
[Paper] [Code] [Video]
A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources
Tan, X., Chang, C., Zhou, L., Tang, L.
International Conference on Machine Learning (ICML) 2022
** Winner of the Student Research Award at the 35th New England Statistics Symposium
** Honorable Mention Award at JSM 2021 Student Paper Competition
[Paper] [Code] [Video]
Identifying Principal Stratum Causal Effects Conditional on a Post-treatment Intermediate Response
Tan, X., Abberbock, J., Rastogi, P., Tang, G.
Causal Learning and Reasoning (CLeaR) 2022
[Paper] [Code]
A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources
The 35th New England Statistics Symposium (NESS) 2022, Storrs, CT
Improving personalized causal inference with information borrowed from heterogeneous data sources
The 14th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics) 2021, King's College London, UK