Cheng Perng Phoo

I am a research scientist specializing in multimodal perception and foundation models for real-world applications. My work focuses on learning from limited supervision and unlabeled data across LiDAR, vision, language, and temporal modalities. I have led research efforts resulting in publications at NeurIPS, ICLR, CVPR, ICCV, and ICRA, with experience translating research ideas into large-scale industrial systems at Waymo.

Email / CV / Resume / Google Scholar / Github / LinkedIn

Publications

Below is a list of my papers. (*) indicates equal contributions. Representative papers are highlighted.

	MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing Shreelekha Revankar, Utkarsh Mall, Cheng Perng Phoo, Kavita Bala, and Bharath Hariharan Conference on Neural Information Processing Systems (NEURIPS), 2025. TLDR: We construct a large-scale multimodal dataset for natural incidents using remote sensing data and news articles.
	DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. TLDR: A neurosymbolic framework to learn programs explaining visual observations in visual spatial scientific domains.
	Scale-Aware Recognition in Satellite Images under Resource Constraints Shreelekha Revankar, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala International Conference on Learning Representations (ICLR), 2025. TLDR: We introduce a new approach to scale-aware recognition in satellite imagery under resource constraints.
	AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery Hangyu Zhou, Chia Hsiang Kao, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala Conference on Neural Information Processing Systems (NEURIPS), 2024. TLDR: The largest dataset to investigate cloud removal leveraging temporal and multispectral information.
	Better Monocular 3D Detectors with LiDAR from the Past Yurong You, Cheng Perng Phoo, Carlos Andres Diaz-Ruiz, Katie Luo, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger International Conference on Robotics and Automation (ICRA), 2024. TLDR: Unlabeled LiDAR scans from repeated traversals could be used to improve camera-based 3D object detectors.
	Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala International Conference on Learning Representations (ICLR), 2024. TLDR: We use ground images as intermediary to connect satellite imagery to natural language (encoded using CLIP), yielding VLMs without textual annoations.
	Pre-training LiDAR-based 3D Object Detectors through Colorization Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q Weinberger, Bharath Hariharan, Wei-Lun Chao International Conference on Learning Representations (ICLR), 2024. TLDR: We pre-train a point cloud detector by tasking it to fill in the missing colors within the point cloud.
	Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q. Weinberger Conference on Neural Information Processing Systems (NEURIPS), 2023. TLDR:* We reframe object discovery as an RL problem and design a reward function to enable faster and more accurate discovery of objects in driving scenes without human supervision.
	Emergent Correspondence from Image Diffusion Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, Bharath Hariharan Conference on Neural Information Processing Systems (NEURIPS), 2023. TLDR:* Features from off-the-shelf image diffusion models could be used to identify semantic and geometric correspondence without further training.
	Distilling from Similar Tasks for Transfer Learning on a Budget Kenneth Borup, Cheng Perng Phoo, Bharath Hariharan International Conference on Computer Vision (ICCV), 2023. TLDR: We construct label- and compute-efficient models by identifying and distilling from suitable pre-trained models.
	Unsupervised Adaptation from Repeated Traversals for Autonomous Driving Yurong You, Cheng Perng Phoo, Katie Z Luo, Travis Zhang, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger Conference on Neural Information Processing Systems (NEURIPS), 2022. Code TLDR:* Unlabeled LiDAR scans from repeated traversals could be used to disambiguate foreground and background objects, yielding cleaner signals for self-training adaptation.
	Learning to Detect Mobile Objects from LiDAR Scans Without Labels Yurong You, Katie Z Luo, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Code TLDR: Comparing unlabeled LiDAR scans from multiple traversals on the same location could uncover dynamic LiDAR points that could be used to train a mobile object detector in an unsupervised/self-supervised manner.
	Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu (Richard) Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Code TLDR: Different downstream tasks require different representations pre-trained on synthetic data generated using different configurations (lighting, object poses, etc). We use reinforcement learning to learn a policy that maps a compact task representation to the appropriate synthetic data configuration.
	Coarsely-labeled Data for Better Few-shot Transfer Cheng Perng Phoo, Bharath Hariharan International Conference on Computer Vision (ICCV), 2021. Code TLDR: Coarsely-labeled data can be cheap to acquire and can be used to learn a better representation for few-shot learning.
	Self-training for Few-shot Transfer Across Extreme Task Differences Cheng Perng Phoo, Bharath Hariharan International Conference on Learning Representations (ICLR) , 2021. Oral (53/2997 Submissions) Code TLDR: We can build strong neural representations for novel domains by (self-)training students to replicate pseudo-labels produced by a teacher from another, unrelated problem domain.
	Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study Joel Castellanos, Cheng Perng Phoo, James T. Eckner, Lea Franco, Steven P. Broglio, Mike McCrea, Thomas McAllister, Jenna Wiens Sports Medicine Journal, 2020 TLDR: Baseline tests conducted on college athletes and military cadets before each semester could contain information for identifying athletes/military cadets who are at a higher risk of experiencing a concussion.
	Heart Sound Classification based on Temporal Alignment Techniques Jose Javier Gonzalez, Cheng Perng Phoo, Jenna Wiens Computing In Cardiology (CinC), 2016 Code TLDR: We use temporal alignment techniques such as dynamic time warping to extract features from heart sound recordings for identifying patients at risk of adverse cardiovascular outcomes.

Awards/Services/Experiences

Award: ICCV 2023 Doctoral Consortium
Cornell PhD Application Reviewer 2023
Peer Review: CVPR(2022, 2023, 2024, 2025), ECCV (2022, 2024), ICCV 2023, NEURIPS (2023, 2024), ICML 2024, ICLR (2024, 2025)
Research Internships
- Meta FAIR Accel team. Advisors: Rama Kovvuri, Effrosyni Mavroudi, Kevin Liang, Huiyu Wang, Jing Huang. June 2022 - Aug 2022.
- MIT-IBM Watson AI Lab. Advisors: Rogerio Feris, Kate Saenko, Chun-Fu (Richard) Chen, Rameswar Panda. June 2021 - Dec 2021.
Teaching Experiences
- [Cornell] CS4780/5780: Machine Learning for Intelligent Systems [SP18] * Awarded Outstanding Teaching Assistant Award by the Computer Science Department
- [Cornell] CS4786/5786: Machine Learning for Data Science [FA17]
- [University of Michigan, Ann Arbor] EECS445: Introduction to Machine Learning [WN17]
- [University of Michigan, Ann Arbor] EECS203: Discrete Mathematics [FA15|WN16|FA16]

Many thanks to Jon Barron for the awesome template! Some of the icons used in this website are from flaticon.