Cheng Perng Phoo
I am a Postdoctoral Research Scientist at Apple, working with Vladlen Koltun.
Prior to Apple, I received my PhD in Computer Science from Cornell University, advised by Professor Bharath Hariharan.
Email  / 
CV  / 
Resume  / 
Google Scholar
 / 
Github  / 
LinkedIn
(*) I am on the job market for a research scientist position. Please feel free to reach out if you have any openings!
|
|
Research
My research lies at the intersection of computer vision and machine learning.
Specifically, I focus on building perception systems capable of recognizing a wide range of concepts across various problem domains (e.g.,
remote sensing, medical imagery,
self-driving vehicles).
Toward this goal, I have identified three key challenges: label efficiency (STARTUP),
robust deployment (Rote-DA), and multitasking (GRAFT).
Currently, I am working on developing multimodal large language models capable of processing long videos and tackling multiple tasks simultaneously.
Below is a list of my papers. (*) indicates equal contributions. Representative papers are highlighted.
|
|
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery
Hangyu Zhou*, Chia Hsiang Kao*, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala
Conference on Neural Information Processing Systems (NEURIPS), 2024.
TLDR: The largest dataset to investigate cloud removal leveraging temporal and multispectral information.
|
|
Better Monocular 3D Detectors with LiDAR from the Past
Yurong You*, Cheng Perng Phoo*, Carlos Andres Diaz-Ruiz, Katie Luo, Wei-Lun Chao, Mark Campbell, Bharath
Hariharan, Kilian Q. Weinberger
International Conference on Robotics and Automation (ICRA), 2024.
TLDR: Unlabeled LiDAR scans from repeated traversals could be used to improve camera-based 3D object detectors.
|
|
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
Utkarsh Mall*, Cheng Perng Phoo*, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala
International Conference on Learning Representations (ICLR), 2024.
TLDR: We use ground images as intermediary to connect satellite imagery to natural language (encoded using CLIP), yielding VLMs without textual annoations.
|
|
Pre-training LiDAR-based 3D Object Detectors through Colorization
Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q Weinberger, Bharath Hariharan, Wei-Lun Chao
International Conference on Learning Representations (ICLR), 2024.
TLDR: We pre-train a point cloud detector by tasking it to fill in the missing colors within the point cloud.
|
|
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Katie Z Luo*, Zhenzhen Liu*, Xiangyu Chen*, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q. Weinberger
Conference on Neural Information Processing Systems (NEURIPS), 2023.
TLDR: We reframe object discovery as an RL problem and design a reward function to enable faster and more accurate discovery of objects in driving scenes without human supervision.
|
|
Emergent Correspondence from Image Diffusion
Luming Tang*, Menglin Jia*, Qianqian Wang*, Cheng Perng Phoo, Bharath Hariharan
Conference on Neural Information Processing Systems (NEURIPS), 2023.
TLDR: Features from off-the-shelf image diffusion models could be used to identify semantic and geometric correspondence without further training.
|
|
Distilling from Similar Tasks for Transfer Learning on a Budget
Kenneth Borup, Cheng Perng Phoo, Bharath Hariharan
International Conference on Computer Vision (ICCV), 2023.
TLDR: We construct label- and compute-efficient models by identifying and distilling from suitable pre-trained models.
|
|
Unsupervised Adaptation from Repeated Traversals for Autonomous Driving
Yurong You*, Cheng Perng Phoo*, Katie Z Luo*, Travis Zhang, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
Conference on Neural Information Processing Systems (NEURIPS), 2022.
Code
TLDR: Unlabeled LiDAR scans from repeated traversals could be used to disambiguate foreground and background objects, yielding cleaner signals for self-training adaptation.
|
|
Learning to Detect Mobile Objects from LiDAR Scans Without Labels
Yurong You*, Katie Z Luo*, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Code
TLDR: Comparing unlabeled LiDAR scans from multiple traversals on the same location could uncover dynamic LiDAR points that could be used to train a mobile object detector in an unsupervised/self-supervised manner.
|
|
Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data
Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu (Richard) Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
Code
TLDR: Different downstream tasks require different representations pre-trained on synthetic data generated using different configurations (lighting, object poses, etc).
We use reinforcement learning to learn a policy that maps a compact task representation to the appropriate synthetic data configuration.
|
|
Coarsely-labeled Data for Better Few-shot Transfer
Cheng Perng Phoo, Bharath Hariharan
International Conference on Computer Vision (ICCV), 2021.
Code
TLDR: Coarsely-labeled data can be cheap to acquire and can be used to learn a better representation for few-shot learning.
|
|
Self-training for Few-shot Transfer Across Extreme Task Differences
Cheng Perng Phoo, Bharath Hariharan
International Conference on Learning Representations (ICLR) , 2021. Oral (53/2997 Submissions)
Code
TLDR: We can build strong neural representations for novel domains by (self-)training students to replicate pseudo-labels produced by a teacher from another, unrelated problem domain.
|
|
Predicting Risk of Sport-Related Concussion in Collegiate Athletes and Military Cadets: A Machine Learning Approach Using Baseline Data from the CARE Consortium Study
Joel Castellanos, Cheng Perng Phoo, James T. Eckner, Lea Franco, Steven P. Broglio, Mike McCrea, Thomas McAllister, Jenna Wiens
Sports Medicine Journal, 2020
TLDR: Baseline tests conducted on college athletes and military cadets before each semester could contain information for identifying athletes/military cadets who are at a higher risk of experiencing a concussion.
|
|
Heart Sound Classification based on Temporal Alignment Techniques
Jose Javier Gonzalez, Cheng Perng Phoo, Jenna Wiens
Computing In Cardiology (CinC), 2016
Code
TLDR: We use temporal alignment techniques such as dynamic time warping to extract features from heart sound recordings for identifying patients at risk of adverse cardiovascular outcomes.
|
Awards/Services/Experiences
- Award: ICCV 2023 Doctoral Consortium
- Cornell PhD Application Reviewer 2023
- Peer Review: CVPR(2022, 2023, 2024), ECCV (2022, 2024), ICCV 2023, NEURIPS (2023, 2024), ICML 2024
- Research Internships
- Teaching Experiences
- [Cornell] CS4780/5780: Machine Learning for Intelligent Systems [SP18]
* Awarded Outstanding Teaching Assistant Award by the Computer Science Department
- [Cornell] CS4786/5786: Machine Learning for Data Science [FA17]
- [University of Michigan, Ann Arbor] EECS445: Introduction to Machine Learning [WN17]
- [University of Michigan, Ann Arbor] EECS203: Discrete Mathematics [FA15|WN16|FA16]
|
Many thanks to Jon Barron for the awesome template! Some of the icons used in this website are from flaticon.
|
|