Linn Chong: No labels required: Zero-shot segmentation with foundation models

Bionote

Linn Chong has been a Ph.D. student at the Center of Robotics at the University of Bonn since November 2021, under the supervision of Prof. Cyrill Stachniss. Her primary research is in computer vision for agricultural robots, focusing on deep learning approaches.
Before pursuing her PhD, she worked as a Research Engineer at the National University of Singapore (NUS) from 2017 to 2021. She graduated with a Master’s in Engineering and a Bachelor of Engineering from the Department of Mechanical Engineering of NUS in 2020 and 2017, respectively.

Agricultural robots apply deep learning methods to perform perception tasks such as locating and identifying weeds in the field.
However, these deep learning approaches are expensive and time-consuming to develop because they require in-domain labels of the field.
To overcome this problem, we propose a way to distinguish and segment weeds from crop plants in RGB images of the field, without any labeling required. Instead of labeling, we leverage foundation models, specifically SAM and CLIP, which we use out-of-the-box without additional training. In particular, we propose a novel way of prompting SAM to obtain vegetation segments and encode SAM segments into the CLIP feature space.
We also propose a new heuristic to distinguish between crops and weeds, where crops are the more common and consistent plant appearances. In this talk, we explain how to apply this heuristic to separate crop-weed CLIP features, and obtain crop-weed semantic segmentation without any labels.

Co-Authors: Lucas Nunes, Federico Magistri, Xingguang Zhong, Jens Behley, Cyrill Stachniss