Improved Pattern Recognition in Agricultural Applications using Self-Supervised Methods for High-Resolution Longitudinal Remote Sensing Data
ORAL
Abstract
Deep learning approaches thrive in high-data regimes, however, acquiring sufficient annotations is a major bottleneck in most applications. This is particularly extreme in remote sensing applications where data is collected at petabyte scale, but only a small fraction is annotated. While there have been recent efforts to collect large agricultural datasets, even these capture only a miniscule fraction of the data available.
We first collect a dataset of high-resolution (10cm/pixel) aerial imagery over farm parcels in the US Midwest multiple times over the course of the season to create a longitudinal dataset over 4TB in size. We then train current SOTA self-supervised methods based on MoCo v2 (Chen, X. et al. 2020) with pixel consistency (Xie, Z., et al 2021) and temporal structure (Manas, O., et al. 2021) and evaluate performance on a classification and segmentation task based on the large Agriculture-Vision dataset (Chiu et al. 2020), as well as a much smaller fine-grained segmentation task for agriculture. Finally, we extend these approaches to better capture the invariances in the data through a conditional layer after the encoder and leverage the work of the vision transformer architecture (Dosovitskiy, A., et al. 2021).
We first collect a dataset of high-resolution (10cm/pixel) aerial imagery over farm parcels in the US Midwest multiple times over the course of the season to create a longitudinal dataset over 4TB in size. We then train current SOTA self-supervised methods based on MoCo v2 (Chen, X. et al. 2020) with pixel consistency (Xie, Z., et al 2021) and temporal structure (Manas, O., et al. 2021) and evaluate performance on a classification and segmentation task based on the large Agriculture-Vision dataset (Chiu et al. 2020), as well as a much smaller fine-grained segmentation task for agriculture. Finally, we extend these approaches to better capture the invariances in the data through a conditional layer after the encoder and leverage the work of the vision transformer architecture (Dosovitskiy, A., et al. 2021).
–
Presenters
-
Jing Wu
University of Illinois at Urbana-Champaign
Authors
-
Jennifer Hobbs
Intelinair, Northwestern University
-
Jing Wu
University of Illinois at Urbana-Champaign
-
David Pichler
Intelinair
-
Daniel Marley
Intelinair