Fusion for Reducing Domain Specificity in Computer Vision Models
ORAL
Abstract
The dream of computer vision is to enable autonomous visual processing of everything that can be seen in the world. Toward this end, the research community has focused recently on creating single, monolithic neurally-inspired models (given some task) which on average outperform the competition on some set of benchmarks, regardless of visual domain. The conventional wisdom has been that to make a neural algorithm perform simultaneously well in several domains, a single neural network should be trained on data from all domains of interest. More recently, synthetic datasets like FlyingThings3D, which contains random everyday objects from many domains flying along random trajectories through space, have attempted to reduce domain specificity by doing away entirely with scene structure. In this work we propose that the answer to the challenge of creating general perception systems is to recognise that different models will have different domains in which they perform well, and to fuse the estimates produced by separate perception models that are each "experts" in their own domains. We present a design paradigm for general model-fusion systems, and evaluate both quantitative and qualitative performance of such systems on image classification and segmentation.
–
Publication: There will eventually be a publication that substantially extends this work.
Presenters
-
Laura E Brandt
MIT CSAIL
Authors
-
Laura E Brandt
MIT CSAIL
-
Nicholas Roy
MIT CSAIL