Dependence of deep learning-based whole organ segmentation on training dataset size in computed tomography (CT) images
ORAL
Abstract
A significant drawback of deep learning-based medical image segmentation is its reliance on large amounts of labeled training data. However, little literature has characterized the dependence of model performance on the amount of training data provided. Here, we examine this dependence in the application of abdominal organ segmentation on patient CT images.
Two public datasets, BTCV (N=30) and VISCERAL.eu (N=20), were used for training. A third dataset, pancreasCT (N=43) was used as an independent test set. Segmentation was performed for five abdominal organs: liver, spleen, kidneys, stomach, and pancreas. Instances of the same CNN were trained on a varying number of randomly selected training scans (N=5-50). The architecture used was DeepMedic, a 3D patch-based CNN. Performance was measured with Dice coefficient, average surface distance, and 95% Hausdorff distance.
We observe that segmentation performance improves with increasing training dataset size, but in some cases plateaus before the whole training set is used. Absolute performance of our model is comparable to literature while minimizing the amount of labelled data required.
This work has implications for optimizing deep learning-based image segmentation pipelines by minimizing time spent on unnecessary dataset labelling.
Two public datasets, BTCV (N=30) and VISCERAL.eu (N=20), were used for training. A third dataset, pancreasCT (N=43) was used as an independent test set. Segmentation was performed for five abdominal organs: liver, spleen, kidneys, stomach, and pancreas. Instances of the same CNN were trained on a varying number of randomly selected training scans (N=5-50). The architecture used was DeepMedic, a 3D patch-based CNN. Performance was measured with Dice coefficient, average surface distance, and 95% Hausdorff distance.
We observe that segmentation performance improves with increasing training dataset size, but in some cases plateaus before the whole training set is used. Absolute performance of our model is comparable to literature while minimizing the amount of labelled data required.
This work has implications for optimizing deep learning-based image segmentation pipelines by minimizing time spent on unnecessary dataset labelling.
–
Presenters
-
Daniel Huff
Medical Physics, University of Wisconsin - Madison
Authors
-
Daniel Huff
Medical Physics, University of Wisconsin - Madison
-
Amy J Weisman
Medical Physics, University of Wisconsin - Madison
-
Robert Jeraj
Medical Physics, University of Wisconsin - Madison, University of Wisconsin - Madison