Researchr is a web site for finding, collecting, sharing, and reviewing scientific publications, for researchers by researchers. Sign up for an account to create a profile with publication list, tag and review your related work, and share bibliographies with your co-authors. . Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In
Vittorio Ferrari, Martial Hebert,
Cristian Sminchisescu, Yair Weiss, editors, Computer Vision
- ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VII. Volume 11211 of Lecture Notes in Computer Science, pages 833-851, Springer, 2018.
[doi] AbstractAbstract is missing. Content TagsAuthors Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam Spatial pyramid pooling module or encode-decoder structure are used in deepneural networks for semantic segmentation task. The former networks are able toencode multi-scale contextual information by probing the incoming features withfilters or pooling operations at multiple rates and multiple effectivefields-of-view, while the latter networks can capture sharper object boundariesby gradually recovering the spatial information. In this work, we propose tocombine the advantages from both methods. Specifically, our proposed model,DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder moduleto refine the segmentation results especially along object boundaries. Wefurther explore the Xception model and apply the depthwise separableconvolution to both Atrous Spatial Pyramid Pooling and decoder modules,resulting in a faster and stronger encoder-decoder network. We demonstrate theeffectiveness of the proposed model on the PASCAL VOC 2012 semantic imagesegmentation dataset and achieve a performance of 89% on the test set withoutany post-processing. Our paper is accompanied with a publicly availablereference implementation of the proposed models in Tensorflow. Continue reading and listeningStay in the loop.Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings. DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. Current implementation includes the following features: DeepLabv1 [1]: We use atrous convolution to explicitly
control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. DeepLabv2 [2]: We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views. DeepLabv3 [3]: We augment the ASPP module with image-level feature [5, 6] to capture longer range information. We also include batch
normalization [7] parameters to facilitate the training. In particular, we applying atrous convolution to extract output features at different output strides during training and evaluation, which efficiently enables training BN at output stride = 16 and attains a high performance at output stride = 8 during evaluation. DeepLabv3+ [4]: We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object
boundaries. Furthermore, in this encoder-decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime. If you find the code useful for your research, please consider citing our latest works:
In the current implementation, we support adopting the following network backbones:
This directory contains our TensorFlow [11] implementation. We provide codes allowing users to train the model, evaluate results in terms of mIOU (mean intersection-over-union), and visualize segmentation results. We use PASCAL VOC 2012 [12] and Cityscapes [13] semantic segmentation benchmarks as an example in the code. Some segmentation results on Flickr images:
Contacts (Maintainers)
Tables of ContentsDemo:
Running:
Models:
Misc:
Getting HelpTo get help with issues you may encounter while using the DeepLab Tensorflow implementation, create a new question on StackOverflow with the tags "tensorflow" and "deeplab". Please report bugs (i.e., broken code, not usage questions) to the tensorflow/models GitHub issue tracker, prefixing the issue name with "deeplab". References
What is atrous separable convolution?Depthwise Separable Convolution Using Atrous Convolution. (a) and (b), Depthwise Separable Convolution: It factorize a standard convolution into a depthwise convolution followed by a point-wise convolution (i.e., 1×1 convolution), drastically reduces computation complexity.
What is DeepLab v3+?DeepLabv3+ is a state-of-art deep learning model for semantic image segmentation [3], where the goal is to assign semantic labels (such as a person, a dog, a cat and so on) to every pixel in the input image.
Why atrous convolutions are used in DeepLab?Atrous convolution allows us to enlarge the field of view of filters to incorporate larger context. It thus offers an efficient mechanism to control the field-of-view and finds the best trade-off between accurate localization (small field-of-view) and context assimilation (large field-of-view).
What is SegNet?SegNet is a semantic segmentation model. This core trainable segmentation architecture consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network.
|