Results


Scene Parsing Results

Team Name Entry Average of Pixel Acc. & Mean IOU
CASIA_IVA_JD 5 0.55471
CASIA_IVA_JD 1 0.55451
WinterIsComing 4 0.55439
WinterIsComing 3 0.55432
CASIA_IVA_JD 2 0.55397
WinterIsComing 2 0.55243
WinterIsComing 1 0.55199
CASIA_IVA_JD 4 0.55178
CASIA_IVA_JD 3 0.55171
Xiaodan Liang 1 0.54473
WinterIsComing 5 0.53075
Jiancheng Li 2 0.52942
Jiancheng Li 1 0.52878
AISEG 1 0.52756
Jiancheng Li 3 0.52355
G-RMI 1 0.52194
G-RMI 5 0.52038
AVL 2 0.51631
G-RMI 4 0.51619
AVL 1 0.51455
G-RMI 2 0.51395
G-RMI 3 0.50506
Semantic Encoding Network 1 0.50450
WenjingBUAA 2 0.45319
IIP 1 0.44288
WenjingBUAA 1 0.25669

Instance Segmentation Results

Team Name Entry Mean AP
Megvii (Face++) 3 0.29772
Megvii (Face++) 2 0.29542
Megvii (Face++) 1 0.27717
G-RMI 1 0.24150
G-RMI 5 0.24131
G-RMI 2 0.23877
G-RMI 3 0.22913
G-RMI 4 0.22544
BlueSky 2 0.15514
BlueSky 1 0.15503
CASIA_IVA_JD 3 0.11171
CASIA_IVA_JD 1 0.11103


Notice: Each team allows to submit at most 5 entries. Results are ranked by the score of each entry.



Scene Parsing Team Descriptions

Team Name Team members Description
CASIA_IVA_JD Jun Fu (1), Jing Liu (1), Longteng Guo (1), Haijie Tian (1), Fei Liu (1), Yong Li (2), Yongjun Bao (2), Weipeng Yan (2), Hanqing Lu (1)

(1) National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
(2) Modeling Team-Business Growth BU-JD
We implement image semantic segmentation based on the fused result of the three deep models: SDN[1], Deeplabv3[2], Resnet101 as base model, and ResNet38[3]. During the training processes, all these models are pretrained only on ImageNet dataset. Hierarchical supervision and image level classification information are jointly used to improve the performance. Besides, the common tricks are used including multi-scale data argument in training stages, and multi-scale testing inputs in inference stages.

[1] Fu J, Liu J, Wang Y, et al. Stacked Deconvolutional Network for Semantic Segmentation[J]. arXiv preprint arXiv:1708.04943, 2017.
[2] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.
[3] Wu Z, Shen C, Hengel A. Wider or deeper: Revisiting the resnet model for visual recognition[J]. arXiv preprint arXiv:1611.10080, 2016.
WinterIsComing Riwei Chen, Qi Chen, Xinglong Wu, Yifan Lu, Yudong Jiang, Linfu Wen
Toutiao AI Lab, ByteDance
-
Jiancheng Li Jiancheng Li (Tsinghua University) We mainly utilized several model fusion algorithms on some effective semantic segmentation models like FCN[1,2], DeepLab[3,4,5] and DilatedNet[6,7]. and used CRF to get more context features.

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[2] Shelhamer, Evan, Jonathan Long, and Trevor Darrell. Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence 39.4 (2017): 640-651.
[3] Chen, Liang-Chieh, et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv preprint arXiv:1412.7062 (2014).
[4] Chen, Liang-Chieh, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915 (2016).
[5] Chen, Liang-Chieh, et al. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
[6] Yu, Fisher, and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
[7] Yu, Fisher, Vladlen Koltun, and Thomas Funkhouser. Dilated residual networks. arXiv preprint arXiv:1705.09914 (2017).
AISEG Bingke Zhu(1), Yingying Chen(1), Jinqiao Wang(1), Ming Tang(1)

(1) CASIA
We developed a fully convolutional network based on the ideas of PSPNet [1], dilated residual networks [2] and HED [3]. PSPNet provide a rich semantic information for pixel-wise classification, and dilated residual network is helpful for degridding problem, while HED can deal with the detail information and provide a deep supervise.

[1] Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang and Jiaya Jia. Pyramid Scene Parsing Network. CoRR abs/1612.01105.
[2] Funkhouser, T.A., Koltun, V., & Yu, F. (2017). Dilated Residual Networks. CoRR, abs/1705.09914. [3] Xie, Saining and Zhuowen Tu. Holistically-Nested Edge Detection. 2015 IEEE International Conference on Computer Vision (ICCV) (2015): 1395-1403.
G-RMI Alireza Fathi, Nori Kanazawa, Kai Yang, Kevin Murphy. (Google Research and Machine Intelligence) A model based on DeepLab
AVL Piotr Bilinski (1) Victor Prisacariu (1)

(1) University of Oxford
We use an encoder-decoder architecture that employs ResNet building blocks, context aggregation and multi-level fusion. We initialize the layer weights of the encoder using those from the ResNet model, which was pre-trained on ImageNet. Then, we train our model on the ADE20K data only.
Semantic Encoding Network Hang Zhang (1); Jerry Zhang (2)

(1) Rutgers University
(2) Amazon
A single model using pre-trained ResNet50 leverages the global semantic encoding for segmentation. This experiment was done using only 4 GPUs.
WenjingBUAA Wenjing KE (Beihang University) Mainly use the pretrained dilated-FCN model, add the Conditional Random Field to the network according to paper CRF as RNN for end-to-end training, fine-tune the model.
IIP Wanjie Sun, Wentao Bao, Han Zhu, Yiming Li, Yaosi Hu, Zhenzhong Chen (all members are students or professors of Whuhan University) Cascade dilated-net with dense multi-label optimization.

[1] Zhou, Bolei, et al. Semantic understanding of scenes through the ADE20K dataset. arXiv preprint arXiv:1608.05442 (2016).
[2] Shen, Tong, et al. Learning Multi-level Region Consistency with Dense Multi-label Networks for Semantic Segmentation. arXiv preprint arXiv:1701.07122 (2017).


Instance Segmentation Team Descriptions

Team Name Team members Description
Megvii (Face++) *Tete Xiao (1,2), *Ruixuan Luo (1,2), *Borui Jiang (1,2), Shuai Shao (1), Yuning Jiang (1), Yadong Mu (2), Jieqi Shi (1, 2), Chi Zhang (1), Jian Sun (1)

*: Equal contribution

(1) Megvii Research
(2) Peking University
We first trained a modified Mask-RCNN[1] with context modules. Then, we chopped the detection branch and inserted four more layers into mask branch. The new mask branch was randomly initialized thereafter and the whole network was fine-tuned with sophistically designed regression target of mask branch. For both training stages, we used very large batch size via synchronic parallel training (up to 96 GPUs) on Megvii (Face++)‰Ûªs large-scale deep learning framework, MegBrain. We also applied multi-scale and horizontal flip at test time. The final result was obtained by ensemble of three modified ResNet[2] and ResNeXt[3] networks.

[1] Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick. Mask RCNN. ICCV 2017.
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
[3] Xie S, Girshick R, Dollar P, et al. Aggregated residual transformations for deep neural networks.
G-RMI Alireza Fathi, Nori Kanazawa, Kai Yang, Kevin Murphy. (Google Research and Machine Intelligence) A model based on Mask R-CNN.
BlueSky Yongcheng Liu (National Laboratory of Pattern Recognition(NLPR), Institute of Automation, Chinese Academy of Sciences) We enhance the FCIS model proposed by Dai et. al, and use the only one model for competition
CASIA_IVA_JD Jun Fu (1), Jing Liu (1), Longteng Guo (1), Haijie Tian (1), Fei Liu (1), Yong Li (2), Yongjun Bao (2), Weipeng Yan (2), Hanqing Lu (1)

(1) National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
(2) Modeling Team-Business Growth BU-JD
We implement instance segmentation by jointly explore the results of semantic segmentation (outputs of the first task in this challenge) and Faster rcnn based object detection. The object region within certain a bounding box is considered as an instance segmentation. Meanwhile, we adopt the results of FCIS [1] to refine instance segmentation result of the overlap region of the same label. The common tricks are used: Multi-scale data argumentation and anchor argumentation are used.

[1] Li Y, Qi H, Dai J, et al. Fully convolutional instance-aware semantic segmentation[J]. arXiv preprint arXiv:1611.07709, 2016.