基于注意力引导和多样本决策的舰船检测方法

吕奕龙; 苟瑶; 李敏; 何玉杰; 邢宇航

doi:10.13700/j.bh.1001-5965.2022.1004

基于注意力引导和多样本决策的舰船检测方法

doi: 10.13700/j.bh.1001-5965.2022.1004

火箭军工程大学作战保障学院，西安 710025

基金项目:

国家自然科学基金(62006240)

详细信息

通讯作者:
E-mail：limin301908@163.com

中图分类号: TP37；TP753
计量
- 文章访问数: 394
- HTML全文浏览量: 167
- PDF下载量: 16
- 被引次数: 0
出版历程
- 收稿日期: 2022-12-20
- 录用日期: 2023-03-10
- 网络出版日期: 2023-03-24
- 整期出版日期: 2025-01-31

Ship detection method based on attentional guidance and multi-sample decision

College of Operational Support，Rocket Force University of Engineering，Xi’an 710025，China

Funds:

National Natural Science Foundation of China (62006240)

More Information

Corresponding author: E-mail：limin301908@163.com

摘要

摘要:
单阶段目标检测方法具有训练速度快、检测时间短的特点，然而其特征金字塔网络(FPN)难以抑制合成孔径雷达(SAR)舰船图像的背景和噪声信息，且检测头存在预测误差。针对该问题，提出一种基于注意力引导和多样本决策的检测方法，用于SAR舰船检测。提出一种注意力引导网络，将其添加至特征金字塔的最高层，使其抑制背景和噪声干扰，从而提升特征的表示能力。提出多样本决策网络，使其参与目标位置的预测。该网络通过增加回归分支中输出的样本数量，缓解预测误差对检测结果的影响。设计了一种新颖的最大似然损失函数。该损失函数利用多样本决策网络中输出的样本构造出最大似然函数，用于规范决策网络的训练，进一步提升目标定位的精度。以RetinaNet网络模型为基线方法，相较于基线方法及目前先进的目标检测方法，所提方法在舰船检测数据集SSDD上表现出最高的检测精度，AP达到52.8 %。相比基线方法，所提方法在AP评价指标上提升了3.4%～5.7%，且训练参数量仅增加2.03×10⁶，帧率仅降低0.5帧/s。
- 舰船检测 /
- 注意力引导 /
- 多样本决策 /
- 最大似然损失函数 /
- 单阶段检测 /
- 合成孔径雷达
Abstract:
The ones-stage object detection method has the characteristics of fast training speed and short inference time. However, its feature pyramid network (FPN) cannot suppress the background and noise information of the synthetic aperture radar (SAR) ship image, and the detection head has a prediction bias. This paper proposes a detection model based on attention guidance and multi-sample decisions for SAR ship detection. Firstly, in order to improve feature representation, this study suggests adding an attentional guidance network to the top of the feature pyramid in order to decrease noise and background interference. Secondly, Multi-sample decision networks are proposed to participate in predicting ship locations. By increasing the amount of output samples in regression branches, the network reduces the impact of prediction bias on detection outcomes. Finally, a novel maximum likelihood loss function is designed. The loss function constructs the maximum likelihood function from the output samples of multiple decision networks, which is used to standardize the training of decision networks and further improve the accuracy of target positioning. Compared with RetinaNet and current advanced object detection methods, the proposed method shows higher detection accuracy on the SSDD dataset, with AP up to 54%. Compared with the baseline method, the SARetinaNet method improved the AP evaluation index by 3.4%～5.7%, the number of training parameters Params only increased by 2.03M, and the FPS only increased by 0.5iter/s.
- ship detection /
- attentional guidance /
- multi-sample decision /
- maximum likelihood loss function /
- one-stage detection /
- synthetic aperture radar

HTML全文

图 1 特征金字塔网络架构

Figure 1. Structure of the feature pyramid network

下载: 全尺寸图片幻灯片

图 2 回归分支上16个输出结果的统计图

Figure 2. Statistical graph of 16 output results on the regression branch

下载: 全尺寸图片幻灯片

图 3 本文方法的结构

Figure 3. Structure of the proposed method

下载: 全尺寸图片幻灯片

图 4 注意力引导网络结构

Figure 4. Structure of attention-guided network

下载: 全尺寸图片幻灯片

图 5 本文方法与RetinaNet方法的结构

Figure 5. Structure of the proposed method and RetinaNet method

下载: 全尺寸图片幻灯片

图 6 本文方法与RetinaNet方法在图像上的关注点

Figure 6. Attention of the proposed method and RetinaNet method on some images

下载: 全尺寸图片幻灯片

图 7 多样本决策网络上16个输出结果的统计图

Figure 7. Statistical graph of 16 output results on the multi-samples decision network

下载: 全尺寸图片幻灯片

图 8 不同方法的检测结果

Figure 8. Results of different methods

下载: 全尺寸图片幻灯片

表 1 实验硬件环境

Table 1. Experimental hardware environment

类别	环境条件
CPU	intel(R) xeon(R) silver 4110
显卡	TITAN RTX (24 GB)
操作系统	Ubuntu 18.04
深度学习框架	Pytorch 1.6.0
CUDA版本	CUDA 11.2
cuDNN版本	cuDNN 7.4.2
运行环境	Pycharm 2021.01
脚本语言	Python3.7

下载: 导出CSV

表 2 不同样本量的影响

Table 2. Effect of different sample size

n	AP	参数量	浮点运算次数
1	49.6	36.88×10⁶	205.13×10⁹
16	52.8	38.13×10⁶	231.68×10⁹
30	48	39.29×10⁶	256.46×10⁹
50	12.8	40.95×10⁶	291.86×10⁹
100	—	45.1×10⁶	380.36×10⁹

下载: 导出CSV

表 3 消融实验结果

Table 3. Results of ablation experiments

注意力引导网络	多样本决策网络	最大似然损失函数	AP	参数量	浮点运算次数	帧率/ （帧·s⁻¹）
×	×	×	48.8	36.1×10⁶	204.36×10⁹	16.2
√	×	×	49.6	36.88×10⁶	205.13×10⁹	15.5
×	√	×	50.2	37.35×10⁶	230.91×10⁹	15.6
×	√	√	51.3	37.35×10⁶	230.91×10⁹	15.7
√	√	√	52.8	38.13×10⁶	231.68×10⁹	15.7

下载: 导出CSV

表 4 特征金字塔输出通道数对检测性能的影响

Table 4. Influence of the number of output channels of feature pyramid on the detection performance

输出通道数	AP	参数量	浮点运算次数
64	52.2	30.02×10⁶	196.71×10⁹
128	48.3	32.17×10⁶	206.55×10⁹
256	52.8	37.35×10⁶	230.91×10⁹
512	52	51.24×10⁶	298.26×10⁹

下载: 导出CSV

表 5 检测头卷积层数对检测性能的影响

Table 5. Influence of convolution layer number of detection head on detection performance

检测头卷积层数	AP	参数量	浮点运算次数
1	53.7	33.81×10⁶	155.36×10⁹
2	53.6	34.99×10⁶	180.54×10⁹
3	52.6	36.17×10⁶	205.73×10⁹
4	52.8	37.35×10⁶	230.91×10⁹

下载: 导出CSV

表 6 消融实验结果

Table 6. Results of ablation experiments

方法	骨干网络类型	训练策略	AP	AP50	AP75	APS	APM	APL
RetinaNet	ResNet-50	1×	48.8	86.7	49.5	46.2	56	31.2
RetinaNet	ResNet-50	2×	53.8	91.5	58.1	49.6	63	38.6
RetinaNet	ResNet-101	1×	48.9	88.3	48.7	45.7	56.3	33.3
RetinaNet	ResNet-101	2×	53.8	91.7	58.6	48.9	63.5	46.0
本文方法	ResNet-50	1×	52.8	89.4	57.2	50.7	58.9	41.1
本文方法	ResNet-50	2×	57.2	91.8	67.1	53.5	65.4	58.8
本文方法	ResNet-101	1×	54.6	89.5	61.8	51.3	62.0	49.9
本文方法	ResNet-101	2×	57.5	93.1	64.1	53.3	65.6	60.3

下载: 导出CSV

表 7 不同检测方法的对比

Table 7. Comparison of different detection methods

方法	骨干网络类型	训练策略	AP	AP50	AP75	APS	APM	APL	参数量	浮点运算次数	帧率/（帧·s⁻¹）
FoveaBox	ResNet-50	1×	50.0	88.0	52.9	49.0	54.3	33.5	36.01×10⁶	202.39×10⁹	11.4
NAS-FCOS	ResNet-50	1×	46.1	84.7	47.5	47	46.9	34.5	38.66×10⁶	191.81×10⁹	15.3
ATSS	ResNet-50	1×	52.4	89.3	58.5	52	56.6	36.8	31.89×10⁶	201.33×10⁹	15.6
GFL	ResNet-50	1×	43.6	80.1	44.3	45	43.2	34	32.03×10⁶	204.42×10⁹	16.8
PISA	ResNet-50	1×	50.6	88.3	56.0	47.8	57.2	29.8	36.1×10⁶	204.36×10⁹	16.1
PAA	ResNet-50	1×	52.7	92.0	55.0	49	61.4	37	31.89×10⁶	201.33×10⁹	9.9
RetinaNet	ResNet-50	1×	48.8	86.7	49.5	46.2	56	31.2	36.1×10⁶	204.36×10⁹	16.2
本文方法	ResNet-50	1×	52.8	89.4	57.2	50.7	58.9	41.1	38.13×10⁶	231.68×10⁹	15.7

下载: 导出CSV

参考文献(32)

[1]	DU L, DAI H, WANG Y, et al. Target discrimination based on weakly supervised learning for high-resolution SAR images in complex scenes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(1): 461-472. doi: 10.1109/TGRS.2019.2937175
[2]	SHAHZAD M, MAURER M, FRAUNDORFER F, et al. Buildings detection in VHR SAR images using fully convolution neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(2): 1100-1116. doi: 10.1109/TGRS.2018.2864716
[3]	HUANG L Q, LIU B, LI B Y, et al. OpenSARShip: A dataset dedicated to sentinel-1 ship interpretation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(1): 195-208. doi: 10.1109/JSTARS.2017.2755672
[4]	ZHANG Z M, WANG H P, XU F, et al. Complex-valued convolutional neural network and its application in polarimetric SAR image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(12): 7177-7188. doi: 10.1109/TGRS.2017.2743222
[5]	YANG G, LI H C, YANG W, et al. Unsupervised change detection of SAR images based on variational multivariate Gaussian mixture model and Shannon entropy[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(5): 826-830. doi: 10.1109/LGRS.2018.2879969
[6]	GIERULL C H. Demystifying the capability of sublook correlation techniques for vessel detection in SAR imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(4): 2031-2042. doi: 10.1109/TGRS.2018.2870716
[7]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
[8]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007.
[9]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 9626-9635.
[10]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[11]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2261-2269.
[12]	CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1800-1807.
[13]	XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5987-5995.
[14]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
[15]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768.
[16]	GHIASI G, LIN T Y, LE Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 7029-7038.
[17]	SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. Piscataway: IEEE Press, 2017: 640-651.
[18]	DAI X Y, CHEN Y P, XIAO B, et al. Dynamic head: Unifying object detection heads with attentions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 7369-7378.
[19]	李晨瑄, 顾佼佼, 王磊, 等. 多尺度特征融合的Anchor-Free轻量化舰船要害部位检测算法[J]. 北京麻豆精品秘国产传媒学报, 2022, 48(10): 2006-2019. LI C X, GU J J, WANG L, et al. Warship' s vital parts detection algorithm based on lightweight Anchor-Free network with multi-scale feature fusion[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(10): 2006-2019(in Chinese).
[20]	张晓玲, 张天文, 师君, 等. 基于深度分离卷积神经网络的高速高精度SAR舰船检测[J]. 雷达学报, 2019, 8(6): 841-851. doi: 10.12000/JR19111 ZHANG X L, ZHANG T W, SHI J, et al. High-speed and high-accurate SAR ship detection based on a depthwise separable convolution neural network[J]. Journal of Radars, 2019, 8(6): 841-851 (in Chinese). doi: 10.12000/JR19111
[21]	JIAO J, ZHANG Y, SUN H, et al. A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection[J]. IEEE Access, 2018, 6: 20881-20892. doi: 10.1109/ACCESS.2018.2825376
[22]	ZHANG T W, ZHANG X L, SHI J, et al. Balanced feature pyramid network for ship detection in synthetic aperture radar images[C]//Proceedings of the IEEE Radar Conference. Piscataway: IEEE Press, 2020: 1-5.
[23]	CHEN S Q, ZHAN R H, WANG W, et al. Learning slimming SAR ship object detector through network pruning and knowledge distillation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 14: 1267-1282.
[24]	FU J M, SUN X, WANG Z R, et al. An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1331-1344. doi: 10.1109/TGRS.2020.3005151
[25]	张冬冬, 王春平, 付强. 基于Anchor-Free的光学遥感舰船关重部位检测算法[J]. 北京麻豆精品秘国产传媒学报, 2024, 50(4): 1365-1374. ZHANG D D, WANG C P, FU Q. Ship’s critical part detection algorithm based on Anchor-Free in optical remote sensing[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(4): 1365-1374(in Chinese).
[26]	ZHANG T W, ZHANG X L, LI J W, et al. SAR ship detection dataset (SSDD): Official release and comprehensive data analysis[J]. Remote Sensing, 2021, 13(18): 3690. doi: 10.3390/rs13183690
[27]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2014: 740-755.
[28]	KONG T, SUN F C, LIU H P, et al. FoveaBox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398. doi: 10.1109/TIP.2020.3002345
[29]	ZHANG S F, CHI C, YAO Y Q, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 9756-9765.
[30]	LI X, WANG W H, WU L J, et al. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[EB/OL]. (2020-06-08)[2022-12-10]. http://doi.org/10.48550/arXiv.2006.04388.
[31]	CAO Y H, CHEN K, LOY C C, et al. Prime sample attention in object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 11580-11588.
[32]	KIM K, LEE H S. Probabilistic anchor assignment with IoU prediction for object detection[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020: 355-371.