基于改进SuperPoint与线性转换器的可见光红外匹配算法

伍薇; 鲜勇; 苏娟; 张大巧; 李少朋; 李冰

doi:10.13700/j.bh.1001-5965.2022.1022

基于改进SuperPoint与线性转换器的可见光红外匹配算法

doi: 10.13700/j.bh.1001-5965.2022.1022

伍薇¹,
鲜勇¹,
苏娟²,
张大巧¹,
李少朋^{1, 3, ,},
李冰¹

1.
火箭军工程大学作战保障学院，西安 710025
2.
火箭军工程大学核工程学院，西安 710025
3.
清华大学自动化系，北京 100084

基金项目:

国家自然科学基金(62103432)；中国博士后科学基金(2022M721841)

详细信息

通讯作者:
E-mail：sp-li16@mails.tsinghua.edu.cn

中图分类号: TP183
计量
- 文章访问数: 538
- HTML全文浏览量: 191
- PDF下载量: 22
- 被引次数: 0
出版历程
- 收稿日期: 2022-12-29
- 录用日期: 2023-05-29
- 网络出版日期: 2023-09-08
- 整期出版日期: 2025-01-31

A matching method based on improved SuperPoint and linear Transformer for optical and infrared images

WU Wei¹,
XIAN Yong¹,
SU Juan²,
ZHANG Daqiao¹,
LI Shaopeng^{1, 3
, ,},
LI Bing¹

1.
College of War Support，Rocket Force University of Engineering，Xi’an 710025，China
2.
College of Nuclear Engineering，Rocket Force University of Engineering，Xi’an 710025，China
3.
Department of Automation，Tsinghua University，Beijing 100084，China

Funds:

National Natural Science Foundation of China (62103432); China Postdoctoral Science Foundation (2022M721841)

More Information

Corresponding author: E-mail：sp-li16@mails.tsinghua.edu.cn

摘要

摘要:
针对可见光和红外图像的异源图像匹配难度大、误匹配率高的问题，提出一种基于改进SuperPoint与线性转换器的深度学习匹配算法。首先在SuperPoint网络结构的基础上，引入特征金字塔的思想构建特征描述分支，基于铰链损失函数进行训练，从而较好地学习可见光与红外图像多尺度深层次特征，增大图像同名点对描述子的相似度；在特征匹配模块，利用线性转换器对SuperGlue匹配算法进行改进，聚合特征以提高匹配性能。在多个数据集上对所提算法进行实验验证，结果表明，与现有的算法相比，所提算法获得了更好的匹配效果，提高了匹配准确率。
- 红外图像 /
- 可见光图像 /
- 异源图像匹配 /
- 特征金字塔 /
- 线性转换器
Abstract:
A deep learning matching algorithm based on improved SuperPoint and linear transformer was proposed to solve the problem of difficult matching and high mismatching rates between visible and infrared heterologous images. Firstly, based on the SuperPoint network structure, the algorithm introduced the idea of a feature pyramid to build a feature description branch and trained it based on the hinge loss function, so as to better learn the multi-scale deep features of visible and infrared images and increase the similarity of the image with correspondence points to the descriptor. In the feature matching module, SuperGlue was improved by adopting a linear transformer for aggregating features to obtain better matching results. Experiments conducted on multiple datasets demonstrate that the proposed method improves the matching precision and provides better matching performance in comparison with existing matching methods.
- infrared image /
- visible light image /
- heterologous image matching /
- feature pyramid network /
- linear transformer

HTML全文

图 1 本文算法框架图

Figure 1. Framework of proposed algorithm

下载: 全尺寸图片幻灯片

图 2 改进SuperPoint网络结构图

Figure 2. Network structure of improved SuperPoint

下载: 全尺寸图片幻灯片

图 3 SIFT与SuperPoint特征点检测对比

Figure 3. Comparison of point feature detection between SIFT and SuperPoint

下载: 全尺寸图片幻灯片

图 4 L-SuperGlue结构图

Figure 4. Structure of L-SuperGlue

下载: 全尺寸图片幻灯片

图 5 线性转换器编码层结构图

Figure 5. Structure of linear transformer encoder layer

下载: 全尺寸图片幻灯片

图 6 自注意力与互注意力可视化

Figure 6. Visualization of self-attention and cross-attention

下载: 全尺寸图片幻灯片

图 7 RGB-NIR数据集匹配结果对比

Figure 7. Comparison of matching results from RGB-NIR dataset

下载: 全尺寸图片幻灯片

图 8 VEDAI数据集匹配结果对比

Figure 8. Comparison of matching results from VEDAI dataset

下载: 全尺寸图片幻灯片

图 9 VIS-IR数据集匹配结果对比

Figure 9. Comparison of matching results fromVIS-IR dataset

下载: 全尺寸图片幻灯片

表 1 不同算法的实验结果对比

Table 1. Comparison experiment results of different algorithms

算法	准确率/%		匹配误差/pixel		匹配耗时/s
算法	RGB-NIR数据集	VEDAI数据集	RGB-NIR数据集	VEDAI数据集	RGB-NIR数据集	VEDAI数据集
SIFT^[3]	58.09	80.26	1.7734	1.5176	0.469	0.518
SuperPoint+SuperGlue^[13]	73.72	85.76	1.3818	1.3538	0.071	0.104
D2-Net^[18]	54.01	52.23	1.4028	1.3529	0.625	0.649
本文算法	83.56	97.65	1.0745	0.9742	0.314	0.365

下载: 导出CSV

表 2 消融实验结果对比

Table 2. Comparison of ablation experiment results

网络结构	准确率/%		匹配误差/pixel		匹配耗时/s
网络结构	RGB-NIR数据集	VEDAI数据集	RGB-NIR数据集	VEDAI数据集	RGB-NIR数据集	VEDAI数据集
A	73.72	85.76	1.3818	1.3538	0.071	0.104
B	82.37	95.37	1.0767	0.9785	0.306	0.339
C	81.46	94.80	1.0867	0.9778	0.075	0.110
D	83.56	97.65	1.0745	0.9742	0.314	0.365

下载: 导出CSV

表 3 网络参数量对比

Table 3. Comparison of network parameters

特征提取与描述		特征匹配
SuperPoint	SuperPoint -FPN	SuperGlue	L- SuperGlue
960000	4920000	11460000	11460000

下载: 导出CSV

参考文献(22)

[1]	MA J Y, JIANG X Y, FAN A X, et al. Image matching from handcrafted to deep features: A survey[J]. International Journal of Computer Vision, 2021, 129(1): 23-79. doi: 10.1007/s11263-020-01359-2
[2]	ZITOVÁ B, FLUSSER J. Image registration methods: A survey[J]. Image and Vision Computing, 2003, 21(11): 977-1000. doi: 10.1016/S0262-8856(03)00137-9
[3]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. doi: 10.1023/B:VISI.0000029664.99615.94
[4]	MA J Y, ZHAO J, MA Y, et al. Non-rigid visible and infrared face registration via regularized Gaussian fields criterion[J]. Pattern Recognition, 2015, 48(3): 772-784. doi: 10.1016/j.patcog.2014.09.005
[5]	YE Y X, SHAN J, BRUZZONE L, et al. Robust registration of multimodal remote sensing images based on structural similarity[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(5): 2941-2958. doi: 10.1109/TGRS.2017.2656380
[6]	CHEN Y J, ZHANG X W, ZHANG Y N, et al. Visible and infrared image registration based on region features and edginess[J]. Machine Vision and Applications, 2018, 29(1): 113-123. doi: 10.1007/s00138-017-0879-6
[7]	LI J Y, HU Q W, AI M Y. RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2019, 29: 3296-3310.
[8]	ARAR M, GINGER Y, DANON D, et al. Unsupervised multi-modal image registration via geometry preserving image-to-image translation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 13407-13416.
[9]	PIELAWSKI N, WETZER E, ÖFVERSTEDT J, et al. CoMIR: Contrastive multimodal image representation for registration[C]// Proceedings of Annual Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2020: 1-21.
[10]	YI K M, TRULLS E, LEPETIT V, et al. LIFT: Learned invariant feature transform[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016: 467-483.
[11]	DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: Self-supervised interest point detection and description[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE Press, 2018: 337-33712.
[12]	LI K H, WANG L G, LIU L, et al. Decoupling makes weakly supervised local feature better[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 15817-15827.
[13]	SARLIN P E, DETONE D, MALISIEWICZ T, et al. SuperGlue: Learning feature matching with graph neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 4937-4946.
[14]	SHI Y, CAI J X, SHAVIT Y, et al. ClusterGNN: Cluster-based coarse-to-fine graph neural network for efficient feature matching[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 12507-12516.
[15]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 936-944.
[16]	CUTURI M. Sinkhorn distance: Lightspeed computation of optimal transport[C]// Proceedings of Annual Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2013: 1-9.
[17]	KATHAROPOULOS A, VYAS A, PAPPAS N, et al. Transformers are RNNs: Fast autoregressive transformers with linear attention[C]//Proceedings of 2020 International Conference on Machine Learing (ICML). New York: ACM Press, 2020: 1-10.
[18]	DUSMANU M, ROCCO I, PAJDLA T, et al. D2-net: a trainable CNN for joint description and detection of local features[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 8084-8093.
[19]	QUAN D, WANG S, NING H Y, et al. Element-wise feature relation learning network for cross-spectral image patch matching[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 3372-3386. doi: 10.1109/TNNLS.2021.3052756
[20]	RAZAKARIVONY S, JURIE F. Vehicle detection in aerial imagery: A small target detection benchmark[J]. Journal of Visual Communication and Image Representation, 2016, 34: 187-203. doi: 10.1016/j.jvcir.2015.11.002
[21]	ROCCO I, ARANDJELOVIC R, SIVIC J. Convolutional neural network architecture for geometric matching[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11): 2553-2567. doi: 10.1109/TPAMI.2018.2865351
[22]	HARTLEY R, ZISSERMAN A. Multiple view geometry in computer vision[M]. New York: Cambridge University Press, 2004: 117-123.