Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
CVPR'25Integrating RGB and NIR imaging provides complementary spectral information, enhancing robotic vision in challenging lighting conditions. However, existing datasets and imaging systems lack pixel-level alignment between RGB and NIR images, posing challenges for downstream tasks. In this paper, we develop a robotic vision system equipped with two pixel-aligned RGB-NIR stereo cameras and a LiDAR sensor mounted on a mobile robot. The system simultaneously captures RGB stereo images, NIR stereo images, and temporally synchronized LiDAR point cloud. Utilizing the mobility of the robot, we present a dataset containing continuous video frames with pixel-aligned RGB and NIR stereo pairs under diverse lighting conditions. We introduce two methods that utilize our pixel-aligned RGB-NIR images: an RGB-NIR image fusion method and a feature fusion method. The first approach enables existing RGB-pretrained vision models to directly utilize RGB-NIR information without fine-tuning. The second approach fine-tunes existing vision models to more effectively utilize RGB-NIR information. Experimental results demonstrate the effectiveness of using pixel-aligned RGB-NIR images across diverse lighting conditions.
Our robotic vision system integrates two pixel-aligned RGB-NIR stereo cameras and a LiDAR sensor, all mounted on a mobile robotic platform. The system captures RGB stereo images, NIR stereo images, and temporally synchronized LiDAR point clouds, ensuring comprehensive spatial and spectral data acquisition. To enhance the quality and robustness of NIR image capture, we have implemented an NIR LED bar light source, providing consistent illumination across varying environments.
The robotic platform is designed for high maneuverability, featuring an omnidirectional wheel system that enables full 360-degree movement. Additionally, the system is powered by a high-capacity battery, allowing for continuous operation for up to six hours, making it suitable for extended field deployments in data acquisitions.
Using our system, we collected a dataset comprising:
For each frame, we provide:
Public access to the dataset is available on huggingface.
We categorize our dataset based on lighting conditions:
We propose a novel RGB-NIR image fusion method for 3 channel vision tasks. This fusion method eccelarates the performance of the pretrained vision models such as stereo depth, semantic segmentation, and object detection.
We estimate a series of disparity maps using the GRU structure of the RAFT-Stereo network. We alternate between the fused and NIR correlation volumes, as input to the GRU at each iteration. Our scenario reflect RGB with active illumination, which intends use NIR mainly to restore stereo depth and RGB compensates NIR images
@article{doe2024superai,
author = {Jinnyeong Kim, Seung-Hwan Baek},
title = {Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision},
conference = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025},
year = {2025},
keywords = {Machine Vision, Robot Vision, Vision Dataset},
doi = {10.48550/arXiv.2411.18025},
url = {https://arxiv.org/abs/2411.18025},
}
powered by Academic Project Page Template