Multimodal RGB-Thermal Object Detection
Equal-contribution author — architecture & experiments (Computer Vision & DL course → IEEE ICUAS 2026 paper, under review)
Detecting people from a drone at medium-high altitude is hard: targets are a handful of pixels and RGB collapses at dawn, dusk and night. This project fuses visible and thermal infrared imagery to stay robust across lighting conditions — built for UAV-assisted monitoring of recreational fishing along protected coastline. The work grew from a Computer Vision & Deep Learning course project into a paper currently under review at IEEE ICUAS 2026 (equal-contribution author).
- Mid-level RGB-TIR fusion on a dual-backbone architecture built on DEYOLO (Dual-Feature-Enhancement YOLO), keeping each modality’s cues distinct before fusing them.
- Small-object refinements: added SPDConv (lossless, space-to-depth downsampling) and a redesigned SPANet neck that propagates the high-resolution P2 level to the heads — exactly the detail small targets need.
- 78% mAP50 on the curated VTUAV-det-tiny benchmark, beating RGB-only (35.5%) and thermal-only (73.8%) baselines; ablations confirm SPDConv and SPANet are complementary.
- Evaluated qualitatively on a custom dual-sensor dataset captured with a DJI Matrice 30T; Class Activation Maps show tighter, better-centred activations on the enhanced model.
- Training tracked for carbon footprint (CodeCarbon) on an NVIDIA RTX A6000.