Yuan-Kai Wang, Tung-Ming Pan, and Tian-You Chen, "Domain-Generalized Person Re-Identification via Refined Neuron Dropout and Reciprocal-Expansion Re-Ranking," Scientific Reports 16, 2855, 2026. DOI
Person re-identification (re-ID) under domain shift remains brittle when conditions vary in viewpoint, illumination, and scene context. A domain-generalized (DG) feature learning framework is proposed that couples training-time domain-aware regularization with inference-time neighborhood calibration. During training, a refined neuron dropout (RND) extends domain-guided dropout via per-domain neuron-impact scores and temperature-controlled retention, suppressing domain-irrelevant activations while preserving domain-salient ones. At inference, a recursive reciprocal-expansion re-ranking (RRE) enforces reciprocal-neighborhood consistency to stabilize similarity estimates on large galleries. The components are architecture-neutral: RND operates on intermediate activations and RRE on distance matrices. The results are reported using a single compact CNN encoder to maintain a fixed computing budget and a fair comparison with lightweight baselines. Comprehensive ablations show stepwise CMC Rank-1 gains from baseline to + DGD, +RND, and + RRE. Comparisons against recent DG-ReID methods indicate competitive performance under pronounced variability in viewpoint, illumination, and background, with small gaps on certain normalization-centric settings, while improving cross-domain robustness on heterogeneous benchmarks.
To substantiate robustness to unseen domains beyond standard benchmarks, the framework was deployed in a complete end-to-end surveillance system. The system integrates four subsystems—(a) pedestrian detection, (b) multi-object tracking, (c) person re-ID, and (d) keyframe extraction—processing raw video to cross-camera retrieval and thus bridging benchmark evaluation and deployment-oriented testing.
Two video datasets were used: FJU-Detection (above images) and the EPFL multi-camera pedestrian dataset. FJU-Detection contains six indoor/outdoor campus cameras. A subject walks naturally through all views; each camera records 4–8 min at 25 fps, yielding ~ 3,800 pedestrian bounding boxes. One identity appears across all six cameras, forming a cross-camera re-ID scenario. EPFL provides synchronized recordings from three fixed campus cameras; the system produced ~ 10,000 pedestrian bounding boxes from these videos.
The system comprises: MS-CNN for pedestrian detection; an MDP-based tracker for temporal association; keyframe extraction triggered by a face-confidence threshold; and person re-ID using the proposed multi-domain feature learning with RND and RRE to rank the top-10 candidates.
After retrieval, metadata (frame number, camera ID, bounding box ID) reconstruct the subject’s cross-camera trajectory, and a synthesized video is rendered.