A CNN-RNN Siamese framework with multi-level aggregation for video-based person re-identification