Visual Place Recognition (VPR) is a critical task in computer vision, traditionally enhanced by re-ranking retrieval results with image matching. However, recent advancements in VPR methods have significantly improved performance, challenging the necessity of re-ranking. In this work, we show that modern retrieval systems often reach a point where re-ranking can degrade results, as current VPR datasets are largely saturated. We propose using image matching as a verification step to assess retrieval confidence, demonstrating that inlier counts can reliably predict when re-ranking is beneficial. Our findings shift the paradigm of retrieval pipelines, offering insights for more robust and adaptive VPR systems.
@inproceedings{sferrazza_2025_to_match,title={To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition},author={Sferrazza, Davide and Berton, Gabriele and Trivigno, Gabriele and Masone, Carlo},booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},month=jun,year={2025},keywords={visual place recognition, uncertainty estimation, image matching, image retrieval},eprint={2504.06116},archiveprefix={arXiv},primaryclass={cs.CV},}
Master’s Thesis
Understanding and Enhancing Visual Place Recognition through Embedding Space Interpretability and Uncertainty Estimation
Visual Place Recognition (VPR) involves determining the geographic location of a photo based solely on its visual content. Recent advancements in Deep Learning (DL) have enabled the representation of images in high-dimensional spaces, where photos taken in the same location tend to cluster together, while images from different places are spread apart. This spatial organization makes it easier to predict locations by performing similarity searches against a database of known places. However, a key gap in current research is understanding the specific information retained in these image embeddings that allows for effective and reliable location prediction. Additionally, existing State-of-the-Art (SOTA) deterministic methods in VPR are unable to quantify the uncertainty of their predictions. This is particularly problematic in safety-critical applications, such as autonomous driving, where knowing the confidence level in a system’s decision is vital for ensuring safety. This thesis addresses two main challenges: first, understanding and visualizing the essential information encoded in image embeddings, and second, providing uncertainty estimates for VPR models through post-hoc techniques. To overcome these challenges, the thesis employs Generative Artificial Intelligence models, particularly Latent Diffusion Models, to explore and visualize the content within image embeddings. Additionally, uncertainty estimation methods are incorporated to enhance the robustness and reliability of VPR systems. The contributions of this thesis provide valuable insights into the interpretability and reliability of VPR systems, offering a framework for analyzing the output of these models and incorporating uncertainty quantification during inference.
@mastersthesis{sferrazza2025understanding,title={Understanding and Enhancing Visual Place Recognition through Embedding Space Interpretability and Uncertainty Estimation},author={Sferrazza, Davide},month=apr,year={2025},school={Politecnico di Torino},keywords={generative artificial intelligence, visual place recognition, uncertainty estimation, diffusion models, latent diffusion models, masters thesis}}