Researcheach | | Professor Yosi KellerProfessor Yosi Keller

Text to Speech

Research Overview

My research lies at the intersection of computer vision, deep learning, and signal processing, with a particular emphasis on real-world applications. I focus on developing robust models for age estimation, kinship verification, camera localization, deepfake generation and detection, multimodal data analysis, and network localization. These efforts often leverage state-of-the-art techniques including transformers, metric learning, probabilistic modeling, and generative neural networks. The ultimate goal of this work is to design scalable, interpretable, and high-performing systems that address fundamental challenges in visual understanding and biometric analysis.

Age Estimation and Face Analysis

This line of research focuses on computational models that analyze facial attributes to infer age, relative aging, and familial relationships. Our work develops deep learning frameworks that handle age progression, bias mitigation, and robust face embedding under varied demographic and environmental conditions. We also explore tasks such as kinship verification and landmark localization, which are foundational to biometric analysis. These contributions enhance the accuracy, fairness, and interpretability of face-based recognition systems in real-world scenarios.

S. Hiba and Y. Keller, "Hierarchical Attention-based Age Estimation and Bias Estimation"
Introduces hierarchical attention layers to enhance age prediction accuracy.
Also addresses fairness by analyzing dataset-induced estimation bias. Published in: IEEE TPAMI, 2023

O. Sendik and Y. Keller, "DeepAge: Deep Learning of Face-Based Age Estimation"
Presents a deep convolutional model trained on diverse facial images.
Delivers high accuracy across varied demographics and lighting. Published in: Signal Processing: Image Communication, 2019

E. Dahan and Y. Keller, "Age-Invariant Face Embedding using the Wasserstein Distance"
Utilizes Wasserstein distance to produce embeddings robust to age changes.
Improves recognition consistency across different age stages. Published in: arXiv:1908.05932

R. Sandhaus and Y. Keller, "Relative Age Estimation Using Face Images"
Uses pairwise comparisons to estimate relative ages between subjects.
Reduces data labeling needs through weak supervision. Published in: arXiv:2502.04852

E. Dahan and Y. Keller, "A Unified Approach to Kinship Verification"
Unifies kinship and facial feature models into a joint architecture.
Improves kinship classification accuracy across benchmarks. Published in: IEEE TPAMI, 2021

S. Mahpod, R. Das, E. Maiorana, Y. Keller, and P. Campisi, “Facial landmarks localization using cascaded neural networks”
Presents a cascaded CNN architecture for accurate facial landmark detection.
Improves precision in challenging conditions such as occlusion and pose variations. Published in: Computer Vision and Image Understanding, Volume 205, 2021

Camera Localization

Camera localization addresses the challenge of estimating a camera’s position and orientation in space from a single image. We develop transformer-based architectures and hypernetworks that learn robust scene representations across diverse environments. Our models achieve high accuracy in both short- and long-range settings and generalize across multiple scenes. This research supports applications in autonomous systems, AR/VR, and visual SLAM.

R. Ferens and Y. Keller, "HyperPose: Hypernetwork-Infused Camera Pose Localization"
Uses hypernetworks to adapt pose estimators to specific scenes.
Improves localization performance across indoor and outdoor datasets. Published in: CVPR 2025

S. Dekel, Y. Keller, M. Cadík, "Estimating Extreme 3D Image Rotation with Transformer Cross-Attention"
Introduces a cross-attention transformer model to handle extreme 3D rotations.
Excels in pose recovery under wide-angle transformations. Published in: CVPR 2024

Y. Shavit, R. Ferens, Y. Keller, "Coarse-to-Fine Multi-Scene Pose Regression with Transformers"
Proposes a coarse-to-fine transformer for multi-scene pose estimation.
Enhances accuracy in diverse and large-scale environments. Published in: IEEE TPAMI, 2023

Y. Shavit, R. Ferens, Y. Keller, "Learning Multi-Scene Absolute Pose Regression with Transformers"
Trains on multiple scenes using a shared transformer framework.
Delivers robust results with limited per-scene training. Published in: ICCV 2021

Y. Shavit and Y. Keller, “Camera Pose Auto-Encoders for Improving Pose Regression”
Introduces auto-encoders to refine pose latent space for regression tasks.
Enhances robustness to diverse camera scene configurations. Published in: ECCV 2022

R. Ferens, Y. Shavit and Y. Keller, “Learning Single and Multi-Scene Camera Pose with Transformer Encoders”
Combines transformer encoders with pose embeddings to unify scene representation.
Demonstrates high accuracy in both single and multi-scene localization. Published in: Computer Vision and Image Understanding

O. Idan, Y. Shavit, Y. Keller, “Learning to Localize in Unseen Scenes with Relative Pose Regressors”
Introduces relative pose constraints to generalize pose models to novel environments.
Enables training-free localization on previously unseen scenes. Published in: arXiv:2303.02717

Deepfake and Deepfake Detection

Y. Nirkin, T. Hassner, and Y. Keller, “FSGAN: Photo-realistic model-free video face swapping and reenactment”
Performs video face reenactment without requiring 3D face models.
Achieves real-time results while preserving facial identity and expression. Published in: ICCV 2019

Y. Nirkin, T. Hassner, Y. Keller, and L. Wolf, “DeepFake Detection Based on Discrepancies Between Faces and their Context”
Detects inconsistencies between manipulated faces and original backgrounds.
Boosts the reliability of deepfake detection in varied settings. Published in: IEEE TPAMI, 2022

Y. Nirkin, T. Hassner, and Y. Keller, “FSGANv2: Better Subject Agnostic Face Swapping and Reenactment”
Improves identity preservation and realism over the original FSGAN.
Supports broader face variations and generalization. Published in: IEEE TPAMI, 2023

Multisensor Data Matching and Analysis

This work investigates techniques for matching and analyzing visual data captured from different sensor types, including thermal, infrared, and RGB cameras. Our methods use attention mechanisms and multiscale feature alignment to detect and register features across heterogeneous modalities. These capabilities are essential in surveillance, remote sensing, and robotic perception. The research aims to bridge domain gaps and improve robustness in adverse or variable conditions.

E. Ben Baruch and Y. Keller, “Joint Detection and Matching of Feature Points in Multimodal Images”
Detects and matches features in RGB, infrared, and thermal image modalities.
Improves registration in heterogeneous data conditions. Published in: IEEE TPAMI, 2022

A. Moreshet and Y. Keller, “Attention-Based Multimodal Image Matching”
Aligns features across modalities using cross-attention mechanisms.
Enables better understanding in fused sensor environments. Published in: CVIU, 2023

N. Ofir and Y. Keller, “Multi-scale Processing of Noisy Images using Edge Preservation Losses”
Applies noise-resilient learning guided by edge retention.
Improves performance on degraded sensor imagery. Published in: ICPR 2020

Kinship, Face, and Biometric Research

E. Dahan and Y. Keller, "Improving Kinship Verification by Gender and Age Disentanglement"
Separates gender and age factors from facial features to enhance kinship classification.
Improves interpretability and accuracy in family relationship inference. Published in: arXiv

S. Mahpod and Y. Keller, “Kinship verification using a hybrid distance learning network”
Combines Euclidean and learned distance metrics in a hybrid neural model.
Improves robustness to intra-class variability. Published in: Computer Vision and Image Understanding, 2018

R. Bouzaglo and Y. Keller, "Fingerprint Synthesis and Reconstruction using Generative Adversarial Networks"
Generates synthetic fingerprint data for testing and training biometric systems.
Maintains realistic patterns while supporting reconstruction from latent images. Published in: arXiv:2201.06164

Deep Learning Research

Our deep learning research encompasses a range of foundational and applied topics, including multimodal learning, coreset selection, and depth estimation. We propose algorithms that improve interpretability, efficiency, and transferability across tasks such as image retrieval, biomedical analysis, and OCR for ancient scripts. The models combine transformers, metric learning, and generative architectures. This body of work contributes to both methodological advancement and real-world impact.

N. Malali and Y. Keller, “Learning to Embed Semantic Similarity for Joint Image-Text Retrieval”
Maps textual and visual data into a unified semantic space.
Enables accurate and scalable image-caption retrieval. Published in: IEEE TPAMI, 2022

G. Shapira and Y. Keller, “FaceCoresetNet: Differentiable Coresets for Face Set Recognition”
Reduces training time and memory usage using differentiable coreset selection.
Maintains accuracy on benchmark face recognition datasets. Published in: AAAI 2024

S. Mahpod and Y. Keller, “Auto-ML Deep Learning for Rashi Scripts OCR”
Applies automated machine learning to optimize OCR for ancient Hebrew texts.
Improves recognition on noisy and historical script data. Published in: arXiv:1811.01290

A. Navon and Y. Keller, “Financial Time Series Prediction Using Deep Learning”
Explores CNN and RNN models for stock trend forecasting.
Improves accuracy on volatile time series data. Published in: arXiv:1711.04174