Research

My research focuses on learning robust, generalizable representations in machine learning systems, especially in settings with limited supervision, cross-modal inputs, or distribution shifts. I am especially interested in how structure emerges from self-supervised objectives, and how generative models can be analyzed and repurposed for safety and interpretability.

Publications

Privacy-Preserving Learning

Under the supervisor of Professor SouYoung Jin at Dartmouth, my research focuses on privacy-preserving techniques for training multimodal models on video data. I develop generative pipelines that modify audiovisual content to remove identifiable traits, such as faces, voices, or contextual personal cues, while preserving semantic and temporal coherence. This allows models to learn from rich, real-world video data without compromising individual privacy.

This work includes Unsafe2Safe: Controllable Image Anonymization for Downstream Utility, accepted to CVPR 2026. Unsafe2Safe is a fully automated pipeline that detects privacy-prone content and rewrites only sensitive regions, preserving global structure and downstream task utility while neutralizing private identity cues. The project page with examples is available here.

Representation Learning

As a research assistant at Aalto University’s BRAIN Lab under Prof. Stéphane Deny, I investigated self-supervised learning and invariance in biological and artificial vision systems. I contributed to projects exploring feature disentanglement, local equivariance, and learning from natural scenes without labels.

One recent result from this line of work is Latent Equivariant Operators for Robust Object Recognition: Promises and Challenges, an arXiv preprint selected as a TinyPaper at the ICLR 2026 GRaM Workshop. In this work, we study how learned latent-space operators can improve object recognition under transformations such as rotation and translation that are rarely seen during training.

Self-supervised model utilizing learnable VAE and operators for reverting distortion Self-supervised model utilizing learnable VAE and operators for reverting distortion.

Extension to process Electromyography (EMG) signal, offering a GUI for real-time EMG visualization and motion classification: Self-supervised model utilizing learnable VAE and operators for reverting distortion

This robustness-oriented thread also led to Impact of Noise on Deep Learning-Based Pseudo-Online Gesture Recognition with High-Density EMG, published at EMBC 2025. The paper studies how additive white Gaussian noise, channel loss, and electrode shift affect pseudo-online gesture recognition from high-density EMG signals.

Scene Representation

My Bachelor’s thesis, supervised by Prof. Alexander Ilin, focused on scene representation using neural fields. I implemented object-centric architectures to encode spatial and structural information from 2D observations, contributing to ongoing research in compositional generalization.

Title: Recent Advances in Scene Representation. PDF. Aalto Archive (with credentials).

Diffusion-Based Deepfake Detection

As part of a research exchange at EPFL, I worked in the Professor Sabine Süsstrunk’s Image and Visual Representation Lab (IVRL) under the supervision of her PhD students Yufan Ren and Peter Grönquist. My research focused on identifying synthetic visual content by analyzing structural patterns and generation dynamics present in visual data. I studied how these signals degrade or persist under real-world conditions such as compression artifacts, partial occlusion, and mismatch between training and target distributions.