Coursework and Projects
Bayesian Modelling of Housing Prices in Finland
January 2023 . within course CS-E5710 - Bayesian Data Analysis.
Collected over 4,800 real estate transactions from Finland's national housing registry by building a custom Python scraper from asuntojen.hintatiedot.fi to tackle the lack of public API. Performed extensive feature engineering (e.g., floor ratio, age approximation, one-hot encodings) and trained multiple Bayesian models using BRMS: pooled regression, two hierarchical models (region and sub-region), and a spline-based model.
- Conducted prior sensitivity analysis and diagnostics (RĚ, ESS, divergences) across model variants.
- The spline model achieved the best predictive accuracy (MAD = âŹ56.7k) and the lowest LOO-CV elpd loss, outperforming frequentist baselines by ~20% in RMSE.
- Identified price distortions in low-data regions and modeled predictive uncertainty via posterior predictive checks and regional breakdowns.
Mispronunciation Detection with Wav2Vec2 & Forced Alignment
NovâDec 2022. within course ELEC-E5510 - Speech Recognition.
A deep leaning framework applicable for foreign language learners in detecting pronounciation mistakes. Fine-tuned a self-supervised Wav2Vec2 model on accented speech data to detect word-level pronunciation errors. Used trellis-based forced alignment and threshold-based classification to evaluate precision, recall, and F1 performance for foreign-language learning support.
Exploring Sanitation via Story Telling Visualization
SepâDec 2022. , within course DOM-E2113 (original) and CS-E4450 (overhauled)
Interactive data Storytelling utilizing D3 and folium map, comprehensively covering aspects of Usability, Time, and Space.
Understanding company risks in Finnish municipalities
SepâDec 2022.

A cohesive tool that facilitates client's discovery of Finland's municipalities with financial risk and those with investment potentials by using time-series analysis and a proposed risk function.
Delivered as a Dash application with interactive 2D and 3D visualisation.
Project by courtesy of course CS-C3250 Data Science Project and OP Bank.
Tools used: Time-series prediction, Dash, Heroku
Data Dashboard and Interactive Visualisation
Feb-May 2022. , within course CS-C2120 - Programming studio A

A Scala application that visualizes data interactively from online sources via a JavaFX-based GUI. The system integrates real-time data pipelines from REST APIs and CSV files, supporting variable selection and interactive annotation.
Using Logistic Regression for Prediction of Housing Affordability
Jan-Feb 2022. , within course CS-C3240 - Machine Learning D
Created a binary classification model to identify "affordable" real estate units based on location, size, and historical pricing.
- Defined affordability criteria using price-per-area thresholds adapted to local income levels.
- Engineered features from scraped housing listings, including geospatial variables and inferred amenities.
- Trained and evaluated logistic regression with L1/L2 regularization.
- Demonstrated interpretability of learned weights and proposed use in public-facing housing search platforms
