Coursework and Projects

Bayesian Modelling of Housing Prices in Finland

January 2023 .M. HÜlttä, M. Dinh within course CS-E5710 - Bayesian Data Analysis.

Collected over 4,800 real estate transactions from Finland's national housing registry by building a custom Python scraper from asuntojen.hintatiedot.fi to tackle the lack of public API. Performed extensive feature engineering (e.g., floor ratio, age approximation, one-hot encodings) and trained multiple Bayesian models using BRMS: pooled regression, two hierarchical models (region and sub-region), and a spline-based model.

  • Conducted prior sensitivity analysis and diagnostics (R̂, ESS, divergences) across model variants.
  • The spline model achieved the best predictive accuracy (MAD = €56.7k) and the lowest LOO-CV elpd loss, outperforming frequentist baselines by ~20% in RMSE.
  • Identified price distortions in low-data regions and modeled predictive uncertainty via posterior predictive checks and regional breakdowns.

Mispronunciation Detection with Wav2Vec2 & Forced Alignment

Nov–Dec 2022. M. Dinh, M. Sairanen, A. Ventura within course ELEC-E5510 - Speech Recognition.

A deep leaning framework applicable for foreign language learners in detecting pronounciation mistakes. Fine-tuned a self-supervised Wav2Vec2 model on accented speech data to detect word-level pronunciation errors. Used trellis-based forced alignment and threshold-based classification to evaluate precision, recall, and F1 performance for foreign-language learning support.

Live Demo

Exploring Sanitation via Story Telling Visualization

Sep–Dec 2022. Personal Project, within course DOM-E2113 (original) and CS-E4450 (overhauled)

Interactive data Storytelling utilizing D3 and folium map, comprehensively covering aspects of Usability, Time, and Space.

Understanding company risks in Finnish municipalities

Sep–Dec 2022. S. Karhula, H. Nguyen, H. Phan, V. Tiainen, M. Dinh, A. V. Card ́o. Equal contribution

Publication Image

A cohesive tool that facilitates client's discovery of Finland's municipalities with financial risk and those with investment potentials by using time-series analysis and a proposed risk function.

Delivered as a Dash application with interactive 2D and 3D visualisation.

Project by courtesy of course CS-C3250 Data Science Project and OP Bank.

Tools used: Time-series prediction, Dash, Heroku

Data Dashboard and Interactive Visualisation

Feb-May 2022. Personal Project, within course CS-C2120 - Programming studio A

Publication Image

A Scala application that visualizes data interactively from online sources via a JavaFX-based GUI. The system integrates real-time data pipelines from REST APIs and CSV files, supporting variable selection and interactive annotation.

Using Logistic Regression for Prediction of Housing Affordability

Jan-Feb 2022. Personal Project, within course CS-C3240 - Machine Learning D

Created a binary classification model to identify "affordable" real estate units based on location, size, and historical pricing.

  • Defined affordability criteria using price-per-area thresholds adapted to local income levels.
  • Engineered features from scraped housing listings, including geospatial variables and inferred amenities.
  • Trained and evaluated logistic regression with L1/L2 regularization.
  • Demonstrated interpretability of learned weights and proposed use in public-facing housing search platforms