Completed2021

Song Recommender System

ML-based workout song recommender using BPM and VADER sentiment analysis. Co-authored research with K-Means clustering on Billboard Top 100 to match songs to exercise intensity.

PythonMLNLPSentiment AnalysisK-MeansResearch

A machine learning-based music recommendation system that generates adaptive playlists based on physiological signals and song sentiment — built as a co-authored research project with Tanish Maheshwari.

The core insight: BPM alone isn't enough to match a song to an exercise state. Sentiment analysis on lyrics (using VADER) reveals a neutrality parameter that shows the highest positive covariance with BPM, and combining both gives a more accurate picture of a song's intensity profile than tempo alone.

Billboard Top 100 lyrics were extracted via the Genius API, sentiment scores computed using VADER, and K-Means clustering (optimised to K=4 via the elbow method) grouped songs into workout intensity tiers: warm-up, intensity, aggressive-1, and aggressive-2. The recommender takes live BPM from a smartwatch, maps it to the correct cluster, and shuffles a song from that tier.

Key Highlights

Co-authored research project with Tanish Maheshwari (Presidency University)
Lyrics extracted from Billboard Top 100 via Genius API and lyrics-extractor library
VADER sentiment analysis (rule-based NLP) to score song polarity across positive, negative, neutral, compound dimensions
Neutrality identified as the feature with highest positive covariance with BPM — used as primary clustering feature
K-Means clustering optimised to K=4 clusters via WCSS elbow method: warm-up, intensity, aggressive-1, aggressive-2
Live BPM input via smartwatch mapped to cluster range for real-time song selection

Tech Stack

ML

Scikit-learnK-MeansHierarchical Clustering

NLP

VADER Sentimentlyrics-extractorGenius API

Data

PandasNumPyMatplotlibBillboard Kaggle dataset

Challenges

Noisy heart rate data from consumer wearables affecting cluster boundary accuracy
Subjectivity of workout intensity — same BPM feels different across fitness levels
Cold start problem: new users have no preference history to anchor recommendations

Key Learnings

Combining physiological signals with content features produces more robust clusters than either alone
VADER's neutrality parameter is a stronger covariate with tempo than positive or negative polarity
Elbow method optimisation is critical — initialising K=5 without validation produced noisier clusters

View all projects