Song Recommender System
ML-based workout song recommender using BPM and VADER sentiment analysis. Co-authored research with K-Means clustering on Billboard Top 100 to match songs to exercise intensity.
A machine learning-based music recommendation system that generates adaptive playlists based on physiological signals and song sentiment — built as a co-authored research project with Tanish Maheshwari.
The core insight: BPM alone isn't enough to match a song to an exercise state. Sentiment analysis on lyrics (using VADER) reveals a neutrality parameter that shows the highest positive covariance with BPM, and combining both gives a more accurate picture of a song's intensity profile than tempo alone.
Billboard Top 100 lyrics were extracted via the Genius API, sentiment scores computed using VADER, and K-Means clustering (optimised to K=4 via the elbow method) grouped songs into workout intensity tiers: warm-up, intensity, aggressive-1, and aggressive-2. The recommender takes live BPM from a smartwatch, maps it to the correct cluster, and shuffles a song from that tier.
Key Highlights
- Co-authored research project with Tanish Maheshwari (Presidency University)
- Lyrics extracted from Billboard Top 100 via Genius API and lyrics-extractor library
- VADER sentiment analysis (rule-based NLP) to score song polarity across positive, negative, neutral, compound dimensions
- Neutrality identified as the feature with highest positive covariance with BPM — used as primary clustering feature
- K-Means clustering optimised to K=4 clusters via WCSS elbow method: warm-up, intensity, aggressive-1, aggressive-2
- Live BPM input via smartwatch mapped to cluster range for real-time song selection
Tech Stack
ML
NLP
Data
Challenges
- Noisy heart rate data from consumer wearables affecting cluster boundary accuracy
- Subjectivity of workout intensity — same BPM feels different across fitness levels
- Cold start problem: new users have no preference history to anchor recommendations
Key Learnings
- Combining physiological signals with content features produces more robust clusters than either alone
- VADER's neutrality parameter is a stronger covariate with tempo than positive or negative polarity
- Elbow method optimisation is critical — initialising K=5 without validation produced noisier clusters