Evaluating Cultural Alignment as a Dose-Response Problem
Most alignments are tested as binary pass/fails. But what if we treat inference shaping like a pharmaceutical trial? Testing a 3-tier prompt gradient on a Qwen 2.5-7B live proxy yielded a perfect monotonic dose-response curve.
Building an Indian Multilingual LLM: Why I Had to Solve Reliability First
The goal was a multilingual LLM for Indian conversational patterns. Before I could build it properly, I had to figure out whether the model I was building on top of could be trusted at all.
Why Production Debugging Is an Underrated Skill
Most engineers learn to build things. The skill that actually separates good engineers from great ones is knowing what to do when something breaks at 3 AM.
Top 4.1% on Kaggle: Lessons That Transferred to Production
I hit Notebooks Expert ranked #2,441 — personal best #707. Here's what 34 notebooks actually taught me, and why almost none of it was about machine learning.