Completed2025

Indian Desi Multilingual LLM — Training Pipeline

End-to-end multilingual LLM training pipeline targeting Hindi/English code-switching. Dataset curation, LoRA fine-tuning, inference evaluation, and deployment packaging across 6 Kaggle notebooks.

View Source View Live

PythonLLMNLPLoRAHuggingFaceTransformers

Built a complete training pipeline for an Indian multilingual LLM — from raw data curation through LoRA fine-tuning to inference evaluation and deployment packaging.

The goal was to address a real gap: most open-source LLMs handle formal Hindi reasonably well but break down on the casual, code-switched Hindi-English that represents how most Indians actually communicate. The project focuses on building the infrastructure that makes multilingual AI reliable and safe for production.

Recognised mid-project that Sarvam AI had advanced significantly in the same space with more compute and a dedicated research team. Rather than continuing a follower project, pivoted toward the more interesting problem hiding inside the work: building rigorous behavioral reliability evaluation infrastructure for open-source LLMs. That pivot evolved into the overarching LLM Inference Systems project, moving from raw training to systemic distribution shaping, guardrails, and audit-grade performance evaluation.

Key Highlights

Canonical dataset curated from 3 complementary sources — chatbot dataset for tone, large-scale conversation corpus for diversity, sentence-pair dataset for structural grounding
Unified schema normalising all sources into a clean, consistent format
LoRA adapter initialisation on a multilingual encoder-decoder base model
6-notebook pipeline: tokenisation → model setup → LoRA init → fine-tuning → inference eval → deployment packaging
Handles Hindi-English code-switching and Devanagari / Romanised Hinglish script diversity
Persona safety CI: checkpoint testing against adversarial prompts before deployment

Tech Stack

Core

PythonPyTorchHuggingFace Transformers

Fine-tuning

LoRAQLoRAPEFT

Evaluation

sentence-transformersCustom benchmark suite

Infrastructure

Kaggle T4 GPUDockerGitHub Actions

Challenges

Code-switching detection — Indian speakers mix languages mid-sentence unpredictably
Script diversity — Hindi written in Devanagari vs Romanised Hinglish are treated as different inputs
Evaluation subjectivity — BLEU scores don't capture cultural nuance or conversational naturalness
Recognising when to pivot: continued investment in the model itself was low-differentiation

Key Learnings

Building safety layers before scaling inference is the right order of operations
Pivoting is a research decision, not a failure — finding the more interesting problem matters
Cultural context matters more than raw accuracy metrics for conversational AI
Dataset quality and schema consistency have more leverage than model architecture choices at this scale

View all projects