If you've ever tried to translate Kashmiri text using Google Translate, you already know the answer: it doesn't support Kashmiri. Not partially. Not poorly. Not at all. As of 2026, Google Translate supports 133 languages — and Kashmiri is not one of them.
Neither do Microsoft Translator, DeepL, or any major commercial MT system — as of March 2026.
Why Don't Major Systems Support Kashmiri?
The reason is purely data-driven. Training a machine translation system requires millions of sentence pairs. Languages like Spanish, French, or Hindi have billions of available training sentences across the web. Kashmiri has thousands — and most of those are not in clean, machine-readable formats.
Additionally, Kashmiri has a dual script system: Nastaliq (right-to-left, used in most formal contexts) and Devanagari (used in some educational materials). This complicates tokenization and requires specialized preprocessing that generic multilingual models don't handle well.
What About Multilingual Models like GPT or mBERT?
Large multilingual models like GPT-4, mBERT, and XLM-RoBERTa include some Kashmiri tokens — but their coverage is minimal. When tested on Kashmiri translation tasks, these models typically produce:
- Partially correct but fluent-sounding outputs that miss key meaning
- Transliterations rather than actual translations
- Hallucinations — grammatically fluent English that bears no relation to the source
- Complete failures on idiomatic expressions or culturally specific phrases
Our Approach: Fine-Tuned Models + Human Evaluation
At Kashmir AI Research, instead of relying on zero-shot translation from general models, we are fine-tuning dedicated models on a curated Kashmiri→English parallel corpus and validating their output through structured human evaluation.
Our Methodology
- Corpus Construction — curating parallel sentence pairs from diverse Kashmiri sources
- LoRA Fine-Tuning — adapting multilingual LLMs with parameter-efficient methods
- Native Speaker Evaluation — real humans judge translation adequacy and fluency
- Pairwise Benchmarking — comparing fine-tuned models against baseline systems
The result will be the first comprehensive Kashmiri machine translation benchmark — and the first public dataset that allows any researcher to replicate and build on our work.
Help Fix This Problem
If you're a native Kashmiri speaker, you can directly help improve AI translation quality. 10–15 minutes of your time matters.
Start Evaluating →