Investigating LLM fine-tuning for Kashmiri→English translation
Faizan Ayoub · KashmirAI Research · Paper forthcoming
KashmirAI Research focuses on Low-Resource Machine Translation for the Kashmiri language. We utilize LoRA (Low-Rank Adaptation) to fine-tune Large Language Models (LLMs), a curated Kashmiri parallel corpus, and a human evaluation framework where native speakers validate AI-generated translations to improve linguistic accuracy.
This work investigates LLM-based machine translation for Kashmiri (کٲشُر), a low-resource endangered language spoken by over 7 million people in the Kashmir Valley. We are building tools and models that can translate between Kashmiri and English, with the goal of advancing language technology for this underserved language.
The research is currently in progress. Detailed methodology, results, and analysis will be shared upon publication of the associated conference paper. For further details, please contact us below.
Kashmiri (ISO 639-3: kas) is a Dardic language of the Indo-Aryan family, primarily spoken in the Kashmir Valley of Jammu & Kashmir, India. It is written in the Perso-Arabic script (Nastaliq), which poses unique challenges for NLP systems designed primarily for Latin-script or Devanagari-script languages.
Despite being an official language of the Indian Union Territory and spoken by millions, Kashmiri remains severely underserved in the NLP ecosystem with minimal digital linguistic resources available.
Low-resource machine translation remains one of the hardest problems in NLP. Models trained on insufficient data exhibit well-documented failure modes. Our work aims to study these challenges specifically for Kashmiri.
Aggregating and quality-filtering Kashmiri-English parallel data from multiple open sources.
Exploring how modern large language models can be adapted for Kashmiri translation tasks.
Building a community-driven evaluation platform with native Kashmiri speakers.
Comparing our approach against existing multilingual translation models.
A core component of this research is human evaluation by native Kashmiri speakers. We built this platform — KashmirAI Research — specifically for this purpose. Evaluators rate translations from anonymized systems on:
Full results, metrics, and analysis will be published in an upcoming conference paper.
For inquiries about this research, collaboration opportunities, or early access to findings:
📱 +91 7006718915
Faizan Ayoub — Lead Researcher, KashmirAI Research