DEVELOPMENT AND OPTIMIZATION OF COMPACT LANGUAGE MODELS (SLM) FOR AUTONOMOUS OPERATION ON MOBILE DEVICES
Main Article Content
Abstract
This paper explores the shift from cloud-based computing to localized execution of Artificial Intelligence on end-user hardware (On-device AI). The primary focus is on Small Language Models (SLMs) with 1 to 3 billion parameters, which are capable of demonstrating cognitive abilities comparable to giant LLMs. Optimization techniques such as 4-bit quantization, Knowledge Distillation, and Low-Rank Adaptation (LoRA) are examined. As a result, the paper proposes an architecture optimized for mobile processors with NPU accelerators, ensuring high-speed text generation with minimal power consumption and complete data privacy.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.
How to Cite
References
1.Creswell J. W. Research Design: Qualitative and Quantitative Approaches. – 6th ed., SAGE, 2023. (Research Design Methodology).
2.Vaswani A. et al. Attention Is All You Need. – NeurIPS, 2017. (Foundations of Transformer Architecture).
3.Touvron H. et al. Llama 3 and 4 Technical Report. – Meta AI, 2024–2025. (Scaling and Training Methods for Compact Models).
4.Nazirova E.Sh., Abidova Sh.B. Methodology of Scientific Research. – Tashkent, TUIT, 2024. (Principles of Organizing Academic Work in IT).
5.Han S. et al. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. – ICLR, 2024. (Neural Network Compression Methods).
6.Microsoft Research. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. – 2024. (Practical Aspects of SLMs).
7.Zheng Lianmin et al. Efficient LLM Inference on Edge Devices. – Journal of AI Resources, 2025. (Optimization of Inference on Peripheral Devices).