Download PDFOpen PDF in browser

One-Hot Encoding and Bag-of-Words Methods in Processing the Uzbek Language Corpus Texts

EasyChair Preprint no. 11048

6 pagesDate: October 9, 2023

Abstract

Computers are designed to process information in digital or numerical form. But data is not always in numerical form. This article describes how to process data in the form of characters, words, and text, as well as the application of ONE-HOT ENCODING and BAG-OF-WORDS methods to the Uzbek language, among the methods of teaching a computer to process natural language. How do Alexa, Google Home, and many other "smart" assistants understand and respond to our speech today? This article presents the approaches of text processing of the Uzbek language corpus through text processing methods such as Bag-of-words (BOW), ONE-HOT encoding in the field of artificial intelligence called natural language processing.

Keyphrases: one-hot encoding, text processing Bag of words, Uzbek language corpus

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:11048,
  author = {Botir Elov and Shahlo Hamroyeva and Noila Matyakubova and Umidjon Yodgorov},
  title = {One-Hot Encoding and Bag-of-Words Methods in Processing the Uzbek Language Corpus Texts},
  howpublished = {EasyChair Preprint no. 11048},

  year = {EasyChair, 2023}}
Download PDFOpen PDF in browser