Urarina Field Note Restoration Using Machine Learning and Game Play

Zhengyuan Feng; Michael Dorin

doi:10.1007/978-3-031-91481-2_1

Back

Book chapter

Urarina Field Note Restoration Using Machine Learning and Game Play

Zhengyuan Feng and Michael Dorin

Extended Reality and Serious Games for Education, Competitiveness, and Wellbeing, pp.3-17

Information Systems Engineering and Management, Springer Nature Switzerland

2025

DOI: https://doi.org/10.1007/978-3-031-91481-2_1

Abstract

Games

Google Cloud Vision

Optical Character Recognition

Urarina Language

Before the common availability of portable electronics, researching endangered languages required recording voices and writing lexicon cards. Lexicon cards describe a word and provide phonetic symbols depicting its pronunciation. In a typical study, multiple researchers with different handwriting styles may produce these cards. Variety in writing styles and the assortment of symbols used often makes optical character recognition difficult. This research addresses data capture challenges with a multi-phase process for accurately digitizing handwritten lexicon cards. First, lexicon cards were scanned into images and submitted to Google Cloud Vision for processing. Google Cloud Vision returned the recognized characters and mathematical bounding boxes denoting the physical locations of all text on the cards. Next, deep learning was employed to decode the phonetic symbols. These symbols were extracted manually using the bounding boxes provided by Google. A convolutional neural network-based application then processed the images and stored the most promising prediction of the symbol that matched the image. Because the automated processes thus far were not 100 % accurate, a further step involved human review and editing. Manual editing is commonly accepted as tedious and error-prone, indicating it also did not meet accuracy goals. An essential final step was creating a game to encourage review of the digital results. Not only does the game encourage an additional review, but it simultaneously provides practice and training to linguists studying the language. Through this process, the digitization of lexicon cards reached 100 % accuracy. This new approach can significantly help revitalize dormant language studies.

Metrics

1 Record Views

Details

Title: Urarina Field Note Restoration Using Machine Learning and Game Play
Author/Creator: Zhengyuan Feng
Michael Dorin
Contributors: Franci Suni-Lopez
Elvira G. Rincón Flores
Hernan Alejandro Quintana Cruz
Eunice Pereira dos Santos Nunes
Publication Details: Extended Reality and Serious Games for Education, Competitiveness, and Wellbeing, pp.3-17
Series: Information Systems Engineering and Management
Publisher: Springer Nature Switzerland; Cham
Academic Unit: Software Engineering and Data Science; School of Engineering
Language: English
Resource Type: Book chapter
Record Identifier: 991015317911303691

Urarina Field Note Restoration Using Machine Learning and Game Play

Abstract

Related links

Metrics

Details