Bridging Modern AI and Historical Language: A Study on Large Language Model Adaptation to 1920s–30s Korean
Abstract: Large Language Models (LLMs) primarily train on modern texts, limiting their ability to process historical language effectively. This study investigates methods for enhancing LLM adaptability to historical Korean, specifically focusing on the 1920s–30s literary domain. We construct a novel dataset of Korean literature from this period and introduce a sentence-final ending prediction task to evaluate historical linguistic adaptation. Our results demonstrate that adapting LLMs with targeted historical text exposure improves their ability to generate era-specific linguistic patterns while maintaining stability in longer contexts. The findings provide insights into the broader challenge of modeling diachronic language variations and highlight the potential of historical text adaptation techniques for computational humanities research.
Paper Type: Short
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Historical Text Processing, Korean NLP
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: Korean
Submission Number: 8471
Loading