Bridging Modern AI and Historical Language: A Study on Large Language Model Adaptation to 1920s–30s Korean

Bridging Modern AI and Historical Language: A Study on Large Language Model Adaptation to 1920s–30s Korean

ACL ARR 2025 February Submission8471 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) primarily train on modern texts, limiting their ability to process historical language effectively. This study investigates methods for enhancing LLM adaptability to historical Korean, specifically focusing on the 1920s–30s literary domain. We construct a novel dataset of Korean literature from this period and introduce a sentence-final ending prediction task to evaluate historical linguistic adaptation. Our results demonstrate that adapting LLMs with targeted historical text exposure improves their ability to generate era-specific linguistic patterns while maintaining stability in longer contexts. The findings provide insights into the broader challenge of modeling diachronic language variations and highlight the potential of historical text adaptation techniques for computational humanities research.

Paper Type: Short

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: Historical Text Processing, Korean NLP

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: Korean

Submission Number: 8471

Loading