Bridging behavioral gaps : automatic extrapolation of concreteness norms in Arabic using English-tuned KNN approaches
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis addresses the automatic extrapolation of concreteness norms for nouns in both Modern Standard Arabic and English. The main goal is to enable reliable estimation of how concrete or abstract words are (e.g. “apple” vs. “justice”) using computational methods, supporting applications in psycholinguistics and natural language processing. To this end, a novel dataset of 202 Arabic nouns rated for concreteness is introduced and aligned with established English norms. To predict concreteness, the study compares a K-Nearest Neighbors (KNN) regression model based on FastText and transformer-based embeddings with predictions from Chat GPT. The KNN models achieve high accuracy, Spearman ρ = 0.92 (RMSE = 0.43) for English and ρ = 0.83 (RMSE = 0.69) for Arabic on held-out test sets. By contrast, ChatGPT predictions, while consistent across runs, yield lower correlations (ρ = 0.80 for both English and Arabic) and higher RMSE values, confirming that KNN remains more accurate for concreteness estimation. Keywords: concreteness norms; abstractness; Arabic; English; K-Nearest Neighbors; word embeddings; FastText; transformer models; ChatGPT; lexical semantics; psycholinguistics; norm extrapolation