🎃 Valiukas

jump to main content

language model


LLMs are trained to predict words from massive datasets of text from the internet. They typically contain billions of parameters that are jointly optimized—at great computational, energy, and financial expense (Bender et al., 2021)—to make predictions about word occurrence from surrounding context. This setup differs from human language acquisition in data scale and format. On the other hand, the core objective of word prediction is a central piece of human language processing (e.g. Altmann and Kamide, 1999; Hale, 2001; Levy, 2008) and has long been shown capable of providing a learning signal from which linguistic structures and semantic categories can emerge (Elman, 1990).

–Piantadosi & Hill (2022)