Anchor Function: A Type of Benchmark Functions for Studying Language Models

Zhongwang Zhang; Zhiwei Wang; Junjie Yao; Zhangchen Zhou; Xiaolong Li; Weinan E; Zhi-Qin John Xu

doi:10.4208/jml.250723

Author(s)

,

&

Abstract

Understanding transformer-based language models is becoming increasingly crucial, particularly as they play pivotal roles in advancing towards artificial general intelligence. However, language model research faces significant challenges, especially for academic research groups with constrained resources. These challenges include complex data structures, unknown target functions, high computational costs and memory requirements, and a lack of interpretability in the inference process, etc.

Drawing a parallel to the use of simple models in scientific research, we propose the concept of an anchor function. This is a type of benchmark function designed for studying language models in learning tasks that follow an "anchor-key" pattern. By utilizing the concept of an anchor function, we can construct a series of functions to simulate various language tasks. The anchor function plays a role analogous to that of mice in diabetes research, particularly suitable for academic research.

We demonstrate the utility of the anchor function with an example, revealing two basic operations by attention structures in language models: shifting tokens and broadcasting one token from one position to many positions. These operations are also commonly observed in large language models. The anchor function framework, therefore, opens up a series of valuable and accessible research questions for further exploration, especially for theoretical study.

Keywords:

Language models Transformer Anchor function Machine learning theory

Author Biographies

Zhongwang Zhang

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
Zhiwei Wang

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
Junjie Yao

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
Zhangchen Zhou

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
Xiaolong Li

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
Weinan E

Beijing International Center for Mathematical Research (BICMR), Peking University, Beijing 100871, China
Zhi-Qin John Xu

Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China

Anchor Function: A Type of Benchmark Functions for Studying Language Models

Author(s)

Abstract

Keywords:

Author Biographies

Abstract View

Pdf View

DOI

Anchor Function: A Type of Benchmark Functions for Studying Language Models

Downloads

SHARE

Author(s)

Abstract

Keywords:

Author Biographies

Abstract View

Pdf View

DOI