Friday, February 10, 2023, 9:00–10:30 a.m. EST, 127 Moore Building and virtually via Zoom
Dr. Shomir Wilson
Assistant Professor and Director of the Human
Language Technologies Lab in the
College of Information Sciences and Technology
at Penn State
“Sociodemographic Biases in Natural Language Processing: Two Case Studies”
Large language models (LLMs) are widely used in natural language processing (NLP) to
obtain high performance on a variety of tasks. However, the large corpora used to train
these models contain sociodemographic biases, and LLMs tend to inherit those biases, with
potentially harmful results. Shomir Wilson will present two case studies that reveal the
sociodemographic biases of select LLMs within the context of sentiment analysis, a
common NLP task. The first study shows that Word2Vec and GloVe exhibit negative
sentiment bias toward terms for people with disabilities. The second study shows that GPT-
2 exhibits a range of sentiment biases for nationality demonyms, i.e., words that specify
national origins. Shomir will conclude with some thoughts on the significance of these
biases and the challenges to mitigating or eliminating them.