The top portion of the campus entrance gate showing IISER Pune logo

Learning Human Preferences: From Clicks to Conversations

By Suryanarayana Sankagiri, EPFL, Switzerland,

seminar hall 51,4th floor main building 

Abstract 

People routinely reveal their preferences online, e.g., when choosing
search results, videos, or products. Such data is used by algorithms to learn human
tastes. Recently, curated datasets of human preferences have been used to fine-
tune language models, substantially improving their alignment with human intent.
These successes raise a natural question: can recommender systems learn more
effectively from comparisons rather than ratings? The talk will trace a path from basic
models of choice behaviour to new frameworks for recommender systems. The main
focus will be on our theoretical result showing that personalised recommendations
can be learned efficiently from comparison data, despite the underlying optimisation
problem being nonconvex. I will then describe a bandit formulation that addresses
the classical exploration-exploitation trade-off in a novel way. Finally, I’ll share
empirical insights motivating richer models of human choice. I will conclude by
arguing that learning from human preferences is key to building interactive AI
systems that reliably serve human needs.