2025 Recap

In 2025, I visited ACL in Austria, EMNLP in China, INLG in Vietnam, and finished my PhD. Most of the papers coming out this year which really excited me were published at the first two conferences, but I also continue to dig the machine learning conferences (especially NeurIPS and ICLR) as well as the HCI conferences (CHI, FAccT) a lot.

For 2026, I’m aiming most of my attention towards EMNLP in Budapest, since it’s close by, it’s my “legacy” conference (2021 in Punta Cana was my very first in-person one), and it continues to be the venue with the highest total number of papers relevant to my research.

My top 5 papers are down below in the “Spotlight” section, followed by a topically arranged list of other favorites.

Please also keep in mind that this entire list is very subjective, coming from a heavy euro/western/WEIRD perspective.


Spotlight

ELI-Why: Evaluating the Pedagogical Utility of Language Model Explanations
Brihi Joshi, Keyu He, Sahana Ramnath, Sadra Sabouri, Kaitlyn Zhou, Souti Chattopadhyay, Swabha Swayamdipta, Xiang Ren
ACL 2025 Findings
doi

Multi-Domain Explainability of Preferences
Nitay Calderon, Liat Ein-Dor, Roi Reichart
EMNLP 2025
doi

The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure
Yu Fan, Yang Tian, Shauli Ravfogel, Mrinmaya Sachan, Elliott Ash, Alexander Hoyle
EMNLP 2025
doi

A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
Nishant Balepur, Matthew Shu, Yoo Yeon Sung, Seraphina Goldfarb-Tarrant, Shi Feng, Fumeng Yang, Rachel Rudinger, Jordan Boyd-Graber
EMNLP 2025
doi

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Liwei Jiang, Yuanjun Chai, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Yejin Choi
NeurIPS 2025 Datasets and Benchmarks
OpenReview


Human-centric XAI

One line of work that I often publish in is the effect of explanations and NLP interpretability on human observers. This often includes explainable fact checking as a use case. The niche of dialogue-based XAI also stays relevant.

Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies
Sunnie S. Y. Kim, Jennifer Wortman Vaughan, Q. Vera Liao, Tania Lombrozo, Olga Russakovsky
CHI 2025
doi

Don’t be Fooled: The Misinformation Effect of Explanations in Human-AI Collaboration
Philipp Spitzer, Joshua Holstein, Katelyn Morrison, Kenneth Holstein, Gerhard Satzger, Niklas KĂŒhl
IJHCI 2025
doi

Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills
Zana Buçinca, Siddharth Swaroop, Amanda E. Paluch, Finale Doshi-Velez, Krzysztof Z. Gajos
CHI 2025
doi

Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant
Gaole He, Nilay Aishwarya, Ujwal Gadiraju
IUI 2025
doi

Show Me the Work: Fact-Checkers’ Requirements for Explainable Automated Fact-Checking
Greta Warren, Irina Shklovski, Isabelle Augenstein
CHI 2025
doi

Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent
Zeyu He, Saniya Naphade, Ting-Hao (Kenneth) Huang
CHI 2025
doi

Investigating Co-Constructive Behavior of Large Language Models in Explanation Dialogues
Leandra Fichtel, Maximilian Spliethöver, Eyke HĂŒllermeier, Patricia Jimenez, Nils Klowait, Stefan Kopp, Axel-Cyrille Ngonga Ngomo, Amelie Robrecht, Ingrid Scharlau, Lutz Terfloth, Anna-Lisa Vollmer, Henning Wachsmuth
SIGDIAL 2025
Proceedings

To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models
Jessica Y. Bo, Sophia Wan, Ashton Anderson
CHI 2025
doi

Understanding the Effects of Explaining Predictive but Unintuitive Features in Human-XAI Interaction
Jiaming Qu, Jaime Arguello, Yue Wang
FAccT 2025
doi

Deceptive Explanations by Large Language Models Lead People to Change their Beliefs About Misinformation More Often than Honest Explanations
Valdemar Danry, Pat Pataranutaporn, Matthew Groh, Ziv Epstein
CHI 2025
doi

Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Rock Yuren Pang, Hope Schroeder, Kynnedy Simone Smith, Solon Barocas, Ziang Xiao, Emily Tseng, Danielle Bragg
CHI 2025
doi

Mechanistic Interpretability

Since January last year, I’ve been exploring a lot of stuff that has happened in the MechInterp community. My own contribution this year was co-authoring the PRISM paper and a survey paper on concept descriptions. I still have a few gripes with it, but due to the hype there are plenty of exciting questions being investigated.

Circuit Tracing: Revealing Computational Graphs in Language Models
Emmanuel Ameisen, Jack Lindsey, Adam Pearce, Wes Gurnee, Nicholas L. Turner, Brian Chen, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton, Trenton Bricken, Callum McDougall, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson, Sam Zimmerman, Kelley Rivoire, Thomas Conerly, Chris Olah, Joshua Batson
Transformer Circuits 2025
Blog

Inferring Functionality of Attention Heads from their Parameters
Amit Elhelo, Mor Geva
ACL 2025
ACL Anthology

ConSim: Measuring Concept-Based Explanations’ Effectiveness with Automated Simulatability
Antonin Poché, Alon Jacovi, Agustin Martin Picard, Victor Boutin, Fanny Jourdan
ACL 2025
ACL Anthology

Enhancing Automated Interpretability with Output-Centric Feature Descriptions
Yoav Gur-Arieh, Roy Mayan, Chen Agassy, Atticus Geiger, Mor Geva
ACL 2025
ACL Anthology

We Can’t Understand AI Using our Existing Vocabulary
John Hewitt, Robert Geirhos, Been Kim
ICML 2025 Position Papers
OpenReview

Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
NeurIPS 2025
OpenReview

FADE: Why Bad Descriptions Happen to Good Features
Bruno Puri, Aakriti Jain, Elena Golimblevskaia, Patrick Kahardipraja, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
ACL 2025 Findings
ACL Anthology

Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna, Yonatan Belinkov, Sandro Pezzelle
Computational Linguistics 2025
doi

Discursive Circuits: How Do Language Models Understand Discourse Relations?
Yisong Miao, Min-Yen Kan
EMNLP 2025
ACL Anthology

On Relation-Specific Neurons in Large Language Models
Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich SchĂŒtze
EMNLP 2025
ACL Anthology

Position-aware Automatic Circuit Discovery
Tal Haklay, Hadas Orgad, David Bau, Aaron Mueller, Yonatan Belinkov
ACL 2025
ACL Anthology

Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions
Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov, Hannaneh Hajishirzi, Ashish Sabharwal
ICLR 2025
OpenReview

MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, IvĂĄn Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov
ICML 2025
OpenReview

Do different prompting methods yield a common task representation in language models?
Guy Davidson, Todd M. Gureckis, Brenden M. Lake, Adina Williams
NeurIPS 2025
OpenReview

Improved Representation Steering for Language Models
Zhengxuan Wu, Qinan Yu, Aryaman Arora, Christopher D. Manning, Christopher Potts
NeurIPS 2025
OpenReview

Measuring and Guiding Monosemanticity
Ruben HÀrle, Felix Friedrich, Manuel Brack, Stephan WÀldchen, Björn Deiseroth, Patrick Schramowski, Kristian Kersting
NeurIPS 2025
OpenReview

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller
ICLR 2025
OpenReview

Tracing Attention Computation Through Feature Interactions
Harish Kamath, Emmanuel Ameisen, Isaac Kauvar, Rodrigo Luger, Wes Gurnee, Adam Pearce, Sam Zimmerman, Joshua Batson, Thomas Conerly, Chris Olah, Jack Lindsey
Transformer Circuits 2025
Blog

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu, Xinyan Velocity Yu, Dani Yogatama, Jiasen Lu, Yoon Kim
ICLR 2025
OpenReview

Which Attention Heads Matter for In-Context Learning?
Kayo Yin, Jacob Steinhardt
ICML 2025
OpenReview

Causal Differentiating Concepts: Interpreting LM Behavior via Causal Representation Learning
Navita Goyal, Hal Daumé III, Alexandre Drouin, Dhanya Sridhar
NeurIPS 2025
OpenReview

Neuron Empirical Gradient: Discovering and Quantifying Neurons Global Linear Controllability
Xin Zhao, Zehui Jiang, Naoki Yoshinaga
ACL 2025
ACL Anthology

Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs, Shihan Wang, Antske Fokkens
CoNLL @ ACL 2025
doi

Rationale Generation

Back to another niche that never ceases to excite me: Synthesizing free-text explanations. It’s certainly my home turf and very intertwined with a lot of other topics in my list. This year, we saw a lot of focus on evaluating chain-of-thought rationales for faithfulness, but also datasets and human evaluation studies pop up every now and then.

Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
Wei Jie Yeo, Ranjan Satapathy, Erik Cambria
EMNLP 2025
ACL Anthology

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasović, Yonatan Belinkov
EMNLP 2025
ACL Anthology

Rubrik’s Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Diana Galvan-Sosa, Gabrielle Gaudeau, Pride Kavumba, Yunmeng Li, Hongyi Gu, Zheng Yuan, Keisuke Sakaguchi, Paula Buttery
ACL 2025
ACL Anthology

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Katie Matton, Robert Ness, John Guttag, Emre Kıcıman
ICLR 2025
OpenReview

A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
Lingjun Zhao, Hal Daumé III
EMNLP 2025
ACL Anthology

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Quan Shi, Carlos E. Jimenez, Shunyu Yao, Nick Haber, Diyi Yang, Karthik Narasimhan
NeurIPS 2025
OpenReview

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Beiduo Chen, Yang Janet Liu, Anna Korhonen, Barbara Plank
EMNLP 2025
ACL Anthology

LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference
Pingjun Hong, Beiduo Chen, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank
EMNLP 2025
ACL Anthology

Multi-Level Explanations for Generative Language Models
Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh
ACL 2025
ACL Anthology

Medical NLP

Where 2024 had fact checking and dialogue systems as two key applications in my research projects, 2025 marked my first deep dive into medical NLP. My main output so far is the Infherno tool for text-to-FHIR translation, but in the background I’ve been working on a few things related to my funding project IlluminateCardio, where it’s also about patient-centric explanations, text simplification, and retrieval-augmented generation.

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze
NAACL 2025
ACL Anthology

The Medium is the Message: How Non-Clinical Information Shapes Clinical Decisions in LLMs
Abinitha Gourabathina, Walter Gerych, Eileen Pan, Marzyeh Ghassemi
FAccT 2025
doi

Medication information extraction using local large language models
Phillip Richter-Pechanski, Marvin Seiferling, Christina Kiriakou, Dominic M. Schwab, Nicolas A. Geis, Christoph Dieterich, Anette Frank
JBI 2025
doi

Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan, Arnab Sen Sharma, Silvio Amir, David Bau, Byron C. Wallace
EMNLP 2025 Findings
ACL Anthology

Data Attribution, Memorization & Retrieval-Augmented Generation

Related to that, but more from a NLP/XAI perspective, I’ve been tracking plenty of papers around the topic of data-centric explanations, memorization, context usage, and the interplay between parametric knowledge and external knowledge.

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz, Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt, Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Rock Yuren Pang, Pang Wei Koh, Noah A. Smith, Sophie Lebrecht, Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi, Jesse Dodge
ACL 2025 System Demonstrations
ACL Anthology

Reason to Rote: Rethinking Memorization in Reasoning
Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank
EMNLP 2025
ACL Anthology

A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Lovisa Hagström, Sara Vera Marjanović, Haeun Yu, Arnav Arora, Christina Lioma, Maria Maistro, Pepa Atanasova, Isabelle Augenstein
ACL 2025
ACL Anthology

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi
NAACL 2025
ACL Anthology

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
NAACL 2025
ACL Anthology

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
ICLR 2025
OpenReview

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
NeurIPS 2025
OpenReview

Search Engines in the AI Era: A Qualitative Understanding to the False Promise of Factual and Verifiable Source-Cited Responses in LLM-based Search
Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Yixin Mao, Chien-Sheng Wu
FAccT 2025
doi

DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
Cathy Jiao, Yijun Pan, Emily Xiao, Daisy Sheng, Niket Jain, Hanzhang Zhao, Ishita Dasgupta, Jiaqi W. Ma, Chenyan Xiong
NeurIPS 2025 Datasets and Benchmarks
OpenReview

Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Jingcheng Niu, Subhabrata Dutta, Ahmed Elshabrawy, Harish Tayyar Madabushi, Iryna Gurevych
TMLR 2025
OpenReview

On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo, Noah A. Smith, Sarah Wiegreffe, Yanai Elazar
ICLR 2025
OpenReview

AbsenceBench: Language Models Can’t Tell What’s Missing
Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore, Peter West, Chenhao Tan, Ari Holtzman
NeurIPS 2025 Datasets and Benchmarks
OpenReview

NLG Evaluation

One paper project I’m currently involved in is on evaluating generated text in a more general sense. I’ve come across fascinating work this year dealing with persona prompting, readability metrics, and considerations about performing human evaluation.

Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations
Chaitanya Malaviya, Joseph Chee Chang, Dan Roth, Mohit Iyyer, Mark Yatskar, Kyle Lo
TACL 2025
doi

Evaluating the Evaluators: Are readability metrics good measures of readability?
Isabel Cachola, Daniel Khashabi, Mark Dredze
EMNLP 2025
ACL Anthology

Natural Language Processing RELIES on Linguistics
Juri Opitz, Shira Wein, Nathan Schneider
Computational Linguistics 2025
doi

The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models
Marlene Lutz, Indira Sen, Georg Ahnert, Elisa Rogers, Markus Strohmaier
EMNLP 2025 Findings
ACL Anthology

Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
Sanjana Ramprasad, Byron C. Wallace
NeurIPS 2025
OpenReview

How to Select Datapoints for Efficient Human Evaluation of NLG Models?
Vilém Zouhar, Peng Cui, Mrinmaya Sachan
TACL 2025
doi

A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies
Alicia DeVrio, Myra Cheng, Lisa Egede, Alexandra Olteanu, Su Lin Blodgett
CHI 2025
doi

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Aparna Elangovan, Jongwoo Ko, Lei Xu, Mahsa Elyasi, Ling Liu, Sravan Babu Bodapati, Dan Roth
ICLR 2025
OpenReview

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernåndez, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni
ACL 2025
ACL Anthology

Topic Modelling & Data Augmentation

Another one of my ongoing collaborations is related to augmenting data and topic models. I noticed a few exciting papers that provided me with great ideas.

Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Catherine Yeh, Donghao Ren, Yannick Assogba, Dominik Moritz, Fred Hohman
CHI 2025
doi

Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of Topic Models
Zongxia Li, Lorena Calvo-Bartolomé, Alexander Hoyle, Daniel Stephens, Paiheng Xu, Alden Dima, Juan Francisco Fung, Jordan Boyd-Graber
ACL 2025
ACL Anthology

Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline
Shivani Kapania, Stephanie Ballard, Alex Kessler, Jennifer Wortman Vaughan
FAccT 2025
doi

Hallucination Detection

A few colleagues have reached out to me about detecting hallucinations. There are quite a few works that made it more tangible for me and proposed some cool ideas to approach it, often involving some interpretability method. It also often overlaps with factuality evaluation and the section on context usage and parametric knowledge.

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, Neel Nanda
ICLR 2025
OpenReview

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Abhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi
ACL 2025
ACL Anthology

Precise Information Control in Long-Form Text Generation
Jacqueline He, Howard Yen, Margaret Li, Shuyue Stella Li, Zhiyuan Zeng, Weijia Shi, Yulia Tsvetkov, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer
NeurIPS 2025
OpenReview

Model-based Annotation & LLMs for Scientific Research

Two remarkable avenues of research that I didn’t really expect to be possible if you had asked me a few years ago: Using models for complex annotation tasks and even as “research agents”. A few notable papers made it on my list this year.

Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić, Tijana Zrnic, Cinoo Lee, Emmanuel J. Candùs, Dan Jurafsky
NAACL 2025
ACL Anthology

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
ICLR 2025
OpenReview

LLMs as Research Tools: A Large Scale Survey of Researchers’ Usage and Perceptions
Zhehui Liao, Maria Antoniak, Inyoung Cheong, Evie Yu-Yen Cheng, Ai-Heng Lee, Kyle Lo, Joseph Chee Chang, Amy X. Zhang
COLM 2025
OpenReview

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
Nitay Calderon, Roi Reichart, Rotem Dror
ACL 2025
ACL Anthology

Other

There are a few papers that I can’t really fit into any of the buckets, but I might come back to them in some fashion. All of them seem to be highly relevant to the current state of research.

Benchmarking Failures in Tool-Augmented Language Models
Eduardo Treviño, Hugo Contant, James Ngai, Graham Neubig, Zora Zhiruo Wang
NAACL 2025
doi

FlexOLMo: Open Language Models for Flexible Data Use
Weijia Shi, Akshita Bhagia, Kevin Farhat, Niklas Muennighoff, Jacob Morrison, Pete Walsh, Dustin Schwenk, Shayne Longpre, Jake Poznanski, Allyson Ettinger, Daogao Liu, Margaret Li, Dirk Groeneveld, Mike Lewis, Wen-tau Yih, Luca Soldaini, Kyle Lo, Noah A. Smith, Luke Zettlemoyer, Pang Wei Koh, Hannaneh Hajishirzi, Ali Farhadi, Sewon Min
NeurIPS 2025
OpenReview

Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Shuang Ma, Xiang Kong, Meng Cao, Graham Neubig, Tongshuang Wu
NeurIPS 2025
OpenReview