Explaining Knowledge Conflicts and Factual Errors (of Temporal Generalization) in LLM Generations
How can we expose and express knowledge conflicts in LLMs resulting from poor temporal generalization?

[1] DYNAMICQA (Marjanović et al., EMNLP 2024 Findings)
[2] Survey on Factuality Challenges (Augenstein et al., Nature Machine Intelligence 2024)
[3] Unfaithful Explanations in CoT Prompting (Turpin et al., NeurIPS 2023)
[4] Interventions for Explaining Factual Associations (Geva et al., EMNLP 2023)
[5] Self-Bias in LLMs (Xu et al., ACL 2024)
[6] Mismatches between Token Probabilities and LLM Outputs (Wang et al., ACL 2024 Findings)
[7] Resolving Knowledge Conflicts (Wang et al., COLM 2024)
[8] SAT Probe (Yuksekgonul et al., ICLR 2024)
[9] MONITOR metric (Wang et al., NAACL 2024)


Conversational Model Refinement

  1. Can we elicit expert human feedback using targeted question generation in a mixed-initiative dialogue setting?
  2. Can we use human feedback to natural language explanations to improve the model performance and align it to user preferences?

[1] Compositional Explanations (Yao et al., NeurIPS 2021)
[2] Digital Socrates (Gu et al., ACL 2024)
[3] Explanation Formats (Malaviya et al., NAACL 2024)
[4] FeedbackQA (Li et al., ACL 2022 Findings)
[5] Synthesis Step by Step (Wang et al., EMNLP 2023 Findings)


Simplifying Outcomes of Language Model Component Analyses
How can results and findings from LM component analysis and mechanistic interpretability studies that are very hard to comprehend for non-experts be simplified and illustrated?

[1] NLEs for Neurons (Huang et al., BlackboxNLP 2023)
[2] Summarize and Score (Singh et al., 2023)
[3] Interpreting the Semantic Flow with VISIT (Katz & Belinkov, EMNLP 2023 Findings)
[4] Knowledge-Critical Subnetworks (Bayazit et al., EMNLP 2024)
[5] Function Vectors (Todd et al., ICLR 2024)
[6] LM Transparency Tool (Tufanov et al., ACL 2024 Demos)
[7] Primer on Component Analysis Methods (Ferrando et al., 2024)
[8] Mechanistic? (Saphra & Wiegreffe, BlackboxNLP @ EMNLP 2024)"


The Mindful Mechanic: Interpreting LLMs’ Decision-Making in Tool Use
API-calling and tool use [1, 2] are expected properties of performant LLM and can often be found in evaluations and benchmarks [3, 4, 5] because the models rely on external knowledge sources and calculations to ensure temporal generalization and factual correctness. However, it remains unclear what parts of a prompt and which mechanisms within LLMs are responsible for deciding when and which tool or API should be used for the next generation step. In a comprehensive study across both instruction-tuned and out-of-the-box LLMs, we examine the decision-making in tool use benchmarks with interpretability methods offering information flow routes [6] and feature attributions [7].

[1] Toolformer (Schick et al., ICLR 2023)
[2] Chameleon (Lu et al., NeurIPS 2023)
[3] ToolLLM (Qin et al., ICLR 2024)
[4] "What Are Tools Anyway?" Survey (Wang et al., COLM 2024)
[5] TACT dataset (Caciularu et al., NeurIPS 2024 D&B)
[6] LM Transparency Tool (Tufanov et al., ACL 2024 Demos)
[7] Inseq (Sarti et al., ACL 2023 Demos)

Explaining Blind Spots of Model-Based Evaluation Metrics for Text Generation

[1] Blindspot NLG (He, Zhang et al., ACL 2023)
[2] AdvEval (Chen et al., ACL 2024 Findings)
[3] LLM Comparative Assessment (Liusie et al., EACL 2024)
[4] Explainable Evaluation Metrics for MT (Leiter et al., JMLR 2024)
[5] TICKing All the Boxes (Cook et al., 2024)
[6] ROSCOE (Golovneva et al., ICLR 2023)
[7] RORA (Jiang, Lu et al., ACL 2024)

Analyzing User Behavior in Explanatory Fact Checking Systems
How can we measure and mitigate human overreliance on persuasive language in the fact checking domain and explanations generated by LLM-based fact checking systems?

[1] LLMs Help Humans Verify Truthfulness (Si et al., NAACL 2024)
[2] Explanations Can Reduce Overreliance (Vasconcelos et al., CSCW 2023)
[3] Explanations to Prevent Overtrust (Mohseni et al., ICWSM 2021)
[4] Explanation Details Affecting Human Performance (Linder et al., Applied AI Letters 2021)
[5] Perception of Explanations in Subjective Decision-Making (Ferguson et al., CHI 2024 TREW)
[6] Role of XAI in Collaborative Disinformation Detection (Schmitt et al., FAccT 2024)
[7] Belief Bias and Explanations (González et al., ACL 2021 Findings)