Explainability for Large Language Models

When and where the weekly meetings will take place

  • TU Berlin, QU Lab
  • Tuesdays 16:00 – 18:00
  • Room: Virtual via Microsoft Teams (link is available via the TU’s ISIS course page; final meeting for group presentations will take place on campus, room TBD)

Description

Large language models like ChatGPT, Gemini and Llama for natural language processing tasks (question answering, dialogue, text classification, machine translation, etc.) are becoming increasingly commonplace and there is a growing concern if these models are responsible to use if they are poorly understood by humans (including machine learning experts).
Explaining these models, i.e. developing computational approaches to explain the deep neural networks, helps to address the safety and ethical concerns and is essential for accountability. In the first weeks of this project, I will give introductory lectures, and we’ll go over desiderata of explanations, prompting-based approaches, data attribution, and understanding of Transformers. They will be based on tutorials presented at NAACL 2024 and EACL 2024:

Our goal is the implementation of explainability methods and subsequent analysis and evaluation of the results. Special focus should be given to the implications on usability of various types of explanations and to the application to downstream tasks like question answering, fact checking, commonsense reasoning, and text classification.

Project requirements

  • Intermediate or advanced knowledge of Python
  • Prior experience in at least one previous course/project with neural natural language processing or deep learning
  • Prior experience in at least one previous course/project with PyTorch or TensorFlow and Hugging Face
  • Course language: English (student consultations can optionally be done in German)

Eligible modules

Schedule

2024-10-14 | Deadline for application (Google Forms)
2024-10-18 | Acceptance notification
2024-10-21 | Moses registration deadline (14:00)
2024-10-22 | Introduction (Part 1); Proposal of topics
2024-10-29 | Introduction (Part 2); Forming of groups and assignment of topics
2024-11-05 | First individual group meetings (exact dates will be set for each group depending on their preferred slot)
2024-12-20 | Last possible individual group meeting before christmas break
2025-01-07 | First possible individual group meeting after christmas break
2025-02-04 | Final presentations (in-person, room TBD)
2025-03-30 | Submission deadline for written project reports

Possible individual group meeting dates

  • Tuesdays 14:00 – 15:00
  • Tuesdays 15:00 – 16:00
  • Tuesdays 16:00 – 17:00
  • Tuesdays 17:00 – 18:00
  • Fridays 16:00 – 17:00
  • Fridays 17:00 – 18:00