08/27 |
Introduction |
Syllabus |
|
08/29 |
Securing AI Coding Assistants |
Constrained Decoding for Secure Code Generation [Slides] |
Amazon Trusted AI Challenge
Practical Attacks against Black-box Code Completion Engines
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
|
09/03 |
Background |
[Slides]
|
Computer Security Textbook from UC Berkeley
|
09/05 |
Secure Code Generation |
Instruction Tuning for Secure Code Generation [Slides]
|
Large Language Models for Code: Security Hardening and Adversarial Testing
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models
|
09/10 |
SWE Agent |
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [Slides]
|
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
ReAct: Synergizing Reasoning and Acting in Language Models
|
09/12 |
Cybersecurity Risks of LLM Agents |
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models [Slides]
|
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
An Empirical Evaluation of LLMs for Solving Offensive Security Challenges
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
|
09/17 |
Copyright |
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
[Slides]
|
Detecting Pretraining Data from Large Language Models
Fantastic Copyrighted Beasts and How (Not) to Generate Them
|
09/19 |
Copyright |
Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs [Slides]
|
Counterfactual Memorization in Neural Language Models
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy
|
09/24 |
RAG Poisoning |
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
[Slides]
|
Certifiably Robust RAG against Retrieval Corruption
|
09/26 |
Backdoor Detection Competition Report |
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs [Slides]
|
Universal Jailbreak Backdoors from Poisoned Human Feedback
|
10/01 |
Logic Fallacy |
NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection [Slides]
|
|
10/03 |
Prompt Injection Attacks and Defenses |
Formalizing and Benchmarking Prompt Injection Attacks and Defenses [Slides]
|
Universal and Transferable Adversarial Attacks on Aligned Language Models
Black Box Adversarial Prompting for Foundation Models
StruQ: Defending Against Prompt Injection with Structured Queries
|
10/08 |
Attacks against Code Generation |
Practical Attacks against Black-box Code Completion Engines [Slides]
|
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models
|
10/10 |
Code Language Models |
Guest Speaker: Yangruibo (Robin) Ding
|
|
10/15 |
Jailbreaking LLMs: Attack, Defense, and Theory |
Guest Speaker: Eric Wong |
|
10/17 |
Mid-term Take-home Exam |
|
|
10/22 |
LLM Agents Exploiting Web Applications |
LLM Agents can Autonomously Exploit One-day Vulnerabilities [Slides] |
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
|
10/24 |
LLM for Static Vulnerability Detection |
Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection [Slides]
|
Vulnerability Detection with Code Language Models: How Far Are We?
ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software
|
10/29 |
Formal Assurance of AI Agents |
AI Agents with Formal Security Guarantees [Slides]
|
|
10/31 |
Security of AI Agents |
Security of AI Agents [Slides]
|
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
|
11/05 |
Debugging Capabilities
Mid-term Project Report Due
|
DebugBench: Evaluating Debugging Capability of Large Language Models
[Slides]
|
|
11/07 |
Patch Security Issues |
Automated Software Vulnerability Patching using Large Language Models
[Slides]
|
Can LLMs Patch Security Issues?
|
11/12 |
LLM for Program Analysis |
SMARTINV: Multimodal Learning for Smart Contract Invariant Inference
[Slides]
|
Can Large Language Models Reason about Program Invariants?
|
11/14 |
Safety of AI Agent |
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
[Slides]
|
|
11/19 |
Alignment & Safety |
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
[Slides]
|
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
|
11/21 |
Multi-Model AI Safety |
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
[Slides]
|
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models
|
11/26 |
Thanksgiving Break |
|
|
11/28 |
Thanksgiving Break |
|
|
12/03 |
Project Lightning Talks |
|
|
12/05 |
Project Lightning Talks |
|
|
12/10 |
Reading Day No Class |
|
|
12/12 |
Final Project Report Due |
|
|