Toggle navigation
OpenReview
.net
Login
×
Back to
AAAI
AAAI 2025 Workshop DATASAFE Submissions
The Steganographic Potentials of Language Models
Artem Karpov
,
Tinuade Adeleke
,
Seong Hah Cho
,
Natalia Perez-Campanero
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop Poster
Readers:
Everyone
Federated Unlearning via Subparameter Space Partitioning and Selective Freezing
Krishna Yadav
,
Varala Nandu Swapnik
,
Kwok Tai Chui
,
Brij Bhooshan Gupta
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Andrey Anurin
,
Jonathan Ng
,
Jason Hoelscher-Obermaier
,
Esben Kran
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
Safety First: a Dataset of Harmful Task Plans for Robots
Zainab Altaweel
,
Mohaiminul Al Nahian
,
Isaac Lehrer
,
Adnan Siraj Rakin
,
Shiqi Zhang
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop Poster
Readers:
Everyone
NATURALADV: An Exploratory Framework to Balance Adversarial Strength and Stealth in Autonomous Driving Environments
Meriel von Stein
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop Poster
Readers:
Everyone
What is Real Anymore? An AI/ML Image Dataset Using Authenticity Validation and Traceable Origins for Every Data Instance
Andrew McDonald
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop Poster
Readers:
Everyone
Evaluating Precise Geolocation Inference Capabilities of Vision Language Models
Neel Jay
,
Hieu Minh Nguyen
,
Hoang Trung Dung
,
Jacob Haimes
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection
Delyan Boychev
,
Radostin Cholakov
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
Data-Centric Safety and Ethical Measures for Data and AI Governance
Srija Chakraborty
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
Changing Answer Order Can Decrease MMLU Accuracy
Vipul Gupta
,
David Pantoja
,
Candace Ross
,
Adina Williams
,
Megan Ung
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
Subversion Strategy Eval: Evaluating AI’s stateless strategic capabilities against control protocols
Alex Troy Mallen
,
Charlie Griffin
,
Alessandro Abate
,
Buck Shlegeris
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
HumanAgencyBench: Do Language Models Support Human Agency?
Benjamin Sturgeon
,
Leo Hyams
,
Daniel Samuelson
,
Ethan Vorster
,
Jacob Haimes
,
Jacy Reese Anthis
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
Preference Poisoning Attacks on Reward Model Learning
Junlin Wu
,
Jiongxiao Wang
,
Chaowei Xiao
,
Chenguang Wang
,
Ning Zhang
,
Yevgeniy Vorobeychik
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone
DarkBench: Benchmarking Dark Patterns in Large Language Models
Esben Kran
,
Hieu Minh Nguyen
,
Akash Kundu
,
Sami Jawhar
,
Jinsuk Park
,
Mateusz Maria Jurewicz
Published: 16 Dec 2024, Last Modified: 20 Feb 2025
airrworkshop OralandPoster
Readers:
Everyone