EMNLP 2024 Workshop BlackBoxNLP Submissions

Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang Cai
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
How do LLMs deal with Syntactic Conflicts in In-context-learning ?
Nahyun Kim
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models
Davide Ghilardi, Federico Belotti, Marco Molinari, Jaehyuk Lim
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
Yusuke Sakai, Adam Nohejl, JIANGNAN HANG, Hidetaka Kamigaito, Taro Watanabe
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Mechanistic?
Naomi Saphra, Sarah Wiegreffe
- Published: 21 Sept 2024, Last Modified: 20 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Investigating Layer Importance in Large Language Models
Yang Zhang, Yanfei Dong, Kenji Kawaguchi
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi, Yadollah Yaghoobzadeh
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
IvRA: A Framework to Enhance Attention-Based Explanations for Language Models with Interpretability-Driven Training
Sean Xie, Soroush Vosoughi, Saeed Hassanpour
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Exploring the Recall of Language Models: Case Study on Molecules
Knarik Mheryan, Hasmik Mnatsakanyan, Philipp Guevorguian, Hrant Khachatrian
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Adib Hasan, Ileana Rugina, Alex Wang
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Can One Token Make All the Difference? Forking Paths in Autoregressive Text Generation
Eric J Bigelow, Ari Holtzman, Hidenori Tanaka, Tomer Ullman
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed, Can Rager, Arthur Conmy
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Mind Your Manners: Detoxifying Language Models via Attention Head Intervention
Jordan Nikolai Pettyjohn, Nathaniel C Hudson, Mansi Sakarvadia, Aswathy Ajith, Kyle Chard
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Quantifying reliance on external information over parametric knowledge during Retrieval Augmented Generation (RAG) using mechanistic analysis
Reshmi Ghosh, Rahul Seetharaman, Hitesh Wadhwa, Somyaa Aggarwal, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Do Metadata and Appearance of the Retrieved Webpages Affect LLM's Reasoning in Retrieval-Augmented Generation?
Cheng-Han Chiang, Hung-yi Lee
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions
Seyedali Mohammadi, Edward Raff, Jinendra Malekar, Vedant Palit, Francis Ferraro, Manas Gaur
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Copy Suppression: Comprehensively Understanding a Motif in Language Model Attention Heads
Callum Stuart McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
How Language Models Prioritize Contextual Grammatical Cues?
Hamidreza Amirzadeh, Afra Alishahi, Hosein Mohebbi
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Self-Assessment Tests are Unreliable Measures of LLM Personality
Akshat Gupta, Xiaoyang Song, Gopala Anumanchipalli
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, Janos Kramar, Anca Dragan, Rohin Shah, Neel Nanda
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models
Carina Kauf, Emmanuele Chersoni, Alessandro Lenci, Evelina Fedorenko, Anna A Ivanova
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Clusters Emerge in Transformer-based Causal Language Models
Xinbo Wu, Lav R. Varshney
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Róbert Csordás, Christopher Potts, Christopher D Manning, Atticus Geiger
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
ToxiSight: Insights Towards Detected Chat Toxicity
Zachary Yang, Domenico Tullo, Reihaneh Rabbany
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone
How Does Code Pretraining Affect Language Model Task Performance?
Jackson Petty, Sjoerd van Steenkiste, Tal Linzen
- Published: 21 Sept 2024, Last Modified: 06 Oct 2024
- BlackboxNLP 2024
- Readers: Everyone