Toggle navigation
OpenReview
.net
Login
×
Back to
EMNLP
EMNLP 2024 Workshop BlackBoxNLP Submissions
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
Xinyu Zhou
,
Delong Chen
,
Samuel Cahyawijaya
,
Xufeng Duan
,
Zhenguang Cai
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
How do LLMs deal with Syntactic Conflicts in In-context-learning ?
Nahyun Kim
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models
Davide Ghilardi
,
Federico Belotti
,
Marco Molinari
,
Jaehyuk Lim
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
Yusuke Sakai
,
Adam Nohejl
,
JIANGNAN HANG
,
Hidetaka Kamigaito
,
Taro Watanabe
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Mechanistic?
Naomi Saphra
,
Sarah Wiegreffe
Published: 21 Sept 2024, Last Modified: 20 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Investigating Layer Importance in Large Language Models
Yang Zhang
,
Yanfei Dong
,
Kenji Kawaguchi
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
,
Yadollah Yaghoobzadeh
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
IvRA: A Framework to Enhance Attention-Based Explanations for Language Models with Interpretability-Driven Training
Sean Xie
,
Soroush Vosoughi
,
Saeed Hassanpour
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Exploring the Recall of Language Models: Case Study on Molecules
Knarik Mheryan
,
Hasmik Mnatsakanyan
,
Philipp Guevorguian
,
Hrant Khachatrian
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Adib Hasan
,
Ileana Rugina
,
Alex Wang
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Can One Token Make All the Difference? Forking Paths in Autoregressive Text Generation
Eric J Bigelow
,
Ari Holtzman
,
Hidenori Tanaka
,
Tomer Ullman
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
,
Can Rager
,
Arthur Conmy
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Mind Your Manners: Detoxifying Language Models via Attention Head Intervention
Jordan Nikolai Pettyjohn
,
Nathaniel C Hudson
,
Mansi Sakarvadia
,
Aswathy Ajith
,
Kyle Chard
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Quantifying reliance on external information over parametric knowledge during Retrieval Augmented Generation (RAG) using mechanistic analysis
Reshmi Ghosh
,
Rahul Seetharaman
,
Hitesh Wadhwa
,
Somyaa Aggarwal
,
Samyadeep Basu
,
Soundararajan Srinivasan
,
Wenlong Zhao
,
Shreyas Chaudhari
,
Ehsan Aghazadeh
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Do Metadata and Appearance of the Retrieved Webpages Affect LLM's Reasoning in Retrieval-Augmented Generation?
Cheng-Han Chiang
,
Hung-yi Lee
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions
Seyedali Mohammadi
,
Edward Raff
,
Jinendra Malekar
,
Vedant Palit
,
Francis Ferraro
,
Manas Gaur
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Copy Suppression: Comprehensively Understanding a Motif in Language Model Attention Heads
Callum Stuart McDougall
,
Arthur Conmy
,
Cody Rushing
,
Thomas McGrath
,
Neel Nanda
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
How Language Models Prioritize Contextual Grammatical Cues?
Hamidreza Amirzadeh
,
Afra Alishahi
,
Hosein Mohebbi
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Self-Assessment Tests are Unreliable Measures of LLM Personality
Akshat Gupta
,
Xiaoyang Song
,
Gopala Anumanchipalli
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Tom Lieberum
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Nicolas Sonnerat
,
Vikrant Varma
,
Janos Kramar
,
Anca Dragan
,
Rohin Shah
,
Neel Nanda
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models
Carina Kauf
,
Emmanuele Chersoni
,
Alessandro Lenci
,
Evelina Fedorenko
,
Anna A Ivanova
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Clusters Emerge in Transformer-based Causal Language Models
Xinbo Wu
,
Lav R. Varshney
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Róbert Csordás
,
Christopher Potts
,
Christopher D Manning
,
Atticus Geiger
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
ToxiSight: Insights Towards Detected Chat Toxicity
Zachary Yang
,
Domenico Tullo
,
Reihaneh Rabbany
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
How Does Code Pretraining Affect Language Model Task Performance?
Jackson Petty
,
Sjoerd van Steenkiste
,
Tal Linzen
Published: 21 Sept 2024, Last Modified: 06 Oct 2024
BlackboxNLP 2024
Readers:
Everyone
«
‹
1
2
›
»