Hello, I am Yihuai Hong, a fourth-year undergraduate student at South China University of Technology. My research interests lie in Natural Language Processing and Language Models.

My research story revolves around the Mechanistic Interpretability and Understanding of LLMs, especially how “Knowledge” and “Reasoning” are formed and processed in LLMs, which can be further mapped to the other adjacent areas such as LLM Safety, Knowledge Editing, various Reasoning capabilities of LLMs and so on:

  • Interpretability × Memorized Knowledge
  • Interpretability × Reasoning [Things in Progress..]

This year I am very fortunate to work with Prof. Mor Geva Pipek from Tel Aviv University and Google Research on LLM Unlearning, and with Prof. Zhijin Jin from University of Toronto on Mechanistic understanding of Reasoning. Now I am also a researcher at Alibaba DAMO Academy working with Dr. Lidong Bing and Dr. Wenxuan Zhang. Last year, I interned at UCL AI Centre working on Knowledge Editing with Prof. Aldo Lipani and also worked with Dr. Haiqin Yang. I started my research career in the second year of undergraduate studies supervised by Prof. Ziqian Zeng in SCUT. More information in my CV.

I am actively looking for 2025 Fall Ph.D. opportunities now :)

🔥 News

  • 2024.09:  🎉🎉 My two new 1st-author papers both have been accepted to EMNLP 2024 Main! “Dissecting Fine-Tuning Unlearning in Large Language Models” and “Interpretability-based Tailoblack Knowledge Editing in Transformers”! Feel so grateful to all my mentors and collaborators. See you in Miami!
  • 2024.06:  🚀🚀 Please check my newest paper! Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces This is the first-ever parametric LLM Unlearning Benchmark! Thanks to the guidance from Prof. Mor Geva and the help of other collaborators.
  • 2023.12:  🎉🎉 My first work is accepted to AAAI 2024 main track and I also won the AAAI-24 Student Scholarship! I am genuinely thankful to Prof. Zeng for guiding me and for her help along this path!

📝 Research

Preprint
sym

Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces

Yihuai Hong, Lei Yu, Shauli Ravfogel, Haiqin Yang, Mor Geva

Website / Arxiv / GitHub / Huggingface / Twitter

Preprint, 2024.06

  • Our findings reveal that current unlearning methods only modify the model’s behavior without truly erasing the encoded knowledge in its parameters.
  • To address this, we present the ConceptVectors Benchmark, where each vector is closely tied to a specific concept. It consists of 285 concept vectors on two open-source LLMs.
  • Directly ablating these vectors demonstrably removes the associated knowledge from the LLMs and significantly blackuces their susceptibility to adversarial manipulation.
EMNLP 2024 Main
sym

Dissecting Fine-Tuning Unlearning in Large Language Models

Yihuai Hong, Yuelin Zou, Lijie Hu, Ziqian Zeng, Di Wang, Haiqin Yang

Arxiv / GitHub

EMNLP 2024 Main(Oral presentation)

In this paper, we delve into the limitations of fine-tuning-based unlearning through activation patching and parameter restoration experiments. Our findings reveal that these methods alter the model’s knowledge retrieval process, providing further evidence that they do not genuinely erase the problematic knowledge embedded in the model parameters. Instead, the coefficients generated by the MLP components in the model’s final layer are the primary contributors to these seemingly positive unlearning effects, playing a crucial role in controlling the model’s behaviors.

EMNLP 2024 Main
sym

Interpretability-based Tailored Knowledge Editing in Transformers

Yihuai Hong, Aldo Lipani

EMNLP 2024 Main (will be released soon)

Our work explores the instability in in-context learning-based Knowledge Editing outcomes, providing insights into its reasons and distinctions from other Knowledge Editing methods. Leveraging findings on the critical role of feed-forward MLPs in decoder-only models, we propose a tailored knowledge editing method, TailoredKE, that considers the unique information flow of each sample. Model interpretability reveals diverse attribute recall across transformer layers, guiding edits to specific features at different depths and mitigating over-editing issues.

AAAI 2024
sym

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Ziqian Zeng*, Yihuai Hong*, Huiping Zhuang, Cen Chen, HongLiang Dai

Arxiv / GitHub

AAAI 2024 Main Track

  • We propose an early exiting method that can achieve consistency during training and inference by formulating the early exiting problem as a reinforcement learning problem.
  • We propose a concept named Memorized Layer to measure the hardness of an instance. We incorporate it into the reward function to allow an instance to balance the accuracy and acceleration depending on individual hardness.
  • The experimental results show that our method can outperform other baselines on natural language understanding and generation tasks.

📖 Educations

  • 2020.09 - 2024.06, Bachelor of Engineering, School of Computer Science and Engineering, South China University of Technology

💻 Internships

🎖 Honors and Awards

  • 2023.12 AAAI-24 Student Scholarship
  • 2023.11 Top Ten Excellent Students Nomination Award of South China University of Technology
  • 2023.09 China National Scholarship (top 0.1%)
  • 2023.05 Meritorious Winner of The Mathematical Contest in Modeling (MCM)
  • 2021.07 Kaggle Silver medal (Top 5%) - CommonLit Readability Prize: Rate the complexity of literary passages for grades 3-12 classroom use Kaggle
  • 2022.03 Kaggle Bronze medal (Top 6%) - Evaluating Student Writing: Analyze argumentative writing elements from students grades 6-12

🙋‍♂️ Academic Services

  • Program Committee: ICLR (2025), AAAI (2024), ACL ARR (Feb. 2024 - June. 2024)

📚 Patents

  • 2022.09 Self-supervised pre-training method, system and medium for Chinese Pinyin spelling correction.