Hello, I am Yihuai Hong, a fourth-year undergraduate student at South China University of Technology. My research interests lie in Natural Language Processing and Language Models.
My research story revolves around the Mechanistic Interpretability and Understanding of LLMs, especially how “Knowledge” and “Reasoning” are formed and processed in LLMs, which can be further mapped to the other adjacent areas such as LLM Safety, Knowledge Editing, various Reasoning capabilities of LLMs and so on:
- Interpretability × Memorized Knowledge
- LLM Safety (LLM Unlearning)
- Parametric Knowledge Benchmark [ConceptVectors]
- Dissect Finetuning Unlearning [EMNLP 2024 Main]
- LLM Unlearning methods [Things in Progress..]
- Knowledge Editing [EMNLP 2024 Main]
- LLM Efficient Inference [AAAI 2204 main track]
- LLM Safety (LLM Unlearning)
- Interpretability × Reasoning [Things in Progress..]
This year I am very fortunate to work with Prof. Mor Geva Pipek from Tel Aviv University and Google Research on LLM Unlearning, and with Prof. Zhijin Jin from University of Toronto on Mechanistic understanding of Reasoning. Now I am also a researcher at Alibaba DAMO Academy working with Dr. Lidong Bing and Dr. Wenxuan Zhang. Last year, I interned at UCL AI Centre working on Knowledge Editing with Prof. Aldo Lipani and also worked with Dr. Haiqin Yang. I started my research career in the second year of undergraduate studies supervised by Prof. Ziqian Zeng in SCUT. More information in my CV.
I am actively looking for 2025 Fall Ph.D. opportunities now :)
🔥 News
- 2024.09: 🎉🎉 My two new 1st-author papers both have been accepted to EMNLP 2024 Main! “Dissecting Fine-Tuning Unlearning in Large Language Models” and “Interpretability-based Tailoblack Knowledge Editing in Transformers”! Feel so grateful to all my mentors and collaborators. See you in Miami!
- 2024.06: 🚀🚀 Please check my newest paper! Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces This is the first-ever parametric LLM Unlearning Benchmark! Thanks to the guidance from Prof. Mor Geva and the help of other collaborators.
- 2023.12: 🎉🎉 My first work is accepted to AAAI 2024 main track and I also won the AAAI-24 Student Scholarship! I am genuinely thankful to Prof. Zeng for guiding me and for her help along this path!
📝 Research
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Yihuai Hong, Lei Yu, Shauli Ravfogel, Haiqin Yang, Mor Geva
Website / Arxiv / GitHub / Huggingface / Twitter
Preprint, 2024.06
- Our findings reveal that current unlearning methods only modify the model’s behavior without truly erasing the encoded knowledge in its parameters.
- To address this, we present the ConceptVectors Benchmark, where each vector is closely tied to a specific concept. It consists of 285 concept vectors on two open-source LLMs.
- Directly ablating these vectors demonstrably removes the associated knowledge from the LLMs and significantly blackuces their susceptibility to adversarial manipulation.
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong, Yuelin Zou, Lijie Hu, Ziqian Zeng, Di Wang, Haiqin Yang
EMNLP 2024 Main(Oral presentation)
In this paper, we delve into the limitations of fine-tuning-based unlearning through activation patching and parameter restoration experiments. Our findings reveal that these methods alter the model’s knowledge retrieval process, providing further evidence that they do not genuinely erase the problematic knowledge embedded in the model parameters. Instead, the coefficients generated by the MLP components in the model’s final layer are the primary contributors to these seemingly positive unlearning effects, playing a crucial role in controlling the model’s behaviors.
Interpretability-based Tailored Knowledge Editing in Transformers
Yihuai Hong, Aldo Lipani
EMNLP 2024 Main
Our work explores the instability in in-context learning-based Knowledge Editing outcomes, providing insights into its reasons and distinctions from other Knowledge Editing methods. Leveraging findings on the critical role of feed-forward MLPs in decoder-only models, we propose a tailored knowledge editing method, TailoredKE, that considers the unique information flow of each sample. Model interpretability reveals diverse attribute recall across transformer layers, guiding edits to specific features at different depths and mitigating over-editing issues.
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng*, Yihuai Hong*, Huiping Zhuang, Cen Chen, HongLiang Dai
AAAI 2024 Main Track
- We propose an early exiting method that can achieve consistency during training and inference by formulating the early exiting problem as a reinforcement learning problem.
- We propose a concept named Memorized Layer to measure the hardness of an instance. We incorporate it into the reward function to allow an instance to balance the accuracy and acceleration depending on individual hardness.
- The experimental results show that our method can outperform other baselines on natural language understanding and generation tasks.
📖 Educations
- 2020.09 - 2024.06, Bachelor of Engineering, School of Computer Science and Engineering, South China University of Technology
💻 Internships
- 2024.07 - present, Research Intern in University of Toronto, Supervisor: Prof. Zhijin Jin, Canada(Remote).
- 2024.09 - present, Research Intern in Alibaba DAMO Academy, Supervisor: Dr. Lidong Bing and Dr. Wenxuan Zhang, Hangzhou, China.
- 2024.02 - 2024.08, Research Intern in Tel Aviv University, Supervisor: Prof. Mor Geva, Israel(Remote).
- 2023.11 - 2024.02, Research Intern in China International Digital Economy Academy, Supervisor: Prof. Haiqin Yang, Shenzhen, China.
- 2023.06 - 2023.12, Research Intern in UCL AI Centre & Web Intelligence Group, Supervisor: Prof. Aldo Lipani, United Kingdom.
- 2022.06 - 2023.08, Research Intern in South China University of Technology, Supervisor: Prof. Ziqian Zeng, Guangzhou, China.
🎖 Honors and Awards
- 2023.12 AAAI-24 Student Scholarship
- 2023.11 Top Ten Excellent Students Nomination Award of South China University of Technology
- 2023.09 China National Scholarship (top 0.1%)
- 2023.05 Meritorious Winner of The Mathematical Contest in Modeling (MCM)
- 2021.07 Kaggle Silver medal (Top 5%) - CommonLit Readability Prize: Rate the complexity of literary passages for grades 3-12 classroom use Kaggle
- 2022.03 Kaggle Bronze medal (Top 6%) - Evaluating Student Writing: Analyze argumentative writing elements from students grades 6-12
🙋♂️ Academic Services
- Program Committee: ICLR (2025), AAAI (2024), ACL ARR (Feb. 2024 - June. 2024)
📚 Patents
- 2022.09 Self-supervised pre-training method, system and medium for Chinese Pinyin spelling correction.