Skip to content

mtuann/llm-updated-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 

Repository files navigation

Table of Contents

  1. Large Language Models Papers
  2. Other topics
  3. Large Language Models Papers with Code

Large Language Models Papers

This GitHub repository contains an updated list of Federated Learning papers as of May 19, 2025.

  • The resources are collected from various sources, including arXiv, NeurIPS, ICML, ICLR, ACL, EMNLP, AAAI, IJCAI, KDD, CVPR, ICCV, ECCV, NIPS, IEEE, ACM, Springer, ScienceDirect, Wiley, Nature, Science, and other top AI/ML conferences and journals.
  • For a better reading experience, visit the Shinyapps website.

Other Topics

Explore additional research papers on the following topics:


For contributions, inquiries, or suggestions, feel free to reach out via email.


If you find this application helpful and would like to support its development, you can buy me a coffee using one of the following methods:


Large Language Models Papers with Code

Due to GitHub repository limitations, this section includes only those papers that provide accompanying code, sorted by publish date. For access to the full list of papers, please visit the Shinyapps website.


No. Title Authors Publish Date Venue Code URL
1 EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models Bohao Xing, Xin Liu, Guoying Zhao, Chengyu Liu, Xiaolan Fu, Heikki Kälviäinen 2025-05-16 arXiv https://github.com/xxtars/EmotionHallucer http://arxiv.org/abs/2505.11405v1
2 Ranked Voting based Self-Consistency of Large Language Models Weiqin Wang, Yile Wang, Hui Huang 2025-05-16 arXiv https://github.com/szu-tera/RankedVotingSC http://arxiv.org/abs/2505.10772v1
3 Unifying Segment Anything in Microscopy with Multimodal Large Language Model Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan 2025-05-16 arXiv https://github.com/ieellee/uLLSAM http://arxiv.org/abs/2505.10769v1
4 GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang 2025-05-16 arXiv https://github.com/stan-lei/GODBench-ACL2025 http://arxiv.org/abs/2505.11436v1
5 GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction Mohammadtaha Bagherifard, Sahar Rajabi, Ali Edalat, Yadollah Yaghoobzadeh 2025-05-16 arXiv https://github.com/saharsamr/Modular-LLM http://arxiv.org/abs/2505.10939v1
6 AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents Julius Henke 2025-05-15 arXiv https://github.com/JuliusHenke/autopentest http://arxiv.org/abs/2505.10321v1
7 Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M Dario Di Palma, Felice Antonio Merra, Maurizio Sfilio, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia 2025-05-15 arXiv https://github.com/sisinflab/LLM-MemoryInspector http://arxiv.org/abs/2505.10212v1
8 From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models Yidan Wang, Yubing Ren, Yanan Cao, Binxing Fang 2025-05-15 arXiv https://github.com/redwyd/SymMark http://arxiv.org/abs/2505.09924v2
9 ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts Jing-Cheng Pang, Kaiyuan Li, Yidi Wang, Si-Hang Yang, Shengyi Jiang, Yang Yu 2025-05-15 arXiv https://github.com/LAMDA-RL/ImagineBench http://arxiv.org/abs/2505.10010v1
10 PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, Binxing Fang 2025-05-15 arXiv https://github.com/redwyd/PrivacyJailbreak http://arxiv.org/abs/2505.09921v2
11 LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models Long Chen, Xiaotian Song, Yanan Sun 2025-05-14 arXiv https://github.com/lc783/LAS http://arxiv.org/abs/2505.09659v1
12 Adversarial Attack on Large Language Models using Exponentiated Gradient Descent Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu 2025-05-14 arXiv https://github.com/sbamit/Exponentiated-Gradient-Descent-LLM-Attack http://arxiv.org/abs/2505.09820v1
13 CodePDE: An Inference Framework for LLM-driven PDE Solver Generation Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar 2025-05-13 arXiv https://github.com/LithiumDA/CodePDE http://arxiv.org/abs/2505.08783v1
14 Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song 2025-05-13 arXiv https://github.com/valuebyte-ai/Awesome-LLM-Psychometrics http://arxiv.org/abs/2505.08245v1
15 Optimized Couplings for Watermarking Large Language Models Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Hsiang Hsu, Haim Permuter, Flavio P. Calmon 2025-05-13 arXiv https://github.com/Carol-Long/CC_Watermark http://arxiv.org/abs/2505.08878v1
16 Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era Xixuan Hao, Yutian Jiang, Xingchen Zou, Jiabo Liu, Yifang Yin, Yuxuan Liang 2025-05-13 arXiv https://github.com/CityMind-Lab/Awesome-Location-Intelligence http://arxiv.org/abs/2505.09651v1
17 HealthBench: Evaluating Large Language Models Towards Improved Human Health Rahul K. Arora, Jason Wei, Rebecca Soskin Hicks, Preston Bowman, Joaquin Quiñonero-Candela, Foivos Tsimpourlas, Michael Sharman, Meghan Shah, Andrea Vallone, Alex Beutel, Johannes Heidecke, Karan Singhal 2025-05-13 arXiv https://github.com/openai/simple-evals http://arxiv.org/abs/2505.08775v1
18 A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao Shi, Jianping Fan, Xuanjing Huang 2025-05-12 arXiv https://github.com/Junjie-Ye/MulDimIF http://arxiv.org/abs/2505.07591v1
19 Are LLMs complicated ethical dilemma analyzers? Jiashen, Du, Jesse Yao, Allen Liu, Zhekai Zhang 2025-05-12 arXiv https://github.com/ALT-JS/ethicaLLM http://arxiv.org/abs/2505.08106v1
20 DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han 2025-05-12 arXiv https://github.com/GasolSun36/DynamicRAG http://arxiv.org/abs/2505.07233v2
21 Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs Yifan Wei, Xiaoyan Yu, Tengfei Pan, Angsheng Li, Li Du 2025-05-12 arXiv https://github.com/weiyifan1023/senator http://arxiv.org/abs/2505.07184v1
22 MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception Zhengye Zhang, Sirui Zhao, Shifeng Liu, Shukang Yin, Xinglong Mao, Tong Xu, Enhong Chen 2025-05-11 arXiv https://github.com/zyzhangUstc/MELLM http://arxiv.org/abs/2505.07007v1
23 From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering Gaurab Sarkar, Sougata Saha 2025-05-11 arXiv https://github.com/sougata-ub/llms_for_ionic_liquids http://arxiv.org/abs/2505.06964v1
24 GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance Jinuk Kim, Marwa El Halabi, Wonpyo Park, Clemens JS Schaefer, Deokjae Lee, Yeonhong Park, Jae W. Lee, Hyun Oh Song 2025-05-11 arXiv https://github.com/snu-mllab/GuidedQuant http://arxiv.org/abs/2505.07004v1
25 POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, Junzheng Shi 2025-05-10 arXiv https://github.com/AndyShaw01/PoisonCraft http://arxiv.org/abs/2505.06579v1
26 Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu 2025-05-09 arXiv https://github.com/zch65458525/L2T http://arxiv.org/abs/2505.06321v1
27 HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow You Peng, Youhe Jiang, Chen Wang, Binhang Yuan 2025-05-08 arXiv https://github.com/Relaxed-System-Lab/Hexgen-Flow http://arxiv.org/abs/2505.05286v1
28 KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification Qianbo Zang, Christophe Zgrzendek, Igor Tchappi, Afshin Khadangi, Johannes Sedlmeir 2025-05-08 arXiv https://github.com/QianboZang/KG-HTC http://arxiv.org/abs/2505.05583v1
29 Prompt-Based LLMs for Position Bias-Aware Reranking in Personalized Recommendations Md Aminul Islam, Ahmed Sayeed Faruk 2025-05-08 arXiv https://github.com/aminul7506/LLMForReRanking http://arxiv.org/abs/2505.04948v1
30 Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, Jianwei Yin 2025-05-08 arXiv https://github.com/colored-dye/multi_stage_influence_function http://arxiv.org/abs/2505.05017v1
31 Benchmarking LLMs' Swarm intelligence Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun 2025-05-07 arXiv https://github.com/x66ccff/swarmbench http://arxiv.org/abs/2505.04364v1
32 TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park 2025-05-07 arXiv https://github.com/ai4co/trajevo http://arxiv.org/abs/2505.04480v1
33 Advancing and Benchmarking Personalized Tool Invocation for LLMs Xu Huang, Yuefeng Huang, Weiwen Liu, Xingshan Zeng, Yasheng Wang, Ruiming Tang, Hong Xie, Defu Lian 2025-05-07 arXiv https://github.com/hyfshadow/PTBench http://arxiv.org/abs/2505.04072v1
34 Avoid Recommending Out-of-Domain Items: Constrained Generative Recommendation with LLMs Hao Liao, Wensheng Lu, Jianxun Lian, Mingqi Wu, Shuo Wang, Yong Zhang, Yitian Huang, Mingyang Zhou, Xing Xie 2025-05-06 arXiv https://github.com/microsoft/RecAI http://arxiv.org/abs/2505.03336v1
35 CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu 2025-05-06 arXiv https://github.com/MoonshotAI/CombiBench/ http://arxiv.org/abs/2505.03171v1
36 Plug-and-Play AMC: Context Is King in Training-Free, Open-Set Modulation with LLMs Mohammad Rostami, Atik Faysal, Reihaneh Gh. Roshan, Huaxia Wang, Nikhil Muralidhar, Yu-Dong Yao 2025-05-06 arXiv https://github.com/RU-SIT/context-is-king http://arxiv.org/abs/2505.03112v1
37 Automatic Calibration for Membership Inference Attack on Large Language Models Saleh Zare Zade, Yao Qiang, Xiangyu Zhou, Hui Zhu, Mohammad Amin Roshani, Prashant Khanduri, Dongxiao Zhu 2025-05-06 arXiv https://github.com/Salehzz/ACMIA http://arxiv.org/abs/2505.03392v1
38 FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, Weiyang Liu 2025-05-05 arXiv https://sphere-ai-lab.github.io/FormalMATH/ http://arxiv.org/abs/2505.02735v1
39 LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng 2025-05-05 arXiv https://github.com/ictnlp/LLaMA-Omni2 http://arxiv.org/abs/2505.02625v1
40 Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Xiaobao Wu 2025-05-05 arXiv https://github.com/bobxwu/learning-from-rewards-llm-papers http://arxiv.org/abs/2505.02686v1
41 Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan 2025-05-04 arXiv https://github.com/millioniron/LLM_exploration http://arxiv.org/abs/2505.02130v1
42 MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents Zeyu Zhang, Quanyu Dai, Xu Chen, Rui Li, Zhongyang Li, Zhenhua Dong 2025-05-04 arXiv https://github.com/nuster1128/MemEngine http://arxiv.org/abs/2505.02099v1
43 Amplifying Your Social Media Presence: Personalized Influential Content Generation with LLMs Yuying Zhao, Yu Wang, Xueqi Cheng, Anne Marie Tumlin, Yunchao Liu, Damin Xia, Meng Jiang, Tyler Derr 2025-05-03 arXiv https://github.com/YuyingZhao/LLM-influence-amplifier http://arxiv.org/abs/2505.01698v1
44 A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee 2025-05-03 arXiv https://github.com/sihyeong/Awesome-LLM-Inference-Engine http://arxiv.org/abs/2505.01658v1
45 WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks Jingwen Tong, Jiawei Shao, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang 2025-05-02 arXiv https://github.com/jwentong/WirelessAgent_R1 https://doi.org/10.48550/arXiv.2409.07964
46 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton van den Hengel, Yuankai Qi, Qingming Huang 2025-05-02 arXiv https://galaxycong.github.io/LLM-Flow-Dubber/ http://arxiv.org/abs/2505.01263v1
47 Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao 2025-05-02 arXiv https://github.com/Hao840/Awesome-Low-Precision-Training http://arxiv.org/abs/2505.01043v1
48 LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou 2025-05-01 arXiv https://github.com/Susan571/LENSLLM http://arxiv.org/abs/2505.03793v1
49 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Bang Zhang, Ruotian Ma, Qingxuan Jiang, Peisong Wang, Jiaqi Chen, Zheng Xie, Xingyu Chen, Yue Wang, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li 2025-05-01 arXiv https://github.com/Tencent/digitalhuman/tree/main/SAGE http://arxiv.org/abs/2505.02847v2
50 SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation Quang P. M. Pham, Khoi T. N. Nguyen, Nhi H. Doan, Cuong A. Pham, Kentaro Inui, Dezhen Song 2025-05-01 arXiv https://github.com/quangpham2006/SmallPlan http://arxiv.org/abs/2505.00831v1
51 A Survey on Large Language Model based Human-Agent Systems Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, Philip S. Yu 2025-05-01 arXiv https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers http://arxiv.org/abs/2505.00753v1
52 DeepCritic: Deliberate Critique with Large Language Models Wenkai Yang, Jingwen Chen, Yankai Lin, Ji-Rong Wen 2025-05-01 arXiv https://github.com/RUCBM/DeepCritic http://arxiv.org/abs/2505.00662v1
53 LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models Junfeng Jiao, Saleh Afroogh, Abhejay Murali, Kevin Chen, David Atkinson, Amit Dhurandhar 2025-05-01 arXiv https://github.com/ http://arxiv.org/abs/2505.00853v1
54 LLM-based Interactive Imitation Learning for Robotic Manipulation Jonas Werner, Kun Chu, Cornelius Weber, Stefan Wermter 2025-04-30 arXiv https://github.com/Tubicor/LLM-iTeach http://arxiv.org/abs/2504.21769v1
55 When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator Md Fahim Anjum 2025-04-30 arXiv https://github.com/MDFahimAnjum/llm-planning-with-reasoning http://arxiv.org/abs/2505.03786v1
56 OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification Shangyu Li, Juyong Jiang, Tiancheng Zhao, Jiasi Shen 2025-04-29 arXiv https://github.com/lishangyu-hkust/OSVBench http://arxiv.org/abs/2504.20964v1
57 Reinforcement Learning for Reasoning in Large Language Models with One Training Example Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen 2025-04-29 arXiv https://github.com/ypwang61/One-Shot-RLVR http://arxiv.org/abs/2504.20571v1
58 Turing Machine Evaluation for Large Language Model Haitao Wu, Zongbo Han, Huaxi Huang, Changqing Zhang 2025-04-29 arXiv https://github.com/HaitaoWuTJU/Turing-Machine-Bench http://arxiv.org/abs/2504.20771v1
59 X-Fusion: Introducing New Modality to Frozen Large Language Models Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li 2025-04-29 arXiv https://sichengmo.github.io/XFusion/ http://arxiv.org/abs/2504.20996v1
60 AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers Zijie Lin, Yiqing Shen, Qilin Cai, He Sun, Jinrui Zhou, Mingjun Xiao 2025-04-28 arXiv https://github.com/shoushouyu/Automated-Paper-to-Code http://arxiv.org/abs/2504.20115v1
61 Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies Kavindu Warnakulasuriya, Prabhash Dissanayake, Navindu De Silva, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Surangika Ranathunga, Nisansa de Silva 2025-04-28 arXiv https://coin-workshop.github.io/coine-2025-detroit/accepted_for_presentation.html http://arxiv.org/abs/2504.19487v1
62 LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects Guangyi Liu, Pengxiang Zhao, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, Wenhao Wang, Tianze Wu, Linghao Li, Hao Wang, Guanjing Xiong, Yong Liu, Hongsheng Li 2025-04-28 2025 https://github.com/PhoneLLM/Awesome-LLM-Powered-Phone-GUI-Agents http://arxiv.org/abs/2504.19838v1
63 SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong 2025-04-27 arXiv https://chen-judge.github.io/SPC/ http://arxiv.org/abs/2504.19162v1
64 Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers Dylan Bouchard, Mohit Singh Chauhan 2025-04-27 arXiv https://github.com/cvs-health/uqlm http://arxiv.org/abs/2504.19254v2
65 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese Peilin Zhou, Bruce Leon, Xiang Ying, Can Zhang, Yifan Shao, Qichen Ye, Dading Chong, Zhiling Jin, Chenxuan Xie, Meng Cao, Yuxin Gu, Sixin Hong, Jing Ren, Jian Chen, Chao Liu, Yining Hua 2025-04-27 arXiv https://github.com/PALIN2018/BrowseComp-ZH http://arxiv.org/abs/2504.19314v2
66 Calibrating Translation Decoding with Quality Estimation on LLMs Di Wu, Yibin Lei, Christof Monz 2025-04-26 arXiv https://github.com/moore3930/calibrating-llm-mt http://arxiv.org/abs/2504.19044v1
67 Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs Mohammad Akbar-Tajari, Mohammad Taher Pilehvar, Mohammad Mahmoody 2025-04-26 arXiv https://github.com/GoAT-pydev/Graph_of_Attacks http://arxiv.org/abs/2504.19019v1
68 SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models Nader Zantout, Haochen Zhang, Pujith Kachana, Jinkai Qiu, Ji Zhang, Wenshan Wang 2025-04-25 arXiv https://github.com/nzantout/SORT3D http://arxiv.org/abs/2504.18684v1
69 DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Yingshui Tan, Yanan Wu, Jihao Gu, Yangguang Li, Jianke Zhu 2025-04-25 arXiv https://github.com/Kizna1ver/DREAM http://arxiv.org/abs/2504.18053v1
70 LEAM: A Prompt-only Large Language Model-enabled Antenna Modeling Method Tao Wu, Kexue Fu, Qiang Hua, Xinxin Liu, Muhammad Ali Imran, Bo Liu 2025-04-25 arXiv https://github.com/TaoWu974/LEAM http://arxiv.org/abs/2504.18271v1
71 An Empirical Study on Prompt Compression for Large Language Models Zheng Zhang, Jinyi Li, Yihuai Lan, Xiang Wang, Hao Wang 2025-04-24 arXiv https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression http://arxiv.org/abs/2505.00019v1
72 RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li 2025-04-24 arXiv https://github.com/RAGEN-AI/RAGEN http://arxiv.org/abs/2504.20073v1
73 Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Tiancheng Gu, Kaicheng Yang, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai, Jiankang Deng 2025-04-24 arXiv https://garygutc.github.io/UniME http://arxiv.org/abs/2504.17432v1
74 Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark Hanlei Zhang, Zhuohang Li, Yeshuang Zhu, Hua Xu, Peiwu Wang, Haige Zhu, Jie Zhou, Jinchao Zhang 2025-04-23 arXiv https://github.com/thuiar/MMLA http://arxiv.org/abs/2504.16427v2
75 UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models Yu Zheng, Longyi Liu, Yuming Lin, Jie Feng, Guozhen Zhang, Depeng Jin, Yong Li 2025-04-23 arXiv https://github.com/tsinghua-fib-lab/PlanBench http://arxiv.org/abs/2504.21027v1
76 Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Hannah Cyberey, David Evans 2025-04-23 arXiv https://github.com/hannahxchen/llm-censorship-steering http://arxiv.org/abs/2504.17130v1
77 Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution Junjie Chen, Haitao Li, Jingli Yang, Yiqun Liu, Qingyao Ai 2025-04-23 arXiv https://github.com/cjj826/GoalAct http://arxiv.org/abs/2504.16563v1
78 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Joya Chen, Ziyun Zeng, Yiqi Lin, Wei Li, Zejun Ma, Mike Zheng Shou 2025-04-22 arXiv https://showlab.github.io/livecc http://arxiv.org/abs/2504.16030v1
79 PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang, Xudong Tian, Anqi Lv, Laifu Man, Jianxiang Li, Feiyu Tao, Qihua Sun, Zhou Liang, Yushu Mu, Zhongxuan Li, Jing-Jun Zhang, Shutao Zhang, Xiaotian Li, Xingqi Xia, Jiawei Lin, Zheyu Shen, Jiahang Chen, Qiuhao Xiong, Binran Wang, Fengyuan Wang, Ziyang Ni, Bohan Zhang, Fan Cui, Changkun Shao, Qing-Hong Cao, Ming-xing Luo, Muhan Zhang, Hua Xing Zhu 2025-04-22 arXiv https://phybench-official.github.io/phybench-demo/ http://arxiv.org/abs/2504.16074v1
80 WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang 2025-04-22 arXiv https://github.com/elated-sawyer/WALL-E http://arxiv.org/abs/2504.15785v1
81 CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs Yingming Zheng, Xiaoliang Liu, Peng Wu, Li Pan 2025-04-21 arXiv https://github.com/8zym/CRAVE http://arxiv.org/abs/2504.14905v1
82 EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen 2025-04-21 arXiv https://zjunlp.github.io/project/EasyEdit2/video https://doi.org/10.48550/arXiv.2308.07269
83 Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu 2025-04-21 arXiv https://github.com/NEUIR/MemGraph http://arxiv.org/abs/2504.14845v1
84 Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty 2025-04-21 arXiv https://github.com/SalesforceAIResearch/jetts-benchmark http://arxiv.org/abs/2504.15253v1
85 IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs David Ma, Yuanxing Zhang, Jincheng Ren, Jarvis Guo, Yifan Yao, Zhenlin Wei, Zhenzhu Yang, Zhongyuan Peng, Boyu Feng, Jun Ma, Xiao Gu, Zhoufutu Wen, King Zhu, Yancheng He, Meng Cao, Shiwen Ni, Jiaheng Liu, Wenhao Huang, Ge Zhang, Xiaojie Jin 2025-04-21 arXiv https://github.com/multimodal-art-projection/IV-Bench http://arxiv.org/abs/2504.15415v1
86 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu 2025-04-21 arXiv https://visulogic-benchmark.github.io/VisuLogic http://arxiv.org/abs/2504.15279v1
87 NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang 2025-04-20 arXiv https://github.com/LawrenceRLiu/NoWag http://arxiv.org/abs/2504.14569v1
88 Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding Tong Zeng, Longfeng Wu, Liang Shi, Dawei Zhou, Feng Guo 2025-04-20 arXiv https://github.com/tong-zeng/DVBench http://arxiv.org/abs/2504.14526v1
89 CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations Man Ho Lam, Chaozheng Wang, Jen-tse Huang, Michael R. Lyu 2025-04-19 arXiv https://donaldlamnl.github.io/CodeCrash/ http://arxiv.org/abs/2504.14119v1
90 Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model Youngbin Lee, Yejin Kim, Suin Kim, Yongjae Lee 2025-04-19 arXiv https://github.com/youngandbin/LLM-MVO-BLM http://arxiv.org/abs/2504.14345v1
91 Towards Explainable Fake Image Detection with Multi-Modal Large Language Models Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang 2025-04-19 arXiv https://github.com/Gennadiyev/mllm-defake http://arxiv.org/abs/2504.14245v1
92 Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator Akshat Ramachandran, Souvik Kundu, Arnab Raha, Shamik Kundu, Deepak K. Mathaikutty, Tushar Krishna 2025-04-19 arXiv https://github.com/FLOW-open-project/FLOW http://arxiv.org/abs/2504.14365v1
93 LLM Sensitivity Evaluation Framework for Clinical Diagnosis Chenwei Yan, Xiangling Fu, Yuxuan Xiong, Tianyi Wang, Siu Cheung Hui, Ji Wu, Xien Liu 2025-04-18 Proceedings of the 31st International Conference on Computational Linguistics, 2025 https://github.com/chenwei23333/DiagnosisQA http://arxiv.org/abs/2504.13475v1
94 ConExion: Concept Extraction with Large Language Models Ebrahim Norouzi, Sven Hertling, Harald Sack 2025-04-17 arXiv https://github.com/ISE-FIZKarlsruhe/concept_extraction http://arxiv.org/abs/2504.12915v1
95 EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, ShiLiang Zhang, Xie Chen 2025-04-17 arXiv https://yanghaha0908.github.io/EmoVoice/ http://arxiv.org/abs/2504.12867v1
96 ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition Hisham A. Alyahya, Haidar Khan, Yazeed Alnumay, M Saiful Bari, Bülent Yener 2025-04-17 arXiv https://github.com/facebookresearch/ZeroSumEval http://arxiv.org/abs/2503.10673v1
97 Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration Yicheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, Feng Ma 2025-04-17 arXiv https://github.com/ycpNotFound/GeoGen http://arxiv.org/abs/2504.12773v1
98 Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM Zirui Pan, Xin Wang, Yipeng Zhang, Hong Chen, Kwan Man Cheng, Yaofei Wu, Wenwu Zhu 2025-04-16 arXiv https://modular-cam.github.io http://arxiv.org/abs/2504.12048v1
99 d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning Siyan Zhao, Devaansh Gupta, Qinqing Zheng, Aditya Grover 2025-04-16 arXiv https://dllm-reasoning.github.io/ http://arxiv.org/abs/2504.12216v1
100 LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA Xanh Ho, Jiahao Huang, Florian Boudin, Akiko Aizawa 2025-04-16 arXiv https://github.com/Alab-NII/llm-judge-extract-qa http://arxiv.org/abs/2504.11972v1
101 HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks Stefan Abi-Karam, Cong Hao 2025-04-16 arXiv https://github.com/stefanpie/hls-eval http://arxiv.org/abs/2504.12268v1
102 A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment Negar Arabzadeh, Charles L. A . Clarke 2025-04-16 arXiv https://github.com/Narabzad/prompt-sensitivity-relevance-judgements/ http://arxiv.org/abs/2504.12408v1
103 MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning Zhaopeng Feng, Shaosheng Cao, Jiahan Ren, Jiayuan Su, Ruizhe Chen, Yan Zhang, Zhe Xu, Yao Hu, Jian Wu, Zuozhu Liu 2025-04-15 arXiv …, 2025 https://github.com/fzp0424/MT-R1-Zero http://arxiv.org/abs/2504.10160v1
104 Using LLMs as prompt modifier to avoid biases in AI image generators René Peinl 2025-04-15 arXiv https://iisys-hof.github.io/llm-prompt-img-gen/ http://arxiv.org/abs/2504.11104v1
105 Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From Changjiang Gao, Hankun Lin, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Jiajun Chen 2025-04-15 arXiv https://github.com/NJUNLP/Cross-Lingual-Context-Retrieval http://arxiv.org/abs/2504.10906v1
106 RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Lizhou Lin, Lan Sun, Renwen Wang, Jianran Liu, Qi Wu, Ling Pei 2025-04-15 arXiv …, 2025 https://inowlzy.github.io/RadarLLM/ http://arxiv.org/abs/2504.09862v1
107 Propaganda via AI? A Study on Semantic Backdoors in Large Language Models Nay Myat Min, Long H. Pham, Yige Li, Jun Sun 2025-04-15 arXiv https://github.com/NayMyatMin/RAVEN http://arxiv.org/abs/2504.12344v1
108 Probing then Editing Response Personality of Large Language Models Tianjie Ju, Zhenyu Shao, Bowen Wang, Yujia Chen, Zhuosheng Zhang, Hao Fei, Mong-Li Lee, Wynne Hsu, Sufeng Duan, Gongshen Liu 2025-04-15 arXiv …, 2025 https://github.com/universe-sky/probing-then-editing-personality http://arxiv.org/abs/2504.10227v1
109 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy 2025-04-15 arXiv …, 2025 https://github.com/deep-symbolic-mathematics/llm-srbench http://arxiv.org/abs/2504.10415v1
110 Teaching Large Language Models to Reason through Learning and Forgetting Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor 2025-04-15 arXiv https://github.com/twni2016/llm-reasoning-uft http://arxiv.org/abs/2504.11364v1
111 Dynamic Compressing Prompts for Efficient Inference of Large Language Models Jinwu Hu, Wei Zhang, Yufeng Wang, Yu Hu, Bin Xiao, Mingkui Tan, Qing Du 2025-04-15 arXiv https://github.com/Fhujinwu/DCP http://arxiv.org/abs/2504.11004v1
112 A Dual-Space Framework for General Knowledge Distillation of Large Language Models Xue Zhang, Songming Zhang, Yunlong Liang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou 2025-04-15 arXiv https://github.com/songmzhang/DSKDv2 http://arxiv.org/abs/2504.11426v1
113 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Tianyi Zhang, Yang Sui, Shaochen Zhong, Vipin Chaudhary, Xia Hu, Anshumali Shrivastava 2025-04-15 arXiv https://github.com/LeanModels/DFloat11 http://arxiv.org/abs/2504.11651v1
114 LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks Soumyadeep Pal, Changsheng Wang, James Diffenderfer, Bhavya Kailkhura, Sijia Liu 2025-04-15 arXiv …, 2025 https://github.com/OPTML-Group/MU-Coreset http://arxiv.org/abs/2504.10185v2
115 CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates Ankit Kumar Shaw, Kun Jiang, Tuopu Wen, Chandan Kumar Sah, Yining Shi, Mengmeng Yang, Diange Yang, Xiaoli Lian 2025-04-14 arXiv https://Ankit-Zefan.github.io/CleanMap/ http://arxiv.org/abs/2504.10738v1
116 ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model Wuyang Lan, Wenzheng Wang, Changwei Ji, Guoxing Yang, Yongbo Zhang, Xiaohong Liu, Song Wu, Guangyu Wang 2025-04-13 arXiv https://github.com/medfound/medfound http://arxiv.org/abs/2504.09421v2
117 Fine-tuning a Large Language Model for Automating Computational Fluid Dynamics Simulations Zhehao Dong, Zhen Lu, Yue Yang 2025-04-13 arXiv https://github.com/YYgroup/AutoCFD http://arxiv.org/abs/2504.09602v2
118 Alleviating the Fear of Losing Alignment in LLM Fine-tuning Kang Yang, Guanhong Tao, Xun Chen, Jun Xu 2025-04-13 arXiv https://github.com/kangyangWHU/LLMAlignment http://arxiv.org/abs/2504.09757v1
119 Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025 Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou 2025-04-13 arXiv https://github.com/zou-group/review_feedback_agent http://arxiv.org/abs/2504.09737v1
120 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training Zhenting Wang, Guofeng Cui, Kun Wan, Wentian Zhao 2025-04-13 arXiv https://github.com/ZhentingWang/DUMP http://arxiv.org/abs/2504.09710v1
121 HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs Sharanya Dasgupta, Sujoy Nath, Arkaprabha Basu, Pourya Shamsolmoali, Swagatam Das 2025-04-13 arXiv https://github.com/sharanya-dasgupta001/hallushift http://arxiv.org/abs/2504.09482v1
122 How new data permeates LLM knowledge and how to dilute it Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler 2025-04-13 arXiv https://sunchipsster1.github.io/projects/outlandish/ http://arxiv.org/abs/2504.09522v1
123 SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model Kaiyu Li, Zepeng Xin, Li Pang, Chao Pang, Yupeng Deng, Jing Yao, Guisong Xia, Deyu Meng, Zhi Wang, Xiangyong Cao 2025-04-13 arXiv https://github.com/earth-insights/SegEarth-R1 http://arxiv.org/abs/2504.09644v1
124 Span-level Emotion-Cause-Category Triplet Extraction with Instruction Tuning LLMs and Data Augmentation Xiangju Li, Dong Yang, Xiaogang Zhu, Faliang Huang, Peng Zhang, Zhongying Zhao 2025-04-13 arXiv https://github.com/zxgnlp/InstruDa-LLM http://arxiv.org/abs/2504.12331v1
125 Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution Chenghao Li, Chaoning Zhang, Yi Lu, Jiaquan Zhang, Qigan Sun, Xudong Wang, Jiwei Wei, Guoqing Wang, Yang Yang, Heng Tao Shen 2025-04-13 arXiv https://github.com/dlMARiA/Syzygy-of-thoughts http://arxiv.org/abs/2504.09566v2
126 GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Lang Lin, Xueyang Yu, Ziqi Pang, Yu-Xiong Wang 2025-04-12 arXiv:2504.07962, 2025 https://glus-video.github.io/ http://arxiv.org/abs/2504.07962v1
127 Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law Yixin Cao, Jiahao Ying, Yaoning Wang, Xipeng Qiu, Xuanjing Huang, Yugang Jiang 2025-04-12 arXiv …, 2025 https://github.com/ALEX-nlp/MUI-Eva http://arxiv.org/abs/2504.07440v1
128 LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking Qi Liu, Haozhe Duan, Yiqun Chen, Quanfeng Lu, Weiwei Sun, Jiaxin Mao 2025-04-12 arXiv …, 2025 https://github.com/liuqi6777/llm4ranking http://arxiv.org/abs/2504.07439v1
129 Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation Bo Zhang, Hui Ma, Dailin Li, Jian Ding, Jian Wang, Bo Xu, HongFei Lin 2025-04-12 arXiv …, 2025 https://github.com/zhangbo-nlp/KEDiT http://arxiv.org/abs/2504.07754v1
130 Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua 2025-04-12 arXiv …, 2025 https://github.com/Lum1104/EIBench http://arxiv.org/abs/2504.07521v1
131 From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy Adrianna Romanowski, Pedro H. V. Valois, Kazuhiro Fukui 2025-04-12 arXiv https://github.com/swaggirl9000/humor http://arxiv.org/abs/2504.09049v1
132 Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks Ye Ye 2025-04-11 arXiv https://github.com/biubiutomato/TME-Agent http://arxiv.org/abs/2504.08525v3
133 A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Conghui He, Lijun Wu 2025-04-11 arXiv https://github.com/GX-XinGao/GRA http://arxiv.org/abs/2504.12322v1
134 Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric Yixin Cao, Jiahao Ying, Yaoning Wang, Xipeng Qiu, Xuanjing Huang, Yugang Jiang 2025-04-10 arXiv https://github.com/ALEX-nlp/MUI-Eva http://arxiv.org/abs/2504.07440v2
135 Exploring the Effectiveness and Interpretability of Texts in LLM-based Time Series Models Zhengke Sun, Hangwei Qian, Ivor Tsang 2025-04-09 arXiv https://github.com/zachysun/TS-Lang-Exp http://arxiv.org/abs/2504.08808v1
136 V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models Xiangxi Zheng, Linjie Li, Zhengyuan Yang, Ping Yu, Alex Jinpeng Wang, Rui Yan, Yuan Yao, Lijuan Wang 2025-04-08 arXiv https://github.com/CSU-JPG/V-MAGE http://arxiv.org/abs/2504.06148v2
137 LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun 2025-04-08 arXiv https://github.com/thunlp/LLMxMapReduce http://arxiv.org/abs/2504.05732v1
138 Assessing Thai Dialect Performance in LLMs with Automatic Benchmarks and Human Evaluation Peerat Limkonchotiwat, Kanruethai Masuk, Surapon Nonesung, Chalermpun Mai-On, Sarana Nutanong, Wuttikorn Ponwitayarat, Potsawee Manakul 2025-04-08 arXiv https://github.com/mrpeerat/Thai_local_benchmark http://arxiv.org/abs/2504.05898v1
139 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models Pengfei Zhou, Fanrui Zhang, Xiaopeng Peng, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang 2025-04-08 arXiv https://github.com/LanceZPF/MDK12 http://arxiv.org/abs/2504.05782v1
140 Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, Rema Padman 2025-04-07 arXiv https://github.com/yubol-cmu/Awesome-Multi-Turn-LLMs http://arxiv.org/abs/2504.04717v1
141 SEAL: Steerable Reasoning Calibration of Large Language Models for Free Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, Zhangyang Wang 2025-04-07 arXiv https://github.com/VITA-Group/SEAL http://arxiv.org/abs/2504.07986v1
142 EduPlanner: LLM-Based Multi-Agent Systems for Customized and Intelligent Instructional Design Xueqiao Zhang, Chao Zhang, Jianwen Sun, Jun Xiao, Yi Yang, Yawei Luo 2025-04-07 arXiv https://github.com/Zc0812/Edu_Planner http://arxiv.org/abs/2504.05370v1
143 Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song 2025-04-07 arXiv https://github.com/sunblaze-ucb/llm-api-audit http://arxiv.org/abs/2504.04715v1
144 Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials Chu Zhao, Enneng Yang, Yuting Liu, Jianzhe Zhao, Guibing Guo, Xingwei Wang 2025-04-07 arXiv https://github.com/user683/HNLMRec http://arxiv.org/abs/2504.04726v1
145 Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration Ran Xu, Wenqi Shi, Yuchen Zhuang, Yue Yu, Joyce C. Ho, Haoyu Wang, Carl Yang 2025-04-07 arXiv https://github.com/ritaranx/Collab-RAG/ http://arxiv.org/abs/2504.04915v1
146 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu 2025-04-07 arXiv https://github.com/Lizonghang/prima.cpp http://arxiv.org/abs/2504.08791v1
147 ArxivBench: Can LLMs Assist Researchers in Conducting Research? Ning Li, Jingran Zhang, Justin Cui 2025-04-06 arXiv https://github.com/arxivBenchLLM/arXivBench http://arxiv.org/abs/2504.10496v1
148 Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning Xuerui Su, Shufang Xie, Guoqing Liu, Yingce Xia, Renqian Luo, Peiran Jin, Zhiming Ma, Yue Wang, Zun Wang, Yuting Liu 2025-04-06 arXiv https://github.com/XueruiSu/Trust-Region-Preference-Approximation http://arxiv.org/abs/2504.04524v1
149 A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models Aviv Brokman, Xuguang Ai, Yuhang Jiang, Shashank Gupta, Ramakanth Kavuluru 2025-04-05 arXiv https://github.com/bionlproc/ZeroShotRE http://arxiv.org/abs/2504.04083v1
150 Window Token Concatenation for Efficient Visual Large Language Models Yifan Li, Wentao Bao, Botao Ye, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong 2025-04-05 arXiv https://github.com/JackYFL/WiCo http://arxiv.org/abs/2504.04024v1
151 AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs Xinyu Mao, Teerapong Leelanupab, Martin Potthast, Harrisen Scells, Guido Zuccon 2025-04-05 arXiv https://github.com/ielab/ai-review http://arxiv.org/abs/2504.04193v1
152 A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models Yuantao Zhang, Zhankui Yang 2025-04-05 arXiv https://github.com/zyttt-coder/LLM_similarity http://arxiv.org/abs/2504.04216v1
153 MSL: Not All Tokens Are What You Need for Tuning LLM as a Recommender Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, Can Wang 2025-04-05 arXiv https://github.com/WANGBohaO-jpg/MSL http://arxiv.org/abs/2504.04178v1
154 VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang 2025-04-05 arXiv https://github.com/SJTU-OmniAgent/VocalNet http://arxiv.org/abs/2504.04060v1
155 Align to Structure: Aligning Large Language Models with Structural Information Zae Myung Kim, Anand Ramachandran, Farideh Tavazoee, Joo-Kyung Kim, Oleg Rokhlenko, Dongyeop Kang 2025-04-04 arXiv https://github.com/minnesotanlp/struct_align http://arxiv.org/abs/2504.03622v1
156 EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline Peter Baile Chen, Tomer Wolfson, Michael Cafarella, Dan Roth 2025-04-04 arXiv https://peterbaile.github.io/enrichindex/ http://arxiv.org/abs/2504.03598v1
157 AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology Xiang Feng, Wentao Jiang, Zengmao Wang, Yong Luo, Pingbo Xu, Baosheng Yu, Hua Jin, Bo Du, Jing Zhang 2025-04-03 arXiv https://github.com/MiliLab/AnesBench http://arxiv.org/abs/2504.02404v1
158 BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs Alexander Leszczynski, Sarah Gillet, Iolanda Leite, Fethiye Irmak Dogan 2025-04-03 arXiv https://github.com/1Eggbert7/BT_LLM http://arxiv.org/abs/2504.02779v1
159 Measurement of LLM's Philosophies of Human Nature Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Wangmeng Zuo 2025-04-03 arXiv https://github.com/kodenii/M-PHNS http://arxiv.org/abs/2504.02304v1
160 ZClip: Adaptive Spike Mitigation for LLM Pre-Training Abhay Kumar, Louis Owen, Nilabhra Roy Chowdhury, Fabian Güra 2025-04-03 arXiv https://github.com/bluorion-com/ZClip http://arxiv.org/abs/2504.02507v1
161 Comment Staytime Prediction with LLM-enhanced Comment Understanding Changshuo Zhang, Zihan Lin, Shukai Liu, Yongqi Liu, Han Li 2025-04-02 arXiv https://github.com/lyingCS/KuaiComt.github.io http://arxiv.org/abs/2504.01602v1
162 OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling Heming Zhang, Tim Xu, Dekang Cao, Shunning Liang, Lars Schimmelpfennig, Levi Kaster, Di Huang, Carlos Cruchaga, Guangfu Li, Michael Province, Yixin Chen, Philip Payne, Fuhai Li 2025-04-02 arXiv https://github.com/FuhaiLiAiLab/OmniCellTOSG http://arxiv.org/abs/2504.02148v1
163 TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri 2025-04-02 arXiv https://github.com/apple/ml-tic-lm http://arxiv.org/abs/2504.02107v1
164 MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits Brandon Radosevich, John Halloran 2025-04-02 arXiv https://github.com/leidosinc/McpSafetyScanner http://arxiv.org/abs/2504.03767v1
165 Urban Computing in the Era of Large Language Models Zhonghang Li, Lianghao Xia, Xubin Ren, Jiabin Tang, Tianyi Chen, Yong Xu, Chao Huang 2025-04-02 arXiv https://github.com/HKUDS/Awesome-LLM4Urban-Papers https://doi.org/10.48550/arXiv.2504.02009
166 CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models Wei Zhou, Yuyang Gao, Xuanhe Zhou, Guoliang Li 2025-04-01 arXiv https://github.com/weAIDB/CrackSQL https://doi.org/10.48550/arXiv.2504.00882
167 RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model Lin Zhang, Zhouhong Gu, Xiaoran Shi, Hongwei Feng, Yanghua Xiao 2025-04-01 arXiv https://github.com/MikeGu721/reckon https://doi.org/10.48550/arXiv.2504.00756
168 ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun 2025-04-01 arXiv https://github.com/icip-cas/ShortV https://doi.org/10.48550/arXiv.2504.00502
169 m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models Xiaoke Huang, Juncheng Wu, Hui Liu, Xianfeng Tang, Yuyin Zhou 2025-04-01 arXiv https://github.com/UCSC-VLAA/m1 https://doi.org/10.48550/arXiv.2504.00869
170 MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs Juncheng Wu, Wenlong Deng, Xingxuan Li, Sheng Liu, Taomian Mi, Yifan Peng, Ziyang Xu, Yi Liu, Hyunjin Cho, Chang-In Choi, Yihan Cao, Hui Ren, Xiang Li, Xiaoxiao Li, Yuyin Zhou 2025-04-01 arXiv https://github.com/UCSC-VLAA/MedReason http://arxiv.org/abs/2504.00993v2
171 When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach 2025-04-01 arXiv https://github.com/nishadsinghi/sc-genrm-scaling http://arxiv.org/abs/2504.01005v1
172 SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning Shiyue Zhao, Junzhi Zhang, Neda Masoud, Heye Huang, Xingpeng Xia, Chengkun He 2025-03-31 arXiv https://sean-shiyuez.github.io/SACA/ http://arxiv.org/abs/2504.00115v1
173 What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma 2025-03-31 arXiv https://github.com/testtimescaling/testtimescaling.github.io/ https://doi.org/10.48550/arXiv.2503.24235
174 SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang, Lin Gui, Yulan He 2025-03-31 arXiv https://github.com/xyzCS/SciReplicate-Bench http://arxiv.org/abs/2504.00255v1
175 LANID: LLM-assisted New Intent Discovery Lu Fan, Jiashu Pu, Rongsheng Zhang, Xiao-Ming Wu 2025-03-31 arXiv https://github.com/floatSDSDS/LANID http://arxiv.org/abs/2503.23740v1
176 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong 2025-03-31 arXiv https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers https://doi.org/10.48550/arXiv.2503.24377
177 Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen 2025-03-31 arXiv https://github.com/LLMkvsys/rethink-kv-compression https://doi.org/10.48550/arXiv.2503.24000
178 Text Chunking for Document Classification for Urban System Management using Large Language Models Joshua Rodriguez, Om Sanan, Guillermo Vizarreta-Luna, Steven A. Conrad 2025-03-31 arXiv https://github.com/josh-rodriguez-csu/ChunkingforLLMs https://doi.org/10.48550/arXiv.2504.00274
179 A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well? Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma 2025-03-31 arXiv https://github.com/testtimescaling/testtimescaling.github.io/ http://arxiv.org/abs/2503.24235v3
180 ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance Tong Xie, Jiawang Zhao, Zishen Wan, Zuodong Zhang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li 2025-03-31 arXiv https://github.com/PKU-SEC-Lab/ReaLM_DAC25/ https://doi.org/10.48550/arXiv.2503.24053
181 EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing Hongxiang Jiang, Jihao Yin, Qixiong Wang, Jiaqi Feng, Guo Chen 2025-03-30 arXiv https://github.com/XiangTodayEatsWhat/EagleVision http://arxiv.org/abs/2503.23330v1
182 Agentic Large Language Models, a survey Aske Plaat, Max J. van Duijn, Niki van Stein, Mike Preuss, Peter van der Putten, Kees Joost Batenburg 2025-03-29 arXiv https://askeplaat.github.io/agentic-llm-survey-site/ https://doi.org/10.48550/arXiv.2503.23037
183 Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han 2025-03-28 arXiv https://github.com/tmlr-group/landscape-of-thoughts https://doi.org/10.48550/arXiv.2503.22165
184 MediTools -- Medical Education Powered by LLMs Amr Alshatnawi, Remi Sampaleanu, David Liebovitz 2025-03-28 arXiv https://github.com/NM-Streamlit-Team/meditools http://arxiv.org/abs/2503.22769v1
185 A Refined Analysis of Massive Activations in LLMs Louis Owen, Nilabhra Roy Chowdhury, Abhay Kumar, Fabian Güra 2025-03-28 arXiv https://github.com/bluorion-com/refine_massive_activations http://arxiv.org/abs/2503.22329v1
186 QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? Belinda Z. Li, Been Kim, Zi Wang 2025-03-28 arXiv https://github.com/google-deepmind/questbench http://arxiv.org/abs/2503.22674v1
187 SWI: Speaking with Intent in Large Language Models Yuwei Yin, EunJeong Hwang, Giuseppe Carenini 2025-03-27 arXiv https://github.com/YuweiYin/SWI https://doi.org/10.48550/arXiv.2503.21544
188 Ignite Forecasting with SPARK: An Efficient Generative Framework for Refining LLMs in Temporal Knowledge Graph Forecasting Gongzhu Yin, Hongli Zhang, Yi Luo, Yuchen Yang, Kun Lu, Chao Meng 2025-03-27 arXiv https://github.com/yin-gz/SPARK http://arxiv.org/abs/2503.22748v1
189 Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap Tong Nie, Jian Sun, Wei Ma 2025-03-27 arXiv https://github.com/tongnie/awesome-llm4tr https://doi.org/10.48550/arXiv.2503.21411
190 Large Language Model Agent: A Survey on Methodology, Applications and Challenges Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu, Ming Zhang 2025-03-27 arXiv https://github.com/luo-junyu/Awesome-Agent-Papers https://doi.org/10.48550/arXiv.2503.21460
191 Dynamic Pyramid Network for Efficient Multimodal Large Language Model Hao Ai, Kunyi Wang, Zezhou Wang, Hao Lu, Jin Tian, Yaxin Luo, Peng Xing, Jen-Yuan Huang, Huaxia Li, Gen luo 2025-03-26 arXiv https://github.com/aihao2000/DPN-LLaVA https://doi.org/10.48550/arXiv.2503.20322
192 Enhancing the Robustness of LLM-Generated Code: Empirical Study and Framework ZiKe Li, MingWei Liu, Anji Li, Kaifeng He, Yanlin Wang, Xin Peng, Zibin Zheng 2025-03-26 arXiv https://github.com/SYSUSELab/RobGen http://arxiv.org/abs/2503.20197v1
193 Leveraging Implicit Sentiments: Enhancing Reliability and Validity in Psychological Trait Evaluation of LLMs Huanhuan Ma, Haisong Gong, Xiaoyuan Yi, Xing Xie, Dongkuan Xu 2025-03-26 arXiv https://github.com/dependentsign/CSI http://arxiv.org/abs/2503.20182v1
194 Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang 2025-03-26 arXiv https://github.com/naver-ai/JOOD http://arxiv.org/abs/2503.20823v1
195 Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia 2025-03-26 arXiv https://github.com/ttthhl/Protecting_Your_Video_Content http://arxiv.org/abs/2503.21824v1
196 LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation Sarah Martinson, Lingkai Kong, Cheol Woo Kim, Aparna Taneja, Milind Tambe 2025-03-25 arXiv https://github.com/sarahmart/LLM-ABS-ARMMAN-prediction http://arxiv.org/abs/2503.22719v1
197 QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition Yuxuan Hu, Xiaodong Chen, Cuiping Li, Hong Chen, Jing Zhang 2025-03-25 arXiv https://github.com/hyx1999/Quad http://arxiv.org/abs/2503.19353v1
198 CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava 2025-03-25 arXiv https://collm-cvpr25.github.io/ https://doi.org/10.48550/arXiv.2503.19910
199 PAVE: Patching and Adapting Video Large Language Models Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li 2025-03-25 arXiv https://github.com/dragonlzm/PAVE https://doi.org/10.48550/arXiv.2503.19794
200 CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models Shuhao Zhang, Bo Cheng, Jiale Han, Yuli Chen, Zhixuan Wu, Changbao Li, Pingli Gu 2025-03-24 arXiv https://github.com/DrankXs/BalancedWatermark https://doi.org/10.48550/arXiv.2503.20802
201 I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Andrey V. Galichin, Alexey Dontsov, Polina Druzhinina, Anton Razzhigaev, Oleg Y. Rogov, Elena Tutubalina, Ivan V. Oseledets 2025-03-24 arXiv https://github.com/AIRI-Institute/SAE-Reasoning https://doi.org/10.48550/arXiv.2503.18878
202 LLaVAction: evaluating and training multi-modal large language models for action recognition Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. Mathis 2025-03-24 arXiv https://github.com/AdaptiveMotorControlLab/LLaVAction https://doi.org/10.48550/arXiv.2503.18712
203 AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, Min Zhang 2025-03-24 arXiv https://github.com/wangzx1219/AgentDropout http://arxiv.org/abs/2503.18891v1
204 BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang 2025-03-24 arXiv https://github.com/DD-DuDa/BitDecoding http://arxiv.org/abs/2503.18773v1
205 Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? Aabid Karim, Abdul Karim, Bhoomika Lohana, Matt Keon, Jaswinder Singh, Abdul Sattar 2025-03-23 arXiv https://github.com/akarim23131/Lost_in_Cultural_Translation http://arxiv.org/abs/2503.18018v1
206 Reasoning with LLMs for Zero-Shot Vulnerability Detection Arastoo Zibaeirad, Marco Vieira 2025-03-22 arXiv https://github.com/Erroristotle/VulnSage http://arxiv.org/abs/2503.17885v1
207 Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models Jiaming Ji, Xinyu Chen, Rui Pan, Han Zhu, Conghui Zhang, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang 2025-03-22 arXiv https://github.com/SafeRLHF-V https://doi.org/10.48550/arXiv.2503.17682
208 RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery Silvia Izquierdo-Badiola, Carlos Rizzo, Guillem Alenyà 2025-03-22 arXiv https://raider-llmagent.github.io/ https://doi.org/10.48550/arXiv.2503.17703
209 LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language Kun Chu, Xufeng Zhao, Cornelius Weber, Stefan Wermter 2025-03-21 arXiv https://github.com/Kchu/LLM-MAP https://doi.org/10.48550/arXiv.2503.17309
210 TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment Shicheng Li, Lei Li, Kun Ouyang, Shuhuai Ren, Yuanxin Liu, Yuanxing Zhang, Fuzheng Zhang, Lingpeng Kong, Qi Liu, Xu Sun 2025-03-21 arXiv https://github.com/lscpku/TEMPLE http://arxiv.org/abs/2503.16929v2
211 Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique Yansi Li, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Qiuzhi Liu, Rui Wang, Zhuosheng Zhang, Zhaopeng Tu, Haitao Mi, Dong Yu 2025-03-21 arXiv https://github.com/puddingyeah/PANEL http://arxiv.org/abs/2503.17363v1
212 RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation Linxi Liang, Jing Gong, Mingwei Liu, Chong Wang, Guangsheng Ou, Yanlin Wang, Xin Peng, Zibin Zheng 2025-03-21 arXiv https://github.com/SYSUSELab/RustEvo http://arxiv.org/abs/2503.16922v1
213 Variance Control via Weight Rescaling in LLM Pre-training Louis Owen, Abhay Kumar, Nilabhra Roy Chowdhury, Fabian Güra 2025-03-21 arXiv https://github.com/bluorion-com/weight_rescaling http://arxiv.org/abs/2503.17500v1
214 MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion Qizhi Pei, Lijun Wu, Zhuoshi Pan, Yu Li, Honglin Lin, Chenlin Ming, Xin Gao, Conghui He, Rui Yan 2025-03-20 arXiv https://github.com/QizhiPei/mathfusion http://arxiv.org/abs/2503.16212v1
215 Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Quy-Anh Dang, Chris Ngo 2025-03-20 arXiv https://github.com/knoveleng/open-rs http://arxiv.org/abs/2503.16219v1
216 The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination Yifan Sun, Han Wang, Dongbai Li, Gang Wang, Huan Zhang 2025-03-20 arXiv https://github.com/ASTRAL-Group/BDC_mitigation_assessment http://arxiv.org/abs/2503.16402v1
217 Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie 2025-03-20 arXiv https://github.com/lntzm/HICom https://doi.org/10.48550/arXiv.2503.16036
218 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Ben Hu 2025-03-20 arXiv https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs https://doi.org/10.48550/arXiv.2503.16419
219 Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, Chao Li, Sheng Xu, Dezhi Chen, Yun Chen, Zuo Bai, Liwen Zhang 2025-03-20 arXiv https://github.com/SUFE-AIFLM-Lab/Fin-R1 https://doi.org/10.48550/arXiv.2503.16252
220 Exploring Large Language Models for Word Games:Who is the Spy? Chentian Wei, Jiewei Chen, Jinzhu Xu 2025-03-19 arXiv https://github.com/ct-wei/Who-is-The-Spy https://doi.org/10.48550/arXiv.2503.15235
221 LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning Federico Cocchi, Nicholas Moratelli, Davide Caffagni, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara 2025-03-19 arXiv https://github.com/aimagelab/LLaVA-MORE http://arxiv.org/abs/2503.15621v1
222 VisNumBench: Evaluating Number Sense of Multimodal Large Language Models Tengjin Weng, Jingyi Wang, Wenhao Jiang, Zhong Ming 2025-03-19 arXiv https://wwwtttjjj.github.io/VisNumBench/ https://doi.org/10.48550/arXiv.2503.14939
223 Aligning Multimodal LLM with Human Preference: A Survey Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan 2025-03-18 arXiv https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment http://arxiv.org/abs/2503.14504v1
224 CodingGenie: A Proactive LLM-Powered Programming Assistant Sebastian Zhao, Alan Zhu, Hussein Mozannar, David Sontag, Ameet Talwalkar, Valerie Chen 2025-03-18 arXiv https://github.com/sebzhao/CodingGenie/ http://arxiv.org/abs/2503.14724v1
225 Learning on LLM Output Signatures for gray-box LLM Behavior Analysis Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gelberg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, Haggai Maron 2025-03-18 arXiv https://github.com/BarSGuy/LLM-Output-Signatures-Network http://arxiv.org/abs/2503.14043v1
226 Word2Minecraft: Generating 3D Game Levels through Large Language Models Shuo Huang, Muhammad Umair Nasir, Steven James, Julian Togelius 2025-03-18 arXiv https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0 https://doi.org/10.48550/arXiv.2503.16536
227 SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability Jiankang Wang, Zhihan Zhang, Zhihang Liu, Yang Li, Jiannan Ge, Hongtao Xie, Yongdong Zhang 2025-03-18 arXiv https://github.com/Jayce1kk/SpaceVLLM https://doi.org/10.48550/arXiv.2503.13983
228 Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Botian Shi, Ding Wang 2025-03-17 arXiv https://github.com/Wings-Of-Disaster/VaLiK http://arxiv.org/abs/2503.12972v1
229 Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari 2025-03-17 arXiv https://github.com/google-research-datasets/egotempo http://arxiv.org/abs/2503.13646v1
230 xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick M. Blies, Günter Klambauer, Sebastian Böck, Sepp Hochreiter 2025-03-17 arXiv https://github.com/NX-AI/xlstm http://arxiv.org/abs/2503.13427v1
231 NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran Wang 2025-03-17 arXiv https://github.com/sungyeonparkk/NuPlanQA https://doi.org/10.48550/arXiv.2503.12772
232 A Survey on the Memory Mechanism of Large Language Model based Agents Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen 2025-03-16 arXiv https://github.com/nuster1128/LLM_Agent_Memory_Survey https://doi.org/10.48550/arXiv.2404.13501
233 SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang 2025-03-16 arXiv https://github.com/AIoT-MLSys-Lab/SVD-LLM https://doi.org/10.48550/arXiv.2503.12340
234 HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs Tsz Chung Cheng, Chung Shing Cheng, Chaak Ming Lau, Eugene Tin-Ho Lam, Chun Yat Wong, Hoi On Yu, Cheuk Hei Chong 2025-03-16 arXiv https://github.com/hon9kon9ize/hkeval2025 http://arxiv.org/abs/2503.12440v1
235 Plausibility Vaccine: Injecting LLM Knowledge for Event Plausibility Jacob Chmura, Jonah Dauvet, Sebastian Sabry 2025-03-16 arXiv https://github.com/Jacob-Chmura/plausibility-vaccine http://arxiv.org/abs/2503.12667v1
236 FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szarvas, Xiaoyu Chu, Alexandru Iosup 2025-03-15 HotCloudPerf 2025 https://github.com/atlarge-research/FAILS http://arxiv.org/abs/2503.12185v1
237 MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling Zhaopeng Feng, Jiahan Ren, Jiayuan Su, Jiamei Zheng, Zhihang Tang, Hongwei Wang, Zuozhu Liu 2025-03-15 arXiv https://sabijun.github.io/MT_RewardTreePage http://arxiv.org/abs/2503.12123v1
238 An LLM-Integrated Framework for Completion, Management, and Tracing of STPA Ali Raeisdanaei, Juho Kim, Michael Liao, Sparsh Kochhar 2025-03-15 arXiv https://github.com/blueskysolarracing/stpa http://arxiv.org/abs/2503.12043v1
239 A Survey on Federated Fine-tuning of Large Language Models Yebo Wu, Chunlin Tian, Jingguang Li, He Sun, Kahou Tam, Li Li, Chengzhong Xu 2025-03-15 arXiv https://github.com/Clin0212/Awesome-Federated-LLM-Learning https://doi.org/10.48550/arXiv.2503.12016
240 CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Norgaard, Nayantara Mudur, Martyna Plomecka, Paul Raccuglia, Yasaman Bahri, Victor V. Albert, Pranesh Srinivasan, Haining Pan, Philippe Faist, Brian Rohr, Ekin Dogus Cubuk, Muratahan Aykol, Amil Merchant, Michael J. Statt, Dan Morris, Drew Purves, Elise Kleeman, Ruth Alcantara, Matthew Abraham, Muqthar Mohammad, Ean Phing VanLee, Chenfei Jiang, Elizabeth Dorfman, Eun-Ah Kim, Michael P Brenner, Viren Jain, Sameera Ponda, Subhashini Venugopalan 2025-03-14 arXiv https://github.com/google/curie http://arxiv.org/abs/2503.13517v2
241 FastVID: Dynamic Density Pruning for Fast Video Large Language Models Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding 2025-03-14 arXiv https://github.com/LunarShen/FastVID https://doi.org/10.48550/arXiv.2503.11187
242 Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space Weichen Zhan, Zile Zhou, Zhiheng Zheng, Chen Gao, Jinqiang Cui, Yong Li, Xinlei Chen, Xiao-Ping Zhang 2025-03-14 arXiv https://github.com/WeichenZh/Open3DVQA https://doi.org/10.48550/arXiv.2503.11094
243 ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning Xinyi Wang, Jiashui Wang, Peng Chen, Jinbo Su, Yanming Liu, Long Liu, Yangdong Wang, Qiyuan Chen, Kai Yun, Chunfu Jia 2025-03-14 arXiv https://github.com/wxy3596/ASMA-Tune http://arxiv.org/abs/2503.11617v1
244 Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space Zhiliang Chen, Xinyuan Niu, Chuan-Sheng Foo, Bryan Kian Hsiang Low 2025-03-14 arXiv https://github.com/chenzhiliang94/convo-plan-SCOPE http://arxiv.org/abs/2503.11586v1
245 MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens Jeong Hun Yeo, Hyeongseop Rha, Se Jin Park, Yong Man Ro 2025-03-14 arXiv https://github.com/JeongHun0716/MMS-LLaMA http://arxiv.org/abs/2503.11315v1
246 TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models Xudong Tan, Peng Ye, Chongjun Tu, Jianjian Cao, Yaoxin Yang, Lin Zhang, Dongzhan Zhou, Tao Chen 2025-03-13 arXiv https://github.com/ShawnTan86/TokenCarve https://doi.org/10.48550/arXiv.2503.10501
247 ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs Xin Liu, Pei Liu, Guoming Tang 2025-03-13 arXiv https://github.com/SusCom-Lab/ZeroMerge http://arxiv.org/abs/2503.10714v1
248 RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs Zhongzhan Huang, Guoming Ling, Vincent S. Liang, Yupei Lin, Yandong Chen, Shanshan Zhong, Hefeng Wu, Liang Lin 2025-03-13 GoogleScholar https://github.com/MilkThink-Lab/RouterEval http://arxiv.org/abs/2503.10657v1
249 Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set Florian Eichin, Yang Janet Liu, Barbara Plank, Michael A. Hedderich 2025-03-13 arXiv https://github.com/mainlp/discourse_probes http://arxiv.org/abs/2503.10515v1
250 ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs Xin Liu, Pei Liu, Guoming Tang 2025-03-13 arXiv https://github.com/SusCom-Lab/ZSMerge http://arxiv.org/abs/2503.10714v2
251 Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs Ariba Khan, Stephen Casper, Dylan Hadfield-Menell 2025-03-13 arXiv:2503.08688, 2025 https://github.com/ariba-k/llm-cultural-alignment-evaluation http://arxiv.org/abs/2503.08688v1
252 Route Sparse Autoencoder to Interpret Large Language Models Wei Shi, Sihang Li, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang, Xiangnan He 2025-03-13 arXiv https://github.com/swei2001/RouteSAEs https://doi.org/10.48550/arXiv.2503.08200
253 OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model Bowen Zhang, Pengcheng Luo 2025-03-13 arXiv https://github.com/bwz96sco/or_llm_agent https://doi.org/10.48550/arXiv.2503.10009
254 Learning to Inference Adaptively for Multimodal Large Language Models Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Saurabh Bagchi, Somali Chaterji, Yingyu Liang, Yin Li 2025-03-13 arXiv https://zhuoyan-xu.github.io/ada-llava/ https://doi.org/10.48550/arXiv.2503.10905
255 Adapting Large Language Models for Parameter-Efficient Log Anomaly Detection Ying Fu Lim, Jiawen Zhu, Guansong Pang 2025-03-13 arXiv https://github.com/mala-lab/LogADReft https://doi.org/10.48550/arXiv.2503.08045
256 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister 2025-03-13 arXiv https://4d-langsplat.github.io https://doi.org/10.48550/arXiv.2503.10437
257 Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs Jiani Huang, Shijie Wang, Liang-bo Ning, Wenqi Fan, Shuaiqiang Wang, Dawei Yin, Qing Li 2025-03-12 arXiv https://github.com/jiani-huang/RecBench http://arxiv.org/abs/2503.09382v1
258 RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports Jiushen Cai, Weihang Zhang, Hanruo Liu, Ningli Wang, Huiqi Li 2025-03-12 arXiv https://github.com/AB-Story/RetSTA-7B http://arxiv.org/abs/2503.09358v1
259 What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models Abhipsha Das, Nicholas Lourie, Siavash Golkar, Mariel Pettee 2025-03-12 arXiv https://github.com/chiral-carbon/kg-for-science http://arxiv.org/abs/2503.09894v1
260 Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents Dongjun Lee, Juyong Lee, Kyuyoung Kim, Jihoon Tack, Jinwoo Shin, Yee Whye Teh, Kimin Lee 2025-03-12 arXiv https://lcowiclr2025.github.io http://arxiv.org/abs/2503.10689v1
261 CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data Adel ElZemity, Budi Arief, Shujun Li 2025-03-12 arXiv https://github.com/Adelsamir01/CyberLLMInstruct http://arxiv.org/abs/2503.09334v1
262 CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection Richard A. Dubniczky, Krisztofer Zoltán Horvát, Tamás Bisztray, Mohamed Amine Ferrag, Lucas C. Cordeiro, Norbert Tihanyi 2025-03-12 arXiv https://github.com/CASTLE-Benchmark http://arxiv.org/abs/2503.09433v1
263 Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wangxiang Che 2025-03-12 arXiv https://long-cot.github.io/ https://doi.org/10.48550/arXiv.2503.09567
264 MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization Zongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan 2025-03-12 arXiv https://github.com/ZongwuWang/MILLION http://arxiv.org/abs/2504.03661v1
265 BYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More Excellent Hongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Mingjie Xing, Yanjun Wu 2025-03-12 arXiv https://github.com/LHY-24/BYOS https://doi.org/10.48550/arXiv.2503.09663
266 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han 2025-03-12 arXiv https://github.com/PeterGriffinJin/Search-R1 http://arxiv.org/abs/2503.09516v1
267 NVP-HRI: Zero shot natural voice and posture-based human-robot interaction via large language model Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Thomas Weber, Matthias Rätsch 2025-03-12 Expert Syst. Appl. https://github.com/laiyuzhi/NVP-HRI https://doi.org/10.1016/j.eswa.2024.126360
268 Process-Supervised LLM Recommenders via Flow-guided Tuning Chongming Gao, Mengyao Gao, Chenxiao Fan, Shuai Yuan, Wentao Shi, Xiangnan He 2025-03-11 arXiv …, 2025 https://github.com/Mr-Peach0301/Flower http://arxiv.org/abs/2503.07377v1
269 Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset Anand Menon, Samit S. Miftah, Shamik Kundu, Souvik Kundu, Amisha Srivastava, Arnab Raha, Gabriel Theodor Sonnenschein, Suvadeep Banerjee, Deepak Mathaikutty, Kanad Basu 2025-03-11 arXiv https://github.com/AnandMenon12/VERT https://doi.org/10.48550/arXiv.2503.08923
270 V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation Guiwei Zhang, Tianyu Zhang, Mohan Zhou, Yalong Bai, Biye Li 2025-03-11 arXiv https://github.com/zhangguiwei610/V2Flow https://doi.org/10.48550/arXiv.2503.07493
271 DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun 2025-03-11 arXiv …, 2025 https://github.com/jongwooko/distillm-2 http://arxiv.org/abs/2503.07067v1
272 ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration Mengting Ai, Tianxin Wei, Yifan Chen, Zhichen Zeng, Ritchie Zhao, Girish Varatkar, Bita Darvish Rouhani, Xianfeng Tang, Hanghang Tong, Jingrui He 2025-03-11 arXiv …, 2025 https://github.com/iDEA-iSAIL-Lab-UIUC/ResMoE http://arxiv.org/abs/2503.06881v1
273 Graphormer-Guided Task Planning: Beyond Static Rules with LLM Safety Perception Wanjing Huang, Tongjie Pan, Yalan Ye 2025-03-11 arXiv:2503.06866, 2025 https://github.com/hwj20/GGTP http://arxiv.org/abs/2503.06866v1
274 Roamify: Designing and Evaluating an LLM Based Google Chrome Extension for Personalised Itinerary Planning Vikranth Udandarao, Noel Abraham Tiju, Muthuraj Vairamuthu, Harsh Mistry, Dhruv Kumar 2025-03-10 arXiv https://github.com/Roamify-Research/Roamify http://arxiv.org/abs/2504.10489v1
275 AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot Xiao Wang, Lu Dong, Sahana Rangasrinivasan, Ifeoma Nwogu, Srirangaraj Setlur, Venugopal Govindaraju 2025-03-09 arXiv https://wangxiaoshawn.github.io/AutoMisty.html http://arxiv.org/abs/2503.06791v1
276 How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders Tatsuro Inaba, Kentaro Inui, Yusuke Miyao, Yohei Oseki, Benjamin Heinzerling, Yu Takagi 2025-03-09 arXiv https://github.com/llm-jp/llm-jp-sae http://arxiv.org/abs/2503.06394v1
277 DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments Wenjie Tang, Yuan Zhou, Erqiang Xu, Keyan Cheng, Minne Li, Liquan Xiao 2025-03-08 arXiv https://github.com/DeciBrain-Group/DSGBench http://arxiv.org/abs/2503.06047v1
278 Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen 2025-03-08 arXiv https://github.com/EIT-NLP/Layer_Select_Fuse_for_MLLM http://arxiv.org/abs/2503.06063v1
279 SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li 2025-03-08 arXiv https://github.com/Lucky-Lance/SmartBench http://arxiv.org/abs/2503.06029v1
280 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Simon A. Aytes, Jinheon Baek, Sung Ju Hwang 2025-03-07 arXiv https://www.github.com/SimonAytes/SoT http://arxiv.org/abs/2503.05179v1
281 RocketEval: Efficient Automated LLM Evaluation via Grading Checklist Tianjun Wei, Wei Wen, Ruizhi Qiao, Xing Sun, Jianghong Ma 2025-03-07 arXiv https://github.com/Joinn99/RocketEval-ICLR http://arxiv.org/abs/2503.05142v1
282 A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval Yu Zhang, Shutong Qiao, Jiaqi Zhang, Tzu-Heng Lin, Chen Gao, Yong Li 2025-03-07 arXiv https://github.com/tsinghua-fib-lab/LLM-Agent-for-Recommendation-and-Search https://doi.org/10.48550/arXiv.2503.05659
283 Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching Bowen Pang, Kai Li, Feifan Wang 2025-03-07 arXiv https://github.com/KevinLee1110/dynamic-batching http://arxiv.org/abs/2503.05248v1
284 TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge Cheng-Han Chiang, Hung-yi Lee, Michal Lukasik 2025-03-06 arXiv https://github.com/d223302/TRACT http://arxiv.org/abs/2503.04381v1
285 Insights from Rights and Wrongs: A Large Language Model for Solving Assertion Failures in RTL Design Jie Zhou, Youshu Ji, Ning Wang, Yuchen Hu, Xinyao Jiao, Bingkun Yao, Xinwei Fang, Shuai Zhao, Nan Guan, Zhe Jiang 2025-03-06 arXiv https://github.com/SEU-ACAL/reproduce-AssertSolver-DAC-25 https://doi.org/10.48550/arXiv.2503.04057
286 Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Wenke Huang, Jian Liang, Xianda Guo, Yiyang Fang, Guancheng Wan, Xuankun Rong, Chi Wen, Zekun Shi, Qingyun Li, Didi Zhu, Yanbiao Ma, Ke Liang, Bin Yang, He Li, Jiawei Shao, Mang Ye, Bo Du 2025-03-06 arXiv https://github.com/WenkeHuang/Awesome-MLLM-Tuning https://doi.org/10.48550/arXiv.2503.04543
287 Predictable Scale: Part I - Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Houyi Li, Wenzheng Zheng, Jingcheng Hu, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang 2025-03-06 arXiv https://step-law.github.io/ https://doi.org/10.48550/arXiv.2503.04715
288 Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation Armel Zebaze, Benoît Sagot, Rachel Bawden 2025-03-06 arXiv https://github.com/ArmelRandy/compositional-translation http://arxiv.org/abs/2503.04554v1
289 DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation Amin Karimi, Charalambos Poullis 2025-03-06 arXiv https://github.com/aminpdik/DSV-LFS http://arxiv.org/abs/2503.04006v1
290 Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English Runtao Zhou, Guangya Wan, Saadia Gabriel, Sheng Li, Alexander J Gates, Maarten Sap, Thomas Hartvigsen 2025-03-06 arXiv https://github.com/Runtaozhou/dialect_bias_eval http://arxiv.org/abs/2503.04099v1
291 Lost in Literalism: How Supervised Training Shapes Translationese in LLMs Yafu Li, Ronghao Zhang, Zhilin Wang, Huajian Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang 2025-03-06 arXiv https://github.com/yafuly/LLM_Translationese http://arxiv.org/abs/2503.04369v1
292 AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks Javier Yong, Haokai Ma, Yunshan Ma, Anis Yusof, Zhenkai Liang, Ee-Chien Chang 2025-03-05 arXiv https://github.com/Javiery3889/AttackSeqBench https://doi.org/10.48550/arXiv.2503.03170
293 LeRAAT: LLM-Enabled Real-Time Aviation Advisory Tool Marc R. Schlichting, Vale Rasmussen, Heba Alazzeh, Houjun Liu, Kiana Jafari, Amelia F. Hardy, Dylan M. Asmar, Mykel J. Kochenderfer 2025-03-05 arXiv https://github.com/sisl/LeRAAT/ http://arxiv.org/abs/2503.16477v1
294 Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence Cristian Jimenez-Romero, Alper Yegenoglu, Christian Blum 2025-03-05 arXiv https://github.com/crjimene/swarm_gpt https://doi.org/10.48550/arXiv.2503.03800
295 Improving LLM Safety Alignment with Dual-Objective Optimization Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song 2025-03-05 arXiv https://github.com/wicai24/DOOR-Alignment http://arxiv.org/abs/2503.03710v1
296 LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models Xi Zhu, Haochen Xue, Ziwei Zhao, Wujiang Xu, Jingyuan Huang, Minghao Guo, Qifan Wang, Kaixiong Zhou, Yongfeng Zhang 2025-03-05 arXiv https://github.com/agiresearch/PromptGFM http://arxiv.org/abs/2503.03313v1
297 ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks Heng Zhou, Hejia Geng, Xiangyuan Xue, Zhenfei Yin, Lei Bai 2025-03-04 arXiv https://github.com/hengzzzhou/ReSo http://arxiv.org/abs/2503.02390v2
298 Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen 2025-03-04 arXiv https://github.com/open-compass/ANAH http://arxiv.org/abs/2503.02846v1
299 Wikipedia in the Era of LLMs: Evolution and Risks Siming Huang, Yuliang Xu, Mingmeng Geng, Yao Wan, Dongping Chen 2025-03-04 arXiv https://github.com/HSM316/LLM_Wikipedia http://arxiv.org/abs/2503.02879v1
300 Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, Tat-Seng Chua 2025-03-04 arXiv https://github.com/SnowCharmQ/DPL http://arxiv.org/abs/2503.02450v1
301 Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers Zicong He, Boxuan Zhang, Lu Cheng 2025-03-04 arXiv https://github.com/ZicongHe2002/HCL-Spark http://arxiv.org/abs/2503.02851v1
302 It Helps to Take a Second Opinion: Teaching Smaller LLMs to Deliberate Mutually via Selective Rationale Optimisation Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy 2025-03-04 arXiv https://github.com/Sohanpatnaik106/coalition http://arxiv.org/abs/2503.02463v1
303 PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models Xueliang Zhao, Wei Wu, Jian Guan, Lingpeng Kong 2025-03-04 arXiv https://github.com/zhaoxlpku/PromptCoT https://doi.org/10.48550/arXiv.2503.02324
304 LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models Pengwei Tang, Yong Liu, Dongjie Zhang, Xing Wu, Debing Zhang 2025-03-04 arXiv https://github.com/HungerPWAY/LoRA-Null https://doi.org/10.48550/arXiv.2503.02659
305 Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions Zirui Wu, Xiao Liu, Jiayi Li, Lingpeng Kong, Yansong Feng 2025-03-04 arXiv https://github.com/WilliamZR/Recipe2Plan http://arxiv.org/abs/2503.02238v1
306 Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs Wei-Yao Wang, Zhao Wang, Helen Suzuki, Yoshiyuki Kobayashi 2025-03-04 arXiv https://github.com/sony/aki http://arxiv.org/abs/2503.02597v1
307 CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom Yisen Li, Lingfeng Yang, Wenxuan Shen, Pan Zhou, Yao Wan, Weiwei Lin, Dongping Chen 2025-03-03 arXiv https://github.com/listentm/crowdselect http://arxiv.org/abs/2503.01836v1
308 Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia Chenxi Wang, Tianle Gu, Zhongyu Wei, Lang Gao, Zirui Song, Xiuying Chen 2025-03-03 arXiv https://github.com/Aurora-cx/TypoLLM http://arxiv.org/abs/2503.01714v1
309 Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue 2025-03-03 arXiv https://github.com/SparkAudio/Spark-TTS http://arxiv.org/abs/2503.01710v1
310 Liger: Linearizing Large Language Models to Gated Recurrent Structures Disen Lan, Weigao Sun, Jiaxi Hu, Jusen Du, Yu Cheng 2025-03-03 arXiv https://github.com/OpenSparseLLMs/Linearization https://doi.org/10.48550/arXiv.2503.01496
311 MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Kunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, Jiaxuan You 2025-03-03 arXiv https://github.com/MultiagentBench/MARBLE http://arxiv.org/abs/2503.01935v1
312 Position: Don't use the CLT in LLM evals with fewer than a few hundred datapoints Sam Bowyer, Laurence Aitchison, Desi R. Ivanova 2025-03-03 arXiv https://github.com/sambowyer/bayes_evals http://arxiv.org/abs/2503.01747v2
313 Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models Tianjie Ju, Yi Hua, Hao Fei, Zhenyu Shao, Yubin Zheng, Haodong Zhao, Mong-Li Lee, Wynne Hsu, Zhuosheng Zhang, Gongshen Liu 2025-03-03 arXiv https://github.com/illusionhi/ProbingPrivacy https://doi.org/10.48550/arXiv.2503.01208
314 Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace Jia-Chen Zhang, Yu-Jie Xiong, Chun-Ming Xia, Dong-Hai Zhu, Xi-He Qiu 2025-03-03 COLING https://github.com/Godz-z/DCFT https://aclanthology.org/2025.coling-main.265/
315 OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization based on CFD Yuxuan Chen, Long Zhang, Xu Zhu, Hua Zhou, Zhuyin Ren 2025-03-03 arXiv https://github.com/Terry-cyx/MetaOpenFOAM https://doi.org/10.48550/arXiv.2503.01273
316 Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios Bryan Chen Zhengyu Tan, Roy Ka-Wei Lee 2025-03-03 arXiv https://inc0mple.github.io/Implicit_Bias_Interactive_Data_Viz http://arxiv.org/abs/2503.01532v1
317 MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages Chen Zhang, Mingxu Tao, Zhiyuan Liao, Yansong Feng 2025-03-03 arXiv https://github.com/luciusssss/MiLiC-Eval http://arxiv.org/abs/2503.01150v1
318 Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao 2025-03-02 arXiv https://github.com/hypasd-art/ETAPP http://arxiv.org/abs/2503.00771v1
319 HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H. F. Ng, Qing Li 2025-03-02 arXiv https://github.com/jzzzzh/HiBench http://arxiv.org/abs/2503.00912v1
320 LLMDR: LLM-Driven Deadlock Detection and Resolution in Multi-Agent Pathfinding Seungbae Seo, Junghwan Kim, Minjeong Shin, Bongwon Suh 2025-03-02 arXiv https://github.com/ssbacc/llmdr-dhc http://arxiv.org/abs/2503.00717v1
321 Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions Shiyu Fang, Jiaqi Liu, Chengkai Xu, Chen Lv, Peng Hang, Jian Sun 2025-03-01 arXiv https://github.com/FanGShiYuu/Actor-Reasoner http://arxiv.org/abs/2503.00502v1
322 U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack Yunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen Wang 2025-03-01 arXiv https://github.com/Tongji-KGLLM/U-NIAH http://arxiv.org/abs/2503.00353v1
323 LLM Post-Training: A Deep Dive into Reasoning Large Language Models Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Salman H. Khan, Fahad Shahbaz Khan 2025-02-28 arXiv https://github.com/mbzuai-oryx/Awesome-LLM-Post-training https://doi.org/10.48550/arXiv.2502.21321
324 Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy, Abdelrahim A. Elmadany, Omer Nacar, El Moatez Billah Nagoudi, Reem Abdel-Salam, Hanin Atwany, Youssef Nafea, Abdulfattah Mohammed Yahya, Rahaf Alhamouri, Hamzah A. Alsayadi, Hiba Zayed, Sara Shatnawi, Serry Sibaee, Yasir Ech-Chammakhy, Walid Al-Dhabyani, Marwa Mohamed Ali, Imen Jarraya, Ahmed Oumar El-Shangiti, Aisha Alraeesi, Mohammed Anwar Al-Ghrawi, Abdulrahman S. Al-Batati, Elgizouli Mohamed, Noha Taha Elgindi, Muhammed Saeed, Houdaifa Atou, Issam Ait Yahia, Abdelhak Bouayad, Mohammed Machrouh, Amal Makouar, Dania Alkawi, Mukhtar Mohamed, Safaa Taher Abdelfadil, Amine Ziad Ounnoughene, Rouabhia Anfel, Rwaa Assi, Ahmed Sorkatti, Mohamedou Cheikh Tourad, Anis Koubaa, Ismail Berrada, Mustafa Jarrar, Shady Shehata, Muhammad Abdul-Mageed 2025-02-28 arXiv https://github.com/UBC-NLP/palm http://arxiv.org/abs/2503.00151v1
325 UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation Thanet Markchom, Tong Wu, Liting Huang, Huizhi Liang 2025-02-28 arXiv https://github.com/tongwu17/SemEval-2025-Task1-UoR-NCL http://arxiv.org/abs/2502.20984v2
326 InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma 2025-02-28 arXiv https://github.com/FunAudioLLM/InspireMusic https://doi.org/10.48550/arXiv.2503.00084
327 DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning Pengcheng Jiang, Jiacheng Lin, Lang Cao, Runchu Tian, SeongKu Kang, Zifeng Wang, Jimeng Sun, Jiawei Han 2025-02-28 arXiv https://github.com/pat-jj/DeepRetrieval https://doi.org/10.48550/arXiv.2503.00223
328 Self-Training Elicits Concise Reasoning in Large Language Models Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun 2025-02-27 arXiv https://github.com/TergelMunkhbat/concise-reasoning https://doi.org/10.48550/arXiv.2502.20122
329 Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen 2025-02-27 arXiv http://vulnerable-ai-agents.github.io http://arxiv.org/abs/2502.20383v1
330 SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks Nikolay Blagoev, Lydia Yiyu Chen, Oğuzhan Ersoy 2025-02-27 arXiv https://github.com/gensyn-ai/skippipe http://arxiv.org/abs/2502.19913v1
331 LongRoPE2: Near-Lossless LLM Context Window Scaling Ning Shang, Li Lyna Zhang, Siyuan Wang, Gaokai Zhang, Gilsinia Lopez, Fan Yang, Weizhu Chen, Mao Yang 2025-02-27 arXiv https://github.com/microsoft/LongRoPE http://arxiv.org/abs/2502.20082v1
332 ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang 2025-02-27 arXiv https://github.com/agiresearch/ECCOS http://arxiv.org/abs/2502.20576v2
333 Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel Kang 2025-02-27 arXiv https://github.com/uiuc-kang-lab/AdaptiveAttackAgent http://arxiv.org/abs/2503.00061v2
334 A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs Julius Broomfield, Kartik Sharma, Srijan Kumar 2025-02-27 arXiv https://github.com/claws-lab/persona-modality http://arxiv.org/abs/2502.20504v1
335 SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model Xinghao Wang, Feng Liu, Rui Su, Zhihui Wang, Lei Bai, Wanli Ouyang 2025-02-27 arXiv https://github.com/StarMoonWang/SeisMoLLM https://doi.org/10.48550/arXiv.2502.19960
336 Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao 2025-02-27 arXiv https://github.com/MaybeLizzy/UGBench https://doi.org/10.48550/arXiv.2502.19982
337 Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents Haochen Sun, Shuwen Zhang, Lei Ren, Hao Xu, Hao Fu, Caixia Yuan, Xiaojie Wang 2025-02-27 arXiv https://github.com/YusaeMeow/Collab-Overcooked https://doi.org/10.48550/arXiv.2502.20073
338 Beneath the Surface: How Large Language Models Reflect Hidden Bias Jinhao Pan, Chahat Raj, Ziyu Yao, Ziwei Zhu 2025-02-27 arXiv https://github.com/JP-25/Hidden-Bias-Benchmark https://doi.org/10.48550/arXiv.2502.19749
339 Foot-In-The-Door: A Multi-turn Jailbreak for LLMs Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang 2025-02-27 arXiv https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak http://arxiv.org/abs/2502.19820v2
340 Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang 2025-02-27 arXiv https://github.com/agiresearch/ECCOS http://arxiv.org/abs/2502.20576v4
341 Protecting multimodal large language models against misleading visualizations Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych 2025-02-27 arXiv https://github.com/UKPLab/arxiv2025-misleading-visualizations https://doi.org/10.48550/arXiv.2502.20503
342 AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, Yong Li 2025-02-26 arXiv https://tsinghua-fib-lab.github.io/AgentSocietyChallenge http://arxiv.org/abs/2502.18754v1
343 TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation Chenlu Ju, Jiaxin Liu, Shobhit Sinha, Hao Xue, Flora Salim 2025-02-26 arXiv https://github.com/cju0/TrajLLM http://arxiv.org/abs/2502.18712v1
344 Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs Yiheng Yang, Yujie Wang, Chi Ma, Lei Yu, Emmanuele Chersoni, Chu-Ren Huang 2025-02-26 arXiv https://github.com/Oldify/CLADA http://arxiv.org/abs/2502.19078v1
345 Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian McAuley 2025-02-26 arXiv https://github.com/dayuyang1999/Awesome-Code-Reasoning http://arxiv.org/abs/2502.19411v1
346 Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang 2025-02-26 arXiv https://zowiezhang.github.io/projects/Amulet http://arxiv.org/abs/2502.19148v1
347 Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation Yuxiang Wang, Xinnan Dai, Wenqi Fan, Yao Ma 2025-02-26 arXiv https://github.com/myflashbarry/LLM-benchmarking http://arxiv.org/abs/2502.18771v1
348 OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models Hui Feng, Yuntzu Yin, Emiliano Reynares, Jay Nanavati 2025-02-26 arXiv https://github.com/iqvianlp/ontologyRAG https://doi.org/10.48550/arXiv.2502.18992
349 A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Angelica I Aviles-Rivero, Chuanlong Xie, Yao Zhu 2025-02-26 arXiv https://github.com/920927/SLM-a-sliding-layer-merging-method http://arxiv.org/abs/2502.19159v3
350 JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models Shuyi Liu, Simiao Cui, Haoran Bu, Yuming Shang, Xi Zhang 2025-02-26 arXiv https://github.com/STAIR-BUPT/JailBench https://doi.org/10.48550/arXiv.2502.18935
351 ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models Danae Sánchez Villegas, Ingo Ziegler, Desmond Elliott 2025-02-26 arXiv https://github.com/danaesavi/ImageChain https://doi.org/10.48550/arXiv.2502.19409
352 Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models Shuliang Liu, Xinze Li, Zhenghao Liu, Yukun Yan, Cheng Yang, Zheni Zeng, Zhiyuan Liu, Maosong Sun, Ge Yu 2025-02-26 arXiv https://github.com/OpenBMB/ConsJudge https://doi.org/10.48550/arXiv.2502.18817
353 Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features Shinwoo Park, Hyundong Jin, Jeong-won Cha, Yo-Sub Han 2025-02-25 arXiv https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features http://arxiv.org/abs/2502.17749v2
354 Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers Hannah Calzi Kleidermacher, James Zou 2025-02-25 arXiv https://hankleid.github.io/ProjectMundo http://arxiv.org/abs/2502.17882v1
355 RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction Jianhao Yan, Yun Luo, Yue Zhang 2025-02-25 arXiv https://github.com/ElliottYan/RefuteBench-2.0 http://arxiv.org/abs/2502.18308v1
356 Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst 2025-02-25 arXiv https://github.com/gayecolakoglu/LayIE-LLM http://arxiv.org/abs/2502.18179v1
357 LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena Tianmi Ma, Jiawei Du, Wenxin Huang, Wenjie Wang, Liang Xie, Xian Zhong, Joey Tianyi Zhou 2025-02-25 arXiv https://github.com/wekjsdvnm/Agent-Trading-Arena http://arxiv.org/abs/2502.17967v1
358 Detecting LLM-Generated Korean Text through Linguistic Feature Analysis Shinwoo Park, Shubin Kim, Do-Kyung Kim, Yo-Sub Han 2025-02-25 arXiv https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis http://arxiv.org/abs/2503.00032v2
359 Can Multimodal LLMs Perform Time Series Anomaly Detection? Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S. Yu, Yue Zhao, Kai Shu 2025-02-25 arXiv https://mllm-ts.github.io http://arxiv.org/abs/2502.17812v1
360 Scalable Best-of-N Selection for Large Language Models via Self-Certainty Zhewei Kang, Xuandong Zhao, Dawn Song 2025-02-25 arXiv https://github.com/backprop07/Self-Certainty https://doi.org/10.48550/arXiv.2502.18581
361 LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation Pengzhi Li, Pengfei Yu, Zide Liu, Wei He, Xuhao Pan, Xudong Rao, Tao Wei, Wei Chen 2025-02-25 arXiv https://zrealli.github.io/LDGen https://doi.org/10.48550/arXiv.2502.18302
362 Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang, Kewei Tu 2025-02-25 arXiv https://github.com/Chord-Chen-30/VLLM-KnowledgeBoundary https://doi.org/10.48550/arXiv.2502.18023
363 Harnessing Multiple Large Language Models: A Survey on LLM Ensemble Zhijun Chen, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Dingqi Yang, Hailong Sun, Philip S. Yu 2025-02-25 arXiv https://github.com/junchenzhi/Awesome-LLM-Ensemble https://doi.org/10.48550/arXiv.2502.18036
364 Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang 2025-02-25 arXiv https://github.com/Optimization-AI/DFT https://doi.org/10.48550/arXiv.2502.18679
365 Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs Himanshu Beniwal, Sailesh Panda, Mayank Singh 2025-02-24 arXiv https://github.com/himanshubeniwal/X-BAT http://arxiv.org/abs/2502.16901v1
366 MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski 2025-02-24 arXiv https://github.com/saccharomycetes/mllms_know http://arxiv.org/abs/2502.17422v1
367 From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs Ruxiao Chen, Chenguang Wang, Yuran Sun, Xilei Zhao, Susu Xu 2025-02-24 arXiv https://github.com/SusuXu-s-Lab/FLARE http://arxiv.org/abs/2502.17701v1
368 Delta Decompression for MoE-based LLMs Compression Hao Gu, Wei Li, Lujun Li, Qiyuan Zhu, Mark Lee, Shengjie Sun, Wei Xue, Yike Guo 2025-02-24 arXiv https://github.com/lliai/D2MoE http://arxiv.org/abs/2502.17298v1
369 ConvoyLLM: Dynamic Multi-Lane Convoy Control Using LLMs Liping Lu, Zhican He, Duanfeng Chu, Rukang Wang, Saiqian Peng, Pan Zhou 2025-02-24 arXiv https://github.com/chuduanfeng/ConvoyLLM http://arxiv.org/abs/2502.17529v2
370 CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought Boxuan Zhang, Ruqi Zhang 2025-02-24 arXiv https://github.com/ZBox1005/CoT-UQ http://arxiv.org/abs/2502.17214v1
371 Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing Yi-Kai Zhang, De-Chuan Zhan, Han-Jia Ye 2025-02-24 arXiv https://cit-llm-routing.github.io http://arxiv.org/abs/2502.17282v1
372 COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs Liming Liu, Zhenghao Xu, Zixuan Zhang, Hao Kang, Zichong Li, Chen Liang, Weizhu Chen, Tuo Zhao 2025-02-24 arXiv https://github.com/lliu606/COSMOS http://arxiv.org/abs/2502.17410v2
373 On Relation-Specific Neurons in Large Language Models Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze 2025-02-24 arXiv https://github.com/cisnlp/relation-specific-neurons https://doi.org/10.48550/arXiv.2502.17355
374 LongSafety: Evaluating Long-Context Safety of Large Language Models Yida Lu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Cunxiang Wang, Xiaotao Gu, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang 2025-02-24 arXiv https://github.com/thu-coai/LongSafety https://doi.org/10.48550/arXiv.2502.16971
375 LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu 2025-02-24 arXiv https://github.com/NEUIR/LLM-QE https://doi.org/10.48550/arXiv.2502.17057
376 Introducing Visual Perception Token into Multimodal Large Language Model Runpeng Yu, Xinyin Ma, Xinchao Wang 2025-02-24 arXiv https://github.com/yu-rp/VisualPerceptionToken https://doi.org/10.48550/arXiv.2502.17425
377 LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models Zhenyu Wang 2025-02-24 arXiv https://github.com/zhenyu-02/LogitLens4LLMs https://doi.org/10.48550/arXiv.2503.11667
378 From System 1 to System 2: A Survey of Reasoning Large Language Models Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhiwei Li, Bao-Long Bi, Ling-Rui Mei, Junfeng Fang, Zhijiang Guo, Le Song, Cheng-Lin Liu 2025-02-24 arXiv https://github.com/zzli2022/Awesome-Slow-Reason-System https://doi.org/10.48550/arXiv.2502.17419
379 VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models Jen-tse Huang, Dasen Dai, Jen-Yuan Huang, Youliang Yuan, Xiaoyuan Liu, Wenxuan Wang, Wenxiang Jiao, Pinjia He, Zhaopeng Tu 2025-02-23 arXiv https://github.com/CUHK-ARISE/VisFactor https://doi.org/10.48550/arXiv.2502.16435
380 BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng 2025-02-23 arXiv https://github.com/zhao-ht/BioMaze https://doi.org/10.48550/arXiv.2502.16660
381 CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale Chenlong Wang, Zhaoyang Chu, Zhengxiang Cheng, Xuyi Yang, Kaiyue Qiu, Yao Wan, Zhou Zhao, Xuanhua Shi, Dongping Chen 2025-02-23 arXiv https://github.com/Lucky-voyage/Code-Sync https://doi.org/10.48550/arXiv.2502.16645
382 CER: Confidence Enhanced Reasoning in LLMs Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah 2025-02-22 arXiv …, 2025 https://github.com/ http://arxiv.org/abs/2502.14634v1
383 Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations Chunyang Li, Weiqi Wang, Tianshi Zheng, Yangqiu Song 2025-02-22 arXiv https://github.com/lcy2723/Robust-Rule-Induction http://arxiv.org/abs/2502.16169v1
384 Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens Ziwei Shan, Yaoyu He, Chengfeng Zhao, Jiashen Du, Jingyan Zhang, Qixuan Zhang, Jingyi Yu, Lan Xu 2025-02-22 arXiv https://koyui.github.io/mojito/ http://arxiv.org/abs/2502.16175v1
385 OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models Wenwen Yu, Zhibo Yang, Jianqiang Wan, Sibo Song, Jun Tang, Wenqing Cheng, Yuliang Liu, Xiang Bai 2025-02-22 arXiv https://github.com/AlibabaResearch/AdvancedLiterateMachinery https://doi.org/10.48550/arXiv.2502.16161
386 Dynamic Low-Rank Sparse Adaptation for Large Language Models Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji 2025-02-22 arXiv https://github.com/wzhuang-xmu/LoSA https://doi.org/10.48550/arXiv.2502.14816
387 Plan-over-Graph: Towards Parallelable LLM Agent Schedule Shiqi Zhang, Xinbei Ma, Zouying Cao, Zhuosheng Zhang, Hai Zhao 2025-02-21 arXiv:2502.14563, 2025 https://github.com/zsq259/Plan-over-Graph http://arxiv.org/abs/2502.14563v1
388 FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs Madhurima Chakraborty, Peter Pirkelbauer, Qing Yi 2025-02-21 arXiv https://github.com/MadhuNimmo/FormalSpecCpp http://arxiv.org/abs/2502.15217v1
389 Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems Tianjie Ju, Bowen Wang, Hao Fei, Mong-Li Lee, Wynne Hsu, Yun Li, Qianren Wang, Pengzhou Cheng, Zongru Wu, Zhuosheng Zhang, Gongshen Liu 2025-02-21 arXiv https://github.com/wbw625/MultiAgentRobustness http://arxiv.org/abs/2502.15153v1
390 Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs Danni Liu, Jan Niehues 2025-02-21 arXiv:2502.14830, 2025 https://github.com/dannigt/mid-align http://arxiv.org/abs/2502.14830v1
391 A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation Shilong Hou, Ruilin Shang, Zi Long, Xianghua Fu, Yin Chen 2025-02-21 arXiv https://github.com/Mebymeby/Pseudonymization-Framework http://arxiv.org/abs/2502.15233v1
392 PredictaBoard: Benchmarking LLM Score Predictability Lorenzo Pacchiardi, Konstantinos Voudouris, Ben Slater, Fernando Martínez-Plumed, José Hernández-Orallo, Lexin Zhou, Wout Schellaert 2025-02-21 arXiv …, 2025 https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard http://arxiv.org/abs/2502.14445v1
393 Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing Qi Le, Enmao Diao, Ziyan Wang, Xinran Wang, Jie Ding, Li Yang, Ali Anwar 2025-02-21 arXiv https://github.com/Qi-Le1/Probe_Pruning http://arxiv.org/abs/2502.15618v1
394 STeCa: Step-level Trajectory Calibration for LLM Agent Learning Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li 2025-02-21 arXiv:2502.14276, 2025 https://github.com/WangHanLinHenry/STeCa http://arxiv.org/abs/2502.14276v1
395 Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo, Giandomenico Cornacchia, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Beat Buesser, Mark Purcell, Pin-Yu Chen, Prasanna Sattigeri, Kush Varshney 2025-02-21 arXiv https://github.com/IBM/Adversarial-Prompt-Evaluation http://arxiv.org/abs/2502.15427v1
396 Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Ya Wang, Zhijian Zhuo, Yutao Zeng, Xun Zhou, Jian Yang, Xiaoqing Li 2025-02-21 arXiv https://github.com/kaihemo/SDD https://doi.org/10.48550/arXiv.2502.15499
397 Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization Yupeng Chang, Yi Chang, Yuan Wu 2025-02-21 arXiv https://github.com/llm172/Transfer-Prompting https://doi.org/10.48550/arXiv.2502.14211
398 On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems Shokhrukh Ibragimov, Arnulf Jentzen, Benno Kuckuck 2025-02-21 arXiv https://github.com/bkuckuck/logical-skills-of-llms https://doi.org/10.48550/arXiv.2502.14180
399 MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding 2025-02-21 arXiv https://medhallu.github.io/ https://doi.org/10.48550/arXiv.2502.14302
400 From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su 2025-02-21 arXiv https://github.com/OSU-NLP-Group/HippoRAG https://doi.org/10.48550/arXiv.2502.14802
401 CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo 2025-02-21 arXiv https://github.com/zhrli324/Corba https://doi.org/10.48550/arXiv.2502.14529
402 Protein Large Language Models: A Comprehensive Survey Yijia Xiao, Wanjia Zhao, Junkai Zhang, Yiqiao Jin, Han Zhang, Zhicheng Ren, Renliang Sun, Haixin Wang, Guancheng Wan, Pan Lu, Xiao Luo, Yu Zhang, James Zou, Yizhou Sun, Wei Wang 2025-02-21 arXiv https://github.com/Yijia-Xiao/Protein-LLM-Survey https://doi.org/10.48550/arXiv.2502.17504
403 Forgotten Polygons: Multimodal Large Language Models are Shape-Blind William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, Ritambhara Singh 2025-02-21 arXiv https://github.com/rsinghlab/Shape-Blind https://doi.org/10.48550/arXiv.2502.15969
404 LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han 2025-02-21 arXiv …, 2025 https://github.com/mit-han-lab/omniserve http://arxiv.org/abs/2502.14866v1
405 Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Md. Mehrab Tanjim, Kibum Kim, Chanyoung Park 2025-02-20 arXiv https://github.com/yeonjun-in/U-SafeBench https://doi.org/10.48550/arXiv.2502.15086
406 InductionBench: LLMs Fail in the Simplest Complexity Class Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang 2025-02-20 arXiv https://github.com/Wenyueh/inductive_reasoning_benchmark http://arxiv.org/abs/2502.15823v3
407 An LLM-based Agent for Reliable Docker Environment Configuration Ruida Hu, Chao Peng, Xinchen Wang, Cuiyun Gao 2025-02-19 arXiv https://github.com/bytedance/Repo2Run http://arxiv.org/abs/2502.13681v1
408 SIFT: Grounding LLM Reasoning in Contexts via Stickers Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng 2025-02-19 arXiv https://github.com/zhijie-group/SIFT http://arxiv.org/abs/2502.14922v1
409 Judging the Judges: A Collection of LLM-Generated Relevance Judgements Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz 2025-02-19 arXiv https://llm4eval.github.io/LLMJudge-benchmark/ http://arxiv.org/abs/2502.13908v1
410 DataSciBench: An LLM Agent Benchmark for Data Science Dan Zhang, Sining Zhoubian, Min Cai, Fengzu Li, Lekang Yang, Wei Wang, Tianjiao Dong, Ziniu Hu, Jie Tang, Yisong Yue 2025-02-19 arXiv https://github.com/THUDM/DataSciBench http://arxiv.org/abs/2502.13897v1
411 Benchmarking LLMs for Political Science: A United Nations Perspective Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu 2025-02-19 arXiv https://github.com/yueqingliang1/UNBench http://arxiv.org/abs/2502.14122v1
412 Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning Zenan Li, Zhaoyu Li, Wen Tang, Xian Zhang, Yuan Yao, Xujie Si, Fan Yang, Kaiyu Yang, Xiaoxing Ma 2025-02-19 arXiv https://github.com/Lizn-zn/NeqLIPS/ http://arxiv.org/abs/2502.13834v1
413 Craw4LLM: Efficient Web Crawling for LLM Pretraining Shi Yu, Zhiyuan Liu, Chenyan Xiong 2025-02-19 arXiv https://github.com/cxcscmu/Crawl4LLM http://arxiv.org/abs/2502.13347v1
414 $\mathttGeLLM^3O$: Generalizing Large Language Models for Multi-property Molecule Optimization Vishal Dey, Xiao Hu, Xia Ning 2025-02-19 arXiv https://github.com/ninglab/GeLLMO http://arxiv.org/abs/2502.13398v1
415 PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao Wei 2025-02-19 arXiv https://github.com/ligw1998/PRIV-QA https://doi.org/10.48550/arXiv.2502.13564
416 AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models Yuanyuan Xu, Hanchen Wang, Wenjie Zhang, Lexing Xie, Yin Chen, Flora Salim, Ying Zhang, Justin Gooding, Toby Walsh 2025-02-19 arXiv https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery https://doi.org/10.48550/arXiv.2502.13626
417 Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming Xie, Xuejian Gong, Kunlong Zhou 2025-02-19 arXiv https://github.com/junzhang-zj/LoRAM https://doi.org/10.48550/arXiv.2502.13533
418 REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models DongGeon Lee, Hwanjo Yu 2025-02-19 arXiv https://github.com/oneonlee/REFIND https://doi.org/10.48550/arXiv.2502.13622
419 Lost in Sequence: Do Large Language Models Understand Sequential Recommendation? Sein Kim, Hongseok Kang, Kibum Kim, Jiwan Kim, Donghyun Kim, Minchul Yang, Kwangjin Oh, Julian McAuley, Chanyoung Park 2025-02-19 arXiv https://github.com/Sein-Kim/LLM-SRec https://doi.org/10.48550/arXiv.2502.13909
420 Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems Yaochen Zhu, Chao Wan, Harald Steck, Dawen Liang, Yesu Feng, Nathan Kallus, Jundong Li 2025-02-19 arXiv https://github.com/yaochenzhu/CRAG https://doi.org/10.48550/arXiv.2502.14137
421 ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, Liteng Gao 2025-02-19 arXiv https://artmentor.github.io/ https://doi.org/10.48550/arXiv.2502.13832
422 LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing 2025-02-19 arXiv https://github.com/DAMO-NLP-SG/LongPO https://doi.org/10.48550/arXiv.2502.13922
423 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, Ping Luo 2025-02-18 arXiv https://text-to-world.github.io/ https://doi.org/10.48550/arXiv.2502.13092
424 Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs Adi Simhi, Itay Itzhak, Fazl Barez, Gabriel Stanovsky, Yonatan Belinkov 2025-02-18 arXiv https://github.com/technion-cs-nlp/Trust_me_Im_wrong http://arxiv.org/abs/2502.12964v1
425 SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah 2025-02-18 arXiv https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SparAMX http://arxiv.org/abs/2502.12444v1
426 Soundwave: Less is More for Speech-Text Alignment in LLMs Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li 2025-02-18 arXiv https://github.com/FreedomIntelligence/Soundwave http://arxiv.org/abs/2502.12900v1
427 MoBA: Mixture of Block Attention for Long-Context LLMs Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu 2025-02-18 arXiv https://github.com/MoonshotAI/MoBA http://arxiv.org/abs/2502.13189v1
428 PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, Yaowei Wang, Min Zhang 2025-02-18 arXiv https://github.com/zjq0455/PTQ1.61 https://doi.org/10.48550/arXiv.2502.13179
429 SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng 2025-02-18 arXiv https://github.com/ZeroNLP/SEA https://doi.org/10.48550/arXiv.2502.12562
430 Investigating and Extending Homans' Social Exchange Theory with Large Language Model based Agents Lei Wang, Zheqing Zhang, Xu Chen 2025-02-18 arXiv https://github.com/Paitesanshi/SET https://doi.org/10.48550/arXiv.2502.12450
431 Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, Yaowei Wang, Min Zhang, Liqiang Nie 2025-02-18 arXiv https://github.com/zjq0455/PTQ_Benchmark http://arxiv.org/abs/2502.13178v1
432 G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, Jia Li 2025-02-18 arXiv https://github.com/Yuhan1i/G-Refer https://doi.org/10.48550/arXiv.2502.12586
433 EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning Xiaoqian Liu, Ke Wang, Yongbin Li, Yuchuan Wu, Wentao Ma, Aobo Kong, Fei Huang, Jianbin Jiao, Junge Zhang 2025-02-18 arXiv https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/EPO http://arxiv.org/abs/2502.12486v1
434 VRoPE: Rotary Position Embedding for Video Large Language Models Zikang Liu, Longteng Guo, Yepeng Tang, Junxian Cai, Kai Ma, Xi Chen, Jing Liu 2025-02-17 arXiv https://github.com/johncaged/VRoPE https://doi.org/10.48550/arXiv.2502.11664
435 Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Yuqi Pang, Bowen Yang, Haoqin Tu, Yun Cao, Zeyu Zhang 2025-02-17 arXiv https://github.com/Pbhgit/MVCD http://arxiv.org/abs/2502.11751v1
436 Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, Xueyang Liu 2025-02-17 arXiv https://github.com/wanghanbinpanda/CodeVision http://arxiv.org/abs/2502.11829v1
437 Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu 2025-02-17 arXiv https://github.com/THU-BPM/Watermark-Radioactivity-Attack http://arxiv.org/abs/2502.11598v1
438 Bitnet.cpp: Efficient Edge Inference for Ternary LLMs Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei 2025-02-17 arXiv https://github.com/microsoft/BitNet/tree/paper http://arxiv.org/abs/2502.11880v1
439 "Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu 2025-02-17 arXiv https://github.com/pillowsofwind/LLM-CBRN-Risks http://arxiv.org/abs/2502.11355v1
440 A Survey of Personalized Large Language Models: Progress and Future Directions Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Jieming Zhu, Minda Hu, Menglin Yang, Irwin King 2025-02-17 arXiv https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models https://doi.org/10.48550/arXiv.2502.11528
441 RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars Yuncheng Hua, Lizhen Qu, Zhuang Li, Hao Xue, Flora D. Salim, Gholamreza Haffari 2025-02-17 arXiv https://github.com/AnonymousCode-ComputerScience/RIDE https://doi.org/10.48550/arXiv.2502.11681
442 Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu 2025-02-17 arXiv https://llm-catastrophic-risks.github.io http://arxiv.org/abs/2502.11355v3
443 Atom of Thoughts for Markov LLM Test-Time Scaling Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo 2025-02-17 arXiv https://github.com/qixucen/atom http://arxiv.org/abs/2502.12018v1
444 Idiosyncrasies in Large Language Models Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, Zhuang Liu 2025-02-17 arXiv https://eric-mingjie.github.io/llm-idiosyncrasies/index.html https://doi.org/10.48550/arXiv.2502.12150
445 A-MEM: Agentic Memory for LLM Agents Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, Yongfeng Zhang 2025-02-17 arXiv https://github.com/WujiangXu/AgenticMemory http://arxiv.org/abs/2502.12110v5
446 LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning Tianshi Zheng, Jiayang Cheng, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See 2025-02-16 arXiv https://github.com/HKUST-KnowComp/LogiDynamics https://doi.org/10.48550/arXiv.2502.11176
447 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Bohan Lyu, Siqiao Huang, Zichen Liang, Qi-An Sun, Jiaming Zhang 2025-02-16 arXiv https://github.com/Imbernoulli/SURGE https://doi.org/10.48550/arXiv.2502.11167
448 BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu 2025-02-16 arXiv https://github.com/zihao-ai/BoT https://doi.org/10.48550/arXiv.2502.12202
449 CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships? Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee 2025-02-16 arXiv https://github.com/aashish2000/CORDIAL https://doi.org/10.48550/arXiv.2502.11300
450 Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models Haoyang Li, Xuejia Chen, Zhanchao Xu, Darian Li, Nicole Hu, Fei Teng, Yiming Li, Luyu Qiu, Chen Jason Zhang, Qing Li, Lei Chen 2025-02-16 arXiv https://github.com/TreeAI-Lab/NumericBench https://doi.org/10.48550/arXiv.2502.11075
451 ReLearn: Unlearning via Learning for Large Language Models Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang 2025-02-16 arXiv https://github.com/zjunlp/unlearn https://doi.org/10.48550/arXiv.2502.11190
452 Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, Dacheng Tao 2025-02-16 arXiv https://github.com/NY1024/RACE https://doi.org/10.48550/arXiv.2502.11054
453 G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang 2025-02-16 arXiv https://github.com/wslong20/G-safeguard http://arxiv.org/abs/2502.11127v1
454 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen 2025-02-16 arXiv https://github.com/zjunlp/DynamicKnowledgeCircuits http://arxiv.org/abs/2502.11196v1
455 MasRouter: Learning to Route LLMs for Multi-Agent Systems Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, Yiyan Qi 2025-02-16 arXiv https://github.com/yanweiyue/masrouter http://arxiv.org/abs/2502.11133v1
456 Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation Yu Cui, Hang Fu, Licheng Wang, Haibin Zhang 2025-02-16 arXiv https://github.com/LMPC-Lab/GenGPUCrypto http://arxiv.org/abs/2502.11110v1
457 Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu 2025-02-16 arXiv https://github.com/Soistesimmer/Fetch http://arxiv.org/abs/2502.11183v1
458 Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey Zirui Song, Bin Yan, Yuhan Liu, Miao Fang, Mingzhe Li, Rui Yan, Xiuying Chen 2025-02-15 arXiv https://github.com/abilliyb/Knowledge_Injection_Survey_Papers https://doi.org/10.48550/arXiv.2502.10708
459 SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat 2025-02-15 arXiv https://github.com/IntelLabs/RAG-FiT/tree/square https://doi.org/10.48550/arXiv.2502.09390
460 Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin 2025-02-15 arXiv …, 2025 https://prefeval.github.io/ http://arxiv.org/abs/2502.09597v1
461 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang 2025-02-15 arXiv https://embodiedbench.github.io https://doi.org/10.48550/arXiv.2502.09560
462 An Empirical Analysis of Uncertainty in Large Language Model Evaluations Qiujie Xie, Qingqiu Li, Zhuohao Yu, Yuejie Zhang, Yue Zhang, Linyi Yang 2025-02-15 arXiv https://github.com/hasakiXie123/LLM-Evaluator-Uncertainty https://doi.org/10.48550/arXiv.2502.10709
463 LintLLM: An Open-Source Verilog Linting Framework Based on Large Language Models Zhigang Fang, Renzhi Chen, Zhijie Yang, Yang Guo, Huadong Dai, Lei Wang 2025-02-15 arXiv https://github.com/fangzhigang32/Static-Verilog-Analysis https://doi.org/10.48550/arXiv.2502.10815
464 CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs Insu Han, Zeliang Zhang, Zhiyuan Wang, Yifan Zhu, Susan Liang, Jiani Liu, Haiting Lin, Mingjie Zhao, Chenliang Xu, Kun Wan, Wentian Zhao 2025-02-15 arXiv https://github.com/insuhan/calibquant http://arxiv.org/abs/2502.14882v2
465 KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models Dong Chen, Zhengqing Hu, Peiguang Fan, Yueting Zhuang, Yafei Li, Qidong Liu, Xiaoheng Jiang, Mingliang Xu 2025-02-14 arXiv https://github.com/Anfeather/KKA https://doi.org/10.48550/arXiv.2502.14880
466 Large Language Diffusion Models Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li 2025-02-14 arXiv https://ml-gsai.github.io/LLaDA-demo/ https://doi.org/10.48550/arXiv.2502.09992
467 LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Shuai Wang, Minhao Cheng 2025-02-14 arXiv https://github.com/likuanppd/LaRA http://arxiv.org/abs/2502.09977v1
468 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Yi-Fan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen, Fan Yang, Zhang Zhang, Tingting Gao, Di Zhang, Liang Wang, Rong Jin, Tieniu Tan 2025-02-14 arXiv https://mm-rlhf.github.io/ http://arxiv.org/abs/2502.10391v1
469 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models Hsu-Kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen 2025-02-14 arXiv https://eddyhkchiu.github.io/v2vllm.github.io/ https://doi.org/10.48550/arXiv.2502.09980
470 The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis Wenbo Pan, Zhichao Liu, Qiguang Chen, Xiangyang Zhou, Haining Yu, Xiaohua Jia 2025-02-13 arXiv https://github.com/BMPixel/safety-residual-space http://arxiv.org/abs/2502.09674v1
471 FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents Mostapha Benhenda 2025-02-13 arXiv:2502.07393, 2025 https://github.com/benstaf/FinRL_DeepSeek http://arxiv.org/abs/2502.07393v1
472 Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning Jiayuan Zhu, Junde Wu 2025-02-13 arXiv:2502.07143, 2025 https://github.com/SuperMedIntel/AskPatients http://arxiv.org/abs/2502.07143v1
473 LLM-Generated Microservice Implementations from RESTful API Definitions Saurabh Chauhan, Zeeshan Rasheed, Abdul Malik Sami, Zheying Zhang, Jussi Rasku, Kai-Kristian Kemell, Pekka Abrahamsson 2025-02-13 arXiv https://github.com/sirbh/code-gen http://arxiv.org/abs/2502.09766v1
474 Bag of Tricks for Inference-time Computation of LLM Reasoning Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu 2025-02-13 arXiv:2502.07191, 2025 https://github.com/usail-hkust/benchmark_inference_time_computation_LL http://arxiv.org/abs/2502.07191v2
475 LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, Weipeng Chen 2025-02-13 arXiv https://github.com/RUCAIBox/LongReD https://doi.org/10.48550/arXiv.2502.07365
476 LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica 2025-02-13 arXiv …, 2025 https://github.com/NovaSky-AI/SkyThought http://arxiv.org/abs/2502.07374v2
477 DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian T. Foster, Rick Stevens 2025-02-13 arXiv https://github.com/xuefeng-cs/DrugImproverGPT https://doi.org/10.48550/arXiv.2502.07237
478 Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang 2025-02-13 arXiv https://github.com/horizonsinzqs/QueryAttack https://doi.org/10.48550/arXiv.2502.09723
479 Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models Yiheng Liu, Xiaohui Gao, Haiyang Sun, Bao Ge, Tianming Liu, Junwei Han, Xintao Hu 2025-02-13 arXiv https://github.com/WhatAboutMyStar/LLM_ACTIVATION https://doi.org/10.48550/arXiv.2502.20408
480 DarwinLM: Evolutionary Structured Pruning of Large Language Models Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan Alistarh 2025-02-13 arXiv https://github.com/IST-DASLab/DarwinLM https://doi.org/10.48550/arXiv.2502.07780
481 RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning Jian Xu, Sichun Luo, Xiangyu Chen, Haoming Huang, Hanxu Hou, Linqi Song 2025-02-12 arXiv https://github.com/JianXu95/RALLRec https://doi.org/10.48550/arXiv.2502.06101
482 LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM Zhi Zhou, Kun-Yang Yu, Shi-Yu Tian, Xiao-Wen Yang, Jiang-Xin Shi, Pengxiao Song, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li 2025-02-12 arXiv …, 2025 https://github.com/LAMDASZ-ML/Knowledge-Guide-Data-Generation http://arxiv.org/abs/2502.06572v2
483 Calibrating LLMs with Information-Theoretic Evidential Deep Learning Yawei Li, David Rügamer, Bernd Bischl, Mina Rezaei 2025-02-12 arXiv:2502.06351, 2025 https://github.com/sandylaker/ib-edl http://arxiv.org/abs/2502.06351v2
484 Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection Areeg Fahad Rasheed, M. Zarkoosh, Shimam Amer Chasib, Safa F. Abbas 2025-02-12 arXiv https://github.com/AREEG94FAHAD/food-hazard-prdouct-cls https://doi.org/10.48550/arXiv.2502.08687
485 Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, Conghui He 2025-02-12 arXiv https://github.com/opendatalab/ProverGen https://doi.org/10.48550/arXiv.2502.06563
486 Systematic Outliers in Large Language Models Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang 2025-02-12 arXiv https://github.com/an-yongqi/systematic-outliers https://doi.org/10.48550/arXiv.2502.06415
487 Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi 2025-02-12 arXiv https://xujiacong.github.io/Anomaly-OV/ https://doi.org/10.48550/arXiv.2502.07601
488 Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting Arvind Pillai, Dimitris Spathis, Subigya Nepal, Amanda C Collins, Daniel M Mackin, Michael V Heinz, Tess Z Griffin, Nicholas C Jacobson, Andrew Campbell 2025-02-11 arXiv https://github.com/arvind1609/time2lang http://arxiv.org/abs/2502.07608v3
489 The foundational capabilities of large language models in predicting postoperative risks using clinical notes Charles Alba, Bing Xue, Joanna Abraham, Thomas George Kannampallil, Chenyang Lu 2025-02-11 npj Digit. Medicine https://github.com/cja5553/LLMs_in_perioperative_care https://doi.org/10.1038/s41746-025-01489-2
490 Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu, Shiqiang Wang, Hans-Arno Jacobsen, Yingbin Liang 2025-02-10 arXiv https://github.com/sowmaster/Sample-Level-Loss-Reweighting-ICLR-2025 https://doi.org/10.48550/arXiv.2502.06733
491 LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, Jeff Huang 2025-02-10 arXiv https://github.com/OwenSanzas/LLM-For-Vulnerability-Detection http://arxiv.org/abs/2502.07049v2
492 HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models Paul Darm, Annalisa Riccardi 2025-02-09 arXiv https://github.com/PaulDrm/targeted_intervention http://arxiv.org/abs/2502.05945v2
493 Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators Hritik Bansal, Pratyush Maini 2025-02-09 arXiv https://pratyushmaini.github.io/blog/2024/risks-private-evals/ http://arxiv.org/abs/2503.04756v1
494 AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents Jiabin Tang, Tianyu Fan, Chao Huang 2025-02-09 arXiv https://github.com/HKUDS/AutoAgent http://arxiv.org/abs/2502.05957v2
495 MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents Jiabin Tang, Tianyu Fan, Chao Huang 2025-02-09 arXiv https://github.com/HKUDS/MetaChain http://arxiv.org/abs/2502.05957v1
496 LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning Hanqing Yang, Jingdi Chen, Marie Siew, Tania Lorido-Botran, Carlee Joe-Wong 2025-02-08 arXiv https://happyeureka.github.io/damcs http://arxiv.org/abs/2502.05453v1
497 Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models Sina Tayebati, Divake Kumar, Nastaran Darabi, Dinithi Jayasuriya, Ranganath Krishnan, Amit Ranjan Trivedi 2025-02-08 arXiv https://github.com/sinatayebati/vlm-uncertainty https://doi.org/10.48550/arXiv.2502.06884
498 OntoTune: Ontology-Driven Self-training for Aligning Large Language Models Zhiqiang Liu, Chengtao Gan, Junjie Wang, Yichi Zhang, Zhongpu Bo, Mengshu Sun, Huajun Chen, Wen Zhang 2025-02-08 arXiv https://github.com/zjukg/OntoTune https://doi.org/10.48550/arXiv.2502.05478
499 ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning Yuwei Yin, Giuseppe Carenini 2025-02-07 arXiv https://github.com/YuweiYin/ARR https://doi.org/10.48550/arXiv.2502.04689
500 Confidence Elicitation: A New Attack Vector for Large Language Models Brian Formento, Chuan Sheng Foo, See-Kiong Ng 2025-02-07 arXiv https://github.com/Aniloid2/Confidence_Elicitation_Attacks https://doi.org/10.48550/arXiv.2502.04643
501 Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research Junde Wu, Jiayuan Zhu, Yuyuan Liu 2025-02-07 arXiv https://github.com/theworldofagents/Agentic-Reasoning http://arxiv.org/abs/2502.04644v1
502 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li 2025-02-07 arXiv https://github.com/yihedeng9/DuoGuard http://arxiv.org/abs/2502.05163v1
503 LLM-Supported Natural Language to Bash Translation Finnian Westenfelder, Erik Hemberg, Miguel Tulla, Stephen Moskal, Una-May O'Reilly, Silviu Chiricescu 2025-02-07 arXiv https://github.com/westenfelder/NL2SH http://arxiv.org/abs/2502.06858v1
504 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh 2025-02-07 arXiv https://github.com/IST-DASLab/QuEST http://arxiv.org/abs/2502.05003v1
505 Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu, Jiahang Xu, Li Lyna Zhang, Qi Chen, Xuan Feng, Yang Chen, Zhongxin Guo, Yuqing Yang, Peng Cheng 2025-02-06 arXiv https://github.com/HenryLau7/CFPO http://arxiv.org/abs/2502.04295v2
506 ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam 2025-02-06 arXiv https://github.com/Gen-Verse/ScoreFlow http://arxiv.org/abs/2502.04306v1
507 Robotouille: An Asynchronous Planning Benchmark for LLM Agents Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, Sanjiban Choudhury 2025-02-06 arXiv https://github.com/portal-cornell/robotouille http://arxiv.org/abs/2502.05227v1
508 My LLM might Mimic AAE -- But When Should it? Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli, Hal Daumé III 2025-02-06 arXiv https://github.com/smelliecat/AAEMime http://arxiv.org/abs/2502.04564v2
509 CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu 2025-02-06 arXiv https://github.com/JarvisPei/CMoE http://arxiv.org/abs/2502.04416v1
510 FAS: Fast ANN-SNN Conversion for Spiking Large Language Models Long Chen, Xiaotian Song, Andy Song, BaDong Chen, Jiancheng Lv, Yanan Sun 2025-02-06 arXiv https://github.com/lc783/FAS https://doi.org/10.48550/arXiv.2502.04405
511 Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin 2025-02-06 arXiv https://github.com/dmbeaglehole/neural_controllers http://arxiv.org/abs/2502.03708v1
512 "Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence Shaopeng Fu, Liang Ding, Di Wang 2025-02-06 arXiv https://github.com/fshp971/adv-icl http://arxiv.org/abs/2502.04204v1
513 Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training Changhao Jiang, Ming Zhang, Junjie Ye, Xiaoran Fan, Yifei Cao, Jiajun Sun, Zhiheng Xi, Shihan Dou, Yi Dong, Yujiong Shen, Jingqi Tong, Zhen Wang, Tao Liang, Zhihui Fei, Mingyang Wan, Guojun Ma, Qi Zhang, Tao Gui, Xuanjing Huang 2025-02-06 arXiv https://github.com/yuhui1038/SMI https://doi.org/10.48550/arXiv.2502.04066
514 KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Xing Li, Zeyu Xing, Yiming Li, Linping Qu, Hui-Ling Zhen, Wulong Liu, Yiwu Yao, Sinno Jialin Pan, Mingxuan Yuan 2025-02-06 arXiv https://github.com/cmd2001/KVTuner http://arxiv.org/abs/2502.04420v1
515 EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models He Hu, Yucheng Zhou, Lianzhong You, Hongbo Xu, Qianning Wang, Zheng Lian, Fei Richard Yu, Fei Ma, Laizhong Cui 2025-02-06 arXiv https://emo-gml.github.io/ https://doi.org/10.48550/arXiv.2502.04424
516 Tool Unlearning for Tool-Augmented LLMs Jiali Cheng, Hadi Amiri 2025-02-05 arXiv:2502.01083, 2025 https://clu-uml.github.io/MU-Bench-Project-Page/ http://arxiv.org/abs/2502.01083v1
517 Preference Leakage: A Contamination Problem in LLM-as-a-judge Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu 2025-02-05 arXiv …, 2025 https://github.com/David-Li0406/Preference-Leakage http://arxiv.org/abs/2502.01534v1
518 Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning Guanlin Li, Kangjie Chen, Shangwei Guo, Jie Zhang, Han Qiu, Chao Zhang, Guoyin Wang, Tianwei Zhang, Jiwei Li 2025-02-05 arXiv …, 2025 https://github.com/GuanlinLee/llm_instruction_tuning http://arxiv.org/abs/2502.01116v1
519 PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design Yuchao Wu, Xiaofei Yu, Hao Chen, Yang Luo, Yeyu Tong, Yuzhe Ma 2025-02-05 arXiv https://github.com/PICDA/PICBench http://arxiv.org/abs/2502.03159v1
520 PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs Mauricio Soroco, Jialin Song, Mengzhou Xia, Kye Emond, Weiran Sun, Wuyang Chen 2025-02-05 arXiv …, 2025 https://pde-controller.github.io/ http://arxiv.org/abs/2502.00963v1
521 LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease Muhammad Zain Raza, Jiawei Xu, Terence Lim, Lily Boddy, Carlos M. Mery, Andrew Well, Ying Ding 2025-02-05 arXiv …, 2025 https://github.com/jiaweixu98/LLM-TA http://arxiv.org/abs/2502.01620v1
522 Demystifying Long Chain-of-Thought Reasoning in LLMs Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue 2025-02-05 arXiv https://github.com/eddycmu/demystify-long-cot http://arxiv.org/abs/2502.03373v1
523 A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs Bradley P. Allen, Paul T. Groth 2025-02-05 arXiv https://github.com/bradleypallen/trex-metalinguistic-disagreement http://arxiv.org/abs/2502.02896v1
524 SPRI: Aligning Large Language Models with Context-Situated Principles Hongli Zhan, Muneeza Azmat, Raya Horesh, Junyi Jessy Li, Mikhail Yurochkin 2025-02-05 arXiv https://github.com/honglizhan/SPRI-public https://doi.org/10.48550/arXiv.2502.03397
525 A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava 2025-02-05 arXiv …, 2025 https://probabilistic-inference-scaling.github.io http://arxiv.org/abs/2502.01618v2
526 Knowledge Distillation from Large Language Models for Household Energy Modeling Mohannad Takrouri, Nicolas M. Cuadrado, Martin Takác 2025-02-05 arXiv https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation https://doi.org/10.48550/arXiv.2502.03034
527 Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan 2025-02-05 arXiv https://github.com/HashmatShadab/Robust-LLaVA https://doi.org/10.48550/arXiv.2502.01576
528 Internal Activation as the Polar Star for Steering Unsafe LLM Behavior Peixuan Han, Cheng Qian, Xiusi Chen, Yuji Zhang, Denghui Zhang, Heng Ji 2025-02-05 arXiv …, 2025 https://github.com/Hanpx20/SafeSwitch http://arxiv.org/abs/2502.01042v2
529 CTR-Driven Advertising Image Generation with Multimodal Large Language Models Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin Gao, Nong Sang 2025-02-05 arXiv https://github.com/Chenguoz/CAIG https://doi.org/10.48550/arXiv.2502.06823
530 Intent Representation Learning with Large Language Model for Recommendation Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang 2025-02-05 arXiv https://github.com/wangyu0627/IRLLRec http://arxiv.org/abs/2502.03307v1
531 AdaSVD: Adaptive Singular Value Decomposition for Large Language Models Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Linghe Kong, Yulun Zhang, Xiaokang Yang 2025-02-05 arXiv https://github.com/ZHITENGLI/AdaSVD https://doi.org/10.48550/arXiv.2502.01403
532 Do Large Language Model Benchmarks Test Reliability? Joshua Vendrow, Edward Vendrow, Sara Beery, Aleksander Madry 2025-02-05 arXiv https://github.com/MadryLab/platinum-benchmarks https://doi.org/10.48550/arXiv.2502.03461
533 Overcoming Vision Language Model Challenges in Diagram Understanding: A Proof-of-Concept with XML-Driven Large Language Models Solutions Shue Shiinoki, Ryo Koshihara, Hayato Motegi, Masumi Morishige 2025-02-05 arXiv https://github.com/galirage/spreadsheet-intelligence https://doi.org/10.48550/arXiv.2502.04389
534 Breaking Focus: Contextual Distraction Curse in Large Language Models Yue Huang, Yanbo Wang, Zixiang Xu, Chujie Gao, Siyuan Wu, Jiayi Ye, Xiuying Chen, Pin-Yu Chen, Xiangliang Zhang 2025-02-05 arXiv https://github.com/wyf23187/LLM_CDV https://doi.org/10.48550/arXiv.2502.01609
535 AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science Chenyue Li, Wen Deng, Mengqian Lu, Binhang Yuan 2025-02-05 arXiv https://github.com/Relaxed-System-Lab/AtmosSci-Bench https://doi.org/10.48550/arXiv.2502.01159
536 CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P. Xing, Hongyi Wang, Huaxiu Yao 2025-02-04 arXiv https://github.com/aiming-lab/CITER https://doi.org/10.48550/arXiv.2502.01976
537 AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement Shivam Singh, Karthik Swaminathan, Nabanita Dash, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna 2025-02-04 arXiv https://sssshivvvv.github.io/adaptbot/ http://arxiv.org/abs/2502.02067v1
538 CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements Afshin Khadangi, Amir Sartipi, Igor Tchappi, Gilbert Fridgen 2025-02-04 arXiv https://cognartive.github.io/ https://doi.org/10.48550/arXiv.2502.04353
539 Risk-Aware Driving Scenario Analysis with Large Language Models Yuan Gao, Mattia Piccinini, Johannes Betz 2025-02-04 arXiv https://github.com/yuangao-tum/Riskaware-Scenario-analyse https://doi.org/10.48550/arXiv.2502.02145
540 SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency Qianhao Yuan, Yanjiang Liu, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun 2025-02-04 arXiv https://github.com/icip-cas/SAISA https://doi.org/10.48550/arXiv.2502.02458
541 A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han 2025-02-04 arXiv https://github.com/AcademyCityL/GALI http://arxiv.org/abs/2502.02659v1
542 AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs Hongxin Li, Jingfan Chen, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang 2025-02-04 arXiv https://autogui-project.github.io/ http://arxiv.org/abs/2502.01977v1
543 Multi-Lingual Cyber Threat Detection in Tweets/X Using ML, DL, and LLM: A Comparative Analysis Saydul Akbar Murad, Ashim Dahal, Nick Rahimi 2025-02-04 arXiv https://github.com/Mmurrad/Tweet-Data-Classification http://arxiv.org/abs/2502.04346v1
544 RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models Can Jin, Hongwu Peng, Anxiang Zhang, Nuo Chen, Jiahui Zhao, Xi Xie, Kuangzheng Li, Shuya Feng, Kai Zhong, Caiwen Ding, Dimitris N. Metaxas 2025-02-03 arXiv https://github.com/jincan333/RankFlow https://doi.org/10.48550/arXiv.2502.00709
545 Progressive Binarization with Semi-Structured Pruning for LLMs Xianglong Yan, Tianao Zhang, Zhiteng Li, Yulun Zhang 2025-02-03 arXiv https://github.com/XIANGLONGYAN/PBS2P http://arxiv.org/abs/2502.01705v1
546 A Comprehensive Analysis on LLM-based Node Classification Algorithms Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, Hong Cheng 2025-02-03 arXiv …, 2025 https://llmnodebed.github.io/ http://arxiv.org/abs/2502.00829v1
547 MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies Ehsaneddin Asgari, Yassine El Kheir, Mohammad Ali Sadraei Javaheri 2025-02-03 arXiv:2502.00894, 2025 https://github.com/llm-lab-org/MorphBPE http://arxiv.org/abs/2502.00894v1
548 RTBAgent: A LLM-based Agent System for Real-Time Bidding Leng Cai, Junxuan He, Yikai Li, Junjie Liang, Yuanping Lin, Ziming Quan, Yawen Zeng, Jin Xu 2025-02-03 arXiv …, 2025 https://github.com/CaiLeng/RTBAgent http://arxiv.org/abs/2502.00792v1
549 UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models Xin Xu, Qiyun Xu, Tong Xiao, Tianhao Chen, Yuchen Yan, Jiaxin Zhang, Shizhe Diao, Can Yang, Yang Wang 2025-02-02 arXiv https://github.com/YangLabHKUST/UGPhysics https://doi.org/10.48550/arXiv.2502.00334
550 UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs Yizhe Xiong, Wei Huang, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jungong Han, Guiguang Ding 2025-02-02 arXiv …, 2025 https://github.com/Bostoncake/UniAttn http://arxiv.org/abs/2502.00439v1
551 MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing Yuxuan Chen, Xu Zhu, Hua Zhou, Zhuyin Ren 2025-02-02 arXiv https://github.com/Terry-cyx/MetaOpenFOAM https://doi.org/10.48550/arXiv.2502.00498
552 LIBRA: Measuring Bias of Large Language Model from a Local Context Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham, Yun Sing Koh 2025-02-02 arXiv https://github.com/ipangbo/LIBRA https://doi.org/10.48550/arXiv.2502.01679
553 Differentially Private Steering for Large Language Model Alignment Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal 2025-02-01 arXiv https://github.com/UKPLab/iclr2025-psa https://doi.org/10.48550/arXiv.2501.18532
554 Speculative Ensemble: Fast Large Language Model Ensemble via Speculation Jiale Fu, Yuchu Jiang, Junkai Chen, Jiaming Fan, Xin Geng, Xu Yang 2025-02-01 arXiv https://github.com/Kamichanw/Speculative-Ensemble/ https://doi.org/10.48550/arXiv.2502.01662
555 LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models Shenghao Fu, Qize Yang, Qijie Mo, Junkai Yan, Xihan Wei, Jingke Meng, Xiaohua Xie, Wei-Shi Zheng 2025-01-31 arXiv https://github.com/iSEE-Laboratory/LLMDet https://doi.org/10.48550/arXiv.2501.18954
556 Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu 2025-01-31 arXiv https://github.com/git-disl/Virus https://doi.org/10.48550/arXiv.2501.17433
557 Reward-Guided Speculative Decoding for Efficient LLM Reasoning Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong 2025-01-31 arXiv https://github.com/BaohaoLiao/RSD http://arxiv.org/abs/2501.19324v1
558 2SSP: A Two-Stage Framework for Structured Pruning of LLMs Fabrizio Sandri, Elia Cunegatti, Giovanni Iacca 2025-01-31 arXiv:2501.17771, 2025 https://github.com/FabrizioSandri/2SSP http://arxiv.org/abs/2501.17771v1
559 ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation Minghua He, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang 2025-01-30 arXiv https://execoder4trans.github.io/ https://doi.org/10.48550/arXiv.2501.18460
560 Uncertainty Quantification and Decomposition for LLM-based Recommendation Wonbin Kweon, Sanghwan Jang, SeongKu Kang, Hwanjo Yu 2025-01-30 arXiv:2501.17630, 2025 https://github.com/WonbinKweon/UNC_LLM_REC_WWW2025 http://arxiv.org/abs/2501.17630v1
561 CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng 2025-01-28 arXiv https://github.com/LVUGAI/CHiP http://arxiv.org/abs/2501.16629v1
562 xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking Sunbowen Lee, Shiwen Ni, Chi Wei, Shuaimin Li, Liyang Fan, Ahmadreza Argha, Hamid Alinejad-Rokny, Ruifeng Xu, Yicheng Gong, Min Yang 2025-01-28 arXiv https://github.com/Aegis1863/xJailbreak http://arxiv.org/abs/2501.16727v2
563 Large Language Model Critics for Execution-Free Evaluation of Code Changes Aashish Yadavally, Hoan Nguyen, Laurent Callot, Gauthier Guinet 2025-01-28 arXiv https://github.com/amazon-science/code-agent-eval https://doi.org/10.48550/arXiv.2501.16655
564 SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Shichao Song, Mengwei Wang, Jiawei Yang 2025-01-28 arXiv https://github.com/IAAR-Shanghai/SafeRAG https://doi.org/10.48550/arXiv.2501.18636
565 AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, Jiangyan Yi, Jianhua Tao 2025-01-27 arXiv https://github.com/zeroQiaoba/AffectGPT https://doi.org/10.48550/arXiv.2501.16566
566 Towards Evaluating and Building Versatile Large Language Models for Medicine Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie 2025-01-27 arXiv https://henrychur.github.io/MedS-Bench/ https://doi.org/10.48550/arXiv.2408.12547
567 LCTG Bench: LLM Controlled Text Generation Benchmark Kentaro Kurihara, Masato Mita, Peinan Zhang, Shota Sasaki, Ryosuke Ishigami, Naoaki Okazaki 2025-01-27 arXiv https://github.com/CyberAgentAILab/LCTG-Bench http://arxiv.org/abs/2501.15875v1
568 TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs Yuxuan Gu, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic 2025-01-26 arXiv https://github.com/guyuxuan9/TensorLLM http://arxiv.org/abs/2501.15674v1
569 Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models Hulingxiao He, Geng Li, Zijun Geng, Jinglin Xu, Yuxin Peng 2025-01-25 arXiv https://github.com/PKU-ICST-MIPL/Finedefics_ICLR2025 https://doi.org/10.48550/arXiv.2501.15140
570 PIP: Perturbation-based Iterative Pruning for Large Language Models Yi Cao, Wei-Jie Xu, Yucheng Shen, Weijie Shi, Chi-Min Chan, Jiajie Xu 2025-01-25 arXiv https://github.com/caoyiiiiii/PIP https://doi.org/10.48550/arXiv.2501.15278
571 MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models Zhongpu Chen, Yinfeng Liu, Long Shi, Zhi-Jie Wang, Xingyan Chen, Yu Zhao, Fuji Ren 2025-01-25 arXiv https://github.com/SWUFE-DB-Group/MDEval-Benchmark https://doi.org/10.48550/arXiv.2501.15000
572 A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka Zitnik, Liang Lin 2025-01-25 arXiv https://lotbench.github.io https://doi.org/10.48550/arXiv.2501.15147
573 UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models Xin Xu, Jiaxin Zhang, Tianhao Chen, Zitong Chao, Jishan Hu, Can Yang 2025-01-24 arXiv https://github.com/YangLabHKUST/UGMathBench https://doi.org/10.48550/arXiv.2501.13766
574 MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, Andrew Y. Ng, Jonathan H. Chen 2025-01-24 arXiv https://github.com/stanfordmlgroup/MedAgentBench http://arxiv.org/abs/2501.14654v1
575 Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao 2025-01-24 arXiv https://github.com/DSL-Lab/aops http://arxiv.org/abs/2501.14275v1
576 DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing Xinyu Ma, Yifeng Xu, Yang Lin, Tianlong Wang, Xu Chu, Xin Gao, Junfeng Zhao, Yasha Wang 2025-01-24 arXiv https://github.com/ArthurLeoM/DRESS-LLM http://arxiv.org/abs/2501.14371v1
577 MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, James Zou, Andrew Y. Ng, Jonathan H. Chen 2025-01-24 arXiv https://github.com/stanfordmlgroup/MedAgentBench http://arxiv.org/abs/2501.14654v2
578 FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu 2025-01-24 arXiv https://github.com/FireRedTeam/FireRedASR http://arxiv.org/abs/2501.14350v1
579 Evaluating and Improving Graph to Text Generation with Large Language Models Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Víctor Gutiérrez-Basulto, Jeff Z. Pan 2025-01-24 arXiv https://github.com/probe2/kg_text https://doi.org/10.48550/arXiv.2501.14497
580 Can Large Language Models Understand Preferences in Personalized Recommendation? Zhaoxuan Tan, Zinan Zeng, Qingkai Zeng, Zhenyu Wu, Zheyuan Liu, Fengran Mo, Meng Jiang 2025-01-24 arXiv https://github.com/TamSiuhin/PerRecBench https://doi.org/10.48550/arXiv.2501.13391
581 JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models Michael K. Chen, Xikun Zhang, Dacheng Tao 2025-01-24 arXiv https://github.com/michaelchen-lab/JustLogic https://doi.org/10.48550/arXiv.2501.14851
582 Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models Bo Gao, Michael W. Spratling 2025-01-24 arXiv https://github.com/iminfine/freeatten https://doi.org/10.48550/arXiv.2501.13428
583 Do as We Do, Not as You Think: the Conformity of Large Language Models Zhiyuan Weng, Guikun Chen, Wenguan Wang 2025-01-24 arXiv https://github.com/Zhiyuan-Weng/BenchForm https://doi.org/10.48550/arXiv.2501.13381
584 Distillation Quantification for Large Language Models Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Jiaheng Liu, Min Yang, Zhoufutu Wen, Shiwen Ni 2025-01-23 arXiv https://github.com/Aegis1863/LLMs-Distillation-Quantification https://doi.org/10.48550/arXiv.2501.12619
585 OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang, Sifan Zhou 2025-01-23 arXiv https://github.com/BrotherHappy/OSTQuant https://doi.org/10.48550/arXiv.2501.13987
586 Low-Rank Adapters Meet Neural Architecture Search for LLM Compression J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain 2025-01-23 arXiv https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning http://arxiv.org/abs/2501.16372v1
587 LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps Andrey Palaev, Adil Khan, Syed M. Ahsan Kazmi 2025-01-23 arXiv https://github.com/Palandr123/DiffusionU-NetLLM http://arxiv.org/abs/2501.14046v1
588 Quantification of Large Language Model Distillation Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Haihong Wu, Tianci Liu, Jiaheng Liu, Hamid Alinejad-Rokny, Min Yang, Yitao Liang, Zhoufutu Wen, Shiwen Ni 2025-01-22 arXiv https://github.com/Aegis1863/LLMs-Distillation-Quantification http://arxiv.org/abs/2501.12619v3
589 A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, Xiao Huang 2025-01-21 arXiv https://github.com/DEEP-PolyU/Awesome-GraphRAG https://doi.org/10.48550/arXiv.2501.13958
590 VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model Xianwei Zhuang, Yuxin Xie, Yufan Deng, Liming Liang, Jinghan Ru, Yuguo Yin, Yuexian Zou 2025-01-21 arXiv https://vargpt-1.github.io/ https://doi.org/10.48550/arXiv.2501.12327
591 EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Zhili Cheng, Yuge Tu, Ran Li, Shiqi Dai, Jinyi Hu, Shengding Hu, Jiahao Li, Yang Shi, Tianyu Yu, Weize Chen, Lei Shi, Maosong Sun 2025-01-21 arXiv https://github.com/thunlp/EmbodiedEval http://arxiv.org/abs/2501.11858v1
592 Can open source large language models be used for tumor documentation in Germany? - An evaluation on urological doctors' notes Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer 2025-01-21 arXiv https://github.com/stefan-m-lenz/UroLlmEval https://doi.org/10.48550/arXiv.2501.12106
593 Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong 2025-01-20 arXiv https://depictqa.github.io/deqa-score/ https://doi.org/10.48550/arXiv.2501.11561
594 Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy Saeid Asgari Taghanaki, Joao Monteiro 2025-01-20 arXiv https://github.com/asgsaeid/EQT http://arxiv.org/abs/2501.11721v1
595 Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference Pouya Hamadanian, Sadjad Fouladi 2025-01-20 arXiv https://github.com/microsoft/glinthawk http://arxiv.org/abs/2501.11779v1
596 ChaosEater: Fully Automating Chaos Engineering with Large Language Models Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri, Yuusuke Nakano 2025-01-19 arXiv https://ntt-dkiku.github.io/chaos-eater https://doi.org/10.48550/arXiv.2501.11107
597 InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models Jing Ding, Kai Feng, Binbin Lin, Jiarui Cai, Qiushi Wang, Yu Xie, Xiaojin Zhang, Zhongyu Wei, Wei Chen 2025-01-19 arXiv https://github.com/HaileyFamo/InsQABench https://doi.org/10.48550/arXiv.2501.10943
598 Control LLM: Controlled Evolution for Intelligence Retention in LLM Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice Leung, Ya Xu 2025-01-19 arXiv https://github.com/linkedin/ControlLLM http://arxiv.org/abs/2501.10979v1
599 LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport Kyeongha Rho, Hyeongkeun Lee, Valentio Iverson, Joon Son Chung 2025-01-18 arXiv:2501.09291, 2025 https://github.com/NAVER-INTEL-Co-Lab/gaudi-lavcap http://arxiv.org/abs/2501.09291v1
600 PaSa: An LLM Agent for Comprehensive Academic Paper Search Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, Weinan E 2025-01-17 arXiv https://github.com/bytedance/pasa http://arxiv.org/abs/2501.10120v1
601 Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, Bryan Hooi 2025-01-17 arXiv:2501.08603, 2025 https://github.com/zz1358m/MCTS-AHD-master http://arxiv.org/abs/2501.08603v2
602 When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis Ruixuan Zhang, Beichen Wang, Juexiao Zhang, Zilin Bian, Chen Feng, Kaan Ozbay 2025-01-17 arXiv https://github.com/ai4ce/SeeUnsafe https://doi.org/10.48550/arXiv.2501.10604
603 FaceXBench: Evaluating Multimodal LLMs on Face Understanding Kartik Narayan, Vibashan VS, Vishal M. Patel 2025-01-17 arXiv https://kartik-3004.github.io/facexbench/ http://arxiv.org/abs/2501.10360v1
604 PokerBench: Training Large Language Models to become Professional Poker Players Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli 2025-01-16 arXiv https://github.com/pokerllm/pokerbench https://doi.org/10.48550/arXiv.2501.08328
605 LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu 2025-01-16 arXiv https://github.com/appletea233/LLaVA-ST https://doi.org/10.48550/arXiv.2501.08282
606 Gandalf the Red: Adaptive Security for LLMs Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla 2025-01-16 arXiv …, 2025 https://github.com/lakeraai/dsec-gandalf http://arxiv.org/abs/2501.07927v1
607 CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, Baishakhi Ray 2025-01-16 arXiv:2501.08200, 2025 https://github.com/Co1lin/CWEval http://arxiv.org/abs/2501.08200v1
608 Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing Eshaan Tanwar, Gayatri Oke, Tanmoy Chakraborty 2025-01-16 arXiv:2501.09127, 2025 https://github.com/EshaanT/Bilingual_processing_LLMs http://arxiv.org/abs/2501.09127v1
609 OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training Yijiong Yu, Ziyun Dai, Zekun Wang, Wei Wang, Ran Chen, Ji Pei 2025-01-16 arXiv …, 2025 https://github.com/yuyijiong/fineweb-edu-chinese http://arxiv.org/abs/2501.08197v1
610 Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs Qinyu Ma, Yuhao Zhou, Jianfeng Li 2025-01-15 Macromol. Rapid Commun. 2025, 2500065 https://github.com/QinyuMa316/RetroSynthesisAgent http://arxiv.org/abs/2501.08897v2
611 LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation Yiran Tao, Jehan Yang, Dan Ding, Zackory Erickson 2025-01-15 arXiv https://lams-assistance.github.io/ http://arxiv.org/abs/2501.08558v1
612 The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities Irina Bigoulaeva, Harish Tayyar Madabushi, Iryna Gurevych 2025-01-15 arXiv https://github.com/UKPLab/arxiv2025-inherent-limits-plms http://arxiv.org/abs/2501.08716v1
613 A Roadmap to Guide the Integration of LLMs in Hierarchical Planning Israel Puerta-Merino, Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares 2025-01-14 arXiv https://llmforplanning.github.io http://arxiv.org/abs/2501.08068v1
614 Lifelong Learning of Large Language Model based Agents: A Roadmap Junhao Zheng, Chengming Shi, Xidi Cai, Qiuke Li, Duzhen Zhang, Chenxing Li, Dong Yu, Qianli Ma 2025-01-13 arXiv https://github.com/qianlima-lab/awesome-lifelong-llm-agent https://doi.org/10.48550/arXiv.2501.07278
615 SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu 2025-01-12 arXiv https://github.com/TianjinYellow/SPAM-Optimizer http://arxiv.org/abs/2501.06842v1
616 ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Xiangru Tang, Tianyu Hu, Muyang Ye, Yanjun Shao, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, Mark Gerstein 2025-01-11 arXiv https://github.com/gersteinlab/chemagent https://doi.org/10.48550/arXiv.2501.06590
617 SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen 2025-01-11 arXiv …, 2025 https://github.com/InternLM/SWE-Fixer http://arxiv.org/abs/2501.05040v1
618 FairCode: Evaluating Social Bias of LLMs in Code Generation Yongkang Du, Jen-tse Huang, Jieyu Zhao, Lu Lin 2025-01-11 arXiv:2501.05396, 2025 https://github.com/YongkDu/FairCode http://arxiv.org/abs/2501.05396v1
619 ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun 2025-01-11 arXiv https://github.com/thunlp/ChartCoder https://doi.org/10.48550/arXiv.2501.06598
620 Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu 2025-01-11 arXiv https://github.com/Rainier-rq/FollowSoftConstraints https://doi.org/10.48550/arXiv.2501.04945
621 Demystifying Domain-adaptive Post-training for Financial LLMs Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty 2025-01-11 arXiv …, 2025 https://github.com/SalesforceAIResearch/FinDap http://arxiv.org/abs/2501.04961v1
622 HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, Zhezhi He 2025-01-11 arXiv …, 2025 https://github.com/Intelligent-Computing-Research-Group/HaVen http://arxiv.org/abs/2501.04908v1
623 Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun 2025-01-10 arXiv https://migician-vg.github.io/ https://doi.org/10.48550/arXiv.2501.05767
624 ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events Duygu Sezen Islakoglu, Jan-Christoph Kalo 2025-01-10 arXiv https://github.com/duyguislakoglu/chronosense https://doi.org/10.48550/arXiv.2501.03040
625 Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain Jing Guo, Nan Li, Ming Xu 2025-01-10 arXiv https://github.com/CEEAI/elle https://doi.org/10.48550/arXiv.2501.06277
626 LLM4SR: A Survey on Large Language Models for Scientific Research Ziming Luo, Zonglin Yang, Zexin Xu, Wei Yang, Xinya Du 2025-01-10 arXiv https://github.com/du-nlp-lab/LLM4SR https://doi.org/10.48550/arXiv.2501.04306
627 MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou 2025-01-10 arXiv https://funaudiollm.github.io/minmo https://doi.org/10.48550/arXiv.2501.06282
628 FlairGPT: Repurposing LLMs for Interior Designs Gabrielle Littlefair, Niladri Shekhar Dutt, Niloy J. Mitra 2025-01-10 arXiv:2501.04648, 2025 https://flairgpt.github.io/ http://arxiv.org/abs/2501.04648v1
629 Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, Yaowei Wang, Yonghong Tian, Jin Tang 2025-01-09 arXiv …, 2025 https://github.com/Event-AHU/Medical_Image_Analysis http://arxiv.org/abs/2501.03458v1
630 LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Viren Bajaj, Zeya Ahmad 2025-01-06 arXiv https://github.com/cvs-health/langfair https://doi.org/10.48550/arXiv.2501.03112
631 BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Beichen Zhang, Yuhong Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Haodong Duan, Yuhang Cao, Dahua Lin, Jiaqi Wang 2025-01-06 arXiv https://github.com/beichenzbc/BoostStep https://doi.org/10.48550/arXiv.2501.03226
632 Visual Large Language Models for Generalized and Specialized Applications Yifan Li, Zhixin Lai, Wentao Bao, Zhen Tan, Anh Dao, Kewei Sui, Jiayi Shen, Dong Liu, Huan Liu, Yu Kong 2025-01-06 arXiv https://github.com/JackYFL/awesome-VLLMs https://doi.org/10.48550/arXiv.2501.02765
633 CALM: Curiosity-Driven Auditing for Large Language Models Xiang Zheng, Longxiang Wang, Yi Liu, Xingjun Ma, Chao Shen, Cong Wang 2025-01-06 arXiv https://github.com/x-zheng16/CALM https://doi.org/10.48550/arXiv.2501.02997
634 HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs Saleh Ashkboos, Mahdi Nikdan, Soroush Tabesh, Roberto L. Castro, Torsten Hoefler, Dan Alistarh 2025-01-05 arXiv https://github.com/IST-DASLab/HALO http://arxiv.org/abs/2501.02625v2
635 Multi-LLM Collaborative Caption Generation in Scientific Documents Jaeyoung Kim, Jongho Lee, Hong-Jun Choi, Ting-Yao Hsu, Chieh-Yang Huang, Sungchul Kim, Ryan Rossi, Tong Yu, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang, Sungchul Choi 2025-01-05 arXiv https://github.com/teamreboott/MLBCAP http://arxiv.org/abs/2501.02552v1
636 MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments Cai Yin, Zhouhong Gu, Du Zhaohan, Ye Zheyu, Cao Shaosheng, Xu Yiqian, Feng Hongwei, Chen Ping 2025-01-04 arXiv https://github.com/lime728/MIRAGE https://doi.org/10.48550/arXiv.2501.01652
637 Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari 2025-01-04 arXiv https://github.com/TaraRadvand74/llm-text-detection http://arxiv.org/abs/2501.02406v2
638 Aligning Large Language Models for Faithful Integrity Against Opposing Argument Yong Zhao, Yang Deng, See-Kiong Ng, Tat-Seng Chua 2025-01-04 arXiv https://github.com/zhaoy777/AFICE https://doi.org/10.48550/arXiv.2501.01336
639 UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang 2025-01-04 arXiv https://github.com/Hub-Tian/UAVs_Meet_LLMs http://arxiv.org/abs/2501.02341v1
640 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Jian Hu 2025-01-04 arXiv https://github.com/OpenRLHF/OpenRLHF https://doi.org/10.48550/arXiv.2501.03262
641 Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, Feiran Huang, Sheng Zhou, Jiajun Bu, Allen Lin, James Caverlee, Fakhri Karray, Irwin King, Philip S. Yu 2025-01-04 arXiv https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation https://doi.org/10.48550/arXiv.2501.01945
642 Text Clustering as Classification with LLMs Chen Huang, Guoxiu He 2025-01-04 Available at SSRN 5081002 https://github.com/ECNU-Text-Computing/Text-Clustering-via-LLM http://arxiv.org/abs/2410.00927v2
643 Instruction-Following Evaluation for Large Language Models Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou 2025-01-03 arXiv https://github.com/google-research/google-research/tree/master/instruction_following_eval https://doi.org/10.48550/arXiv.2311.07911
644 FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze 2025-01-03 arXiv …, 2025 http://github.com/flashinfer-ai/flashinfer http://arxiv.org/abs/2501.01005v1
645 Labels Generated by Large Language Model Helps Measuring People's Empathy in Vitro Md. Rakibul Hasan, Yue Yao, Md. Zakir Hossain, Aneesh Krishna, Imre Rudas, Shafin Rahman, Tom Gedeon 2025-01-02 arXiv https://github.com/hasan-rakibul/LLMPathy https://doi.org/10.48550/arXiv.2501.00691
646 Aligning LLMs with Domain Invariant Reward Models David Wu, Sanjiban Choudhury 2025-01-02 arXiv:2501.00911, 2025 https://github.com/portal-cornell/dial http://arxiv.org/abs/2501.00911v1
647 Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models Anmol Reddy Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid A. Hasan, Elita A. Lobo 2025 arXiv https://github.com/molereddy/Alternate-Preference-Optimization https://doi.org/10.48550/arXiv.2409.13474
648 Surveillance Video-and-Language Understanding: from Small to Large Multimodal Models Tongtong Yuan, Xuange Zhang, Bo Liu, Kun Liu, Jian Jin, Zhenzhen Jiao 2025 IEEE Transactions on Circuits and Systems for Video Technology https://xuange923.github.io/Surveillance-Video-Understanding https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10681489
649 LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He 2025 arXiv https://github.com/QiaoYRan/LOGIN https://doi.org/10.48550/arXiv.2405.13902
650 Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi 2025 arXiv https://github.com/zhongjian-zhang/LLM4RGNN https://doi.org/10.48550/arXiv.2408.08685
651 TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning Xiang Li, Yunshi Lan, Chao Yang 2025 arXiv https://github.com/Ashura5/TreeEval https://doi.org/10.48550/arXiv.2402.13125
652 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang, Lingdong Shen, Jia Liu, Fangxin Shang, Hongxiang Li, Haifeng Huang, Yehui Yang 2025 AAAI https://github.com/ShawnHuang497/MedPLIB https://doi.org/10.1609/aaai.v39i4.32394
653 Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval Guangyuan Ma, Yongliang Ma, Xing Wu, Zhenpeng Su, Ming Zhou, Songlin Hu 2025 arXiv https://github.com/tdro-llm/tdro https://doi.org/10.48550/arXiv.2408.10613
654 SS-GEN: A Social Story Generation Framework with Large Language Models Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, Jian Yu 2025 AAAI https://github.com/MIMIFY/SS-GEN https://doi.org/10.1609/aaai.v39i2.32119
655 SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu 2025 arXiv https://SEAS-LLM.github.io/ https://doi.org/10.48550/arXiv.2408.02632
656 Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li 2025 arXiv https://github.com/Event-AHU/OpenPAR https://doi.org/10.48550/arXiv.2408.09720
657 PAT: Pruning-Aware Tuning for Large Language Models Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du 2025 arXiv https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning https://doi.org/10.48550/arXiv.2408.14721
658 One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, Ji-Rong Wen 2025 arXiv https://github.com/DaoD/SPRING/ https://doi.org/10.48550/arXiv.2405.19670
659 NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning Xin Yi, Shunfan Zheng, Linlin Wang, Gerard de Melo, Xiaoling Wang, Liang He 2025 AAAI https://github.com/xinykou/NLSR https://doi.org/10.1609/aaai.v39i24.34762
660 MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen 2025 arXiv https://github.com/zjwang21/MoE-LPR https://doi.org/10.48550/arXiv.2408.11396
661 CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding? Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma 2025 arXiv https://github.com/CodeLLM-Research/CodeJudge-Eval https://doi.org/10.48550/arXiv.2408.10718
662 Mitigating Social Bias in Large Language Models: A Multi-Objective Approach Within a Multi-Agent Framework Zhenjie Xu, Wenqing Chen, Yi Tang, Xuanying Li, Cheng Hu, Zhixuan Chu, Kui Ren, Zibin Zheng, Zhichao Lu 2025 AAAI https://github.com/Cortantse/MOMA https://doi.org/10.1609/aaai.v39i24.34748
663 Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan 2025 AAAI https://github.com/dirtycomputer/O2M_attack https://doi.org/10.1609/aaai.v39i4.32396
664 MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang 2025 arXiv https://github.com/wjfu99/MIA-Tuner https://doi.org/10.48550/arXiv.2408.08661
665 LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation Qidong Liu, Xian Wu, Wanyu Wang, Yejing Wang, Yuanshao Zhu, Xiangyu Zhao, Feng Tian, Yefeng Zheng 2025 AAAI https://github.com/Applied-Machine-Learning-Lab/LLMEmb https://doi.org/10.1609/aaai.v39i11.33327
666 LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai 2025 AAAI https://github.com/adxcreative/LEARN https://doi.org/10.1609/aaai.v39i11.33291
667 Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao 2025 arXiv https://github.com/ChenhuiHu/knowledge_in_superposition https://doi.org/10.48550/arXiv.2408.07413
668 ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu 2025 AAAI https://github.com/zhaoyuzhi/ICM-Assistant https://doi.org/10.1609/aaai.v39i8.32908
669 IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin 2025 arXiv https://github.com/360CVGroup/Inner-Adaptor-Architecture https://doi.org/10.48550/arXiv.2408.12902
670 Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning Junlin He, Tong Nie, Wei Ma 2025 arXiv https://github.com/Umaruchain/LLMGeovec https://doi.org/10.48550/arXiv.2408.12116
671 Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou 2025 arXiv https://github.com/ywh187/FitPrune https://doi.org/10.48550/arXiv.2409.10197
672 Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Shengping Liu, Kang Liu, Jun Zhao 2025 COLING https://github.com/Xnhyacinth/IAG https://aclanthology.org/2025.coling-main.89/
673 QuickLLaMA: Query-aware Inference Acceleration for Large Language Models Jingyao Li, Han Shi, Sitong Wu, Chuanyang Zheng, Zhenguo Li, Xin Jiang, Hong Xu, Jiaya Jia 2025 COLING https://github.com/dvlab-research/Q-LLM https://aclanthology.org/2025.coling-main.34/
674 Distilling Rule-based Knowledge into Large Language Models Wenkai Yang, Yankai Lin, Jie Zhou, Ji-Rong Wen 2025 COLING https://github.com/RUCBM/rule-distillation https://aclanthology.org/2025.coling-main.61/
675 EarthMarker: A Visual Prompting Multimodal Large Language Model for Remote Sensing Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Jun Li, Xuerui Mao 2025 IEEE Trans. Geosci. Remote. Sens. https://github.com/wivizhang/EarthMarker https://doi.org/10.1109/TGRS.2024.3523505
676 Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui 2025 arXiv https://github.com/reml-group/DoG https://doi.org/10.48550/arXiv.2409.03155
677 Towards Efficient and Effective Adaptation of Large Language Models for Sequential Recommendation Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu 2025 arXiv https://github.com/justarter/E2URec https://doi.org/10.48550/arXiv.2310.01612
678 Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers 2025 arXiv https://github.com/rsummers11/CADLab/tree/master/MAPLEZ_LLM_report_labeler/ https://doi.org/10.48550/arXiv.2403.04024
679 Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng 2025 arXiv https://github.com/zengxingchen/ChartQA-MLLM https://doi.org/10.48550/arXiv.2407.20174
680 Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong 2025 COLING https://github.com/hfutml/Calibration-MLLM https://aclanthology.org/2025.coling-main.208/
681 Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges Vinay Samuel, Yue Zhou, Henry Peng Zou 2025 arXiv https://github.com/vsamuel2003/data-contamination https://doi.org/10.48550/arXiv.2409.09927
682 The Only Way is Ethics: A Guide to Ethical Research with Large Language Models Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch 2025 COLING https://github.com/MxEddie/Ethics-Whitepaper https://aclanthology.org/2025.coling-main.603/
683 The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models Zihui Wu, Haichang Gao, Jianping He, Ping Wang 2025 arXiv https://github.com/wooozihui/jailbreakfunction https://doi.org/10.48550/arXiv.2407.17915
684 Retrieval Augmented Instruction Tuning for Open NER with Large Language Models Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang 2025 arXiv https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER https://doi.org/10.48550/arXiv.2406.17305
685 Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong 2025 COLING https://github.com/wutaiqiang/LLM_KD_AKL https://aclanthology.org/2025.coling-main.383/
686 Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen 2025 COLING https://github.com/open-compass/DevEval https://aclanthology.org/2025.coling-main.502/
687 Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han, Le Sun, Hao Wang, Zhenyu Zeng 2025 arXiv https://github.com/tshu-w/ComEM https://doi.org/10.48550/arXiv.2405.16884
688 Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang G. Cai 2025 arXiv https://github.com/ChenDelong1999/Linguistic-Similarity https://doi.org/10.48550/arXiv.2409.12435
689 LLMTreeRec: Unleashing the Power of Large Language Models for Cold-Start Recommendations Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang 2025 COLING https://github.com/Applied-Machine-Learning-Lab/LLMTreeRec https://aclanthology.org/2025.coling-main.59/
690 KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting Thilini Wijesiriwardene, Ruwan Wickramarachchi, Sreeram Reddy Vennam, Vinija Jain, Aman Chadha, Amitava Das, Ponnurangam Kumaraguru, Amit P. Sheth 2025 COLING https://github.com/Thiliniiw/KnowledgePrompts/ https://aclanthology.org/2025.coling-main.268/
691 Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang 2025 arXiv https://github.com/RUCAIBox/LLM-Knowledge-Boundary https://doi.org/10.48550/arXiv.2307.11019
692 InternLM-Law: An Open Source Chinese Legal Large Language Model Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge 2025 arXiv https://github.com/InternLM/InternLM-Law https://doi.org/10.48550/arXiv.2406.14887
693 ICLEval: Evaluating In-Context Learning Ability of Large Language Models Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen 2025 arXiv https://github.com/yiye3/ICLEval https://doi.org/10.48550/arXiv.2406.14955
694 Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining Zongru Wu, Pengzhou Cheng, Lingyong Fang, Zhuosheng Zhang, Gongshen Liu 2025 COLING https://github.com/ZrW00/GraceFul https://aclanthology.org/2025.coling-main.220/
695 GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models Zike Yuan, Ming Liu, Hui Wang, Bing Qin 2025 arXiv https://github.com/ZIKEYUAN/GraCoRe https://doi.org/10.48550/arXiv.2407.02936
696 Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion Ben Liu, Jihai Zhang, Fangquan Lin, Cheng Yang, Min Peng 2025 COLING https://github.com/LB0828/FtG https://aclanthology.org/2025.coling-main.740/
697 Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang 2025 COLING https://github.com/Luckfort/CD https://aclanthology.org/2025.coling-main.37/
698 Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, Yu Hong 2025 COLING https://github.com/jiah-li/magic https://aclanthology.org/2025.coling-main.305/
699 Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan, Zheng Hui, Jiawei Yao 2025 AAAI https://github.com/FanshuoZeng/Simignore https://doi.org/10.1609/aaai.v39i10.33107
700 The Geometry of Categorical and Hierarchical Concepts in Large Language Models Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch 2025 arXiv https://github.com/KihoPark/LLM_Categorical_Hierarchical_Representations https://doi.org/10.48550/arXiv.2406.01506
701 ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang 2025 arXiv https://github.com/yejipark-m/ConVis https://doi.org/10.48550/arXiv.2408.13906
702 DiscoveryBench: Towards Data-Driven Discovery with Large Language Models Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark 2025 arXiv https://github.com/allenai/discoverybench https://doi.org/10.48550/arXiv.2407.01725
703 MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia 2025 ICLR https://github.com/dvlab-research/MR-GSM8K https://openreview.net/forum?id=br4H61LOoI
704 LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica 2025 arXiv https://livecodebench.github.io/ https://doi.org/10.48550/arXiv.2403.07974
705 Large Language Models are Interpretable Learners Ruochen Wang, Si Si, Felix X. Yu, Dorothea Wiesmann Rothuizen, Cho-Jui Hsieh, Inderjit S. Dhillon 2025 ICLR https://github.com/ruocwang/llm-symbolic-program https://openreview.net/forum?id=hTphfqtafO
706 LLaMA-Omni: Seamless Speech Interaction with Large Language Models Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng 2025 arXiv https://github.com/ictnlp/LLaMA-Omni https://doi.org/10.48550/arXiv.2409.06666
707 LLM-SR: Scientific Equation Discovery via Programming with Large Language Models Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K. Reddy 2025 arXiv https://github.com/deep-symbolic-mathematics/LLM-SR https://doi.org/10.48550/arXiv.2404.18400
708 LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models Xiaohao Yang, He Zhao, Dinh Q. Phung, Wray L. Buntine, Lan Du 2025 arXiv https://github.com/Xiaohao-Yang/Topic_Model_Evaluation https://doi.org/10.48550/arXiv.2406.09008
709 KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang 2025 ICLR https://github.com/juyongjiang/KaSA https://openreview.net/forum?id=OQqNieeivq
710 Improved Techniques for Optimization-Based Jailbreaking on Large Language Models Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin 2025 arXiv https://github.com/jiaxiaojunQAQ/I-GCG https://doi.org/10.48550/arXiv.2405.21018
711 FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian 2025 ICLR https://github.com/microsoft/CADGeneration/FlexCAD https://openreview.net/forum?id=Z0eiiV3Yyh
712 Efficient Evolutionary Search Over Chemical Space with Large Language Models Haorui Wang, Marta Skreta, Cher Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang 2025 ICLR http://github.com/zoom-wang112358/MOLLEO https://openreview.net/forum?id=awWiNvQwf3
713 Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaosheng Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin 2025 ICLR https://github.com/Osilly/dynamic_llava https://openreview.net/forum?id=hzVpZDrW73
714 Developing safe and responsible large language model: can we balance bias reduction and language understanding? Shaina Raza, Oluwanifemi Bamgbose, Shardul Ghuge, Fatemeh Tavakoli, Deepak John Reji, Syed Raza Bashir 2025 Mach. Learn. https://github.com/shainarazavi/Safe-Responsible-LLM https://doi.org/10.1007/s10994-025-06767-4
715 Neuron based Personality Trait Induction in Large Language Models Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao, Ji-Rong Wen 2025 ICLR https://github.com/RUCAIBox/NPTI https://openreview.net/forum?id=LYHEY783Np
716 Concept Bottleneck Large Language Models Chung-En Sun, Tuomas P. Oikarinen, Berk Ustun, Tsui-Wei Weng 2025 ICLR https://github.com/Trustworthy-ML-Lab/CB-LLMs https://openreview.net/forum?id=RC5FPYVQaH
717 Can Large Language Models Understand Symbolic Graphics Programs? Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf 2025 arXiv https://sgp-bench.github.io/ https://doi.org/10.48550/arXiv.2408.08313
718 CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma Gongque, Jianing Yu, Qiuna Tan, Weiran Xu 2025 arXiv https://github.com/csbench/csbench https://doi.org/10.48550/arXiv.2406.08587
719 Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu 2025 arXiv https://github.com/git-disl/Booster https://doi.org/10.48550/arXiv.2409.01586
720 Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Boda Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov 2025 arXiv https://github.com/IINemo/lm-polygraph https://doi.org/10.48550/arXiv.2406.15627
721 Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang 2025 arXiv https://github.com/TUDa-HWAI/Basis_Sharing https://doi.org/10.48550/arXiv.2410.03765
722 An Engorgio Prompt Makes Large Language Model Babble on Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu 2025 ICLR https://github.com/jianshuod/Engorgio-prompt https://openreview.net/forum?id=m4eXBo0VNc
723 Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards Xiaoyu Yang, Jie Lu, En Yu 2025 ICLR https://github.com/Anonymous0Knight/ConceptDriftMLLMs https://openreview.net/forum?id=b20VK2GnSs
724 AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models Kim Sung-Bin, Oh Hyun-Bin, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh 2025 ICLR https://github.com/AVHBench/AVHBench https://openreview.net/forum?id=jTEKTdI3K9
725 A Probabilistic Perspective on Unlearning and Alignment for Large Language Models Yan Scholten, Stephan Günnemann, Leo Schwinn 2025 arXiv https://github.com/yascho/probabilistic-unlearning https://doi.org/10.48550/arXiv.2410.03523
726 A Closer Look into Mixture-of-Experts in Large Language Models Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu 2025 arXiv https://github.com/kamanphoebe/Look-into-MoEs https://doi.org/10.48550/arXiv.2406.18219
727 Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models Jingyang Zhang, Jingwei Sun, Eric C. Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Helen Li 2025 arXiv https://zjysteven.github.io/mink-plus-plus/ https://doi.org/10.48550/arXiv.2404.02936
728 Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji 2025 arXiv https://github.com/lzhxmu/VTW https://doi.org/10.48550/arXiv.2405.05803
729 NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions Mehak Preet Dhaliwal, Andong Hua, Laya Pullela, Ryan Burke, Yao Qin 2025 arXiv https://mehak126.github.io/nutribench.html https://doi.org/10.48550/arXiv.2407.12843
730 UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model Zhaowei Li, Wei Wang, Yiqing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang 2025 arXiv https://github.com/lzw-lzw/UnifiedMLLM https://doi.org/10.48550/arXiv.2408.02503
731 Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution Wentao Tan, Qiong Cao, Yibing Zhan, Chao Xue, Changxing Ding 2025 AAAI https://github.com/WentaoTan/SENA https://doi.org/10.1609/aaai.v39i7.32774
732 SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao, Hangyu Mao, Fuzheng Zhang 2025 arXiv https://sheetagent.github.io https://doi.org/10.48550/arXiv.2403.03636
733 Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao 2025 arXiv https://github.com/liuqi6777/pe_rank https://doi.org/10.48550/arXiv.2406.14848
734 Learning Multiple Object States from Actions via Large Language Models Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato 2025 WACV https://masatate.github.io/ObjStatefromAction.github.io/ https://doi.org/10.1109/WACV61041.2025.00925
735 Large Language Models Empowered Personalized Web Agents Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua 2025 WWW https://hongrucai.github.io/PersonalWAB/ https://doi.org/10.1145/3696410.3714842
736 Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval Luo Ji, Feixiang Guo, Teng Chen, Qingqing Gu, Xiaoyu Wang, Ningyuan Xi, Yihong Wang, Peng Yu, Yue Zhao, Hongyang Lei, Zhonglin Jiang, Yong Chen 2025 ECIR https://github.com/flyfree5/LaHoRe https://doi.org/10.1007/978-3-031-88714-7_27
737 CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, Xiangnan He 2025 arXiv https://github.com/zyang1580/CoLLM https://doi.org/10.48550/arXiv.2310.19488
738 Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma 2025 ICLR https://github.com/BryceZhuo/PolyCom https://openreview.net/forum?id=CbpWPbYHuv
739 DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Do cumentation Anna C. Doris, Daniele Grandi, Ryan Tomich, Md Ferdous Alam, Mohammadmehdi Ataei, Hyunmin Cheong, Faez Ahmed 2025 J. Comput. Inf. Sci. Eng. https://github.com/anniedoris/design_qa/ https://doi.org/10.1115/1.4067333
740 Zero-shot Model-based Reinforcement Learning using Large Language Models Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl 2025 arXiv https://github.com/abenechehab/dicl https://doi.org/10.48550/arXiv.2410.11711
741 WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jian-Guang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Yansong Tang, Dongmei Zhang 2025 ICLR https://github.com/nlpxucan/WizardLM https://openreview.net/forum?id=mMPMHWOdOy
742 Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning? Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp, Jindong Gu 2025 WACV https://chenxshuo.github.io/m-icl/ https://doi.org/10.1109/WACV61041.2025.00585
743 TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking Danqing Wang, Jianxin Ma, Fei Fang, Lei Li 2025 ICLR https://github.com/dqwang122/ThinkHub https://openreview.net/forum?id=VIUisLx8lQ
744 Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation Mufei Li, Siqi Miao, Pan Li 2025 ICLR https://github.com/Graph-COM/SubgraphRAG https://openreview.net/forum?id=JvkuZZ04O7
745 Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, Jian Guo 2025 ICLR https://github.com/IDEA-FinAI/ToG-2 https://openreview.net/forum?id=oFBu7qaZpS
746 REvolve: Reward Evolution with Large Language Models using Human Feedback Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires 2025 ICLR https://rishihazra.github.io/REvolve https://openreview.net/forum?id=cJPUpL8mOw
747 Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh 2025 NAACL https://github.com/parameterlab/mia-scaling https://aclanthology.org/2025.findings-naacl.234/
748 Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou 2025 arXiv https://github.com/QwenLM/AutoIF https://doi.org/10.48550/arXiv.2406.13542
749 REEF: Representation Encoding Fingerprints for Large Language Models Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao 2025 ICLR https://github.com/tmylla/REEF https://openreview.net/forum?id=SnDmPkOJ0T
750 Steering Large Language Models between Code Execution and Textual Reasoning Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang 2025 arXiv https://yongchao98.github.io/CodeSteer/ https://doi.org/10.48550/arXiv.2410.03524
751 StringLLM: Understanding the String Processing Capability of Large Language Models Xilong Wang, Hao Fu, Jindong Wang, Neil Zhenqiang Gong 2025 arXiv https://github.com/wxl-lxw/StringLLM https://doi.org/10.48550/arXiv.2410.01208
752 TESTEVAL: Benchmarking Large Language Models for Test Case Generation Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma 2025 arXiv https://llm4softwaretesting.github.io https://doi.org/10.48550/arXiv.2406.04531
753 A Closer Look at Machine Unlearning for Large Language Models Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin 2025 arXiv https://github.com/sail-sg/closer-look-LLM-unlearning https://doi.org/10.48550/arXiv.2410.08109
754 Distributed Mixture-of-Agents for Edge Inference with Large Language Models Purbesh Mitra, Priyanka Kaswan, Sennur Ulukus 2024-12-30 arXiv https://github.com/purbeshmitra/distributed_moa http://arxiv.org/abs/2412.21200v1
755 Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen 2024-12-29 arXiv https://github.com/YuHuiGao/FG-Bench http://arxiv.org/abs/2412.20613v1
756 Mind the Data Gap: Bridging LLMs to Enterprise Data Integration Moe Kayali, Fabian Wenz, Nesime Tatbul, Çağatay Demiralp 2024-12-29 arXiv https://goby-benchmark.github.io/ http://arxiv.org/abs/2412.20331v1
757 TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang 2024-12-29 arXiv https://github.com/ACA-Lab-SJTU/token-ring http://arxiv.org/abs/2412.20501v1
758 On the Compositional Generalization of Multimodal LLMs for Medical Imaging Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang 2024-12-28 arXiv https://github.com/FreedomIntelligence/Med-MAT http://arxiv.org/abs/2412.20070v1
759 Toward Adaptive Reasoning in Large Language Models with Thought Rollback Sijia Chen, Baochun Li 2024-12-27 ICML https://github.com/iQua/llmpebase/tree/main/examples/ThoughtRollback https://openreview.net/forum?id=aoAPOOtN9E
760 A Survey on Large Language Model Acceleration based on KV Cache Management Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen 2024-12-27 arXiv https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management http://arxiv.org/abs/2412.19442v2
761 Gradient Weight-normalized Low-rank Projection for Efficient LLM Training Jia-Hong Huang, Yixian Shen, Hongyi Zhu, Stevan Rudinac, Evangelos Kanoulas 2024-12-27 arXiv https://github.com/Jhhuangkay/Gradient-Weight-normalized-Low-rank-Projection-for-Efficient-LLM-Training http://arxiv.org/abs/2412.19616v1
762 MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios Jiaqi Fan, Jianhua Wu, Jincheng Gao, Jianhao Yu, Yafei Wang, Hongqing Chu, Bingzhao Gao 2024-12-27 arXiv https://github.com/fjq-tongji/MLLM-SUL http://arxiv.org/abs/2412.19406v1
763 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang 2024-12-26 arXiv https://github.com/OpenGVLab/TPO http://arxiv.org/abs/2412.19326v1
764 CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models Ping Guo, Qingfu Zhang, Xi Lin 2024-12-25 arXiv https://github.com/pgg3/CoEvo http://arxiv.org/abs/2412.18890v1
765 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding Tatiana Zemskova, Dmitry Yudin 2024-12-24 arXiv https://github.com/CognitiveAISystems/3DGraphLLM http://arxiv.org/abs/2412.18450v2
766 Distilling Fine-grained Sentiment Understanding from Large Language Models Yice Zhang, Guangyu Xie, Hongling Xu, Kaiheng Hou, Jianzhu Bao, Qianlong Wang, Shiwei Chen, Ruifeng Xu 2024-12-24 arXiv https://github.com/HITSZ-HLT/FSA-Distillation http://arxiv.org/abs/2412.18552v2
767 Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving Hao Pang, Zhenpo Wang, Guoqiang Li 2024-12-24 arXiv https://bitmobility.github.io/LGDRL/ http://arxiv.org/abs/2412.18511v1
768 Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models Xuan Lin, Long Chen, Yile Wang, Xiangxiang Zeng, Philip S. Yu 2024-12-24 arXiv https://github.com/chenlong164/PEIT http://arxiv.org/abs/2412.18084v1
769 Token-Budget-Aware LLM Reasoning Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen 2024-12-24 arXiv https://github.com/GeniusHTX/TALE http://arxiv.org/abs/2412.18547v3
770 Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance Nicolas Devatine, Louis Abraham 2024-12-23 arXiv https://github.com/NDV-tiime/CompressionDistance http://arxiv.org/abs/2412.17321v1
771 Large Language Model Safety: A Holistic Survey Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, Deyi Xiong 2024-12-23 arXiv https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers http://arxiv.org/abs/2412.17686v1
772 CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models Yeyuan Wang, Dehong Gao, Bin Li, Rujiao Long, Lei Yi, Xiaoyan Cai, Libin Yang, Jinxia Zhang, Shanqing Yu, Qi Xuan 2024-12-22 arXiv https://github.com/Gavin001201/CoF http://arxiv.org/abs/2412.16869v1
773 MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge Jie He, Nan Hu, Wanqiu Long, Jiaoyan Chen, Jeff Z. Pan 2024-12-22 arXiv https://github.com/probe2/multi-hop/ http://arxiv.org/abs/2412.17032v1
774 PruneVid: Visual Token Pruning for Efficient Video Large Language Models Xiaohu Huang, Hao Zhou, Kai Han 2024-12-20 arXiv https://github.com/Visual-AI/PruneVid http://arxiv.org/abs/2412.16117v1
775 TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, Zhengyin Du 2024-12-20 arXiv https://github.com/Junjie-Ye/TL-Training http://arxiv.org/abs/2412.15495v1
776 Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation Xiaoqiang Kang, Zimu Wang, Xiaobo Jin, Wei Wang, Kaizhu Huang, Qiufeng Wang 2024-12-20 arXiv https://github.com/Jason8Kang/TELL http://arxiv.org/abs/2412.15594v1
777 WebLLM: A High-Performance In-Browser LLM Inference Engine Charlie F. Ruan, Yucheng Qin, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen 2024-12-20 arXiv https://github.com/mlc-ai/web-llm http://arxiv.org/abs/2412.15803v1
778 Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou 2024-12-19 arXiv https://github.com/8421BCD/fullrank http://arxiv.org/abs/2412.14574v1
779 On Verbalized Confidence Scores for LLMs Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada 2024-12-19 arXiv https://github.com/danielyxyang/llm-verbalized-uq http://arxiv.org/abs/2412.14737v1
780 ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study Eric Modesitt, Ke Yang, Spencer Hulsey, Chengxiang Zhai, Volodymyr Kindratenko 2024-12-19 arXiv https://github.com/ModeEric/ORBIT-Llama http://arxiv.org/abs/2412.14436v1
781 Agent-SafetyBench: Evaluating the Safety of LLM Agents Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, Minlie Huang 2024-12-19 arXiv https://github.com/thu-coai/Agent-SafetyBench http://arxiv.org/abs/2412.14470v1
782 Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes Katarzyna Kobalczyk, Claudio Fanconi, Hao Sun, Mihaela van der Schaar 2024-12-18 arXiv https://github.com/kasia-kobalczyk/few-shot-steerable-alignment http://arxiv.org/abs/2412.13998v1
783 ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals Utkarsh Saxena, Sayeh Sharify, Kaushik Roy, Xin Wang 2024-12-18 arXiv https://github.com/utkarsh-dmx/project-resq http://arxiv.org/abs/2412.14363v1
784 InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models Cong Wei, Yujie Zhong, Haoxian Tan, Yingsen Zeng, Yong Liu, Zheng Zhao, Yujiu Yang 2024-12-18 arXiv https://github.com/congvvc/InstructSeg http://arxiv.org/abs/2412.14006v1
785 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie 2024-12-18 arXiv https://vision-x-nyu.github.io/thinking-in-space.github.io/ http://arxiv.org/abs/2412.14171v1
786 Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting Vijay Goyal, Mustafa Khan, Aprameya Tirupati, Harveer Saini, Michael Lam, Kevin Zhu 2024-12-18 arXiv https://github.com/alonso130r/knowledge-distillation http://arxiv.org/abs/2412.17846v1
787 Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings Yuanhe Zhang, Zhenhong Zhou, Wei Zhang, Xinyue Wang, Xiaojun Jia, Yang Liu, Sen Su 2024-12-18 arXiv https://github.com/shuita2333/AutoDoS http://arxiv.org/abs/2412.13879v1
788 Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, Kai Han 2024-12-18 arXiv https://visual-ai.github.io/gamebot http://arxiv.org/abs/2412.13602v1
789 Are Your LLMs Capable of Stable Reasoning? Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen 2024-12-17 arXiv https://github.com/open-compass/GPassK http://arxiv.org/abs/2412.13147v2
790 Assessing the Limitations of Large Language Models in Clinical Fact Decomposition Monica Munnangi, Akshay Swaminathan, Jason Alan Fries, Jenelle Jindal, Sanjana Narayanan, Ivan Lopez, Lucia Tu, Philip Chung, Jesutofunmi A. Omiye, Mehr Kashyap, Nigam Shah 2024-12-17 arXiv https://github.com/som-shahlab/factehr http://arxiv.org/abs/2412.12422v1
791 Benchmarking and Understanding Compositional Relational Reasoning of LLMs Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang Liang 2024-12-17 arXiv https://github.com/Caiyun-AI/GAR http://arxiv.org/abs/2412.12841v1
792 Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks Xunkai Li, Zhengyu Wu, Jiayi Wu, Hanwen Cui, Jishuo Jia, Rong-Hua Li, Guoren Wang 2024-12-17 arXiv https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers http://arxiv.org/abs/2412.12456v1
793 SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents Sheng Yin, Xianghe Pang, Yuanzhuo Ding, Menglan Chen, Yutong Bi, Yichen Xiong, Wenhao Huang, Zhen Xiang, Jing Shao, Siheng Chen 2024-12-17 arXiv https://github.com/shengyin1224/SafeAgentBench http://arxiv.org/abs/2412.13178v2
794 SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for Autonomous Vehicles with Large Language Models Zhiyuan Zhou, Heye Huang, Boqi Li, Shiyue Zhao, Yao Mu, Jianqiang Wang 2024-12-17 arXiv https://mezzi33.github.io/SafeDrive/ http://arxiv.org/abs/2412.13238v2
795 RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou 2024-12-16 arXiv https://github.com/sunnynexus/RetroLLM http://arxiv.org/abs/2412.11919v1
796 RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement Junjie Lin, Jian Zhao, Lin Liu, Yue Deng, Youpeng Zhao, Lanxiao Huang, Xia Lin, Wengang Zhou, Houqiang Li 2024-12-16 arXiv https://github.com/Linjunjie99/RL-LLM-DT http://arxiv.org/abs/2412.11417v2
797 LLMs Can Simulate Standardized Patients via Agent Coevolution Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying 2024-12-16 arXiv https://github.com/ZJUMAI/EvoPatient http://arxiv.org/abs/2412.11716v1
798 Does VLM Classification Benefit from LLM Description Semantics? Pingchuan Ma, Lennart Rietdorf, Dmytro Kotovenko, Vincent Tao Hu, Björn Ommer 2024-12-16 arXiv https://github.com/CompVis/DisCLIP http://arxiv.org/abs/2412.11917v3
799 Analyzing Images of Legal Documents: Toward Multi-Modal LLMs for Access to Justice Hannes Westermann, Jaromir Savelka 2024-12-16 arXiv https://github.com/hwestermann/AI4A2J_analyzing_images_of_legal_documents http://arxiv.org/abs/2412.15260v1
800 BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement Yuhao Du, Shunian Chen, Wenbo Zan, Peizhao Li, Mingxuan Wang, Dingjie Song, Bo Li, Yan Hu, Benyou Wang 2024-12-16 arXiv https://github.com/FreedomIntelligence/BlenderLLM http://arxiv.org/abs/2412.14203v1
801 Empowering LLMs to Understand and Generate Complex Vector Graphics Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu 2024-12-15 arXiv https://ximinng.github.io/LLM4SVGProject/ http://arxiv.org/abs/2412.11102v1
802 NITRO: LLM Inference on Intel Laptop NPUs Anthony Fei, Mohamed S. Abdelfattah 2024-12-15 arXiv https://github.com/abdelfattah-lab/nitro http://arxiv.org/abs/2412.11053v1
803 Learning to Verify Summary Facts with Fine-Grained LLM Feedback Jihwan Oh, Jeonghwan Choi, Nicole Hee-Yeon Kim, Taewon Yun, Hwanjun Song 2024-12-14 arXiv https://github.com/DISL-Lab/FineSumFact http://arxiv.org/abs/2412.10689v1
804 B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu 2024-12-13 arXiv https://github.com/zhuqiangLu/B-VLLM http://arxiv.org/abs/2412.09919v1
805 Can LLMs Convert Graphs to Text-Attributed Graphs? Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye 2024-12-13 arXiv https://github.com/Zehong-Wang/TANS http://arxiv.org/abs/2412.10136v1
806 ChainStream: An LLM-based Framework for Unified Synthetic Sensing Jiacheng Liu, Yuanchun Li, Liangyan Li, Yi Sun, Hao Wen, Xiangyu Li, Yao Guo, Yunxin Liu 2024-12-13 arXiv https://github.com/MobileLLM/ChainStream http://arxiv.org/abs/2412.15240v1
807 CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou 2024-12-13 arXiv https://funaudiollm.github.io/cosyvoice2 http://arxiv.org/abs/2412.10117v3
808 Can Modern LLMs Act as Agent Cores in Radiology Environments? Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie 2024-12-12 arXiv https://github.com/MAGIC-AI4Med/RadABench http://arxiv.org/abs/2412.09529v2
809 RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang 2024-12-12 arXiv https://github.com/skyriver-2000/RuleArena http://arxiv.org/abs/2412.08972v1
810 What Makes Cryptic Crosswords Challenging for LLMs? Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar 2024-12-12 COLING 2025 https://github.com/bodasadallah/decrypting-crosswords http://arxiv.org/abs/2412.09012v1
811 Autoformalizing and Simulating Game-Theoretic Scenarios using LLM-augmented Agents Agnieszka Mensfelt, Kostas Stathis, Vince Trencsenyi 2024-12-11 arXiv https://github.com/dicelab-rhul/autoformalizing-agents http://arxiv.org/abs/2412.08805v1
812 Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Ping Tan, Hongan Wang, Xiaoming Deng 2024-12-11 arXiv https://multi-graspllm.github.io http://arxiv.org/abs/2412.08468v1
813 Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation Pedro H. V. Valois, Lincon S. Souza, Erica K. Shimomoto, Kazuhiro Fukui 2024-12-10 arXiv https://github.com/phvv-me/frame-representation-hypothesis http://arxiv.org/abs/2412.07334v2
814 LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh 2024-12-10 arXiv https://github.com/interview-eval/ http://arxiv.org/abs/2412.10424v2
815 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong 2024-12-10 arXiv https://jianzongwu.github.io/projects/diffsensei/ http://arxiv.org/abs/2412.07589v1
816 IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model Weizhen Bian, Siyan Liu, Yubo Zhou, Dezhi Chen, Yijie Liao, Zhenzhen Fan, Aobo Wang 2024-12-10 KSEM https://github.com/LuckyBian/ISY5001 https://doi.org/10.1007/978-981-97-5489-2_24
817 PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models Qian Zhang, Panfeng Chen, Jiali Li, Linkun Feng, Shuyu Liu, Heng Zhao, Mei Chen, Hui Li, Yanhao Wang 2024-12-09 arXiv https://github.com/ACMISLab/PediaBench http://arxiv.org/abs/2412.06287v2
818 Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study Ehsan Shareghi, Jiuzhou Han, Paul Burgess 2024-12-09 arXiv https://auslawbench.github.io http://arxiv.org/abs/2412.06272v1
819 Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Xiao Xu, Tianhao Niu, Yuxi Xie, Libo Qin, Wanxiang Che, Min-Yen Kan 2024-12-08 arXiv https://github.com/LooperXX/MMGiC http://arxiv.org/abs/2412.05939v1
820 LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, Yiqun Liu 2024-12-07 arXiv https://github.com/CSHaitao/Awesome-LLMs-as-Judges http://arxiv.org/abs/2412.05579v2
821 Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning Michael Hersche, Giacomo Camposampiero, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi 2024-12-07 arXiv https://github.com/IBM/raven-large-language-models http://arxiv.org/abs/2412.05586v1
822 Training-Free Bayesianization for Low-Rank Adapters of Large Language Models Haizhou Shi, Yibin Wang, Ligong Han, Huan Zhang, Hao Wang 2024-12-07 arXiv https://github.com/Wang-ML-Lab/bayesian-peft http://arxiv.org/abs/2412.05723v1
823 EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios Lu Qiu, Yuying Ge, Yi Chen, Yixiao Ge, Ying Shan, Xihui Liu 2024-12-05 arXiv https://qiulu66.github.io/egoplanbench2/ http://arxiv.org/abs/2412.04447v1
824 Reinforcement Learning Enhanced LLMs: A Survey Shuhe Wang, Shengyu Zhang, Jie Zhang, Runyi Hu, Xiaoya Li, Tianwei Zhang, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy 2024-12-05 arXiv https://github.com/ShuheWang1998/Reinforcement-Learning-Enhanced-LLMs-A-Survey http://arxiv.org/abs/2412.10400v2
825 LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen 2024-12-05 arXiv https://github.com/lbc12345/LossAgent http://arxiv.org/abs/2412.04090v1
826 AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang 2024-12-04 arXiv https://github.com/LaVi-Lab/AIM http://arxiv.org/abs/2412.03248v1
827 Alignment at Pre-training! Towards Native Alignment for Arabic LLMs Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu 2024-12-04 arXiv https://github.com/FreedomIntelligence/AceGPT-v2 http://arxiv.org/abs/2412.03253v1
828 Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media Kun Li, Chenwei Dai, Wei Zhou, Songlin Hu 2024-12-04 arXiv https://github.com/linkseed18612254945/FineRob http://arxiv.org/abs/2412.03148v1
829 From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, Zhongyu Wei 2024-12-04 arXiv https://github.com/FudanDISC/SocialAgent http://arxiv.org/abs/2412.03563v1
830 Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning Long Mai, Julie Carson-Berndsen 2024-12-04 arXiv https://github.com/mailong25/peft_diversity http://arxiv.org/abs/2412.03343v1
831 VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding Chaoyu Li, Eun Woo Im, Pooyan Fazli 2024-12-04 arXiv https://vid-halluc.github.io/ http://arxiv.org/abs/2412.03735v1
832 Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov 2024-12-03 arXiv https://github.com/JetBrains-Research/PandasPlotBench http://arxiv.org/abs/2412.02764v1
833 Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design Md Omar Faruque, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy 2024-12-03 arXiv https://github.com/HSTRG1/GHOSTbenchmarks http://arxiv.org/abs/2412.02816v1
834 CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese Novels Lingxiao Wei, He Yan, Xiangju Lu, Junmin Zhu, Jun Wang, Wei Zhang 2024-12-03 arXiv https://github.com/CxsGhost/CNNSum http://arxiv.org/abs/2412.02819v4
835 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue 2024-12-03 arXiv https://av-odyssey.github.io/ http://arxiv.org/abs/2412.02611v1
836 DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline Wenhao Sun, Sai Hou, Zixuan Wang, Bo Yu, Shaoshan Liu, Xu Yang, Shuai Liang, Yiming Gan, Yinhe Han 2024-12-02 arXiv https://rlc-lab.github.io/dadu-e/ http://arxiv.org/abs/2412.01663v1
837 DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation Jingyang Xiang, Sai Qian Zhang 2024-12-01 arXiv https://github.com/JingyangXiang/DFRot http://arxiv.org/abs/2412.00648v2
838 GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu 2024-12 CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security https://github.com/kstanghere/GenderCARE-ccs24 https://dl.acm.org/doi/10.1145/3658644.3670284
839 Mitigating Entity-Level Hallucination in Large Language Models Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu 2024-12 SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region https://github.com/oneal2000/EntityHallucination https://dl.acm.org/doi/10.1145/3673791.3698403
840 Optimization-based Prompt Injection Attack to LLM-as-a-Judge Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong 2024-12 CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security https://github.com/ShiJiawenwen/JudgeDeceiver https://dl.acm.org/doi/10.1145/3658644.3690291
841 PLeak: Prompt Leaking Attacks against Large Language Model Applications Bo Hui, Haolin Yuan, Neil Zhenqiang Gong, Philippe Burlina, Yinzhi Cao 2024-12 CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security https://github.com/BHui97/PLeak https://dl.acm.org/doi/10.1145/3658644.3670370
842 AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models Yutong Zhou, Masahiro Ryo 2024-11-30 arXiv https://github.com/Yutong-Zhou-cv/AgriBench http://arxiv.org/abs/2412.00465v2
843 Node Importance Estimation Leveraging LLMs for Semantic Augmentation in Knowledge Graphs Xinyu Lin, Tianyu Zhang, Chengbin Hou, Jinbao Wang, Jianye Xue, Hairong Lv 2024-11-30 arXiv https://github.com/XinyuLin-FZ/LENIE http://arxiv.org/abs/2412.00478v1
844 DroidCall: A Dataset for LLM-powered Android Intent Invocation Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu 2024-11-30 arXiv https://github.com/UbiquitousLearning/DroidCall http://arxiv.org/abs/2412.00402v1
845 Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji 2024-11-29 arXiv https://github.com/DoubtedSteam/DyVTE http://arxiv.org/abs/2411.19628v1
846 Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models Tian Yu, Shaolei Zhang, Yang Feng 2024-11-29 arXiv https://github.com/ictnlp/Auto-RAG http://arxiv.org/abs/2411.19443v1
847 Ensemble Watermarks for Large Language Models Georg Niess, Roman Kern 2024-11-29 arXiv http://github.com/CommodoreEU/master-generation http://arxiv.org/abs/2411.19563v1
848 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin, Chaoyou Fu, Sirui Zhao, Yunhang Shen, Chunjiang Ge, Yan Yang, Zuwei Long, Yuhan Dai, Tong Xu, Xing Sun, Ran He, Caifeng Shan, Enhong Chen 2024-11-29 arXiv https://github.com/xjtupanda/T2Vid http://arxiv.org/abs/2411.19951v2
849 TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension Zipeng Qiu, You Peng, Guangxin He, Binhang Yuan, Chen Wang 2024-11-29 arXiv https://github.com/Relaxed-System-Lab/TQA-Bench http://arxiv.org/abs/2411.19504v1
850 Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures Yicheng Zhang, Zhen Qin, Zhaomin Wu, Shuiguang Deng 2024-11-28 arXiv https://github.com/zyc140345/FedAMoLE http://arxiv.org/abs/2411.19128v1
851 TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability Shimin Chen, Xiaohan Lan, Yitian Yuan, Zequn Jie, Lin Ma 2024-11-27 arXiv https://github.com/TimeMarker-LLM/TimeMarker/ http://arxiv.org/abs/2411.18211v1
852 ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang 2024-11-27 arXiv https://github.com/IDEA-Research/ChatRex http://arxiv.org/abs/2411.18363v2
853 Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models Jingming Liu, Yumeng Li, Boyuan Xiao, Yichang Jian, Ziang Qin, Tianjia Shao, Yao-Xiang Ding, Kun Zhou 2024-11-27 arXiv https://future-item.github.io/autoimagine-site http://arxiv.org/abs/2411.18142v1
854 Can LLMs be Good Graph Judger for Knowledge Graph Construction? Haoyu Huang, Chong Chen, Conghui He, Yang Li, Jiawei Jiang, Wentao Zhang 2024-11-26 arXiv https://github.com/hhy-huang/GraphJudger http://arxiv.org/abs/2411.17388v1
855 Leveraging Large Language Models and Topic Modeling for Toxicity Classification Haniyeh Ehsani Oskouie, Christina Chance, Claire Huang, Margaret Capetz, Elizabeth Eyeson, Majid Sarrafzadeh 2024-11-26 arXiv https://github.com/aheldis/Toxicity-Classification http://arxiv.org/abs/2411.17876v1
856 Star Attention: Efficient LLM Inference over Long Sequences Shantanu Acharya, Fei Jia, Boris Ginsburg 2024-11-26 arXiv https://github.com/NVIDIA/Star-Attention http://arxiv.org/abs/2411.17116v1
857 BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng 2024-11-25 arXiv https://github.com/ictnlp/BayLing https://doi.org/10.48550/arXiv.2411.16300
858 Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering Federico Cocchi, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara 2024-11-25 arXiv https://github.com/aimagelab/ReflectiVA http://arxiv.org/abs/2411.16863v1
859 CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity Zhengmin Yu, Jiutian Zeng, Siyi Chen, Wenhan Xu, Dandan Xu, Xiangyu Liu, Zonghao Ying, Nan Wang, Yuan Zhang, Min Yang 2024-11-25 arXiv https://github.com/CS-EVAL/CS-Eval http://arxiv.org/abs/2411.16239v2
860 Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models Ronghuan Wu, Wanchao Su, Jing Liao 2024-11-25 arXiv https://chat2svg.github.io/ http://arxiv.org/abs/2411.16602v1
861 Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang 2024-11-25 arXiv https://mathcritique.github.io/ http://arxiv.org/abs/2411.16579v1
862 From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu 2024-11-25 arXiv https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge http://arxiv.org/abs/2411.16594v4
863 VidHal: Benchmarking Temporal Hallucinations in Vision LLMs Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli 2024-11-25 arXiv https://github.com/Lookuz/VidHal http://arxiv.org/abs/2411.16771v1
864 ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin 2024-11-25 arXiv https://github.com/om-ai-lab/ZoomEye http://arxiv.org/abs/2411.16044v1
865 Multi-label Sequential Sentence Classification via Large Language Model Mengfei Lan, Lecheng Zheng, Shufan Ming, Halil Kilicoglu 2024-11-23 EMNLP https://github.com/ScienceNLP-Lab/LLM-SSC https://aclanthology.org/2024.findings-emnlp.944
866 ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain Haochen Zhao, Xiangru Tang, Ziran Yang, Xiao Han, Xuanzhi Feng, Yueqing Fan, Senhao Cheng, Di Jin, Yilun Zhao, Arman Cohan, Mark Gerstein 2024-11-23 arXiv https://github.com/HaochenZhao/SafeAgent4Chem http://arxiv.org/abs/2411.16736v1
867 Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat 2024-11-23 arXiv https://github.com/parinzee/seed-free-synthetic-instruct http://arxiv.org/abs/2411.15484v1
868 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun, Ziwei Liu, Liang Wang, Caifeng Shan, Ran He 2024-11-22 arXiv https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Benchmarks http://arxiv.org/abs/2411.15296v2
869 DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu 2024-11-21 arXiv https://github.com/hexuandeng/DRPruning http://arxiv.org/abs/2411.14055v1
870 UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages Bethel Melesse Tessema, Akhil Kedia, Tae-Sun Chung 2024-11-21 arXiv https://github.com/bethelmelesse/unifiedcrawl http://arxiv.org/abs/2411.14343v1
871 SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama 2024-11-21 arXiv https://github.com/aitomatic/semikong http://arxiv.org/abs/2411.13802v2
872 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang 2024-11-20 arXiv https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning http://arxiv.org/abs/2411.13504v2
873 DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen 2024-11-20 arXiv https://github.com/XiandaGuo/Drive-MLLM http://arxiv.org/abs/2411.13112v2
874 On the Consistency of Video Large Language Models in Temporal Comprehension Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao 2024-11-20 arXiv https://github.com/minjoong507/Consistency-of-Video-LLM http://arxiv.org/abs/2411.12951v1
875 Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods Jai Doshi, Asa Cooper Stickland 2024-11-18 arXiv https://github.com/JaiDoshi/Knowledge-Erasure http://arxiv.org/abs/2411.12103v2
876 FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training Anjia Cao, Xing Wei, Zhiheng Ma 2024-11-18 arXiv https://github.com/MIV-XJTU/FLAME http://arxiv.org/abs/2411.11927v2
877 BianCang: A Traditional Chinese Medicine Large Language Model Sibo Wei, Xueping Peng, Yi-fei Wang, Jiasheng Si, Weiyu Zhang, Wenpeng Lu, Xiaoming Wu, Yinglong Wang 2024-11-17 arXiv https://github.com/QLU-NLP/BianCang http://arxiv.org/abs/2411.11027v1
878 Multilingual Large Language Models: A Systematic Survey Shaolin Zhu, Supryadi, Shaoyang Xu, Haoran Sun, Leiyu Pan, Menglong Cui, Jiangcun Du, Renren Jin, António Branco, Deyi Xiong 2024-11-17 arXiv https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers http://arxiv.org/abs/2411.11072v2
879 TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models Tingyu Qu, Mingxiao Li, Tinne Tuytelaars, Marie-Francine Moens 2024-11-17 arXiv https://github.com/tingyu215/TS-LLaVA http://arxiv.org/abs/2411.11066v1
880 Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu, Sophia Ananiadou 2024-11-17 arXiv https://github.com/zepingyu0512/llava-mechanism http://arxiv.org/abs/2411.10950v1
881 Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang 2024-11-16 arXiv https://github.com/liuting20/MustDrop http://arxiv.org/abs/2411.10803v1
882 Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits Yuxuan Huang 2024-11-15 arXiv https://github.com/Aipura/Orca http://arxiv.org/abs/2411.10006v1
883 Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun 2024-11-15 arXiv https://github.com/Terry-Xu-666/visual_inference_chain http://arxiv.org/abs/2411.12591v1
884 Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Preni Golazizian, Jesse Thomason, Morteza Dehghani 2024-11-15 arXiv https://github.com/ParsaHejabi/Simulation-Framework-for-Multi-Agent-Balderdash http://arxiv.org/abs/2411.10422v1
885 Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen 2024-11-15 arXiv https://github.com/tamlhp/awesome-instruction-editing http://arxiv.org/abs/2411.09955v2
886 MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing Zhao 2024-11-14 arXiv https://github.com/joenahm/MM-Eval http://arxiv.org/abs/2411.09492v1
887 LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation Zhenshi Li, Dilxat Muhtar, Feng Gu, Xueliang Zhang, Pengfeng Xiao, Guangjun He, Xiaoxiang Zhu 2024-11-14 arXiv https://github.com/NJU-LHRS/LHRS-Bot https://doi.org/10.48550/arXiv.2411.09301
888 DROJ: A Prompt-Driven Attack against Large Language Models Leyang Hu, Boran Wang 2024-11-14 arXiv https://github.com/Leon-Leyang/LLM-Safeguard http://arxiv.org/abs/2411.09125v1
889 DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama 2024-11-13 arXiv https://wyd0817.github.io/project-dart-llm/ http://arxiv.org/abs/2411.09022v1
890 CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li 2024-11-13 arXiv https://github.com/AutoBench/CorrectBench http://arxiv.org/abs/2411.08510v1
891 Large Language Models Can Self-Improve in Long-context Reasoning Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam 2024-11-12 arXiv https://github.com/SihengLi99/SEALONG http://arxiv.org/abs/2411.08147v1
892 Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models Yusen Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang 2024-11-12 arXiv https://github.com/psunlpgroup/VerbosityLLM http://arxiv.org/abs/2411.07858v2
893 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Danielle Bitterman, Fei Wang, Kai Shu 2024-11-10 arXiv https://clinicalbench.github.io http://arxiv.org/abs/2411.06469v1
894 Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models Xiaojun Wu, Junxi Liu, Huanyi Su, Zhouchi Lin, Yiyan Qi, Chengjin Xu, Jiajun Su, Jiajie Zhong, Fuwei Wang, Saizhuo Wang, Fengrui Hua, Jia Li, Jian Guo 2024-11-09 arXiv https://github.com/IDEA-FinAI/Golden-Touchstone http://arxiv.org/abs/2411.06272v1
895 TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen 2024-11-09 arXiv https://github.com/tsynbio/Toursynbio-Search http://arxiv.org/abs/2411.06024v1
896 Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation Dong Shu, Bingbing Duan, Kai Guo, Kaixiong Zhou, Jiliang Tang, Mengnan Du 2024-11-08 arXiv https://github.com/Tizzzzy/LLM-GDM-alignment http://arxiv.org/abs/2411.05316v1
897 Game-theoretic LLM: Agent Workflow for Negotiation Games Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, Yongfeng Zhang 2024-11-08 arXiv https://github.com/Wenyueh/game_theory http://arxiv.org/abs/2411.05990v2
898 WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun 2024-11-08 arXiv https://github.com/OpenBMB/WorkflowLLM http://arxiv.org/abs/2411.05451v1
899 FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? Eric Wu, Kevin Wu, James Zou 2024-11-07 arXiv https://github.com/kevinwu23/StanfordFineTuneBench http://arxiv.org/abs/2411.05059v2
900 Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Ho-Jin Choi 2024-11-07 arXiv https://github.com/passing2961/Thanos http://arxiv.org/abs/2411.04496v1
901 Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Natraj Raman, Sriram Gopalakrishnan, Tanmoy Chakraborty 2024-11-07 arXiv https://github.com/LCS2-IIITD/MonteCLoRA http://arxiv.org/abs/2411.04358v2
902 AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen 2024-11-07 arXiv https://github.com/tsynbio/AutoPE http://arxiv.org/abs/2411.04440v1
903 Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities Shengzhi Li, Kittipat Kampa, Rongyu Lin, Bohang Li, Shichao Pei 2024-11-07 arXiv https://github.com/findalexli/Abstract2Appendix http://arxiv.org/abs/2411.05232v1
904 QUILL: Quotation Generation Enhancement of Large Language Models Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao 2024-11-06 arXiv https://github.com/GraceXiaoo/QUILL http://arxiv.org/abs/2411.03675v1
905 Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu 2024-11-05 arXiv https://github.com/RazvanDu/DynamicSlicing http://arxiv.org/abs/2411.03513v1
906 Leveraging Large Language Models in Code Question Answering: Baselines and Issues Georgy Andryushchenko, Vladimir Ivanov, Vladimir Makharev, Elizaveta Tukhtina, Aidar Valeev 2024-11-05 arXiv https://github.com/IU-AES-AI4Code/CodeQuestionAnswering http://arxiv.org/abs/2411.03012v1
907 SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents Dawei Li, Zhen Tan, Peijia Qian, Yifan Li, Kumar Satvik Chaudhary, Lijie Hu, Jiayi Shen 2024-11-05 arXiv https://github.com/David-Li0406/SMoA http://arxiv.org/abs/2411.03284v1
908 Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment Jason Vega, Junsheng Huang, Gaokai Zhang, Hangoo Kang, Minjia Zhang, Gagandeep Singh 2024-11-05 arXiv https://github.com/uiuc-focal-lab/stochastic-monkeys/ http://arxiv.org/abs/2411.02785v2
909 Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task Hoonick Lee, Mogan Gim, Donghyeon Park, Donghee Choi, Jaewoo Kang 2024-11-04 arXiv http://github.com/dmis-lab/CulinaryASH http://arxiv.org/abs/2411.01996v1
910 Eurekaverse: Environment Curriculum Generation via Large Language Models William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma 2024-11-04 arXiv https://eureka-research.github.io/eurekaverse http://arxiv.org/abs/2411.01775v1
911 SQL Injection Jailbreak: a structural disaster of large language models Jiawei Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu 2024-11-03 arXiv https://github.com/weiyezhimeng/SQL-Injection-Jailbreak http://arxiv.org/abs/2411.01565v3
912 Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Shijia Liao, Yuxuan Wang, Tianyu Li, Yifan Cheng, Ruoyi Zhang, Rongzhi Zhou, Yijin Xing 2024-11-02 arXiv https://github.com/fishaudio/fish-speech http://arxiv.org/abs/2411.01156v2
913 Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das 2024-11-02 arXiv https://github.com/apple-yinhan/Noise-robust-SED http://arxiv.org/abs/2411.01174v1
914 TODO: Enhancing LLM Alignment with Ternary Preferences Yuxiang Guo, Lu Yin, Bo Jiang, Jiaqi Zhang 2024-11-02 arXiv https://github.com/XXares/TODO http://arxiv.org/abs/2411.02442v1
915 LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham 2024-11-01 arXiv https://fsoft-aic.github.io/fsoft-LibMoE.github.io http://arxiv.org/abs/2411.00918v1
916 Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling Yiwen Ding, Zhiheng Xi, Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang 2024-11-01 arXiv https://github.com/Yiwen-Ding/Guided-Self-Improvement http://arxiv.org/abs/2411.00750v1
917 SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen 2024-11-01 arXiv https://jayzhang42.github.io/sled_page/ http://arxiv.org/abs/2411.02433v2
918 Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma 2024-11-01 arXiv https://freeze-omni.github.io/ http://arxiv.org/abs/2411.00774v5
919 Beyond Utility: Evaluating LLM as Recommender Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang 2024-11-01 arXiv https://github.com/JiangDeccc/EvaLLMasRecommender http://arxiv.org/abs/2411.00331v1
920 MoD: A Distribution-Based Approach for Merging Large Language Models Quy-Anh Dang, Chris Ngo 2024-11-01 arXiv https://github.com/knovel-eng/mod http://arxiv.org/abs/2411.00406v1
921 EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reddy Bommu, Yang Katie Zhao, Yingyan Celine Lin 2024-11 DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference https://github.com/GATECH-EIC/Edge-LLM https://dl.acm.org/doi/10.1145/3649329.3658473
922 Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman 2024-11 SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis https://github.com/PoSeiDon-Workflows/LLM_AD https://dl.acm.org/doi/10.1109/SC41406.2024.00098
923 Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang 2024-11 LAMPS '24: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis https://github.com/ThuCCSLab/MergeGuard https://dl.acm.org/doi/10.1145/3689217.3690614
924 BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu 2024-10-31 arXiv https://github.com/xinghaow99/BitStack http://arxiv.org/abs/2410.23918v1
925 DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao 2024-10-31 arXiv https://github.com/NLP2CT/DetectRL http://arxiv.org/abs/2410.23746v1
926 End-to-End Ontology Learning with Large Language Models Andy Lo, Albert Q. Jiang, Wenda Li, Mateja Jamnik 2024-10-31 arXiv https://github.com/andylolu2/ollm http://arxiv.org/abs/2410.23584v1
927 LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction Andre Niyongabo Rubungo, Kangming Li, Jason Hattrick-Simpers, Adji Bousso Dieng 2024-10-31 arXiv https://github.com/vertaix/LLM4Mat-Bench http://arxiv.org/abs/2411.00177v3
928 LLaMo: Large Language Model-based Molecular Graph Assistant Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim 2024-10-31 arXiv https://github.com/mlvlab/LLaMo http://arxiv.org/abs/2411.00871v1
929 What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Ming Li, Yanhong Li, Tianyi Zhou 2024-10-31 arXiv https://github.com/MingLiiii/Layer_Gradient http://arxiv.org/abs/2410.23743v1
930 On Memorization of Large Language Models in Logical Reasoning Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar 2024-10-30 arXiv https://memkklogic.github.io http://arxiv.org/abs/2410.23123v1
931 ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning Millennium Bismay, Xiangjue Dong, James Caverlee 2024-10-30 arXiv https://github.com/millenniumbismay/reasoningrec http://arxiv.org/abs/2410.23180v1
932 Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos 2024-10-30 arXiv https://github.com/facebookresearch/oni http://arxiv.org/abs/2410.23022v2
933 SciPIP: An LLM-based Scientific Paper Idea Proposer Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye 2024-10-30 arXiv https://github.com/cheerss/SciPIP http://arxiv.org/abs/2410.23166v1
934 Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning Dong Shu, Mengnan Du 2024-10-30 arXiv https://github.com/Tizzzzy/Demonstration_Selection_Overview http://arxiv.org/abs/2410.23099v1
935 BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference Junqi Zhao, Zhijin Fang, Shu Li, Shaohui Yang, Shichao He 2024-10-30 arXiv https://github.com/JunqiZhao888/buzz-llm http://arxiv.org/abs/2410.23079v1
936 Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning Keqin Bao, Ming Yan, Yang Zhang, Jizhi Zhang, Wenjie Wang, Fuli Feng, Xiangnan He 2024-10-30 arXiv https://github.com/ym689/rec_icl http://arxiv.org/abs/2410.23136v1
937 Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua 2024-10-30 arXiv https://github.com/itsmeyjt/CFT http://arxiv.org/abs/2410.22809v1
938 Distinguishing Ignorance from Error in LLM Hallucinations Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov 2024-10-29 arXiv https://github.com/technion-cs-nlp/hallucination-mitigation http://arxiv.org/abs/2410.22071v1
939 Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach Qingchuan Li, Jiatong Li, Tongxuan Liu, Yuting Zeng, Mingyue Cheng, Weizhe Huang, Qi Liu 2024-10-29 arXiv https://github.com/wufeiwuwoshihua/nshy http://arxiv.org/abs/2410.21779v1
940 Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho 2024-10-29 arXiv https://github.com/krafton-ai/Rare2Frequent http://arxiv.org/abs/2410.22376v1
941 Scaling LLM Inference with Optimized Sample Compute Allocation Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li 2024-10-29 arXiv https://github.com/LeiLiLab/OSCA http://arxiv.org/abs/2410.22480v1
942 Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese 2024-10-28 arXiv https://github.com/pasquini-dario/project_mantis http://arxiv.org/abs/2410.20911v2
943 LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment Ge Yang, Changyi He, Jinyang Guo, Jianyu Wu, Yifu Ding, Aishan Liu, Haotong Qin, Pengliang Ji, Xianglong Liu 2024-10-28 arXiv https://github.com/AboveParadise/LLMCBench http://arxiv.org/abs/2410.21352v2
944 NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu 2024-10-28 arXiv https://github.com/hexuandeng/NewTerm http://arxiv.org/abs/2410.20814v1
945 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen 2024-10-28 arXiv https://github.com/bytedance/ShadowKV http://arxiv.org/abs/2410.21465v1
946 Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin 2024-10-28 arXiv https://github.com/KL4805/ShoppingMMLU http://arxiv.org/abs/2410.20745v2
947 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister 2024-10-28 arXiv https://mengzibin.github.io/SocialGPT.github.io/ http://arxiv.org/abs/2410.21411v1
948 Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye Yirong Sun, Dawei Zhu, Yanjun Chen, Erjia Xiao, Xinghao Chen, Xiaoyu Shen 2024-10-28 arXiv https://github.com/EIT-NLP/BLEUless_DocMT http://arxiv.org/abs/2410.20941v2
949 Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data Xinhong Xie, Tao Li, Quanyan Zhu 2024-10-27 arXiv https://github.com/XXXinhong/Detoxification_LLM http://arxiv.org/abs/2410.20298v1
950 Enhancing Inflation Nowcasting with LLM: Sentiment Analysis on News Marc-Antoine Allard, Paul Teiletche, Adam Zinebi 2024-10-26 arXiv https://github.com/paultltc/InflaBERT http://arxiv.org/abs/2410.20198v1
951 LLMs Can Evolve Continually on Modality for X-Modal Reasoning Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen 2024-10-26 arXiv https://github.com/JiazuoYu/PathWeave http://arxiv.org/abs/2410.20178v2
952 Language Agents Meet Causality -- Bridging LLMs and Causal World Models John Gkountouras, Matthias Lindemann, Phillip Lippe, Efstratios Gavves, Ivan Titov 2024-10-25 arXiv https://j0hngou.github.io/LLMCWM/ http://arxiv.org/abs/2410.19923v1
953 APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury 2024-10-25 arXiv https://portal-cornell.github.io/apricot/ http://arxiv.org/abs/2410.19656v1
954 Delving into the Reversal Curse: How Far Can Large Language Models Generalize? Zhengkai Lin, Zhihang Fu, Kai Liu, Liang Xie, Binbin Lin, Wenxiao Wang, Deng Cai, Yue Wu, Jieping Ye 2024-10-24 arXiv https://github.com/alibaba/thinking_bias http://arxiv.org/abs/2410.18808v2
955 GCoder: Improving Large Language Model for Generalized Graph Problem Solving Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, Jia Li 2024-10-24 arXiv https://github.com/Bklight999/WWW25-GCoder/tree/master http://arxiv.org/abs/2410.19084v1
956 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang 2024-10-24 arXiv https://github.com/VITA-Group/READ-ME http://arxiv.org/abs/2410.19123v1
957 Distill Visual Chart Reasoning Ability from LLMs to MLLMs Wei He, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang 2024-10-24 arXiv https://github.com/hewei2001/ReachQA http://arxiv.org/abs/2410.18798v1
958 CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen 2024-10-23 arXiv https://wangqinsi1.github.io/coreinfer_page/ http://arxiv.org/abs/2410.18311v1
959 Cross-model Control: Improving Multiple Large Language Models in One-time Training Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao 2024-10-23 arXiv https://github.com/wujwyi/CMC http://arxiv.org/abs/2410.17599v1
960 ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang 2024-10-22 arXiv https://github.com/dmis-lab/ETHIC http://arxiv.org/abs/2410.16848v1
961 VoiceBench: Benchmarking LLM-Based Voice Assistants Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li 2024-10-22 arXiv https://github.com/MatthewCYM/VoiceBench http://arxiv.org/abs/2410.17196v3
962 Improving Causal Reasoning in Large Language Models: A Survey Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Qingzhen Liu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan 2024-10-22 arXiv https://github.com/chendl02/Awesome-LLM-causal-reasoning http://arxiv.org/abs/2410.16676v3
963 Automated Spinal MRI Labelling from Reports Using a Large Language Model Robin Y. Park, Rhydian Windsor, Amir Jamaludin, Andrew Zisserman 2024-10-22 MICCAI https://github.com/robinyjpark/AutoLabelClassifier https://doi.org/10.1007/978-3-031-72086-4_10
964 DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao 2024-10-22 arXiv https://github.com/ChnQ/DEAN http://arxiv.org/abs/2410.16672v1
965 AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration Bradley McDanel 2024-10-22 arXiv https://github.com/BradMcDanel/AMUSD/ http://arxiv.org/abs/2410.17375v1
966 CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou 2024-10-22 arXiv https://github.com/uclaml/COPS http://arxiv.org/abs/2410.16670v1
967 Boosting Jailbreak Transferability for Large Language Models Hanqing Liu, Lifeng Zhou, Huanqian Yan 2024-10-21 arXiv https://github.com/HqingLiu/SI-GCG http://arxiv.org/abs/2410.15645v2
968 Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, Pekka Abrahamsson 2024-10-21 arXiv https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs http://arxiv.org/abs/2410.15944v1
969 LLaVA-KD: A Framework of Distilling Multimodal Large Language Models Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai 2024-10-21 arXiv https://github.com/Fantasyele/LLaVA-KD http://arxiv.org/abs/2410.16236v2
970 MagicPIG: LSH Sampling for Efficient LLM Generation Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen 2024-10-21 arXiv https://github.com/Infini-AI-Lab/MagicPIG http://arxiv.org/abs/2410.16179v4
971 Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma 2024-10-21 arXiv https://github.com/soacker/Mesa-Extrapolation http://arxiv.org/abs/2410.15859v3
972 RAC: Efficient LLM Factuality Correction with Retrieval Augmentation Changmao Li, Jeffrey Flanigan 2024-10-21 arXiv https://github.com/jlab-nlp/Retrieval-Augmented-Correction http://arxiv.org/abs/2410.15667v1
973 CausalGraph2LLM: Evaluating LLMs for Causal Queries Ivaxi Sheth, Bahare Fatemi, Mario Fritz 2024-10-21 arXiv https://github.com/ivaxi0s/CausalGraph2LLM http://arxiv.org/abs/2410.15939v1
974 A Comprehensive Evaluation of Cognitive Biases in LLMs Simon Malberg, Roman Poletukhin, Carolin M. Schuster, Georg Groh 2024-10-20 arXiv https://github.com/simonmalberg/cognitive-biases-in-llms http://arxiv.org/abs/2410.15413v1
975 Are LLMs Good Zero-Shot Fallacy Classifiers? Fengjun Pan, Xiaobao Wu, Zongrui Li, Anh Tuan Luu 2024-10-19 arXiv https://github.com/panFJCharlotte98/Fallacy_Detection http://arxiv.org/abs/2410.15050v1
976 Evaluating Deep Unlearning in Large Language Models Ruihan Wu, Chhavi Yadav, Russ Salakhutdinov, Kamalika Chaudhuri 2024-10-19 arXiv https://github.com/wrh14/deep_unlearning http://arxiv.org/abs/2410.15153v3
977 Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction Yinhan He, Zaiyi Zheng, Patrick Soga, Yaozhen Zhu, yushun Dong, Jundong Li 2024-10-19 EMNLP 2024 (Findings) https://github.com/YinhanHe123/new\_LLM4GNNExplanation http://arxiv.org/abs/2410.15165v1
978 GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization Zihui Wu, Haichang Gao, Ping Wang, Shudong Zhang, Zhaoxiang Liu, Shiguo Lian 2024-10-19 arXiv https://github.com/wooozihui/GlitchMiner http://arxiv.org/abs/2410.15052v4
979 Imprompter: Tricking LLM Agents into Improper Tool Use Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, Earlence Fernandes 2024-10-19 arXiv https://github.com/Reapor-Yurnero/imprompter http://arxiv.org/abs/2410.14923v2
980 MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification Yin Li, Liangwei Wang, Shiyuan Piao, Boo-Ho Yang, Ziyue Li, Wei Zeng, Fugee Tsung 2024-10-19 arXiv https://github.com/MCCodeAI/MCCoder http://arxiv.org/abs/2410.15154v1
981 SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent Jiarui Ji, Yang Li, Hongtao Liu, Zhicheng Du, Zhewei Wei, Weiran Shen, Qi Qi, Yankai Lin 2024-10-18 arXiv https://github.com/jijiarui-cather/SRAPAgent_Framework http://arxiv.org/abs/2410.14152v1
982 Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen 2024-10-18 arXiv https://github.com/ShuoTang123/MATRIX-Gen http://arxiv.org/abs/2410.14251v1
983 CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic Huaiyuan Yao, Longchao Da, Vishnu Nandam, Justin Turnau, Zhiwei Liu, Linsey Pang, Hua Wei 2024-10-18 arXiv https://github.com/Hyan-Yao/CoMAL http://arxiv.org/abs/2410.14368v1
984 Enabling Scalable Evaluation of Bias Patterns in Medical LLMs Hamed Fayyaz, Raphael Poulain, Rahmatollah Beheshti 2024-10-18 arXiv https://github.com/healthylaife/autofair http://arxiv.org/abs/2410.14763v1
985 Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models Wei Jie Yeo, Ranjan Satapathy, Erik Cambria 2024-10-18 arXiv https://github.com/wj210/Causal-Faithfulness https://doi.org/10.48550/arXiv.2410.14155
986 Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu 2024-10-17 EMNLP https://github.com/yyhappier/ShortcutSuite https://aclanthology.org/2024.emnlp-main.679
987 Data Defenses Against Large Language Models William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, Sauvik Das 2024-10-17 arXiv https://github.com/wagnew3/LLMDataDefenses http://arxiv.org/abs/2410.13138v1
988 FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad 2024-10-17 arXiv https://github.com/vectara/FaithBench http://arxiv.org/abs/2410.13210v1
989 LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner 2024-10-17 arXiv https://github.com/amazon-science/llm-rank-pruning http://arxiv.org/abs/2410.13299v2
990 Retrieval-Augmented Personalization for Multimodal Large Language Models Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue 2024-10-17 arXiv https://github.com/Hoar012/RAP-MLLM http://arxiv.org/abs/2410.13360v2
991 SLM-Mod: Small Language Models Surpass LLMs at Content Moderation Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha 2024-10-17 arXiv https://github.com/AGoyal0512/SLM-Mod http://arxiv.org/abs/2410.13155v1
992 aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li 2024-10-17 arXiv https://github.com/aixcoder-plugin/aiXcoder-7B http://arxiv.org/abs/2410.13187v2
993 Hypothesis Testing the Circuit Hypothesis in LLMs Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei 2024-10-16 arXiv https://github.com/blei-lab/circuitry http://arxiv.org/abs/2410.13032v1
994 Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors Weixuan Wang, Jingyuan Yang, Wei Peng 2024-10-16 arXiv https://github.com/weixuan-wang123/SADI http://arxiv.org/abs/2410.12299v1
995 Self-Pluralising Culture Alignment for Large Language Models Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong 2024-10-16 arXiv https://github.com/shaoyangxu/CultureSPA http://arxiv.org/abs/2410.12971v1
996 Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models Iaroslav Chelombitko, Egor Safronov, Aleksey Komissarov 2024-10-16 arXiv https://github.com/nup-csai/Qtok/ http://arxiv.org/abs/2410.12989v1
997 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen 2024-10-16 arXiv https://github.com/open-compass/ProSA http://arxiv.org/abs/2410.12405v1
998 POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization Batuhan K. Karaman, Ishmam Zabir, Alon Benhaim, Vishrav Chaudhary, Mert R. Sabuncu, Xia Song 2024-10-16 arXiv https://github.com/batuhankmkaraman/POROver http://arxiv.org/abs/2410.12999v1
999 DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs Yingsong Luo, Ling Chen 2024-10-16 arXiv https://github.com/LuoYingSong/DAQ http://arxiv.org/abs/2410.12187v2
1000 Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch 2024-10-16 arXiv https://github.com/weixuan-wang123/INCLINE https://doi.org/10.48550/arXiv.2410.12462
1001 Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha 2024-10-16 arXiv https://github.com/IBM/codellm-devkit http://arxiv.org/abs/2410.13007v1
1002 Exploring Model Kinship for Merging Large Language Models Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen 2024-10-16 arXiv https://github.com/zjunlp/ModelKinship https://doi.org/10.48550/arXiv.2410.12613