The resources are collected from various sources, including arXiv, NeurIPS, ICML, ICLR, ACL, EMNLP, AAAI, IJCAI, KDD, CVPR, ICCV, ECCV, NIPS, IEEE, ACM, Springer, ScienceDirect, Wiley, Nature, Science, and other top AI/ML conferences and journals.
For a better reading experience, visit the Shinyapps website.

Other Topics

Explore additional research papers on the following topics:

For Large Language Models papers, please visit the LLM Repository.
For Backdoor Learning papers, please visit the Backdoor Learning Repository.
For Federated Learning papers, please visit the Federated Learning Repository.
For Machine Unlearning papers, please visit the Machine Unlearning Repository.

For contributions, inquiries, or suggestions, feel free to reach out via email.

If you find this application helpful and would like to support its development, you can buy me a coffee using one of the following methods:

Techcombank (Vietnam): 5877 5555 55 (Nguyen Thi Lan Phuong)
PayPal or Credit/Debit Card: https://ko-fi.com/miutheladycat

Large Language Models Papers with Code

Due to GitHub repository limitations, this section includes only those papers that provide accompanying code, sorted by publish date. For access to the full list of papers, please visit the Shinyapps website.

No.	Title	Authors	Publish Date	Venue	Code	URL
1	EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	Bohao Xing, Xin Liu, Guoying Zhao, Chengyu Liu, Xiaolan Fu, Heikki Kälviäinen	2025-05-16	arXiv	https://github.com/xxtars/EmotionHallucer	http://arxiv.org/abs/2505.11405v1
2	Ranked Voting based Self-Consistency of Large Language Models	Weiqin Wang, Yile Wang, Hui Huang	2025-05-16	arXiv	https://github.com/szu-tera/RankedVotingSC	http://arxiv.org/abs/2505.10772v1
3	Unifying Segment Anything in Microscopy with Multimodal Large Language Model	Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan	2025-05-16	arXiv	https://github.com/ieellee/uLLSAM	http://arxiv.org/abs/2505.10769v1
4	GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art	Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang	2025-05-16	arXiv	https://github.com/stan-lei/GODBench-ACL2025	http://arxiv.org/abs/2505.11436v1
5	GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction	Mohammadtaha Bagherifard, Sahar Rajabi, Ali Edalat, Yadollah Yaghoobzadeh	2025-05-16	arXiv	https://github.com/saharsamr/Modular-LLM	http://arxiv.org/abs/2505.10939v1
6	AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents	Julius Henke	2025-05-15	arXiv	https://github.com/JuliusHenke/autopentest	http://arxiv.org/abs/2505.10321v1
7	Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M	Dario Di Palma, Felice Antonio Merra, Maurizio Sfilio, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia	2025-05-15	arXiv	https://github.com/sisinflab/LLM-MemoryInspector	http://arxiv.org/abs/2505.10212v1
8	From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models	Yidan Wang, Yubing Ren, Yanan Cao, Binxing Fang	2025-05-15	arXiv	https://github.com/redwyd/SymMark	http://arxiv.org/abs/2505.09924v2
9	ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts	Jing-Cheng Pang, Kaiyuan Li, Yidi Wang, Si-Hang Yang, Shengyi Jiang, Yang Yu	2025-05-15	arXiv	https://github.com/LAMDA-RL/ImagineBench	http://arxiv.org/abs/2505.10010v1
10	PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization	Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, Binxing Fang	2025-05-15	arXiv	https://github.com/redwyd/PrivacyJailbreak	http://arxiv.org/abs/2505.09921v2
11	LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models	Long Chen, Xiaotian Song, Yanan Sun	2025-05-14	arXiv	https://github.com/lc783/LAS	http://arxiv.org/abs/2505.09659v1
12	Adversarial Attack on Large Language Models using Exponentiated Gradient Descent	Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu	2025-05-14	arXiv	https://github.com/sbamit/Exponentiated-Gradient-Descent-LLM-Attack	http://arxiv.org/abs/2505.09820v1
13	CodePDE: An Inference Framework for LLM-driven PDE Solver Generation	Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar	2025-05-13	arXiv	https://github.com/LithiumDA/CodePDE	http://arxiv.org/abs/2505.08783v1
14	Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement	Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song	2025-05-13	arXiv	https://github.com/valuebyte-ai/Awesome-LLM-Psychometrics	http://arxiv.org/abs/2505.08245v1
15	Optimized Couplings for Watermarking Large Language Models	Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Hsiang Hsu, Haim Permuter, Flavio P. Calmon	2025-05-13	arXiv	https://github.com/Carol-Long/CC_Watermark	http://arxiv.org/abs/2505.08878v1
16	Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era	Xixuan Hao, Yutian Jiang, Xingchen Zou, Jiabo Liu, Yifang Yin, Yuxuan Liang	2025-05-13	arXiv	https://github.com/CityMind-Lab/Awesome-Location-Intelligence	http://arxiv.org/abs/2505.09651v1
17	HealthBench: Evaluating Large Language Models Towards Improved Human Health	Rahul K. Arora, Jason Wei, Rebecca Soskin Hicks, Preston Bowman, Joaquin Quiñonero-Candela, Foivos Tsimpourlas, Michael Sharman, Meghan Shah, Andrea Vallone, Alex Beutel, Johannes Heidecke, Karan Singhal	2025-05-13	arXiv	https://github.com/openai/simple-evals	http://arxiv.org/abs/2505.08775v1
18	A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models	Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao Shi, Jianping Fan, Xuanjing Huang	2025-05-12	arXiv	https://github.com/Junjie-Ye/MulDimIF	http://arxiv.org/abs/2505.07591v1
19	Are LLMs complicated ethical dilemma analyzers?	Jiashen, Du, Jesse Yao, Allen Liu, Zhekai Zhang	2025-05-12	arXiv	https://github.com/ALT-JS/ethicaLLM	http://arxiv.org/abs/2505.08106v1
20	DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation	Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han	2025-05-12	arXiv	https://github.com/GasolSun36/DynamicRAG	http://arxiv.org/abs/2505.07233v2
21	Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs	Yifan Wei, Xiaoyan Yu, Tengfei Pan, Angsheng Li, Li Du	2025-05-12	arXiv	https://github.com/weiyifan1023/senator	http://arxiv.org/abs/2505.07184v1
22	MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception	Zhengye Zhang, Sirui Zhao, Shifeng Liu, Shukang Yin, Xinglong Mao, Tong Xu, Enhong Chen	2025-05-11	arXiv	https://github.com/zyzhangUstc/MELLM	http://arxiv.org/abs/2505.07007v1
23	From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering	Gaurab Sarkar, Sougata Saha	2025-05-11	arXiv	https://github.com/sougata-ub/llms_for_ionic_liquids	http://arxiv.org/abs/2505.06964v1
24	GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance	Jinuk Kim, Marwa El Halabi, Wonpyo Park, Clemens JS Schaefer, Deokjae Lee, Yeonhong Park, Jae W. Lee, Hyun Oh Song	2025-05-11	arXiv	https://github.com/snu-mllab/GuidedQuant	http://arxiv.org/abs/2505.07004v1
25	POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models	Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, Junzheng Shi	2025-05-10	arXiv	https://github.com/AndyShaw01/PoisonCraft	http://arxiv.org/abs/2505.06579v1
26	Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning	Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu	2025-05-09	arXiv	https://github.com/zch65458525/L2T	http://arxiv.org/abs/2505.06321v1
27	HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow	You Peng, Youhe Jiang, Chen Wang, Binhang Yuan	2025-05-08	arXiv	https://github.com/Relaxed-System-Lab/Hexgen-Flow	http://arxiv.org/abs/2505.05286v1
28	KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification	Qianbo Zang, Christophe Zgrzendek, Igor Tchappi, Afshin Khadangi, Johannes Sedlmeir	2025-05-08	arXiv	https://github.com/QianboZang/KG-HTC	http://arxiv.org/abs/2505.05583v1
29	Prompt-Based LLMs for Position Bias-Aware Reranking in Personalized Recommendations	Md Aminul Islam, Ahmed Sayeed Faruk	2025-05-08	arXiv	https://github.com/aminul7506/LLMForReRanking	http://arxiv.org/abs/2505.04948v1
30	Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization	Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, Jianwei Yin	2025-05-08	arXiv	https://github.com/colored-dye/multi_stage_influence_function	http://arxiv.org/abs/2505.05017v1
31	Benchmarking LLMs' Swarm intelligence	Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun	2025-05-07	arXiv	https://github.com/x66ccff/swarmbench	http://arxiv.org/abs/2505.04364v1
32	TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution	Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park	2025-05-07	arXiv	https://github.com/ai4co/trajevo	http://arxiv.org/abs/2505.04480v1
33	Advancing and Benchmarking Personalized Tool Invocation for LLMs	Xu Huang, Yuefeng Huang, Weiwen Liu, Xingshan Zeng, Yasheng Wang, Ruiming Tang, Hong Xie, Defu Lian	2025-05-07	arXiv	https://github.com/hyfshadow/PTBench	http://arxiv.org/abs/2505.04072v1
34	Avoid Recommending Out-of-Domain Items: Constrained Generative Recommendation with LLMs	Hao Liao, Wensheng Lu, Jianxun Lian, Mingqi Wu, Shuo Wang, Yong Zhang, Yitian Huang, Mingyang Zhou, Xing Xie	2025-05-06	arXiv	https://github.com/microsoft/RecAI	http://arxiv.org/abs/2505.03336v1
35	CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu	2025-05-06	arXiv	https://github.com/MoonshotAI/CombiBench/	http://arxiv.org/abs/2505.03171v1
36	Plug-and-Play AMC: Context Is King in Training-Free, Open-Set Modulation with LLMs	Mohammad Rostami, Atik Faysal, Reihaneh Gh. Roshan, Huaxia Wang, Nikhil Muralidhar, Yu-Dong Yao	2025-05-06	arXiv	https://github.com/RU-SIT/context-is-king	http://arxiv.org/abs/2505.03112v1
37	Automatic Calibration for Membership Inference Attack on Large Language Models	Saleh Zare Zade, Yao Qiang, Xiangyu Zhou, Hui Zhu, Mohammad Amin Roshani, Prashant Khanduri, Dongxiao Zhu	2025-05-06	arXiv	https://github.com/Salehzz/ACMIA	http://arxiv.org/abs/2505.03392v1
38	FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models	Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, Weiyang Liu	2025-05-05	arXiv	https://sphere-ai-lab.github.io/FormalMATH/	http://arxiv.org/abs/2505.02735v1
39	LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis	Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng	2025-05-05	arXiv	https://github.com/ictnlp/LLaMA-Omni2	http://arxiv.org/abs/2505.02625v1
40	Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models	Xiaobao Wu	2025-05-05	arXiv	https://github.com/bobxwu/learning-from-rewards-llm-papers	http://arxiv.org/abs/2505.02686v1
41	Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data	Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan	2025-05-04	arXiv	https://github.com/millioniron/LLM_exploration	http://arxiv.org/abs/2505.02130v1
42	MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents	Zeyu Zhang, Quanyu Dai, Xu Chen, Rui Li, Zhongyang Li, Zhenhua Dong	2025-05-04	arXiv	https://github.com/nuster1128/MemEngine	http://arxiv.org/abs/2505.02099v1
43	Amplifying Your Social Media Presence: Personalized Influential Content Generation with LLMs	Yuying Zhao, Yu Wang, Xueqi Cheng, Anne Marie Tumlin, Yunchao Liu, Damin Xia, Meng Jiang, Tyler Derr	2025-05-03	arXiv	https://github.com/YuyingZhao/LLM-influence-amplifier	http://arxiv.org/abs/2505.01698v1
44	A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency	Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee	2025-05-03	arXiv	https://github.com/sihyeong/Awesome-LLM-Inference-Engine	http://arxiv.org/abs/2505.01658v1
45	WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks	Jingwen Tong, Jiawei Shao, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang	2025-05-02	arXiv	https://github.com/jwentong/WirelessAgent_R1	https://doi.org/10.48550/arXiv.2409.07964
46	FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing	Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton van den Hengel, Yuankai Qi, Qingming Huang	2025-05-02	arXiv	https://galaxycong.github.io/LLM-Flow-Dubber/	http://arxiv.org/abs/2505.01263v1
47	Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities	Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao	2025-05-02	arXiv	https://github.com/Hao840/Awesome-Low-Precision-Training	http://arxiv.org/abs/2505.01043v1
48	LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection	Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou	2025-05-01	arXiv	https://github.com/Susan571/LENSLLM	http://arxiv.org/abs/2505.03793v1
49	Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models	Bang Zhang, Ruotian Ma, Qingxuan Jiang, Peisong Wang, Jiaqi Chen, Zheng Xie, Xingyu Chen, Yue Wang, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li	2025-05-01	arXiv	https://github.com/Tencent/digitalhuman/tree/main/SAGE	http://arxiv.org/abs/2505.02847v2
50	SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation	Quang P. M. Pham, Khoi T. N. Nguyen, Nhi H. Doan, Cuong A. Pham, Kentaro Inui, Dezhen Song	2025-05-01	arXiv	https://github.com/quangpham2006/SmallPlan	http://arxiv.org/abs/2505.00831v1
51	A Survey on Large Language Model based Human-Agent Systems	Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, Philip S. Yu	2025-05-01	arXiv	https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers	http://arxiv.org/abs/2505.00753v1
52	DeepCritic: Deliberate Critique with Large Language Models	Wenkai Yang, Jingwen Chen, Yankai Lin, Ji-Rong Wen	2025-05-01	arXiv	https://github.com/RUCBM/DeepCritic	http://arxiv.org/abs/2505.00662v1
53	LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models	Junfeng Jiao, Saleh Afroogh, Abhejay Murali, Kevin Chen, David Atkinson, Amit Dhurandhar	2025-05-01	arXiv	https://github.com/	http://arxiv.org/abs/2505.00853v1
54	LLM-based Interactive Imitation Learning for Robotic Manipulation	Jonas Werner, Kun Chu, Cornelius Weber, Stefan Wermter	2025-04-30	arXiv	https://github.com/Tubicor/LLM-iTeach	http://arxiv.org/abs/2504.21769v1
55	When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator	Md Fahim Anjum	2025-04-30	arXiv	https://github.com/MDFahimAnjum/llm-planning-with-reasoning	http://arxiv.org/abs/2505.03786v1
56	OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification	Shangyu Li, Juyong Jiang, Tiancheng Zhao, Jiasi Shen	2025-04-29	arXiv	https://github.com/lishangyu-hkust/OSVBench	http://arxiv.org/abs/2504.20964v1
57	Reinforcement Learning for Reasoning in Large Language Models with One Training Example	Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen	2025-04-29	arXiv	https://github.com/ypwang61/One-Shot-RLVR	http://arxiv.org/abs/2504.20571v1
58	Turing Machine Evaluation for Large Language Model	Haitao Wu, Zongbo Han, Huaxi Huang, Changqing Zhang	2025-04-29	arXiv	https://github.com/HaitaoWuTJU/Turing-Machine-Bench	http://arxiv.org/abs/2504.20771v1
59	X-Fusion: Introducing New Modality to Frozen Large Language Models	Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li	2025-04-29	arXiv	https://sichengmo.github.io/XFusion/	http://arxiv.org/abs/2504.20996v1
60	AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers	Zijie Lin, Yiqing Shen, Qilin Cai, He Sun, Jinrui Zhou, Mingjun Xiao	2025-04-28	arXiv	https://github.com/shoushouyu/Automated-Paper-to-Code	http://arxiv.org/abs/2504.20115v1
61	Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies	Kavindu Warnakulasuriya, Prabhash Dissanayake, Navindu De Silva, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Surangika Ranathunga, Nisansa de Silva	2025-04-28	arXiv	https://coin-workshop.github.io/coine-2025-detroit/accepted_for_presentation.html	http://arxiv.org/abs/2504.19487v1
62	LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects	Guangyi Liu, Pengxiang Zhao, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, Wenhao Wang, Tianze Wu, Linghao Li, Hao Wang, Guanjing Xiong, Yong Liu, Hongsheng Li	2025-04-28	2025	https://github.com/PhoneLLM/Awesome-LLM-Powered-Phone-GUI-Agents	http://arxiv.org/abs/2504.19838v1
63	SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning	Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong	2025-04-27	arXiv	https://chen-judge.github.io/SPC/	http://arxiv.org/abs/2504.19162v1
64	Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers	Dylan Bouchard, Mohit Singh Chauhan	2025-04-27	arXiv	https://github.com/cvs-health/uqlm	http://arxiv.org/abs/2504.19254v2
65	BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese	Peilin Zhou, Bruce Leon, Xiang Ying, Can Zhang, Yifan Shao, Qichen Ye, Dading Chong, Zhiling Jin, Chenxuan Xie, Meng Cao, Yuxin Gu, Sixin Hong, Jing Ren, Jian Chen, Chao Liu, Yining Hua	2025-04-27	arXiv	https://github.com/PALIN2018/BrowseComp-ZH	http://arxiv.org/abs/2504.19314v2
66	Calibrating Translation Decoding with Quality Estimation on LLMs	Di Wu, Yibin Lei, Christof Monz	2025-04-26	arXiv	https://github.com/moore3930/calibrating-llm-mt	http://arxiv.org/abs/2504.19044v1
67	Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs	Mohammad Akbar-Tajari, Mohammad Taher Pilehvar, Mohammad Mahmoody	2025-04-26	arXiv	https://github.com/GoAT-pydev/Graph_of_Attacks	http://arxiv.org/abs/2504.19019v1
68	SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models	Nader Zantout, Haochen Zhang, Pujith Kachana, Jinkai Qiu, Ji Zhang, Wenshan Wang	2025-04-25	arXiv	https://github.com/nzantout/SORT3D	http://arxiv.org/abs/2504.18684v1
69	DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models	Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Yingshui Tan, Yanan Wu, Jihao Gu, Yangguang Li, Jianke Zhu	2025-04-25	arXiv	https://github.com/Kizna1ver/DREAM	http://arxiv.org/abs/2504.18053v1
70	LEAM: A Prompt-only Large Language Model-enabled Antenna Modeling Method	Tao Wu, Kexue Fu, Qiang Hua, Xinxin Liu, Muhammad Ali Imran, Bo Liu	2025-04-25	arXiv	https://github.com/TaoWu974/LEAM	http://arxiv.org/abs/2504.18271v1
71	An Empirical Study on Prompt Compression for Large Language Models	Zheng Zhang, Jinyi Li, Yihuai Lan, Xiang Wang, Hao Wang	2025-04-24	arXiv	https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression	http://arxiv.org/abs/2505.00019v1
72	RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning	Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li	2025-04-24	arXiv	https://github.com/RAGEN-AI/RAGEN	http://arxiv.org/abs/2504.20073v1
73	Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs	Tiancheng Gu, Kaicheng Yang, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai, Jiankang Deng	2025-04-24	arXiv	https://garygutc.github.io/UniME	http://arxiv.org/abs/2504.17432v1
74	Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark	Hanlei Zhang, Zhuohang Li, Yeshuang Zhu, Hua Xu, Peiwu Wang, Haige Zhu, Jie Zhou, Jinchao Zhang	2025-04-23	arXiv	https://github.com/thuiar/MMLA	http://arxiv.org/abs/2504.16427v2
75	UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models	Yu Zheng, Longyi Liu, Yuming Lin, Jie Feng, Guozhen Zhang, Depeng Jin, Yong Li	2025-04-23	arXiv	https://github.com/tsinghua-fib-lab/PlanBench	http://arxiv.org/abs/2504.21027v1
76	Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control	Hannah Cyberey, David Evans	2025-04-23	arXiv	https://github.com/hannahxchen/llm-censorship-steering	http://arxiv.org/abs/2504.17130v1
77	Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution	Junjie Chen, Haitao Li, Jingli Yang, Yiqun Liu, Qingyao Ai	2025-04-23	arXiv	https://github.com/cjj826/GoalAct	http://arxiv.org/abs/2504.16563v1
78	LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale	Joya Chen, Ziyun Zeng, Yiqi Lin, Wei Li, Zejun Ma, Mike Zheng Shou	2025-04-22	arXiv	https://showlab.github.io/livecc	http://arxiv.org/abs/2504.16030v1
79	PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models	Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang, Xudong Tian, Anqi Lv, Laifu Man, Jianxiang Li, Feiyu Tao, Qihua Sun, Zhou Liang, Yushu Mu, Zhongxuan Li, Jing-Jun Zhang, Shutao Zhang, Xiaotian Li, Xingqi Xia, Jiawei Lin, Zheyu Shen, Jiahang Chen, Qiuhao Xiong, Binran Wang, Fengyuan Wang, Ziyang Ni, Bohan Zhang, Fan Cui, Changkun Shao, Qing-Hong Cao, Ming-xing Luo, Muhan Zhang, Hua Xing Zhu	2025-04-22	arXiv	https://phybench-official.github.io/phybench-demo/	http://arxiv.org/abs/2504.16074v1
80	WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents	Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang	2025-04-22	arXiv	https://github.com/elated-sawyer/WALL-E	http://arxiv.org/abs/2504.15785v1
81	CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs	Yingming Zheng, Xiaoliang Liu, Peng Wu, Li Pan	2025-04-21	arXiv	https://github.com/8zym/CRAVE	http://arxiv.org/abs/2504.14905v1
82	EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models	Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen	2025-04-21	arXiv	https://zjunlp.github.io/project/EasyEdit2/video	https://doi.org/10.48550/arXiv.2308.07269
83	Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph	Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu	2025-04-21	arXiv	https://github.com/NEUIR/MemGraph	http://arxiv.org/abs/2504.14845v1
84	Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators	Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty	2025-04-21	arXiv	https://github.com/SalesforceAIResearch/jetts-benchmark	http://arxiv.org/abs/2504.15253v1
85	IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	David Ma, Yuanxing Zhang, Jincheng Ren, Jarvis Guo, Yifan Yao, Zhenlin Wei, Zhenzhu Yang, Zhongyuan Peng, Boyu Feng, Jun Ma, Xiao Gu, Zhoufutu Wen, King Zhu, Yancheng He, Meng Cao, Shiwen Ni, Jiaheng Liu, Wenhao Huang, Ge Zhang, Xiaojie Jin	2025-04-21	arXiv	https://github.com/multimodal-art-projection/IV-Bench	http://arxiv.org/abs/2504.15415v1
86	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu	2025-04-21	arXiv	https://visulogic-benchmark.github.io/VisuLogic	http://arxiv.org/abs/2504.15279v1
87	NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models	Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang	2025-04-20	arXiv	https://github.com/LawrenceRLiu/NoWag	http://arxiv.org/abs/2504.14569v1
88	Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Tong Zeng, Longfeng Wu, Liang Shi, Dawei Zhou, Feng Guo	2025-04-20	arXiv	https://github.com/tong-zeng/DVBench	http://arxiv.org/abs/2504.14526v1
89	CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations	Man Ho Lam, Chaozheng Wang, Jen-tse Huang, Michael R. Lyu	2025-04-19	arXiv	https://donaldlamnl.github.io/CodeCrash/	http://arxiv.org/abs/2504.14119v1
90	Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model	Youngbin Lee, Yejin Kim, Suin Kim, Yongjae Lee	2025-04-19	arXiv	https://github.com/youngandbin/LLM-MVO-BLM	http://arxiv.org/abs/2504.14345v1
91	Towards Explainable Fake Image Detection with Multi-Modal Large Language Models	Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang	2025-04-19	arXiv	https://github.com/Gennadiyev/mllm-defake	http://arxiv.org/abs/2504.14245v1
92	Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator	Akshat Ramachandran, Souvik Kundu, Arnab Raha, Shamik Kundu, Deepak K. Mathaikutty, Tushar Krishna	2025-04-19	arXiv	https://github.com/FLOW-open-project/FLOW	http://arxiv.org/abs/2504.14365v1
93	LLM Sensitivity Evaluation Framework for Clinical Diagnosis	Chenwei Yan, Xiangling Fu, Yuxuan Xiong, Tianyi Wang, Siu Cheung Hui, Ji Wu, Xien Liu	2025-04-18	Proceedings of the 31st International Conference on Computational Linguistics, 2025	https://github.com/chenwei23333/DiagnosisQA	http://arxiv.org/abs/2504.13475v1
94	ConExion: Concept Extraction with Large Language Models	Ebrahim Norouzi, Sven Hertling, Harald Sack	2025-04-17	arXiv	https://github.com/ISE-FIZKarlsruhe/concept_extraction	http://arxiv.org/abs/2504.12915v1
95	EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting	Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, ShiLiang Zhang, Xie Chen	2025-04-17	arXiv	https://yanghaha0908.github.io/EmoVoice/	http://arxiv.org/abs/2504.12867v1
96	ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition	Hisham A. Alyahya, Haidar Khan, Yazeed Alnumay, M Saiful Bari, Bülent Yener	2025-04-17	arXiv	https://github.com/facebookresearch/ZeroSumEval	http://arxiv.org/abs/2503.10673v1
97	Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration	Yicheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, Feng Ma	2025-04-17	arXiv	https://github.com/ycpNotFound/GeoGen	http://arxiv.org/abs/2504.12773v1
98	Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM	Zirui Pan, Xin Wang, Yipeng Zhang, Hong Chen, Kwan Man Cheng, Yaofei Wu, Wenwu Zhu	2025-04-16	arXiv	https://modular-cam.github.io	http://arxiv.org/abs/2504.12048v1
99	d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	Siyan Zhao, Devaansh Gupta, Qinqing Zheng, Aditya Grover	2025-04-16	arXiv	https://dllm-reasoning.github.io/	http://arxiv.org/abs/2504.12216v1
100	LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA	Xanh Ho, Jiahao Huang, Florian Boudin, Akiko Aizawa	2025-04-16	arXiv	https://github.com/Alab-NII/llm-judge-extract-qa	http://arxiv.org/abs/2504.11972v1
101	HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks	Stefan Abi-Karam, Cong Hao	2025-04-16	arXiv	https://github.com/stefanpie/hls-eval	http://arxiv.org/abs/2504.12268v1
102	A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment	Negar Arabzadeh, Charles L. A . Clarke	2025-04-16	arXiv	https://github.com/Narabzad/prompt-sensitivity-relevance-judgements/	http://arxiv.org/abs/2504.12408v1
103	MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning	Zhaopeng Feng, Shaosheng Cao, Jiahan Ren, Jiayuan Su, Ruizhe Chen, Yan Zhang, Zhe Xu, Yao Hu, Jian Wu, Zuozhu Liu	2025-04-15	arXiv …, 2025	https://github.com/fzp0424/MT-R1-Zero	http://arxiv.org/abs/2504.10160v1
104	Using LLMs as prompt modifier to avoid biases in AI image generators	René Peinl	2025-04-15	arXiv	https://iisys-hof.github.io/llm-prompt-img-gen/	http://arxiv.org/abs/2504.11104v1
105	Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From	Changjiang Gao, Hankun Lin, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Jiajun Chen	2025-04-15	arXiv	https://github.com/NJUNLP/Cross-Lingual-Context-Retrieval	http://arxiv.org/abs/2504.10906v1
106	RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence	Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Lizhou Lin, Lan Sun, Renwen Wang, Jianran Liu, Qi Wu, Ling Pei	2025-04-15	arXiv …, 2025	https://inowlzy.github.io/RadarLLM/	http://arxiv.org/abs/2504.09862v1
107	Propaganda via AI? A Study on Semantic Backdoors in Large Language Models	Nay Myat Min, Long H. Pham, Yige Li, Jun Sun	2025-04-15	arXiv	https://github.com/NayMyatMin/RAVEN	http://arxiv.org/abs/2504.12344v1
108	Probing then Editing Response Personality of Large Language Models	Tianjie Ju, Zhenyu Shao, Bowen Wang, Yujia Chen, Zhuosheng Zhang, Hao Fei, Mong-Li Lee, Wynne Hsu, Sufeng Duan, Gongshen Liu	2025-04-15	arXiv …, 2025	https://github.com/universe-sky/probing-then-editing-personality	http://arxiv.org/abs/2504.10227v1
109	LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models	Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy	2025-04-15	arXiv …, 2025	https://github.com/deep-symbolic-mathematics/llm-srbench	http://arxiv.org/abs/2504.10415v1
110	Teaching Large Language Models to Reason through Learning and Forgetting	Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor	2025-04-15	arXiv	https://github.com/twni2016/llm-reasoning-uft	http://arxiv.org/abs/2504.11364v1
111	Dynamic Compressing Prompts for Efficient Inference of Large Language Models	Jinwu Hu, Wei Zhang, Yufeng Wang, Yu Hu, Bin Xiao, Mingkui Tan, Qing Du	2025-04-15	arXiv	https://github.com/Fhujinwu/DCP	http://arxiv.org/abs/2504.11004v1
112	A Dual-Space Framework for General Knowledge Distillation of Large Language Models	Xue Zhang, Songming Zhang, Yunlong Liang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou	2025-04-15	arXiv	https://github.com/songmzhang/DSKDv2	http://arxiv.org/abs/2504.11426v1
113	70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float	Tianyi Zhang, Yang Sui, Shaochen Zhong, Vipin Chaudhary, Xia Hu, Anshumali Shrivastava	2025-04-15	arXiv	https://github.com/LeanModels/DFloat11	http://arxiv.org/abs/2504.11651v1
114	LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks	Soumyadeep Pal, Changsheng Wang, James Diffenderfer, Bhavya Kailkhura, Sijia Liu	2025-04-15	arXiv …, 2025	https://github.com/OPTML-Group/MU-Coreset	http://arxiv.org/abs/2504.10185v2
115	CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates	Ankit Kumar Shaw, Kun Jiang, Tuopu Wen, Chandan Kumar Sah, Yining Shi, Mengmeng Yang, Diange Yang, Xiaoli Lian	2025-04-14	arXiv	https://Ankit-Zefan.github.io/CleanMap/	http://arxiv.org/abs/2504.10738v1
116	ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model	Wuyang Lan, Wenzheng Wang, Changwei Ji, Guoxing Yang, Yongbo Zhang, Xiaohong Liu, Song Wu, Guangyu Wang	2025-04-13	arXiv	https://github.com/medfound/medfound	http://arxiv.org/abs/2504.09421v2
117	Fine-tuning a Large Language Model for Automating Computational Fluid Dynamics Simulations	Zhehao Dong, Zhen Lu, Yue Yang	2025-04-13	arXiv	https://github.com/YYgroup/AutoCFD	http://arxiv.org/abs/2504.09602v2
118	Alleviating the Fear of Losing Alignment in LLM Fine-tuning	Kang Yang, Guanhong Tao, Xun Chen, Jun Xu	2025-04-13	arXiv	https://github.com/kangyangWHU/LLMAlignment	http://arxiv.org/abs/2504.09757v1
119	Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025	Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou	2025-04-13	arXiv	https://github.com/zou-group/review_feedback_agent	http://arxiv.org/abs/2504.09737v1
120	DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training	Zhenting Wang, Guofeng Cui, Kun Wan, Wentian Zhao	2025-04-13	arXiv	https://github.com/ZhentingWang/DUMP	http://arxiv.org/abs/2504.09710v1
121	HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs	Sharanya Dasgupta, Sujoy Nath, Arkaprabha Basu, Pourya Shamsolmoali, Swagatam Das	2025-04-13	arXiv	https://github.com/sharanya-dasgupta001/hallushift	http://arxiv.org/abs/2504.09482v1
122	How new data permeates LLM knowledge and how to dilute it	Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler	2025-04-13	arXiv	https://sunchipsster1.github.io/projects/outlandish/	http://arxiv.org/abs/2504.09522v1
123	SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model	Kaiyu Li, Zepeng Xin, Li Pang, Chao Pang, Yupeng Deng, Jing Yao, Guisong Xia, Deyu Meng, Zhi Wang, Xiangyong Cao	2025-04-13	arXiv	https://github.com/earth-insights/SegEarth-R1	http://arxiv.org/abs/2504.09644v1
124	Span-level Emotion-Cause-Category Triplet Extraction with Instruction Tuning LLMs and Data Augmentation	Xiangju Li, Dong Yang, Xiaogang Zhu, Faliang Huang, Peng Zhang, Zhongying Zhao	2025-04-13	arXiv	https://github.com/zxgnlp/InstruDa-LLM	http://arxiv.org/abs/2504.12331v1
125	Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Chenghao Li, Chaoning Zhang, Yi Lu, Jiaquan Zhang, Qigan Sun, Xudong Wang, Jiwei Wei, Guoqing Wang, Yang Yang, Heng Tao Shen	2025-04-13	arXiv	https://github.com/dlMARiA/Syzygy-of-thoughts	http://arxiv.org/abs/2504.09566v2
126	GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Lang Lin, Xueyang Yu, Ziqi Pang, Yu-Xiong Wang	2025-04-12	arXiv:2504.07962, 2025	https://glus-video.github.io/	http://arxiv.org/abs/2504.07962v1
127	Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law	Yixin Cao, Jiahao Ying, Yaoning Wang, Xipeng Qiu, Xuanjing Huang, Yugang Jiang	2025-04-12	arXiv …, 2025	https://github.com/ALEX-nlp/MUI-Eva	http://arxiv.org/abs/2504.07440v1
128	LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking	Qi Liu, Haozhe Duan, Yiqun Chen, Quanfeng Lu, Weiwei Sun, Jiaxin Mao	2025-04-12	arXiv …, 2025	https://github.com/liuqi6777/llm4ranking	http://arxiv.org/abs/2504.07439v1
129	Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation	Bo Zhang, Hui Ma, Dailin Li, Jian Ding, Jian Wang, Bo Xu, HongFei Lin	2025-04-12	arXiv …, 2025	https://github.com/zhangbo-nlp/KEDiT	http://arxiv.org/abs/2504.07754v1
130	Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models	Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua	2025-04-12	arXiv …, 2025	https://github.com/Lum1104/EIBench	http://arxiv.org/abs/2504.07521v1
131	From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy	Adrianna Romanowski, Pedro H. V. Valois, Kazuhiro Fukui	2025-04-12	arXiv	https://github.com/swaggirl9000/humor	http://arxiv.org/abs/2504.09049v1
132	Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks	Ye Ye	2025-04-11	arXiv	https://github.com/biubiutomato/TME-Agent	http://arxiv.org/abs/2504.08525v3
133	A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis	Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Conghui He, Lijun Wu	2025-04-11	arXiv	https://github.com/GX-XinGao/GRA	http://arxiv.org/abs/2504.12322v1
134	Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric	Yixin Cao, Jiahao Ying, Yaoning Wang, Xipeng Qiu, Xuanjing Huang, Yugang Jiang	2025-04-10	arXiv	https://github.com/ALEX-nlp/MUI-Eva	http://arxiv.org/abs/2504.07440v2
135	Exploring the Effectiveness and Interpretability of Texts in LLM-based Time Series Models	Zhengke Sun, Hangwei Qian, Ivor Tsang	2025-04-09	arXiv	https://github.com/zachysun/TS-Lang-Exp	http://arxiv.org/abs/2504.08808v1
136	V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models	Xiangxi Zheng, Linjie Li, Zhengyuan Yang, Ping Yu, Alex Jinpeng Wang, Rui Yan, Yuan Yao, Lijuan Wang	2025-04-08	arXiv	https://github.com/CSU-JPG/V-MAGE	http://arxiv.org/abs/2504.06148v2
137	LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources	Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun	2025-04-08	arXiv	https://github.com/thunlp/LLMxMapReduce	http://arxiv.org/abs/2504.05732v1
138	Assessing Thai Dialect Performance in LLMs with Automatic Benchmarks and Human Evaluation	Peerat Limkonchotiwat, Kanruethai Masuk, Surapon Nonesung, Chalermpun Mai-On, Sarana Nutanong, Wuttikorn Ponwitayarat, Potsawee Manakul	2025-04-08	arXiv	https://github.com/mrpeerat/Thai_local_benchmark	http://arxiv.org/abs/2504.05898v1
139	MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models	Pengfei Zhou, Fanrui Zhang, Xiaopeng Peng, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang	2025-04-08	arXiv	https://github.com/LanceZPF/MDK12	http://arxiv.org/abs/2504.05782v1
140	Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models	Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, Rema Padman	2025-04-07	arXiv	https://github.com/yubol-cmu/Awesome-Multi-Turn-LLMs	http://arxiv.org/abs/2504.04717v1
141	SEAL: Steerable Reasoning Calibration of Large Language Models for Free	Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, Zhangyang Wang	2025-04-07	arXiv	https://github.com/VITA-Group/SEAL	http://arxiv.org/abs/2504.07986v1
142	EduPlanner: LLM-Based Multi-Agent Systems for Customized and Intelligent Instructional Design	Xueqiao Zhang, Chao Zhang, Jianwen Sun, Jun Xiao, Yi Yang, Yawei Luo	2025-04-07	arXiv	https://github.com/Zc0812/Edu_Planner	http://arxiv.org/abs/2504.05370v1
143	Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs	Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song	2025-04-07	arXiv	https://github.com/sunblaze-ucb/llm-api-audit	http://arxiv.org/abs/2504.04715v1
144	Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials	Chu Zhao, Enneng Yang, Yuting Liu, Jianzhe Zhao, Guibing Guo, Xingwei Wang	2025-04-07	arXiv	https://github.com/user683/HNLMRec	http://arxiv.org/abs/2504.04726v1
145	Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration	Ran Xu, Wenqi Shi, Yuchen Zhuang, Yue Yu, Joyce C. Ho, Haoyu Wang, Carl Yang	2025-04-07	arXiv	https://github.com/ritaranx/Collab-RAG/	http://arxiv.org/abs/2504.04915v1
146	PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters	Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu	2025-04-07	arXiv	https://github.com/Lizonghang/prima.cpp	http://arxiv.org/abs/2504.08791v1
147	ArxivBench: Can LLMs Assist Researchers in Conducting Research?	Ning Li, Jingran Zhang, Justin Cui	2025-04-06	arXiv	https://github.com/arxivBenchLLM/arXivBench	http://arxiv.org/abs/2504.10496v1
148	Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning	Xuerui Su, Shufang Xie, Guoqing Liu, Yingce Xia, Renqian Luo, Peiran Jin, Zhiming Ma, Yue Wang, Zun Wang, Yuting Liu	2025-04-06	arXiv	https://github.com/XueruiSu/Trust-Region-Preference-Approximation	http://arxiv.org/abs/2504.04524v1
149	A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models	Aviv Brokman, Xuguang Ai, Yuhang Jiang, Shashank Gupta, Ramakanth Kavuluru	2025-04-05	arXiv	https://github.com/bionlproc/ZeroShotRE	http://arxiv.org/abs/2504.04083v1
150	Window Token Concatenation for Efficient Visual Large Language Models	Yifan Li, Wentao Bao, Botao Ye, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong	2025-04-05	arXiv	https://github.com/JackYFL/WiCo	http://arxiv.org/abs/2504.04024v1
151	AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs	Xinyu Mao, Teerapong Leelanupab, Martin Potthast, Harrisen Scells, Guido Zuccon	2025-04-05	arXiv	https://github.com/ielab/ai-review	http://arxiv.org/abs/2504.04193v1
152	A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models	Yuantao Zhang, Zhankui Yang	2025-04-05	arXiv	https://github.com/zyttt-coder/LLM_similarity	http://arxiv.org/abs/2504.04216v1
153	MSL: Not All Tokens Are What You Need for Tuning LLM as a Recommender	Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, Can Wang	2025-04-05	arXiv	https://github.com/WANGBohaO-jpg/MSL	http://arxiv.org/abs/2504.04178v1
154	VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation	Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang	2025-04-05	arXiv	https://github.com/SJTU-OmniAgent/VocalNet	http://arxiv.org/abs/2504.04060v1
155	Align to Structure: Aligning Large Language Models with Structural Information	Zae Myung Kim, Anand Ramachandran, Farideh Tavazoee, Joo-Kyung Kim, Oleg Rokhlenko, Dongyeop Kang	2025-04-04	arXiv	https://github.com/minnesotanlp/struct_align	http://arxiv.org/abs/2504.03622v1
156	EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline	Peter Baile Chen, Tomer Wolfson, Michael Cafarella, Dan Roth	2025-04-04	arXiv	https://peterbaile.github.io/enrichindex/	http://arxiv.org/abs/2504.03598v1
157	AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology	Xiang Feng, Wentao Jiang, Zengmao Wang, Yong Luo, Pingbo Xu, Baosheng Yu, Hua Jin, Bo Du, Jing Zhang	2025-04-03	arXiv	https://github.com/MiliLab/AnesBench	http://arxiv.org/abs/2504.02404v1
158	BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs	Alexander Leszczynski, Sarah Gillet, Iolanda Leite, Fethiye Irmak Dogan	2025-04-03	arXiv	https://github.com/1Eggbert7/BT_LLM	http://arxiv.org/abs/2504.02779v1
159	Measurement of LLM's Philosophies of Human Nature	Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Wangmeng Zuo	2025-04-03	arXiv	https://github.com/kodenii/M-PHNS	http://arxiv.org/abs/2504.02304v1
160	ZClip: Adaptive Spike Mitigation for LLM Pre-Training	Abhay Kumar, Louis Owen, Nilabhra Roy Chowdhury, Fabian Güra	2025-04-03	arXiv	https://github.com/bluorion-com/ZClip	http://arxiv.org/abs/2504.02507v1
161	Comment Staytime Prediction with LLM-enhanced Comment Understanding	Changshuo Zhang, Zihan Lin, Shukai Liu, Yongqi Liu, Han Li	2025-04-02	arXiv	https://github.com/lyingCS/KuaiComt.github.io	http://arxiv.org/abs/2504.01602v1
162	OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling	Heming Zhang, Tim Xu, Dekang Cao, Shunning Liang, Lars Schimmelpfennig, Levi Kaster, Di Huang, Carlos Cruchaga, Guangfu Li, Michael Province, Yixin Chen, Philip Payne, Fuhai Li	2025-04-02	arXiv	https://github.com/FuhaiLiAiLab/OmniCellTOSG	http://arxiv.org/abs/2504.02148v1
163	TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining	Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri	2025-04-02	arXiv	https://github.com/apple/ml-tic-lm	http://arxiv.org/abs/2504.02107v1
164	MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits	Brandon Radosevich, John Halloran	2025-04-02	arXiv	https://github.com/leidosinc/McpSafetyScanner	http://arxiv.org/abs/2504.03767v1
165	Urban Computing in the Era of Large Language Models	Zhonghang Li, Lianghao Xia, Xubin Ren, Jiabin Tang, Tianyi Chen, Yong Xu, Chao Huang	2025-04-02	arXiv	https://github.com/HKUDS/Awesome-LLM4Urban-Papers	https://doi.org/10.48550/arXiv.2504.02009
166	CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models	Wei Zhou, Yuyang Gao, Xuanhe Zhou, Guoliang Li	2025-04-01	arXiv	https://github.com/weAIDB/CrackSQL	https://doi.org/10.48550/arXiv.2504.00882
167	RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model	Lin Zhang, Zhouhong Gu, Xiaoran Shi, Hongwei Feng, Yanghua Xiao	2025-04-01	arXiv	https://github.com/MikeGu721/reckon	https://doi.org/10.48550/arXiv.2504.00756
168	ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers	Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun	2025-04-01	arXiv	https://github.com/icip-cas/ShortV	https://doi.org/10.48550/arXiv.2504.00502
169	m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models	Xiaoke Huang, Juncheng Wu, Hui Liu, Xianfeng Tang, Yuyin Zhou	2025-04-01	arXiv	https://github.com/UCSC-VLAA/m1	https://doi.org/10.48550/arXiv.2504.00869
170	MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs	Juncheng Wu, Wenlong Deng, Xingxuan Li, Sheng Liu, Taomian Mi, Yifan Peng, Ziyang Xu, Yi Liu, Hyunjin Cho, Chang-In Choi, Yihan Cao, Hui Ren, Xiang Li, Xiaoxiao Li, Yuyin Zhou	2025-04-01	arXiv	https://github.com/UCSC-VLAA/MedReason	http://arxiv.org/abs/2504.00993v2
171	When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning	Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach	2025-04-01	arXiv	https://github.com/nishadsinghi/sc-genrm-scaling	http://arxiv.org/abs/2504.01005v1
172	SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning	Shiyue Zhao, Junzhi Zhang, Neda Masoud, Heye Huang, Xingpeng Xia, Chengkun He	2025-03-31	arXiv	https://sean-shiyuez.github.io/SACA/	http://arxiv.org/abs/2504.00115v1
173	What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models	Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma	2025-03-31	arXiv	https://github.com/testtimescaling/testtimescaling.github.io/	https://doi.org/10.48550/arXiv.2503.24235
174	SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers	Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang, Lin Gui, Yulan He	2025-03-31	arXiv	https://github.com/xyzCS/SciReplicate-Bench	http://arxiv.org/abs/2504.00255v1
175	LANID: LLM-assisted New Intent Discovery	Lu Fan, Jiashu Pu, Rongsheng Zhang, Xiao-Ming Wu	2025-03-31	arXiv	https://github.com/floatSDSDS/LANID	http://arxiv.org/abs/2503.23740v1
176	Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models	Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong	2025-03-31	arXiv	https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers	https://doi.org/10.48550/arXiv.2503.24377
177	Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving	Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen	2025-03-31	arXiv	https://github.com/LLMkvsys/rethink-kv-compression	https://doi.org/10.48550/arXiv.2503.24000
178	Text Chunking for Document Classification for Urban System Management using Large Language Models	Joshua Rodriguez, Om Sanan, Guillermo Vizarreta-Luna, Steven A. Conrad	2025-03-31	arXiv	https://github.com/josh-rodriguez-csu/ChunkingforLLMs	https://doi.org/10.48550/arXiv.2504.00274
179	A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?	Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma	2025-03-31	arXiv	https://github.com/testtimescaling/testtimescaling.github.io/	http://arxiv.org/abs/2503.24235v3
180	ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance	Tong Xie, Jiawang Zhao, Zishen Wan, Zuodong Zhang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li	2025-03-31	arXiv	https://github.com/PKU-SEC-Lab/ReaLM_DAC25/	https://doi.org/10.48550/arXiv.2503.24053
181	EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing	Hongxiang Jiang, Jihao Yin, Qixiong Wang, Jiaqi Feng, Guo Chen	2025-03-30	arXiv	https://github.com/XiangTodayEatsWhat/EagleVision	http://arxiv.org/abs/2503.23330v1
182	Agentic Large Language Models, a survey	Aske Plaat, Max J. van Duijn, Niki van Stein, Mike Preuss, Peter van der Putten, Kees Joost Batenburg	2025-03-29	arXiv	https://askeplaat.github.io/agentic-llm-survey-site/	https://doi.org/10.48550/arXiv.2503.23037
183	Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models	Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han	2025-03-28	arXiv	https://github.com/tmlr-group/landscape-of-thoughts	https://doi.org/10.48550/arXiv.2503.22165
184	MediTools -- Medical Education Powered by LLMs	Amr Alshatnawi, Remi Sampaleanu, David Liebovitz	2025-03-28	arXiv	https://github.com/NM-Streamlit-Team/meditools	http://arxiv.org/abs/2503.22769v1
185	A Refined Analysis of Massive Activations in LLMs	Louis Owen, Nilabhra Roy Chowdhury, Abhay Kumar, Fabian Güra	2025-03-28	arXiv	https://github.com/bluorion-com/refine_massive_activations	http://arxiv.org/abs/2503.22329v1
186	QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?	Belinda Z. Li, Been Kim, Zi Wang	2025-03-28	arXiv	https://github.com/google-deepmind/questbench	http://arxiv.org/abs/2503.22674v1
187	SWI: Speaking with Intent in Large Language Models	Yuwei Yin, EunJeong Hwang, Giuseppe Carenini	2025-03-27	arXiv	https://github.com/YuweiYin/SWI	https://doi.org/10.48550/arXiv.2503.21544
188	Ignite Forecasting with SPARK: An Efficient Generative Framework for Refining LLMs in Temporal Knowledge Graph Forecasting	Gongzhu Yin, Hongli Zhang, Yi Luo, Yuchen Yang, Kun Lu, Chao Meng	2025-03-27	arXiv	https://github.com/yin-gz/SPARK	http://arxiv.org/abs/2503.22748v1
189	Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap	Tong Nie, Jian Sun, Wei Ma	2025-03-27	arXiv	https://github.com/tongnie/awesome-llm4tr	https://doi.org/10.48550/arXiv.2503.21411
190	Large Language Model Agent: A Survey on Methodology, Applications and Challenges	Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu, Ming Zhang	2025-03-27	arXiv	https://github.com/luo-junyu/Awesome-Agent-Papers	https://doi.org/10.48550/arXiv.2503.21460
191	Dynamic Pyramid Network for Efficient Multimodal Large Language Model	Hao Ai, Kunyi Wang, Zezhou Wang, Hao Lu, Jin Tian, Yaxin Luo, Peng Xing, Jen-Yuan Huang, Huaxia Li, Gen luo	2025-03-26	arXiv	https://github.com/aihao2000/DPN-LLaVA	https://doi.org/10.48550/arXiv.2503.20322
192	Enhancing the Robustness of LLM-Generated Code: Empirical Study and Framework	ZiKe Li, MingWei Liu, Anji Li, Kaifeng He, Yanlin Wang, Xin Peng, Zibin Zheng	2025-03-26	arXiv	https://github.com/SYSUSELab/RobGen	http://arxiv.org/abs/2503.20197v1
193	Leveraging Implicit Sentiments: Enhancing Reliability and Validity in Psychological Trait Evaluation of LLMs	Huanhuan Ma, Haisong Gong, Xiaoyuan Yi, Xing Xie, Dongkuan Xu	2025-03-26	arXiv	https://github.com/dependentsign/CSI	http://arxiv.org/abs/2503.20182v1
194	Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy	Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang	2025-03-26	arXiv	https://github.com/naver-ai/JOOD	http://arxiv.org/abs/2503.20823v1
195	Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations	Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia	2025-03-26	arXiv	https://github.com/ttthhl/Protecting_Your_Video_Content	http://arxiv.org/abs/2503.21824v1
196	LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation	Sarah Martinson, Lingkai Kong, Cheol Woo Kim, Aparna Taneja, Milind Tambe	2025-03-25	arXiv	https://github.com/sarahmart/LLM-ABS-ARMMAN-prediction	http://arxiv.org/abs/2503.22719v1
197	QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition	Yuxuan Hu, Xiaodong Chen, Cuiping Li, Hong Chen, Jing Zhang	2025-03-25	arXiv	https://github.com/hyx1999/Quad	http://arxiv.org/abs/2503.19353v1
198	CoLLM: A Large Language Model for Composed Image Retrieval	Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava	2025-03-25	arXiv	https://collm-cvpr25.github.io/	https://doi.org/10.48550/arXiv.2503.19910
199	PAVE: Patching and Adapting Video Large Language Models	Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li	2025-03-25	arXiv	https://github.com/dragonlzm/PAVE	https://doi.org/10.48550/arXiv.2503.19794
200	CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models	Shuhao Zhang, Bo Cheng, Jiale Han, Yuli Chen, Zhixuan Wu, Changbao Li, Pingli Gu	2025-03-24	arXiv	https://github.com/DrankXs/BalancedWatermark	https://doi.org/10.48550/arXiv.2503.20802
201	I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders	Andrey V. Galichin, Alexey Dontsov, Polina Druzhinina, Anton Razzhigaev, Oleg Y. Rogov, Elena Tutubalina, Ivan V. Oseledets	2025-03-24	arXiv	https://github.com/AIRI-Institute/SAE-Reasoning	https://doi.org/10.48550/arXiv.2503.18878
202	LLaVAction: evaluating and training multi-modal large language models for action recognition	Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. Mathis	2025-03-24	arXiv	https://github.com/AdaptiveMotorControlLab/LLaVAction	https://doi.org/10.48550/arXiv.2503.18712
203	AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration	Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, Min Zhang	2025-03-24	arXiv	https://github.com/wangzx1219/AgentDropout	http://arxiv.org/abs/2503.18891v1
204	BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache	Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang	2025-03-24	arXiv	https://github.com/DD-DuDa/BitDecoding	http://arxiv.org/abs/2503.18773v1
205	Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?	Aabid Karim, Abdul Karim, Bhoomika Lohana, Matt Keon, Jaswinder Singh, Abdul Sattar	2025-03-23	arXiv	https://github.com/akarim23131/Lost_in_Cultural_Translation	http://arxiv.org/abs/2503.18018v1
206	Reasoning with LLMs for Zero-Shot Vulnerability Detection	Arastoo Zibaeirad, Marco Vieira	2025-03-22	arXiv	https://github.com/Erroristotle/VulnSage	http://arxiv.org/abs/2503.17885v1
207	Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models	Jiaming Ji, Xinyu Chen, Rui Pan, Han Zhu, Conghui Zhang, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang	2025-03-22	arXiv	https://github.com/SafeRLHF-V	https://doi.org/10.48550/arXiv.2503.17682
208	RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery	Silvia Izquierdo-Badiola, Carlos Rizzo, Guillem Alenyà	2025-03-22	arXiv	https://raider-llmagent.github.io/	https://doi.org/10.48550/arXiv.2503.17703
209	LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language	Kun Chu, Xufeng Zhao, Cornelius Weber, Stefan Wermter	2025-03-21	arXiv	https://github.com/Kchu/LLM-MAP	https://doi.org/10.48550/arXiv.2503.17309
210	TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment	Shicheng Li, Lei Li, Kun Ouyang, Shuhuai Ren, Yuanxin Liu, Yuanxing Zhang, Fuzheng Zhang, Lingpeng Kong, Qi Liu, Xu Sun	2025-03-21	arXiv	https://github.com/lscpku/TEMPLE	http://arxiv.org/abs/2503.16929v2
211	Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique	Yansi Li, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Qiuzhi Liu, Rui Wang, Zhuosheng Zhang, Zhaopeng Tu, Haitao Mi, Dong Yu	2025-03-21	arXiv	https://github.com/puddingyeah/PANEL	http://arxiv.org/abs/2503.17363v1
212	RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation	Linxi Liang, Jing Gong, Mingwei Liu, Chong Wang, Guangsheng Ou, Yanlin Wang, Xin Peng, Zibin Zheng	2025-03-21	arXiv	https://github.com/SYSUSELab/RustEvo	http://arxiv.org/abs/2503.16922v1
213	Variance Control via Weight Rescaling in LLM Pre-training	Louis Owen, Abhay Kumar, Nilabhra Roy Chowdhury, Fabian Güra	2025-03-21	arXiv	https://github.com/bluorion-com/weight_rescaling	http://arxiv.org/abs/2503.17500v1
214	MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion	Qizhi Pei, Lijun Wu, Zhuoshi Pan, Yu Li, Honglin Lin, Chenlin Ming, Xin Gao, Conghui He, Rui Yan	2025-03-20	arXiv	https://github.com/QizhiPei/mathfusion	http://arxiv.org/abs/2503.16212v1
215	Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't	Quy-Anh Dang, Chris Ngo	2025-03-20	arXiv	https://github.com/knoveleng/open-rs	http://arxiv.org/abs/2503.16219v1
216	The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination	Yifan Sun, Han Wang, Dongbai Li, Gang Wang, Huan Zhang	2025-03-20	arXiv	https://github.com/ASTRAL-Group/BDC_mitigation_assessment	http://arxiv.org/abs/2503.16402v1
217	Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie	2025-03-20	arXiv	https://github.com/lntzm/HICom	https://doi.org/10.48550/arXiv.2503.16036
218	Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models	Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Ben Hu	2025-03-20	arXiv	https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs	https://doi.org/10.48550/arXiv.2503.16419
219	Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning	Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, Chao Li, Sheng Xu, Dezhi Chen, Yun Chen, Zuo Bai, Liwen Zhang	2025-03-20	arXiv	https://github.com/SUFE-AIFLM-Lab/Fin-R1	https://doi.org/10.48550/arXiv.2503.16252
220	Exploring Large Language Models for Word Games:Who is the Spy?	Chentian Wei, Jiewei Chen, Jinzhu Xu	2025-03-19	arXiv	https://github.com/ct-wei/Who-is-The-Spy	https://doi.org/10.48550/arXiv.2503.15235
221	LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning	Federico Cocchi, Nicholas Moratelli, Davide Caffagni, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara	2025-03-19	arXiv	https://github.com/aimagelab/LLaVA-MORE	http://arxiv.org/abs/2503.15621v1
222	VisNumBench: Evaluating Number Sense of Multimodal Large Language Models	Tengjin Weng, Jingyi Wang, Wenhao Jiang, Zhong Ming	2025-03-19	arXiv	https://wwwtttjjj.github.io/VisNumBench/	https://doi.org/10.48550/arXiv.2503.14939
223	Aligning Multimodal LLM with Human Preference: A Survey	Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan	2025-03-18	arXiv	https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment	http://arxiv.org/abs/2503.14504v1
224	CodingGenie: A Proactive LLM-Powered Programming Assistant	Sebastian Zhao, Alan Zhu, Hussein Mozannar, David Sontag, Ameet Talwalkar, Valerie Chen	2025-03-18	arXiv	https://github.com/sebzhao/CodingGenie/	http://arxiv.org/abs/2503.14724v1
225	Learning on LLM Output Signatures for gray-box LLM Behavior Analysis	Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gelberg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, Haggai Maron	2025-03-18	arXiv	https://github.com/BarSGuy/LLM-Output-Signatures-Network	http://arxiv.org/abs/2503.14043v1
226	Word2Minecraft: Generating 3D Game Levels through Large Language Models	Shuo Huang, Muhammad Umair Nasir, Steven James, Julian Togelius	2025-03-18	arXiv	https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0	https://doi.org/10.48550/arXiv.2503.16536
227	SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Jiankang Wang, Zhihan Zhang, Zhihang Liu, Yang Li, Jiannan Ge, Hongtao Xie, Yongdong Zhang	2025-03-18	arXiv	https://github.com/Jayce1kk/SpaceVLLM	https://doi.org/10.48550/arXiv.2503.13983
228	Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning	Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Botian Shi, Ding Wang	2025-03-17	arXiv	https://github.com/Wings-Of-Disaster/VaLiK	http://arxiv.org/abs/2503.12972v1
229	Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos	Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari	2025-03-17	arXiv	https://github.com/google-research-datasets/egotempo	http://arxiv.org/abs/2503.13646v1
230	xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick M. Blies, Günter Klambauer, Sebastian Böck, Sepp Hochreiter	2025-03-17	arXiv	https://github.com/NX-AI/xlstm	http://arxiv.org/abs/2503.13427v1
231	NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models	Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran Wang	2025-03-17	arXiv	https://github.com/sungyeonparkk/NuPlanQA	https://doi.org/10.48550/arXiv.2503.12772
232	A Survey on the Memory Mechanism of Large Language Model based Agents	Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen	2025-03-16	arXiv	https://github.com/nuster1128/LLM_Agent_Memory_Survey	https://doi.org/10.48550/arXiv.2404.13501
233	SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression	Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang	2025-03-16	arXiv	https://github.com/AIoT-MLSys-Lab/SVD-LLM	https://doi.org/10.48550/arXiv.2503.12340
234	HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs	Tsz Chung Cheng, Chung Shing Cheng, Chaak Ming Lau, Eugene Tin-Ho Lam, Chun Yat Wong, Hoi On Yu, Cheuk Hei Chong	2025-03-16	arXiv	https://github.com/hon9kon9ize/hkeval2025	http://arxiv.org/abs/2503.12440v1
235	Plausibility Vaccine: Injecting LLM Knowledge for Event Plausibility	Jacob Chmura, Jonah Dauvet, Sebastian Sabry	2025-03-16	arXiv	https://github.com/Jacob-Chmura/plausibility-vaccine	http://arxiv.org/abs/2503.12667v1
236	FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents	Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szarvas, Xiaoyu Chu, Alexandru Iosup	2025-03-15	HotCloudPerf 2025	https://github.com/atlarge-research/FAILS	http://arxiv.org/abs/2503.12185v1
237	MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling	Zhaopeng Feng, Jiahan Ren, Jiayuan Su, Jiamei Zheng, Zhihang Tang, Hongwei Wang, Zuozhu Liu	2025-03-15	arXiv	https://sabijun.github.io/MT_RewardTreePage	http://arxiv.org/abs/2503.12123v1
238	An LLM-Integrated Framework for Completion, Management, and Tracing of STPA	Ali Raeisdanaei, Juho Kim, Michael Liao, Sparsh Kochhar	2025-03-15	arXiv	https://github.com/blueskysolarracing/stpa	http://arxiv.org/abs/2503.12043v1
239	A Survey on Federated Fine-tuning of Large Language Models	Yebo Wu, Chunlin Tian, Jingguang Li, He Sun, Kahou Tam, Li Li, Chengzhong Xu	2025-03-15	arXiv	https://github.com/Clin0212/Awesome-Federated-LLM-Learning	https://doi.org/10.48550/arXiv.2503.12016
240	CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning	Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Norgaard, Nayantara Mudur, Martyna Plomecka, Paul Raccuglia, Yasaman Bahri, Victor V. Albert, Pranesh Srinivasan, Haining Pan, Philippe Faist, Brian Rohr, Ekin Dogus Cubuk, Muratahan Aykol, Amil Merchant, Michael J. Statt, Dan Morris, Drew Purves, Elise Kleeman, Ruth Alcantara, Matthew Abraham, Muqthar Mohammad, Ean Phing VanLee, Chenfei Jiang, Elizabeth Dorfman, Eun-Ah Kim, Michael P Brenner, Viren Jain, Sameera Ponda, Subhashini Venugopalan	2025-03-14	arXiv	https://github.com/google/curie	http://arxiv.org/abs/2503.13517v2
241	FastVID: Dynamic Density Pruning for Fast Video Large Language Models	Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding	2025-03-14	arXiv	https://github.com/LunarShen/FastVID	https://doi.org/10.48550/arXiv.2503.11187
242	Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space	Weichen Zhan, Zile Zhou, Zhiheng Zheng, Chen Gao, Jinqiang Cui, Yong Li, Xinlei Chen, Xiao-Ping Zhang	2025-03-14	arXiv	https://github.com/WeichenZh/Open3DVQA	https://doi.org/10.48550/arXiv.2503.11094
243	ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning	Xinyi Wang, Jiashui Wang, Peng Chen, Jinbo Su, Yanming Liu, Long Liu, Yangdong Wang, Qiyuan Chen, Kai Yun, Chunfu Jia	2025-03-14	arXiv	https://github.com/wxy3596/ASMA-Tune	http://arxiv.org/abs/2503.11617v1
244	Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space	Zhiliang Chen, Xinyuan Niu, Chuan-Sheng Foo, Bryan Kian Hsiang Low	2025-03-14	arXiv	https://github.com/chenzhiliang94/convo-plan-SCOPE	http://arxiv.org/abs/2503.11586v1
245	MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens	Jeong Hun Yeo, Hyeongseop Rha, Se Jin Park, Yong Man Ro	2025-03-14	arXiv	https://github.com/JeongHun0716/MMS-LLaMA	http://arxiv.org/abs/2503.11315v1
246	TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models	Xudong Tan, Peng Ye, Chongjun Tu, Jianjian Cao, Yaoxin Yang, Lin Zhang, Dongzhan Zhou, Tao Chen	2025-03-13	arXiv	https://github.com/ShawnTan86/TokenCarve	https://doi.org/10.48550/arXiv.2503.10501
247	ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs	Xin Liu, Pei Liu, Guoming Tang	2025-03-13	arXiv	https://github.com/SusCom-Lab/ZeroMerge	http://arxiv.org/abs/2503.10714v1
248	RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs	Zhongzhan Huang, Guoming Ling, Vincent S. Liang, Yupei Lin, Yandong Chen, Shanshan Zhong, Hefeng Wu, Liang Lin	2025-03-13	GoogleScholar	https://github.com/MilkThink-Lab/RouterEval	http://arxiv.org/abs/2503.10657v1
249	Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set	Florian Eichin, Yang Janet Liu, Barbara Plank, Michael A. Hedderich	2025-03-13	arXiv	https://github.com/mainlp/discourse_probes	http://arxiv.org/abs/2503.10515v1
250	ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs	Xin Liu, Pei Liu, Guoming Tang	2025-03-13	arXiv	https://github.com/SusCom-Lab/ZSMerge	http://arxiv.org/abs/2503.10714v2
251	Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs	Ariba Khan, Stephen Casper, Dylan Hadfield-Menell	2025-03-13	arXiv:2503.08688, 2025	https://github.com/ariba-k/llm-cultural-alignment-evaluation	http://arxiv.org/abs/2503.08688v1
252	Route Sparse Autoencoder to Interpret Large Language Models	Wei Shi, Sihang Li, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang, Xiangnan He	2025-03-13	arXiv	https://github.com/swei2001/RouteSAEs	https://doi.org/10.48550/arXiv.2503.08200
253	OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model	Bowen Zhang, Pengcheng Luo	2025-03-13	arXiv	https://github.com/bwz96sco/or_llm_agent	https://doi.org/10.48550/arXiv.2503.10009
254	Learning to Inference Adaptively for Multimodal Large Language Models	Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Saurabh Bagchi, Somali Chaterji, Yingyu Liang, Yin Li	2025-03-13	arXiv	https://zhuoyan-xu.github.io/ada-llava/	https://doi.org/10.48550/arXiv.2503.10905
255	Adapting Large Language Models for Parameter-Efficient Log Anomaly Detection	Ying Fu Lim, Jiawen Zhu, Guansong Pang	2025-03-13	arXiv	https://github.com/mala-lab/LogADReft	https://doi.org/10.48550/arXiv.2503.08045
256	4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models	Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister	2025-03-13	arXiv	https://4d-langsplat.github.io	https://doi.org/10.48550/arXiv.2503.10437
257	Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs	Jiani Huang, Shijie Wang, Liang-bo Ning, Wenqi Fan, Shuaiqiang Wang, Dawei Yin, Qing Li	2025-03-12	arXiv	https://github.com/jiani-huang/RecBench	http://arxiv.org/abs/2503.09382v1
258	RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports	Jiushen Cai, Weihang Zhang, Hanruo Liu, Ningli Wang, Huiqi Li	2025-03-12	arXiv	https://github.com/AB-Story/RetSTA-7B	http://arxiv.org/abs/2503.09358v1
259	What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models	Abhipsha Das, Nicholas Lourie, Siavash Golkar, Mariel Pettee	2025-03-12	arXiv	https://github.com/chiral-carbon/kg-for-science	http://arxiv.org/abs/2503.09894v1
260	Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents	Dongjun Lee, Juyong Lee, Kyuyoung Kim, Jihoon Tack, Jinwoo Shin, Yee Whye Teh, Kimin Lee	2025-03-12	arXiv	https://lcowiclr2025.github.io	http://arxiv.org/abs/2503.10689v1
261	CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data	Adel ElZemity, Budi Arief, Shujun Li	2025-03-12	arXiv	https://github.com/Adelsamir01/CyberLLMInstruct	http://arxiv.org/abs/2503.09334v1
262	CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection	Richard A. Dubniczky, Krisztofer Zoltán Horvát, Tamás Bisztray, Mohamed Amine Ferrag, Lucas C. Cordeiro, Norbert Tihanyi	2025-03-12	arXiv	https://github.com/CASTLE-Benchmark	http://arxiv.org/abs/2503.09433v1
263	Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models	Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wangxiang Che	2025-03-12	arXiv	https://long-cot.github.io/	https://doi.org/10.48550/arXiv.2503.09567
264	MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization	Zongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan	2025-03-12	arXiv	https://github.com/ZongwuWang/MILLION	http://arxiv.org/abs/2504.03661v1
265	BYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More Excellent	Hongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Mingjie Xing, Yanjun Wu	2025-03-12	arXiv	https://github.com/LHY-24/BYOS	https://doi.org/10.48550/arXiv.2503.09663
266	Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning	Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han	2025-03-12	arXiv	https://github.com/PeterGriffinJin/Search-R1	http://arxiv.org/abs/2503.09516v1
267	NVP-HRI: Zero shot natural voice and posture-based human-robot interaction via large language model	Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Thomas Weber, Matthias Rätsch	2025-03-12	Expert Syst. Appl.	https://github.com/laiyuzhi/NVP-HRI	https://doi.org/10.1016/j.eswa.2024.126360
268	Process-Supervised LLM Recommenders via Flow-guided Tuning	Chongming Gao, Mengyao Gao, Chenxiao Fan, Shuai Yuan, Wentao Shi, Xiangnan He	2025-03-11	arXiv …, 2025	https://github.com/Mr-Peach0301/Flower	http://arxiv.org/abs/2503.07377v1
269	Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset	Anand Menon, Samit S. Miftah, Shamik Kundu, Souvik Kundu, Amisha Srivastava, Arnab Raha, Gabriel Theodor Sonnenschein, Suvadeep Banerjee, Deepak Mathaikutty, Kanad Basu	2025-03-11	arXiv	https://github.com/AnandMenon12/VERT	https://doi.org/10.48550/arXiv.2503.08923
270	V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation	Guiwei Zhang, Tianyu Zhang, Mohan Zhou, Yalong Bai, Biye Li	2025-03-11	arXiv	https://github.com/zhangguiwei610/V2Flow	https://doi.org/10.48550/arXiv.2503.07493
271	DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs	Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun	2025-03-11	arXiv …, 2025	https://github.com/jongwooko/distillm-2	http://arxiv.org/abs/2503.07067v1
272	ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration	Mengting Ai, Tianxin Wei, Yifan Chen, Zhichen Zeng, Ritchie Zhao, Girish Varatkar, Bita Darvish Rouhani, Xianfeng Tang, Hanghang Tong, Jingrui He	2025-03-11	arXiv …, 2025	https://github.com/iDEA-iSAIL-Lab-UIUC/ResMoE	http://arxiv.org/abs/2503.06881v1
273	Graphormer-Guided Task Planning: Beyond Static Rules with LLM Safety Perception	Wanjing Huang, Tongjie Pan, Yalan Ye	2025-03-11	arXiv:2503.06866, 2025	https://github.com/hwj20/GGTP	http://arxiv.org/abs/2503.06866v1
274	Roamify: Designing and Evaluating an LLM Based Google Chrome Extension for Personalised Itinerary Planning	Vikranth Udandarao, Noel Abraham Tiju, Muthuraj Vairamuthu, Harsh Mistry, Dhruv Kumar	2025-03-10	arXiv	https://github.com/Roamify-Research/Roamify	http://arxiv.org/abs/2504.10489v1
275	AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot	Xiao Wang, Lu Dong, Sahana Rangasrinivasan, Ifeoma Nwogu, Srirangaraj Setlur, Venugopal Govindaraju	2025-03-09	arXiv	https://wangxiaoshawn.github.io/AutoMisty.html	http://arxiv.org/abs/2503.06791v1
276	How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders	Tatsuro Inaba, Kentaro Inui, Yusuke Miyao, Yohei Oseki, Benjamin Heinzerling, Yu Takagi	2025-03-09	arXiv	https://github.com/llm-jp/llm-jp-sae	http://arxiv.org/abs/2503.06394v1
277	DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments	Wenjie Tang, Yuan Zhou, Erqiang Xu, Keyan Cheng, Minne Li, Liquan Xiao	2025-03-08	arXiv	https://github.com/DeciBrain-Group/DSGBench	http://arxiv.org/abs/2503.06047v1
278	Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices	Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen	2025-03-08	arXiv	https://github.com/EIT-NLP/Layer_Select_Fuse_for_MLLM	http://arxiv.org/abs/2503.06063v1
279	SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?	Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li	2025-03-08	arXiv	https://github.com/Lucky-Lance/SmartBench	http://arxiv.org/abs/2503.06029v1
280	Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching	Simon A. Aytes, Jinheon Baek, Sung Ju Hwang	2025-03-07	arXiv	https://www.github.com/SimonAytes/SoT	http://arxiv.org/abs/2503.05179v1
281	RocketEval: Efficient Automated LLM Evaluation via Grading Checklist	Tianjun Wei, Wei Wen, Ruizhi Qiao, Xing Sun, Jianghong Ma	2025-03-07	arXiv	https://github.com/Joinn99/RocketEval-ICLR	http://arxiv.org/abs/2503.05142v1
282	A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval	Yu Zhang, Shutong Qiao, Jiaqi Zhang, Tzu-Heng Lin, Chen Gao, Yong Li	2025-03-07	arXiv	https://github.com/tsinghua-fib-lab/LLM-Agent-for-Recommendation-and-Search	https://doi.org/10.48550/arXiv.2503.05659
283	Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching	Bowen Pang, Kai Li, Feifan Wang	2025-03-07	arXiv	https://github.com/KevinLee1110/dynamic-batching	http://arxiv.org/abs/2503.05248v1
284	TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge	Cheng-Han Chiang, Hung-yi Lee, Michal Lukasik	2025-03-06	arXiv	https://github.com/d223302/TRACT	http://arxiv.org/abs/2503.04381v1
285	Insights from Rights and Wrongs: A Large Language Model for Solving Assertion Failures in RTL Design	Jie Zhou, Youshu Ji, Ning Wang, Yuchen Hu, Xinyao Jiao, Bingkun Yao, Xinwei Fang, Shuai Zhao, Nan Guan, Zhe Jiang	2025-03-06	arXiv	https://github.com/SEU-ACAL/reproduce-AssertSolver-DAC-25	https://doi.org/10.48550/arXiv.2503.04057
286	Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model	Wenke Huang, Jian Liang, Xianda Guo, Yiyang Fang, Guancheng Wan, Xuankun Rong, Chi Wen, Zekun Shi, Qingyun Li, Didi Zhu, Yanbiao Ma, Ke Liang, Bin Yang, He Li, Jiawei Shao, Mang Ye, Bo Du	2025-03-06	arXiv	https://github.com/WenkeHuang/Awesome-MLLM-Tuning	https://doi.org/10.48550/arXiv.2503.04543
287	Predictable Scale: Part I - Optimal Hyperparameter Scaling Law in Large Language Model Pretraining	Houyi Li, Wenzheng Zheng, Jingcheng Hu, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang	2025-03-06	arXiv	https://step-law.github.io/	https://doi.org/10.48550/arXiv.2503.04715
288	Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation	Armel Zebaze, Benoît Sagot, Rachel Bawden	2025-03-06	arXiv	https://github.com/ArmelRandy/compositional-translation	http://arxiv.org/abs/2503.04554v1
289	DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation	Amin Karimi, Charalambos Poullis	2025-03-06	arXiv	https://github.com/aminpdik/DSV-LFS	http://arxiv.org/abs/2503.04006v1
290	Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English	Runtao Zhou, Guangya Wan, Saadia Gabriel, Sheng Li, Alexander J Gates, Maarten Sap, Thomas Hartvigsen	2025-03-06	arXiv	https://github.com/Runtaozhou/dialect_bias_eval	http://arxiv.org/abs/2503.04099v1
291	Lost in Literalism: How Supervised Training Shapes Translationese in LLMs	Yafu Li, Ronghao Zhang, Zhilin Wang, Huajian Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang	2025-03-06	arXiv	https://github.com/yafuly/LLM_Translationese	http://arxiv.org/abs/2503.04369v1
292	AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks	Javier Yong, Haokai Ma, Yunshan Ma, Anis Yusof, Zhenkai Liang, Ee-Chien Chang	2025-03-05	arXiv	https://github.com/Javiery3889/AttackSeqBench	https://doi.org/10.48550/arXiv.2503.03170
293	LeRAAT: LLM-Enabled Real-Time Aviation Advisory Tool	Marc R. Schlichting, Vale Rasmussen, Heba Alazzeh, Houjun Liu, Kiana Jafari, Amelia F. Hardy, Dylan M. Asmar, Mykel J. Kochenderfer	2025-03-05	arXiv	https://github.com/sisl/LeRAAT/	http://arxiv.org/abs/2503.16477v1
294	Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence	Cristian Jimenez-Romero, Alper Yegenoglu, Christian Blum	2025-03-05	arXiv	https://github.com/crjimene/swarm_gpt	https://doi.org/10.48550/arXiv.2503.03800
295	Improving LLM Safety Alignment with Dual-Objective Optimization	Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song	2025-03-05	arXiv	https://github.com/wicai24/DOOR-Alignment	http://arxiv.org/abs/2503.03710v1
296	LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models	Xi Zhu, Haochen Xue, Ziwei Zhao, Wujiang Xu, Jingyuan Huang, Minghao Guo, Qifan Wang, Kaixiong Zhou, Yongfeng Zhang	2025-03-05	arXiv	https://github.com/agiresearch/PromptGFM	http://arxiv.org/abs/2503.03313v1
297	ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks	Heng Zhou, Hejia Geng, Xiangyuan Xue, Zhenfei Yin, Lei Bai	2025-03-04	arXiv	https://github.com/hengzzzhou/ReSo	http://arxiv.org/abs/2503.02390v2
298	Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs	Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen	2025-03-04	arXiv	https://github.com/open-compass/ANAH	http://arxiv.org/abs/2503.02846v1
299	Wikipedia in the Era of LLMs: Evolution and Risks	Siming Huang, Yuliang Xu, Mingmeng Geng, Yao Wan, Dongping Chen	2025-03-04	arXiv	https://github.com/HSM316/LLM_Wikipedia	http://arxiv.org/abs/2503.02879v1
300	Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization	Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, Tat-Seng Chua	2025-03-04	arXiv	https://github.com/SnowCharmQ/DPL	http://arxiv.org/abs/2503.02450v1
301	Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers	Zicong He, Boxuan Zhang, Lu Cheng	2025-03-04	arXiv	https://github.com/ZicongHe2002/HCL-Spark	http://arxiv.org/abs/2503.02851v1
302	It Helps to Take a Second Opinion: Teaching Smaller LLMs to Deliberate Mutually via Selective Rationale Optimisation	Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy	2025-03-04	arXiv	https://github.com/Sohanpatnaik106/coalition	http://arxiv.org/abs/2503.02463v1
303	PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models	Xueliang Zhao, Wei Wu, Jian Guan, Lingpeng Kong	2025-03-04	arXiv	https://github.com/zhaoxlpku/PromptCoT	https://doi.org/10.48550/arXiv.2503.02324
304	LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models	Pengwei Tang, Yong Liu, Dongjie Zhang, Xing Wu, Debing Zhang	2025-03-04	arXiv	https://github.com/HungerPWAY/LoRA-Null	https://doi.org/10.48550/arXiv.2503.02659
305	Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions	Zirui Wu, Xiao Liu, Jiayi Li, Lingpeng Kong, Yansong Feng	2025-03-04	arXiv	https://github.com/WilliamZR/Recipe2Plan	http://arxiv.org/abs/2503.02238v1
306	Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs	Wei-Yao Wang, Zhao Wang, Helen Suzuki, Yoshiyuki Kobayashi	2025-03-04	arXiv	https://github.com/sony/aki	http://arxiv.org/abs/2503.02597v1
307	CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom	Yisen Li, Lingfeng Yang, Wenxuan Shen, Pan Zhou, Yao Wan, Weiwei Lin, Dongping Chen	2025-03-03	arXiv	https://github.com/listentm/crowdselect	http://arxiv.org/abs/2503.01836v1
308	Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia	Chenxi Wang, Tianle Gu, Zhongyu Wei, Lang Gao, Zirui Song, Xiuying Chen	2025-03-03	arXiv	https://github.com/Aurora-cx/TypoLLM	http://arxiv.org/abs/2503.01714v1
309	Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens	Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue	2025-03-03	arXiv	https://github.com/SparkAudio/Spark-TTS	http://arxiv.org/abs/2503.01710v1
310	Liger: Linearizing Large Language Models to Gated Recurrent Structures	Disen Lan, Weigao Sun, Jiaxi Hu, Jusen Du, Yu Cheng	2025-03-03	arXiv	https://github.com/OpenSparseLLMs/Linearization	https://doi.org/10.48550/arXiv.2503.01496
311	MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents	Kunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, Jiaxuan You	2025-03-03	arXiv	https://github.com/MultiagentBench/MARBLE	http://arxiv.org/abs/2503.01935v1
312	Position: Don't use the CLT in LLM evals with fewer than a few hundred datapoints	Sam Bowyer, Laurence Aitchison, Desi R. Ivanova	2025-03-03	arXiv	https://github.com/sambowyer/bayes_evals	http://arxiv.org/abs/2503.01747v2
313	Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models	Tianjie Ju, Yi Hua, Hao Fei, Zhenyu Shao, Yubin Zheng, Haodong Zhao, Mong-Li Lee, Wynne Hsu, Zhuosheng Zhang, Gongshen Liu	2025-03-03	arXiv	https://github.com/illusionhi/ProbingPrivacy	https://doi.org/10.48550/arXiv.2503.01208
314	Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace	Jia-Chen Zhang, Yu-Jie Xiong, Chun-Ming Xia, Dong-Hai Zhu, Xi-He Qiu	2025-03-03	COLING	https://github.com/Godz-z/DCFT	https://aclanthology.org/2025.coling-main.265/
315	OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization based on CFD	Yuxuan Chen, Long Zhang, Xu Zhu, Hua Zhou, Zhuyin Ren	2025-03-03	arXiv	https://github.com/Terry-cyx/MetaOpenFOAM	https://doi.org/10.48550/arXiv.2503.01273
316	Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios	Bryan Chen Zhengyu Tan, Roy Ka-Wei Lee	2025-03-03	arXiv	https://inc0mple.github.io/Implicit_Bias_Interactive_Data_Viz	http://arxiv.org/abs/2503.01532v1
317	MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages	Chen Zhang, Mingxu Tao, Zhiyuan Liao, Yansong Feng	2025-03-03	arXiv	https://github.com/luciusssss/MiLiC-Eval	http://arxiv.org/abs/2503.01150v1
318	Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity	Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao	2025-03-02	arXiv	https://github.com/hypasd-art/ETAPP	http://arxiv.org/abs/2503.00771v1
319	HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning	Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H. F. Ng, Qing Li	2025-03-02	arXiv	https://github.com/jzzzzh/HiBench	http://arxiv.org/abs/2503.00912v1
320	LLMDR: LLM-Driven Deadlock Detection and Resolution in Multi-Agent Pathfinding	Seungbae Seo, Junghwan Kim, Minjeong Shin, Bongwon Suh	2025-03-02	arXiv	https://github.com/ssbacc/llmdr-dhc	http://arxiv.org/abs/2503.00717v1
321	Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions	Shiyu Fang, Jiaqi Liu, Chengkai Xu, Chen Lv, Peng Hang, Jian Sun	2025-03-01	arXiv	https://github.com/FanGShiYuu/Actor-Reasoner	http://arxiv.org/abs/2503.00502v1
322	U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack	Yunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen Wang	2025-03-01	arXiv	https://github.com/Tongji-KGLLM/U-NIAH	http://arxiv.org/abs/2503.00353v1
323	LLM Post-Training: A Deep Dive into Reasoning Large Language Models	Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Salman H. Khan, Fahad Shahbaz Khan	2025-02-28	arXiv	https://github.com/mbzuai-oryx/Awesome-LLM-Post-training	https://doi.org/10.48550/arXiv.2502.21321
324	Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs	Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy, Abdelrahim A. Elmadany, Omer Nacar, El Moatez Billah Nagoudi, Reem Abdel-Salam, Hanin Atwany, Youssef Nafea, Abdulfattah Mohammed Yahya, Rahaf Alhamouri, Hamzah A. Alsayadi, Hiba Zayed, Sara Shatnawi, Serry Sibaee, Yasir Ech-Chammakhy, Walid Al-Dhabyani, Marwa Mohamed Ali, Imen Jarraya, Ahmed Oumar El-Shangiti, Aisha Alraeesi, Mohammed Anwar Al-Ghrawi, Abdulrahman S. Al-Batati, Elgizouli Mohamed, Noha Taha Elgindi, Muhammed Saeed, Houdaifa Atou, Issam Ait Yahia, Abdelhak Bouayad, Mohammed Machrouh, Amal Makouar, Dania Alkawi, Mukhtar Mohamed, Safaa Taher Abdelfadil, Amine Ziad Ounnoughene, Rouabhia Anfel, Rwaa Assi, Ahmed Sorkatti, Mohamedou Cheikh Tourad, Anis Koubaa, Ismail Berrada, Mustafa Jarrar, Shady Shehata, Muhammad Abdul-Mageed	2025-02-28	arXiv	https://github.com/UBC-NLP/palm	http://arxiv.org/abs/2503.00151v1
325	UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation	Thanet Markchom, Tong Wu, Liting Huang, Huizhi Liang	2025-02-28	arXiv	https://github.com/tongwu17/SemEval-2025-Task1-UoR-NCL	http://arxiv.org/abs/2502.20984v2
326	InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation	Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma	2025-02-28	arXiv	https://github.com/FunAudioLLM/InspireMusic	https://doi.org/10.48550/arXiv.2503.00084
327	DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning	Pengcheng Jiang, Jiacheng Lin, Lang Cao, Runchu Tian, SeongKu Kang, Zifeng Wang, Jimeng Sun, Jiawei Han	2025-02-28	arXiv	https://github.com/pat-jj/DeepRetrieval	https://doi.org/10.48550/arXiv.2503.00223
328	Self-Training Elicits Concise Reasoning in Large Language Models	Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun	2025-02-27	arXiv	https://github.com/TergelMunkhbat/concise-reasoning	https://doi.org/10.48550/arXiv.2502.20122
329	Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis	Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen	2025-02-27	arXiv	http://vulnerable-ai-agents.github.io	http://arxiv.org/abs/2502.20383v1
330	SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks	Nikolay Blagoev, Lydia Yiyu Chen, Oğuzhan Ersoy	2025-02-27	arXiv	https://github.com/gensyn-ai/skippipe	http://arxiv.org/abs/2502.19913v1
331	LongRoPE2: Near-Lossless LLM Context Window Scaling	Ning Shang, Li Lyna Zhang, Siyuan Wang, Gaokai Zhang, Gilsinia Lopez, Fan Yang, Weizhu Chen, Mao Yang	2025-02-27	arXiv	https://github.com/microsoft/LongRoPE	http://arxiv.org/abs/2502.20082v1
332	ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving	Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang	2025-02-27	arXiv	https://github.com/agiresearch/ECCOS	http://arxiv.org/abs/2502.20576v2
333	Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents	Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel Kang	2025-02-27	arXiv	https://github.com/uiuc-kang-lab/AdaptiveAttackAgent	http://arxiv.org/abs/2503.00061v2
334	A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs	Julius Broomfield, Kartik Sharma, Srijan Kumar	2025-02-27	arXiv	https://github.com/claws-lab/persona-modality	http://arxiv.org/abs/2502.20504v1
335	SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model	Xinghao Wang, Feng Liu, Rui Su, Zhihui Wang, Lei Bai, Wanli Ouyang	2025-02-27	arXiv	https://github.com/StarMoonWang/SeisMoLLM	https://doi.org/10.48550/arXiv.2502.19960
336	Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models	Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao	2025-02-27	arXiv	https://github.com/MaybeLizzy/UGBench	https://doi.org/10.48550/arXiv.2502.19982
337	Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Haochen Sun, Shuwen Zhang, Lei Ren, Hao Xu, Hao Fu, Caixia Yuan, Xiaojie Wang	2025-02-27	arXiv	https://github.com/YusaeMeow/Collab-Overcooked	https://doi.org/10.48550/arXiv.2502.20073
338	Beneath the Surface: How Large Language Models Reflect Hidden Bias	Jinhao Pan, Chahat Raj, Ziyu Yao, Ziwei Zhu	2025-02-27	arXiv	https://github.com/JP-25/Hidden-Bias-Benchmark	https://doi.org/10.48550/arXiv.2502.19749
339	Foot-In-The-Door: A Multi-turn Jailbreak for LLMs	Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang	2025-02-27	arXiv	https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak	http://arxiv.org/abs/2502.19820v2
340	Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS	Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang	2025-02-27	arXiv	https://github.com/agiresearch/ECCOS	http://arxiv.org/abs/2502.20576v4
341	Protecting multimodal large language models against misleading visualizations	Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych	2025-02-27	arXiv	https://github.com/UKPLab/arxiv2025-misleading-visualizations	https://doi.org/10.48550/arXiv.2502.20503
342	AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms	Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, Yong Li	2025-02-26	arXiv	https://tsinghua-fib-lab.github.io/AgentSocietyChallenge	http://arxiv.org/abs/2502.18754v1
343	TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation	Chenlu Ju, Jiaxin Liu, Shobhit Sinha, Hao Xue, Flora Salim	2025-02-26	arXiv	https://github.com/cju0/TrajLLM	http://arxiv.org/abs/2502.18712v1
344	Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs	Yiheng Yang, Yujie Wang, Chi Ma, Lei Yu, Emmanuele Chersoni, Chu-Ren Huang	2025-02-26	arXiv	https://github.com/Oldify/CLADA	http://arxiv.org/abs/2502.19078v1
345	Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs	Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian McAuley	2025-02-26	arXiv	https://github.com/dayuyang1999/Awesome-Code-Reasoning	http://arxiv.org/abs/2502.19411v1
346	Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs	Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang	2025-02-26	arXiv	https://zowiezhang.github.io/projects/Amulet	http://arxiv.org/abs/2502.19148v1
347	Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation	Yuxiang Wang, Xinnan Dai, Wenqi Fan, Yao Ma	2025-02-26	arXiv	https://github.com/myflashbarry/LLM-benchmarking	http://arxiv.org/abs/2502.18771v1
348	OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models	Hui Feng, Yuntzu Yin, Emiliano Reynares, Jay Nanavati	2025-02-26	arXiv	https://github.com/iqvianlp/ontologyRAG	https://doi.org/10.48550/arXiv.2502.18992
349	A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs	Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Angelica I Aviles-Rivero, Chuanlong Xie, Yao Zhu	2025-02-26	arXiv	https://github.com/920927/SLM-a-sliding-layer-merging-method	http://arxiv.org/abs/2502.19159v3
350	JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models	Shuyi Liu, Simiao Cui, Haoran Bu, Yuming Shang, Xi Zhang	2025-02-26	arXiv	https://github.com/STAIR-BUPT/JailBench	https://doi.org/10.48550/arXiv.2502.18935
351	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models	Danae Sánchez Villegas, Ingo Ziegler, Desmond Elliott	2025-02-26	arXiv	https://github.com/danaesavi/ImageChain	https://doi.org/10.48550/arXiv.2502.19409
352	Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models	Shuliang Liu, Xinze Li, Zhenghao Liu, Yukun Yan, Cheng Yang, Zheni Zeng, Zhiyuan Liu, Maosong Sun, Ge Yu	2025-02-26	arXiv	https://github.com/OpenBMB/ConsJudge	https://doi.org/10.48550/arXiv.2502.18817
353	Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features	Shinwoo Park, Hyundong Jin, Jeong-won Cha, Yo-Sub Han	2025-02-25	arXiv	https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features	http://arxiv.org/abs/2502.17749v2
354	Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers	Hannah Calzi Kleidermacher, James Zou	2025-02-25	arXiv	https://hankleid.github.io/ProjectMundo	http://arxiv.org/abs/2502.17882v1
355	RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction	Jianhao Yan, Yun Luo, Yue Zhang	2025-02-25	arXiv	https://github.com/ElliottYan/RefuteBench-2.0	http://arxiv.org/abs/2502.18308v1
356	Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs	Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst	2025-02-25	arXiv	https://github.com/gayecolakoglu/LayIE-LLM	http://arxiv.org/abs/2502.18179v1
357	LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena	Tianmi Ma, Jiawei Du, Wenxin Huang, Wenjie Wang, Liang Xie, Xian Zhong, Joey Tianyi Zhou	2025-02-25	arXiv	https://github.com/wekjsdvnm/Agent-Trading-Arena	http://arxiv.org/abs/2502.17967v1
358	Detecting LLM-Generated Korean Text through Linguistic Feature Analysis	Shinwoo Park, Shubin Kim, Do-Kyung Kim, Yo-Sub Han	2025-02-25	arXiv	https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis	http://arxiv.org/abs/2503.00032v2
359	Can Multimodal LLMs Perform Time Series Anomaly Detection?	Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S. Yu, Yue Zhao, Kai Shu	2025-02-25	arXiv	https://mllm-ts.github.io	http://arxiv.org/abs/2502.17812v1
360	Scalable Best-of-N Selection for Large Language Models via Self-Certainty	Zhewei Kang, Xuandong Zhao, Dawn Song	2025-02-25	arXiv	https://github.com/backprop07/Self-Certainty	https://doi.org/10.48550/arXiv.2502.18581
361	LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation	Pengzhi Li, Pengfei Yu, Zide Liu, Wei He, Xuhao Pan, Xudong Rao, Tao Wei, Wei Chen	2025-02-25	arXiv	https://zrealli.github.io/LDGen	https://doi.org/10.48550/arXiv.2502.18302
362	Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference	Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang, Kewei Tu	2025-02-25	arXiv	https://github.com/Chord-Chen-30/VLLM-KnowledgeBoundary	https://doi.org/10.48550/arXiv.2502.18023
363	Harnessing Multiple Large Language Models: A Survey on LLM Ensemble	Zhijun Chen, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Dingqi Yang, Hailong Sun, Philip S. Yu	2025-02-25	arXiv	https://github.com/junchenzhi/Awesome-LLM-Ensemble	https://doi.org/10.48550/arXiv.2502.18036
364	Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data	Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang	2025-02-25	arXiv	https://github.com/Optimization-AI/DFT	https://doi.org/10.48550/arXiv.2502.18679
365	Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs	Himanshu Beniwal, Sailesh Panda, Mayank Singh	2025-02-24	arXiv	https://github.com/himanshubeniwal/X-BAT	http://arxiv.org/abs/2502.16901v1
366	MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs	Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski	2025-02-24	arXiv	https://github.com/saccharomycetes/mllms_know	http://arxiv.org/abs/2502.17422v1
367	From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs	Ruxiao Chen, Chenguang Wang, Yuran Sun, Xilei Zhao, Susu Xu	2025-02-24	arXiv	https://github.com/SusuXu-s-Lab/FLARE	http://arxiv.org/abs/2502.17701v1
368	Delta Decompression for MoE-based LLMs Compression	Hao Gu, Wei Li, Lujun Li, Qiyuan Zhu, Mark Lee, Shengjie Sun, Wei Xue, Yike Guo	2025-02-24	arXiv	https://github.com/lliai/D2MoE	http://arxiv.org/abs/2502.17298v1
369	ConvoyLLM: Dynamic Multi-Lane Convoy Control Using LLMs	Liping Lu, Zhican He, Duanfeng Chu, Rukang Wang, Saiqian Peng, Pan Zhou	2025-02-24	arXiv	https://github.com/chuduanfeng/ConvoyLLM	http://arxiv.org/abs/2502.17529v2
370	CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought	Boxuan Zhang, Ruqi Zhang	2025-02-24	arXiv	https://github.com/ZBox1005/CoT-UQ	http://arxiv.org/abs/2502.17214v1
371	Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing	Yi-Kai Zhang, De-Chuan Zhan, Han-Jia Ye	2025-02-24	arXiv	https://cit-llm-routing.github.io	http://arxiv.org/abs/2502.17282v1
372	COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs	Liming Liu, Zhenghao Xu, Zixuan Zhang, Hao Kang, Zichong Li, Chen Liang, Weizhu Chen, Tuo Zhao	2025-02-24	arXiv	https://github.com/lliu606/COSMOS	http://arxiv.org/abs/2502.17410v2
373	On Relation-Specific Neurons in Large Language Models	Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze	2025-02-24	arXiv	https://github.com/cisnlp/relation-specific-neurons	https://doi.org/10.48550/arXiv.2502.17355
374	LongSafety: Evaluating Long-Context Safety of Large Language Models	Yida Lu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Cunxiang Wang, Xiaotao Gu, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang	2025-02-24	arXiv	https://github.com/thu-coai/LongSafety	https://doi.org/10.48550/arXiv.2502.16971
375	LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences	Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu	2025-02-24	arXiv	https://github.com/NEUIR/LLM-QE	https://doi.org/10.48550/arXiv.2502.17057
376	Introducing Visual Perception Token into Multimodal Large Language Model	Runpeng Yu, Xinyin Ma, Xinchao Wang	2025-02-24	arXiv	https://github.com/yu-rp/VisualPerceptionToken	https://doi.org/10.48550/arXiv.2502.17425
377	LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models	Zhenyu Wang	2025-02-24	arXiv	https://github.com/zhenyu-02/LogitLens4LLMs	https://doi.org/10.48550/arXiv.2503.11667
378	From System 1 to System 2: A Survey of Reasoning Large Language Models	Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhiwei Li, Bao-Long Bi, Ling-Rui Mei, Junfeng Fang, Zhijiang Guo, Le Song, Cheng-Lin Liu	2025-02-24	arXiv	https://github.com/zzli2022/Awesome-Slow-Reason-System	https://doi.org/10.48550/arXiv.2502.17419
379	VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models	Jen-tse Huang, Dasen Dai, Jen-Yuan Huang, Youliang Yuan, Xiaoyuan Liu, Wenxuan Wang, Wenxiang Jiao, Pinjia He, Zhaopeng Tu	2025-02-23	arXiv	https://github.com/CUHK-ARISE/VisFactor	https://doi.org/10.48550/arXiv.2502.16435
380	BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning	Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng	2025-02-23	arXiv	https://github.com/zhao-ht/BioMaze	https://doi.org/10.48550/arXiv.2502.16660
381	CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale	Chenlong Wang, Zhaoyang Chu, Zhengxiang Cheng, Xuyi Yang, Kaiyue Qiu, Yao Wan, Zhou Zhao, Xuanhua Shi, Dongping Chen	2025-02-23	arXiv	https://github.com/Lucky-voyage/Code-Sync	https://doi.org/10.48550/arXiv.2502.16645
382	CER: Confidence Enhanced Reasoning in LLMs	Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah	2025-02-22	arXiv …, 2025	https://github.com/	http://arxiv.org/abs/2502.14634v1
383	Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations	Chunyang Li, Weiqi Wang, Tianshi Zheng, Yangqiu Song	2025-02-22	arXiv	https://github.com/lcy2723/Robust-Rule-Induction	http://arxiv.org/abs/2502.16169v1
384	Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens	Ziwei Shan, Yaoyu He, Chengfeng Zhao, Jiashen Du, Jingyan Zhang, Qixuan Zhang, Jingyi Yu, Lan Xu	2025-02-22	arXiv	https://koyui.github.io/mojito/	http://arxiv.org/abs/2502.16175v1
385	OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Wenwen Yu, Zhibo Yang, Jianqiang Wan, Sibo Song, Jun Tang, Wenqing Cheng, Yuliang Liu, Xiang Bai	2025-02-22	arXiv	https://github.com/AlibabaResearch/AdvancedLiterateMachinery	https://doi.org/10.48550/arXiv.2502.16161
386	Dynamic Low-Rank Sparse Adaptation for Large Language Models	Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji	2025-02-22	arXiv	https://github.com/wzhuang-xmu/LoSA	https://doi.org/10.48550/arXiv.2502.14816
387	Plan-over-Graph: Towards Parallelable LLM Agent Schedule	Shiqi Zhang, Xinbei Ma, Zouying Cao, Zhuosheng Zhang, Hai Zhao	2025-02-21	arXiv:2502.14563, 2025	https://github.com/zsq259/Plan-over-Graph	http://arxiv.org/abs/2502.14563v1
388	FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs	Madhurima Chakraborty, Peter Pirkelbauer, Qing Yi	2025-02-21	arXiv	https://github.com/MadhuNimmo/FormalSpecCpp	http://arxiv.org/abs/2502.15217v1
389	Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems	Tianjie Ju, Bowen Wang, Hao Fei, Mong-Li Lee, Wynne Hsu, Yun Li, Qianren Wang, Pengzhou Cheng, Zongru Wu, Zhuosheng Zhang, Gongshen Liu	2025-02-21	arXiv	https://github.com/wbw625/MultiAgentRobustness	http://arxiv.org/abs/2502.15153v1
390	Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs	Danni Liu, Jan Niehues	2025-02-21	arXiv:2502.14830, 2025	https://github.com/dannigt/mid-align	http://arxiv.org/abs/2502.14830v1
391	A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation	Shilong Hou, Ruilin Shang, Zi Long, Xianghua Fu, Yin Chen	2025-02-21	arXiv	https://github.com/Mebymeby/Pseudonymization-Framework	http://arxiv.org/abs/2502.15233v1
392	PredictaBoard: Benchmarking LLM Score Predictability	Lorenzo Pacchiardi, Konstantinos Voudouris, Ben Slater, Fernando Martínez-Plumed, José Hernández-Orallo, Lexin Zhou, Wout Schellaert	2025-02-21	arXiv …, 2025	https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard	http://arxiv.org/abs/2502.14445v1
393	Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing	Qi Le, Enmao Diao, Ziyan Wang, Xinran Wang, Jie Ding, Li Yang, Ali Anwar	2025-02-21	arXiv	https://github.com/Qi-Le1/Probe_Pruning	http://arxiv.org/abs/2502.15618v1
394	STeCa: Step-level Trajectory Calibration for LLM Agent Learning	Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li	2025-02-21	arXiv:2502.14276, 2025	https://github.com/WangHanLinHenry/STeCa	http://arxiv.org/abs/2502.14276v1
395	Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs	Giulio Zizzo, Giandomenico Cornacchia, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Beat Buesser, Mark Purcell, Pin-Yu Chen, Prasanna Sattigeri, Kush Varshney	2025-02-21	arXiv	https://github.com/IBM/Adversarial-Prompt-Evaluation	http://arxiv.org/abs/2502.15427v1
396	Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models	Ya Wang, Zhijian Zhuo, Yutao Zeng, Xun Zhou, Jian Yang, Xiaoqing Li	2025-02-21	arXiv	https://github.com/kaihemo/SDD	https://doi.org/10.48550/arXiv.2502.15499
397	Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization	Yupeng Chang, Yi Chang, Yuan Wu	2025-02-21	arXiv	https://github.com/llm172/Transfer-Prompting	https://doi.org/10.48550/arXiv.2502.14211
398	On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems	Shokhrukh Ibragimov, Arnulf Jentzen, Benno Kuckuck	2025-02-21	arXiv	https://github.com/bkuckuck/logical-skills-of-llms	https://doi.org/10.48550/arXiv.2502.14180
399	MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models	Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding	2025-02-21	arXiv	https://medhallu.github.io/	https://doi.org/10.48550/arXiv.2502.14302
400	From RAG to Memory: Non-Parametric Continual Learning for Large Language Models	Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su	2025-02-21	arXiv	https://github.com/OSU-NLP-Group/HippoRAG	https://doi.org/10.48550/arXiv.2502.14802
401	CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models	Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo	2025-02-21	arXiv	https://github.com/zhrli324/Corba	https://doi.org/10.48550/arXiv.2502.14529
402	Protein Large Language Models: A Comprehensive Survey	Yijia Xiao, Wanjia Zhao, Junkai Zhang, Yiqiao Jin, Han Zhang, Zhicheng Ren, Renliang Sun, Haixin Wang, Guancheng Wan, Pan Lu, Xiao Luo, Yu Zhang, James Zou, Yizhou Sun, Wei Wang	2025-02-21	arXiv	https://github.com/Yijia-Xiao/Protein-LLM-Survey	https://doi.org/10.48550/arXiv.2502.17504
403	Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, Ritambhara Singh	2025-02-21	arXiv	https://github.com/rsinghlab/Shape-Blind	https://doi.org/10.48550/arXiv.2502.15969
404	LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention	Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han	2025-02-21	arXiv …, 2025	https://github.com/mit-han-lab/omniserve	http://arxiv.org/abs/2502.14866v1
405	Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models	Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Md. Mehrab Tanjim, Kibum Kim, Chanyoung Park	2025-02-20	arXiv	https://github.com/yeonjun-in/U-SafeBench	https://doi.org/10.48550/arXiv.2502.15086
406	InductionBench: LLMs Fail in the Simplest Complexity Class	Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang	2025-02-20	arXiv	https://github.com/Wenyueh/inductive_reasoning_benchmark	http://arxiv.org/abs/2502.15823v3
407	An LLM-based Agent for Reliable Docker Environment Configuration	Ruida Hu, Chao Peng, Xinchen Wang, Cuiyun Gao	2025-02-19	arXiv	https://github.com/bytedance/Repo2Run	http://arxiv.org/abs/2502.13681v1
408	SIFT: Grounding LLM Reasoning in Contexts via Stickers	Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng	2025-02-19	arXiv	https://github.com/zhijie-group/SIFT	http://arxiv.org/abs/2502.14922v1
409	Judging the Judges: A Collection of LLM-Generated Relevance Judgements	Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz	2025-02-19	arXiv	https://llm4eval.github.io/LLMJudge-benchmark/	http://arxiv.org/abs/2502.13908v1
410	DataSciBench: An LLM Agent Benchmark for Data Science	Dan Zhang, Sining Zhoubian, Min Cai, Fengzu Li, Lekang Yang, Wei Wang, Tianjiao Dong, Ziniu Hu, Jie Tang, Yisong Yue	2025-02-19	arXiv	https://github.com/THUDM/DataSciBench	http://arxiv.org/abs/2502.13897v1
411	Benchmarking LLMs for Political Science: A United Nations Perspective	Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu	2025-02-19	arXiv	https://github.com/yueqingliang1/UNBench	http://arxiv.org/abs/2502.14122v1
412	Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning	Zenan Li, Zhaoyu Li, Wen Tang, Xian Zhang, Yuan Yao, Xujie Si, Fan Yang, Kaiyu Yang, Xiaoxing Ma	2025-02-19	arXiv	https://github.com/Lizn-zn/NeqLIPS/	http://arxiv.org/abs/2502.13834v1
413	Craw4LLM: Efficient Web Crawling for LLM Pretraining	Shi Yu, Zhiyuan Liu, Chenyan Xiong	2025-02-19	arXiv	https://github.com/cxcscmu/Crawl4LLM	http://arxiv.org/abs/2502.13347v1
414	$\mathttGeLLM^3O$: Generalizing Large Language Models for Multi-property Molecule Optimization	Vishal Dey, Xiao Hu, Xia Ning	2025-02-19	arXiv	https://github.com/ninglab/GeLLMO	http://arxiv.org/abs/2502.13398v1
415	PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models	Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao Wei	2025-02-19	arXiv	https://github.com/ligw1998/PRIV-QA	https://doi.org/10.48550/arXiv.2502.13564
416	AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models	Yuanyuan Xu, Hanchen Wang, Wenjie Zhang, Lexing Xie, Yin Chen, Flora Salim, Ying Zhang, Justin Gooding, Toby Walsh	2025-02-19	arXiv	https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery	https://doi.org/10.48550/arXiv.2502.13626
417	Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models	Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming Xie, Xuejian Gong, Kunlong Zhou	2025-02-19	arXiv	https://github.com/junzhang-zj/LoRAM	https://doi.org/10.48550/arXiv.2502.13533
418	REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models	DongGeon Lee, Hwanjo Yu	2025-02-19	arXiv	https://github.com/oneonlee/REFIND	https://doi.org/10.48550/arXiv.2502.13622
419	Lost in Sequence: Do Large Language Models Understand Sequential Recommendation?	Sein Kim, Hongseok Kang, Kibum Kim, Jiwan Kim, Donghyun Kim, Minchul Yang, Kwangjin Oh, Julian McAuley, Chanyoung Park	2025-02-19	arXiv	https://github.com/Sein-Kim/LLM-SRec	https://doi.org/10.48550/arXiv.2502.13909
420	Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems	Yaochen Zhu, Chao Wan, Harald Steck, Dawen Liang, Yesu Feng, Nathan Kallus, Jundong Li	2025-02-19	arXiv	https://github.com/yaochenzhu/CRAG	https://doi.org/10.48550/arXiv.2502.14137
421	ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities	Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, Liteng Gao	2025-02-19	arXiv	https://artmentor.github.io/	https://doi.org/10.48550/arXiv.2502.13832
422	LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization	Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing	2025-02-19	arXiv	https://github.com/DAMO-NLP-SG/LongPO	https://doi.org/10.48550/arXiv.2502.13922
423	Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, Ping Luo	2025-02-18	arXiv	https://text-to-world.github.io/	https://doi.org/10.48550/arXiv.2502.13092
424	Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs	Adi Simhi, Itay Itzhak, Fazl Barez, Gabriel Stanovsky, Yonatan Belinkov	2025-02-18	arXiv	https://github.com/technion-cs-nlp/Trust_me_Im_wrong	http://arxiv.org/abs/2502.12964v1
425	SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs	Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah	2025-02-18	arXiv	https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SparAMX	http://arxiv.org/abs/2502.12444v1
426	Soundwave: Less is More for Speech-Text Alignment in LLMs	Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li	2025-02-18	arXiv	https://github.com/FreedomIntelligence/Soundwave	http://arxiv.org/abs/2502.12900v1
427	MoBA: Mixture of Block Attention for Long-Context LLMs	Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu	2025-02-18	arXiv	https://github.com/MoonshotAI/MoBA	http://arxiv.org/abs/2502.13189v1
428	PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models	Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, Yaowei Wang, Min Zhang	2025-02-18	arXiv	https://github.com/zjq0455/PTQ1.61	https://doi.org/10.48550/arXiv.2502.13179
429	SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings	Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng	2025-02-18	arXiv	https://github.com/ZeroNLP/SEA	https://doi.org/10.48550/arXiv.2502.12562
430	Investigating and Extending Homans' Social Exchange Theory with Large Language Model based Agents	Lei Wang, Zheqing Zhang, Xu Chen	2025-02-18	arXiv	https://github.com/Paitesanshi/SET	https://doi.org/10.48550/arXiv.2502.12450
431	Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis	Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, Yaowei Wang, Min Zhang, Liqiang Nie	2025-02-18	arXiv	https://github.com/zjq0455/PTQ_Benchmark	http://arxiv.org/abs/2502.13178v1
432	G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation	Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, Jia Li	2025-02-18	arXiv	https://github.com/Yuhan1i/G-Refer	https://doi.org/10.48550/arXiv.2502.12586
433	EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning	Xiaoqian Liu, Ke Wang, Yongbin Li, Yuchuan Wu, Wentao Ma, Aobo Kong, Fei Huang, Jianbin Jiao, Junge Zhang	2025-02-18	arXiv	https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/EPO	http://arxiv.org/abs/2502.12486v1
434	VRoPE: Rotary Position Embedding for Video Large Language Models	Zikang Liu, Longteng Guo, Yepeng Tang, Junxian Cai, Kai Ma, Xi Chen, Jing Liu	2025-02-17	arXiv	https://github.com/johncaged/VRoPE	https://doi.org/10.48550/arXiv.2502.11664
435	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	Yuqi Pang, Bowen Yang, Haoqin Tu, Yun Cao, Zeyu Zhang	2025-02-17	arXiv	https://github.com/Pbhgit/MVCD	http://arxiv.org/abs/2502.11751v1
436	Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, Xueyang Liu	2025-02-17	arXiv	https://github.com/wanghanbinpanda/CodeVision	http://arxiv.org/abs/2502.11829v1
437	Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?	Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu	2025-02-17	arXiv	https://github.com/THU-BPM/Watermark-Radioactivity-Attack	http://arxiv.org/abs/2502.11598v1
438	Bitnet.cpp: Efficient Edge Inference for Ternary LLMs	Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei	2025-02-17	arXiv	https://github.com/microsoft/BitNet/tree/paper	http://arxiv.org/abs/2502.11880v1
439	"Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents	Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu	2025-02-17	arXiv	https://github.com/pillowsofwind/LLM-CBRN-Risks	http://arxiv.org/abs/2502.11355v1
440	A Survey of Personalized Large Language Models: Progress and Future Directions	Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Jieming Zhu, Minda Hu, Menglin Yang, Irwin King	2025-02-17	arXiv	https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models	https://doi.org/10.48550/arXiv.2502.11528
441	RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars	Yuncheng Hua, Lizhen Qu, Zhuang Li, Hao Xue, Flora D. Salim, Gholamreza Haffari	2025-02-17	arXiv	https://github.com/AnonymousCode-ComputerScience/RIDE	https://doi.org/10.48550/arXiv.2502.11681
442	Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents	Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu	2025-02-17	arXiv	https://llm-catastrophic-risks.github.io	http://arxiv.org/abs/2502.11355v3
443	Atom of Thoughts for Markov LLM Test-Time Scaling	Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo	2025-02-17	arXiv	https://github.com/qixucen/atom	http://arxiv.org/abs/2502.12018v1
444	Idiosyncrasies in Large Language Models	Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, Zhuang Liu	2025-02-17	arXiv	https://eric-mingjie.github.io/llm-idiosyncrasies/index.html	https://doi.org/10.48550/arXiv.2502.12150
445	A-MEM: Agentic Memory for LLM Agents	Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, Yongfeng Zhang	2025-02-17	arXiv	https://github.com/WujiangXu/AgenticMemory	http://arxiv.org/abs/2502.12110v5
446	LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning	Tianshi Zheng, Jiayang Cheng, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See	2025-02-16	arXiv	https://github.com/HKUST-KnowComp/LogiDynamics	https://doi.org/10.48550/arXiv.2502.11176
447	SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors	Bohan Lyu, Siqiao Huang, Zichen Liang, Qi-An Sun, Jiaming Zhang	2025-02-16	arXiv	https://github.com/Imbernoulli/SURGE	https://doi.org/10.48550/arXiv.2502.11167
448	BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack	Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu	2025-02-16	arXiv	https://github.com/zihao-ai/BoT	https://doi.org/10.48550/arXiv.2502.12202
449	CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?	Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee	2025-02-16	arXiv	https://github.com/aashish2000/CORDIAL	https://doi.org/10.48550/arXiv.2502.11300
450	Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models	Haoyang Li, Xuejia Chen, Zhanchao Xu, Darian Li, Nicole Hu, Fei Teng, Yiming Li, Luyu Qiu, Chen Jason Zhang, Qing Li, Lei Chen	2025-02-16	arXiv	https://github.com/TreeAI-Lab/NumericBench	https://doi.org/10.48550/arXiv.2502.11075
451	ReLearn: Unlearning via Learning for Large Language Models	Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang	2025-02-16	arXiv	https://github.com/zjunlp/unlearn	https://doi.org/10.48550/arXiv.2502.11190
452	Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models	Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, Dacheng Tao	2025-02-16	arXiv	https://github.com/NY1024/RACE	https://doi.org/10.48550/arXiv.2502.11054
453	G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems	Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang	2025-02-16	arXiv	https://github.com/wslong20/G-safeguard	http://arxiv.org/abs/2502.11127v1
454	How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training	Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen	2025-02-16	arXiv	https://github.com/zjunlp/DynamicKnowledgeCircuits	http://arxiv.org/abs/2502.11196v1
455	MasRouter: Learning to Route LLMs for Multi-Agent Systems	Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, Yiyan Qi	2025-02-16	arXiv	https://github.com/yanweiyue/masrouter	http://arxiv.org/abs/2502.11133v1
456	Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation	Yu Cui, Hang Fu, Licheng Wang, Haibin Zhang	2025-02-16	arXiv	https://github.com/LMPC-Lab/GenGPUCrypto	http://arxiv.org/abs/2502.11110v1
457	Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu	2025-02-16	arXiv	https://github.com/Soistesimmer/Fetch	http://arxiv.org/abs/2502.11183v1
458	Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey	Zirui Song, Bin Yan, Yuhan Liu, Miao Fang, Mingzhe Li, Rui Yan, Xiuying Chen	2025-02-15	arXiv	https://github.com/abilliyb/Knowledge_Injection_Survey_Papers	https://doi.org/10.48550/arXiv.2502.10708
459	SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models	Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat	2025-02-15	arXiv	https://github.com/IntelLabs/RAG-FiT/tree/square	https://doi.org/10.48550/arXiv.2502.09390
460	Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin	2025-02-15	arXiv …, 2025	https://prefeval.github.io/	http://arxiv.org/abs/2502.09597v1
461	EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang	2025-02-15	arXiv	https://embodiedbench.github.io	https://doi.org/10.48550/arXiv.2502.09560
462	An Empirical Analysis of Uncertainty in Large Language Model Evaluations	Qiujie Xie, Qingqiu Li, Zhuohao Yu, Yuejie Zhang, Yue Zhang, Linyi Yang	2025-02-15	arXiv	https://github.com/hasakiXie123/LLM-Evaluator-Uncertainty	https://doi.org/10.48550/arXiv.2502.10709
463	LintLLM: An Open-Source Verilog Linting Framework Based on Large Language Models	Zhigang Fang, Renzhi Chen, Zhijie Yang, Yang Guo, Huadong Dai, Lei Wang	2025-02-15	arXiv	https://github.com/fangzhigang32/Static-Verilog-Analysis	https://doi.org/10.48550/arXiv.2502.10815
464	CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs	Insu Han, Zeliang Zhang, Zhiyuan Wang, Yifan Zhu, Susan Liang, Jiani Liu, Haiting Lin, Mingjie Zhao, Chenliang Xu, Kun Wan, Wentian Zhao	2025-02-15	arXiv	https://github.com/insuhan/calibquant	http://arxiv.org/abs/2502.14882v2
465	KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models	Dong Chen, Zhengqing Hu, Peiguang Fan, Yueting Zhuang, Yafei Li, Qidong Liu, Xiaoheng Jiang, Mingliang Xu	2025-02-14	arXiv	https://github.com/Anfeather/KKA	https://doi.org/10.48550/arXiv.2502.14880
466	Large Language Diffusion Models	Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li	2025-02-14	arXiv	https://ml-gsai.github.io/LLaDA-demo/	https://doi.org/10.48550/arXiv.2502.09992
467	LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing	Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Shuai Wang, Minhao Cheng	2025-02-14	arXiv	https://github.com/likuanppd/LaRA	http://arxiv.org/abs/2502.09977v1
468	MM-RLHF: The Next Step Forward in Multimodal LLM Alignment	Yi-Fan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen, Fan Yang, Zhang Zhang, Tingting Gao, Di Zhang, Liang Wang, Rong Jin, Tieniu Tan	2025-02-14	arXiv	https://mm-rlhf.github.io/	http://arxiv.org/abs/2502.10391v1
469	V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models	Hsu-Kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen	2025-02-14	arXiv	https://eddyhkchiu.github.io/v2vllm.github.io/	https://doi.org/10.48550/arXiv.2502.09980
470	The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis	Wenbo Pan, Zhichao Liu, Qiguang Chen, Xiangyang Zhou, Haining Yu, Xiaohua Jia	2025-02-13	arXiv	https://github.com/BMPixel/safety-residual-space	http://arxiv.org/abs/2502.09674v1
471	FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents	Mostapha Benhenda	2025-02-13	arXiv:2502.07393, 2025	https://github.com/benstaf/FinRL_DeepSeek	http://arxiv.org/abs/2502.07393v1
472	Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning	Jiayuan Zhu, Junde Wu	2025-02-13	arXiv:2502.07143, 2025	https://github.com/SuperMedIntel/AskPatients	http://arxiv.org/abs/2502.07143v1
473	LLM-Generated Microservice Implementations from RESTful API Definitions	Saurabh Chauhan, Zeeshan Rasheed, Abdul Malik Sami, Zheying Zhang, Jussi Rasku, Kai-Kristian Kemell, Pekka Abrahamsson	2025-02-13	arXiv	https://github.com/sirbh/code-gen	http://arxiv.org/abs/2502.09766v1
474	Bag of Tricks for Inference-time Computation of LLM Reasoning	Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu	2025-02-13	arXiv:2502.07191, 2025	https://github.com/usail-hkust/benchmark_inference_time_computation_LL	http://arxiv.org/abs/2502.07191v2
475	LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation	Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, Weipeng Chen	2025-02-13	arXiv	https://github.com/RUCAIBox/LongReD	https://doi.org/10.48550/arXiv.2502.07365
476	LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!	Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica	2025-02-13	arXiv …, 2025	https://github.com/NovaSky-AI/SkyThought	http://arxiv.org/abs/2502.07374v2
477	DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization	Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian T. Foster, Rick Stevens	2025-02-13	arXiv	https://github.com/xuefeng-cs/DrugImproverGPT	https://doi.org/10.48550/arXiv.2502.07237
478	Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models	Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang	2025-02-13	arXiv	https://github.com/horizonsinzqs/QueryAttack	https://doi.org/10.48550/arXiv.2502.09723
479	Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models	Yiheng Liu, Xiaohui Gao, Haiyang Sun, Bao Ge, Tianming Liu, Junwei Han, Xintao Hu	2025-02-13	arXiv	https://github.com/WhatAboutMyStar/LLM_ACTIVATION	https://doi.org/10.48550/arXiv.2502.20408
480	DarwinLM: Evolutionary Structured Pruning of Large Language Models	Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan Alistarh	2025-02-13	arXiv	https://github.com/IST-DASLab/DarwinLM	https://doi.org/10.48550/arXiv.2502.07780
481	RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning	Jian Xu, Sichun Luo, Xiangyu Chen, Haoming Huang, Hanxu Hou, Linqi Song	2025-02-12	arXiv	https://github.com/JianXu95/RALLRec	https://doi.org/10.48550/arXiv.2502.06101
482	LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM	Zhi Zhou, Kun-Yang Yu, Shi-Yu Tian, Xiao-Wen Yang, Jiang-Xin Shi, Pengxiao Song, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li	2025-02-12	arXiv …, 2025	https://github.com/LAMDASZ-ML/Knowledge-Guide-Data-Generation	http://arxiv.org/abs/2502.06572v2
483	Calibrating LLMs with Information-Theoretic Evidential Deep Learning	Yawei Li, David Rügamer, Bernd Bischl, Mina Rezaei	2025-02-12	arXiv:2502.06351, 2025	https://github.com/sandylaker/ib-edl	http://arxiv.org/abs/2502.06351v2
484	Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection	Areeg Fahad Rasheed, M. Zarkoosh, Shimam Amer Chasib, Safa F. Abbas	2025-02-12	arXiv	https://github.com/AREEG94FAHAD/food-hazard-prdouct-cls	https://doi.org/10.48550/arXiv.2502.08687
485	Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation	Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, Conghui He	2025-02-12	arXiv	https://github.com/opendatalab/ProverGen	https://doi.org/10.48550/arXiv.2502.06563
486	Systematic Outliers in Large Language Models	Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang	2025-02-12	arXiv	https://github.com/an-yongqi/systematic-outliers	https://doi.org/10.48550/arXiv.2502.06415
487	Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models	Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi	2025-02-12	arXiv	https://xujiacong.github.io/Anomaly-OV/	https://doi.org/10.48550/arXiv.2502.07601
488	Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting	Arvind Pillai, Dimitris Spathis, Subigya Nepal, Amanda C Collins, Daniel M Mackin, Michael V Heinz, Tess Z Griffin, Nicholas C Jacobson, Andrew Campbell	2025-02-11	arXiv	https://github.com/arvind1609/time2lang	http://arxiv.org/abs/2502.07608v3
489	The foundational capabilities of large language models in predicting postoperative risks using clinical notes	Charles Alba, Bing Xue, Joanna Abraham, Thomas George Kannampallil, Chenyang Lu	2025-02-11	npj Digit. Medicine	https://github.com/cja5553/LLMs_in_perioperative_care	https://doi.org/10.1038/s41746-025-01489-2
490	Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining	Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu, Shiqiang Wang, Hans-Arno Jacobsen, Yingbin Liang	2025-02-10	arXiv	https://github.com/sowmaster/Sample-Level-Loss-Reweighting-ICLR-2025	https://doi.org/10.48550/arXiv.2502.06733
491	LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights	Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, Jeff Huang	2025-02-10	arXiv	https://github.com/OwenSanzas/LLM-For-Vulnerability-Detection	http://arxiv.org/abs/2502.07049v2
492	HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models	Paul Darm, Annalisa Riccardi	2025-02-09	arXiv	https://github.com/PaulDrm/targeted_intervention	http://arxiv.org/abs/2502.05945v2
493	Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators	Hritik Bansal, Pratyush Maini	2025-02-09	arXiv	https://pratyushmaini.github.io/blog/2024/risks-private-evals/	http://arxiv.org/abs/2503.04756v1
494	AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents	Jiabin Tang, Tianyu Fan, Chao Huang	2025-02-09	arXiv	https://github.com/HKUDS/AutoAgent	http://arxiv.org/abs/2502.05957v2
495	MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents	Jiabin Tang, Tianyu Fan, Chao Huang	2025-02-09	arXiv	https://github.com/HKUDS/MetaChain	http://arxiv.org/abs/2502.05957v1
496	LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning	Hanqing Yang, Jingdi Chen, Marie Siew, Tania Lorido-Botran, Carlee Joe-Wong	2025-02-08	arXiv	https://happyeureka.github.io/damcs	http://arxiv.org/abs/2502.05453v1
497	Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models	Sina Tayebati, Divake Kumar, Nastaran Darabi, Dinithi Jayasuriya, Ranganath Krishnan, Amit Ranjan Trivedi	2025-02-08	arXiv	https://github.com/sinatayebati/vlm-uncertainty	https://doi.org/10.48550/arXiv.2502.06884
498	OntoTune: Ontology-Driven Self-training for Aligning Large Language Models	Zhiqiang Liu, Chengtao Gan, Junjie Wang, Yichi Zhang, Zhongpu Bo, Mengshu Sun, Huajun Chen, Wen Zhang	2025-02-08	arXiv	https://github.com/zjukg/OntoTune	https://doi.org/10.48550/arXiv.2502.05478
499	ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning	Yuwei Yin, Giuseppe Carenini	2025-02-07	arXiv	https://github.com/YuweiYin/ARR	https://doi.org/10.48550/arXiv.2502.04689
500	Confidence Elicitation: A New Attack Vector for Large Language Models	Brian Formento, Chuan Sheng Foo, See-Kiong Ng	2025-02-07	arXiv	https://github.com/Aniloid2/Confidence_Elicitation_Attacks	https://doi.org/10.48550/arXiv.2502.04643
501	Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research	Junde Wu, Jiayuan Zhu, Yuyuan Liu	2025-02-07	arXiv	https://github.com/theworldofagents/Agentic-Reasoning	http://arxiv.org/abs/2502.04644v1
502	DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails	Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li	2025-02-07	arXiv	https://github.com/yihedeng9/DuoGuard	http://arxiv.org/abs/2502.05163v1
503	LLM-Supported Natural Language to Bash Translation	Finnian Westenfelder, Erik Hemberg, Miguel Tulla, Stephen Moskal, Una-May O'Reilly, Silviu Chiricescu	2025-02-07	arXiv	https://github.com/westenfelder/NL2SH	http://arxiv.org/abs/2502.06858v1
504	QuEST: Stable Training of LLMs with 1-Bit Weights and Activations	Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh	2025-02-07	arXiv	https://github.com/IST-DASLab/QuEST	http://arxiv.org/abs/2502.05003v1
505	Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization	Yuanye Liu, Jiahang Xu, Li Lyna Zhang, Qi Chen, Xuan Feng, Yang Chen, Zhongxin Guo, Yuqing Yang, Peng Cheng	2025-02-06	arXiv	https://github.com/HenryLau7/CFPO	http://arxiv.org/abs/2502.04295v2
506	ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization	Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam	2025-02-06	arXiv	https://github.com/Gen-Verse/ScoreFlow	http://arxiv.org/abs/2502.04306v1
507	Robotouille: An Asynchronous Planning Benchmark for LLM Agents	Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, Sanjiban Choudhury	2025-02-06	arXiv	https://github.com/portal-cornell/robotouille	http://arxiv.org/abs/2502.05227v1
508	My LLM might Mimic AAE -- But When Should it?	Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli, Hal Daumé III	2025-02-06	arXiv	https://github.com/smelliecat/AAEMime	http://arxiv.org/abs/2502.04564v2
509	CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu	2025-02-06	arXiv	https://github.com/JarvisPei/CMoE	http://arxiv.org/abs/2502.04416v1
510	FAS: Fast ANN-SNN Conversion for Spiking Large Language Models	Long Chen, Xiaotian Song, Andy Song, BaDong Chen, Jiancheng Lv, Yanan Sun	2025-02-06	arXiv	https://github.com/lc783/FAS	https://doi.org/10.48550/arXiv.2502.04405
511	Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers	Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin	2025-02-06	arXiv	https://github.com/dmbeaglehole/neural_controllers	http://arxiv.org/abs/2502.03708v1
512	"Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence	Shaopeng Fu, Liang Ding, Di Wang	2025-02-06	arXiv	https://github.com/fshp971/adv-icl	http://arxiv.org/abs/2502.04204v1
513	Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training	Changhao Jiang, Ming Zhang, Junjie Ye, Xiaoran Fan, Yifei Cao, Jiajun Sun, Zhiheng Xi, Shihan Dou, Yi Dong, Yujiong Shen, Jingqi Tong, Zhen Wang, Tao Liang, Zhihui Fei, Mingyang Wan, Guojun Ma, Qi Zhang, Tao Gui, Xuanjing Huang	2025-02-06	arXiv	https://github.com/yuhui1038/SMI	https://doi.org/10.48550/arXiv.2502.04066
514	KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference	Xing Li, Zeyu Xing, Yiming Li, Linping Qu, Hui-Ling Zhen, Wulong Liu, Yiwu Yao, Sinno Jialin Pan, Mingxuan Yuan	2025-02-06	arXiv	https://github.com/cmd2001/KVTuner	http://arxiv.org/abs/2502.04420v1
515	EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models	He Hu, Yucheng Zhou, Lianzhong You, Hongbo Xu, Qianning Wang, Zheng Lian, Fei Richard Yu, Fei Ma, Laizhong Cui	2025-02-06	arXiv	https://emo-gml.github.io/	https://doi.org/10.48550/arXiv.2502.04424
516	Tool Unlearning for Tool-Augmented LLMs	Jiali Cheng, Hadi Amiri	2025-02-05	arXiv:2502.01083, 2025	https://clu-uml.github.io/MU-Bench-Project-Page/	http://arxiv.org/abs/2502.01083v1
517	Preference Leakage: A Contamination Problem in LLM-as-a-judge	Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu	2025-02-05	arXiv …, 2025	https://github.com/David-Li0406/Preference-Leakage	http://arxiv.org/abs/2502.01534v1
518	Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning	Guanlin Li, Kangjie Chen, Shangwei Guo, Jie Zhang, Han Qiu, Chao Zhang, Guoyin Wang, Tianwei Zhang, Jiwei Li	2025-02-05	arXiv …, 2025	https://github.com/GuanlinLee/llm_instruction_tuning	http://arxiv.org/abs/2502.01116v1
519	PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design	Yuchao Wu, Xiaofei Yu, Hao Chen, Yang Luo, Yeyu Tong, Yuzhe Ma	2025-02-05	arXiv	https://github.com/PICDA/PICBench	http://arxiv.org/abs/2502.03159v1
520	PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs	Mauricio Soroco, Jialin Song, Mengzhou Xia, Kye Emond, Weiran Sun, Wuyang Chen	2025-02-05	arXiv …, 2025	https://pde-controller.github.io/	http://arxiv.org/abs/2502.00963v1
521	LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease	Muhammad Zain Raza, Jiawei Xu, Terence Lim, Lily Boddy, Carlos M. Mery, Andrew Well, Ying Ding	2025-02-05	arXiv …, 2025	https://github.com/jiaweixu98/LLM-TA	http://arxiv.org/abs/2502.01620v1
522	Demystifying Long Chain-of-Thought Reasoning in LLMs	Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue	2025-02-05	arXiv	https://github.com/eddycmu/demystify-long-cot	http://arxiv.org/abs/2502.03373v1
523	A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs	Bradley P. Allen, Paul T. Groth	2025-02-05	arXiv	https://github.com/bradleypallen/trex-metalinguistic-disagreement	http://arxiv.org/abs/2502.02896v1
524	SPRI: Aligning Large Language Models with Context-Situated Principles	Hongli Zhan, Muneeza Azmat, Raya Horesh, Junyi Jessy Li, Mikhail Yurochkin	2025-02-05	arXiv	https://github.com/honglizhan/SPRI-public	https://doi.org/10.48550/arXiv.2502.03397
525	A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods	Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava	2025-02-05	arXiv …, 2025	https://probabilistic-inference-scaling.github.io	http://arxiv.org/abs/2502.01618v2
526	Knowledge Distillation from Large Language Models for Household Energy Modeling	Mohannad Takrouri, Nicolas M. Cuadrado, Martin Takác	2025-02-05	arXiv	https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation	https://doi.org/10.48550/arXiv.2502.03034
527	Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models	Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan	2025-02-05	arXiv	https://github.com/HashmatShadab/Robust-LLaVA	https://doi.org/10.48550/arXiv.2502.01576
528	Internal Activation as the Polar Star for Steering Unsafe LLM Behavior	Peixuan Han, Cheng Qian, Xiusi Chen, Yuji Zhang, Denghui Zhang, Heng Ji	2025-02-05	arXiv …, 2025	https://github.com/Hanpx20/SafeSwitch	http://arxiv.org/abs/2502.01042v2
529	CTR-Driven Advertising Image Generation with Multimodal Large Language Models	Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin Gao, Nong Sang	2025-02-05	arXiv	https://github.com/Chenguoz/CAIG	https://doi.org/10.48550/arXiv.2502.06823
530	Intent Representation Learning with Large Language Model for Recommendation	Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang	2025-02-05	arXiv	https://github.com/wangyu0627/IRLLRec	http://arxiv.org/abs/2502.03307v1
531	AdaSVD: Adaptive Singular Value Decomposition for Large Language Models	Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Linghe Kong, Yulun Zhang, Xiaokang Yang	2025-02-05	arXiv	https://github.com/ZHITENGLI/AdaSVD	https://doi.org/10.48550/arXiv.2502.01403
532	Do Large Language Model Benchmarks Test Reliability?	Joshua Vendrow, Edward Vendrow, Sara Beery, Aleksander Madry	2025-02-05	arXiv	https://github.com/MadryLab/platinum-benchmarks	https://doi.org/10.48550/arXiv.2502.03461
533	Overcoming Vision Language Model Challenges in Diagram Understanding: A Proof-of-Concept with XML-Driven Large Language Models Solutions	Shue Shiinoki, Ryo Koshihara, Hayato Motegi, Masumi Morishige	2025-02-05	arXiv	https://github.com/galirage/spreadsheet-intelligence	https://doi.org/10.48550/arXiv.2502.04389
534	Breaking Focus: Contextual Distraction Curse in Large Language Models	Yue Huang, Yanbo Wang, Zixiang Xu, Chujie Gao, Siyuan Wu, Jiayi Ye, Xiuying Chen, Pin-Yu Chen, Xiangliang Zhang	2025-02-05	arXiv	https://github.com/wyf23187/LLM_CDV	https://doi.org/10.48550/arXiv.2502.01609
535	AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science	Chenyue Li, Wen Deng, Mengqian Lu, Binhang Yuan	2025-02-05	arXiv	https://github.com/Relaxed-System-Lab/AtmosSci-Bench	https://doi.org/10.48550/arXiv.2502.01159
536	CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing	Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P. Xing, Hongyi Wang, Huaxiu Yao	2025-02-04	arXiv	https://github.com/aiming-lab/CITER	https://doi.org/10.48550/arXiv.2502.01976
537	AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement	Shivam Singh, Karthik Swaminathan, Nabanita Dash, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna	2025-02-04	arXiv	https://sssshivvvv.github.io/adaptbot/	http://arxiv.org/abs/2502.02067v1
538	CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements	Afshin Khadangi, Amir Sartipi, Igor Tchappi, Gilbert Fridgen	2025-02-04	arXiv	https://cognartive.github.io/	https://doi.org/10.48550/arXiv.2502.04353
539	Risk-Aware Driving Scenario Analysis with Large Language Models	Yuan Gao, Mattia Piccinini, Johannes Betz	2025-02-04	arXiv	https://github.com/yuangao-tum/Riskaware-Scenario-analyse	https://doi.org/10.48550/arXiv.2502.02145
540	SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency	Qianhao Yuan, Yanjiang Liu, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun	2025-02-04	arXiv	https://github.com/icip-cas/SAISA	https://doi.org/10.48550/arXiv.2502.02458
541	A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)	Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han	2025-02-04	arXiv	https://github.com/AcademyCityL/GALI	http://arxiv.org/abs/2502.02659v1
542	AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs	Hongxin Li, Jingfan Chen, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang	2025-02-04	arXiv	https://autogui-project.github.io/	http://arxiv.org/abs/2502.01977v1
543	Multi-Lingual Cyber Threat Detection in Tweets/X Using ML, DL, and LLM: A Comparative Analysis	Saydul Akbar Murad, Ashim Dahal, Nick Rahimi	2025-02-04	arXiv	https://github.com/Mmurrad/Tweet-Data-Classification	http://arxiv.org/abs/2502.04346v1
544	RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models	Can Jin, Hongwu Peng, Anxiang Zhang, Nuo Chen, Jiahui Zhao, Xi Xie, Kuangzheng Li, Shuya Feng, Kai Zhong, Caiwen Ding, Dimitris N. Metaxas	2025-02-03	arXiv	https://github.com/jincan333/RankFlow	https://doi.org/10.48550/arXiv.2502.00709
545	Progressive Binarization with Semi-Structured Pruning for LLMs	Xianglong Yan, Tianao Zhang, Zhiteng Li, Yulun Zhang	2025-02-03	arXiv	https://github.com/XIANGLONGYAN/PBS2P	http://arxiv.org/abs/2502.01705v1
546	A Comprehensive Analysis on LLM-based Node Classification Algorithms	Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, Hong Cheng	2025-02-03	arXiv …, 2025	https://llmnodebed.github.io/	http://arxiv.org/abs/2502.00829v1
547	MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies	Ehsaneddin Asgari, Yassine El Kheir, Mohammad Ali Sadraei Javaheri	2025-02-03	arXiv:2502.00894, 2025	https://github.com/llm-lab-org/MorphBPE	http://arxiv.org/abs/2502.00894v1
548	RTBAgent: A LLM-based Agent System for Real-Time Bidding	Leng Cai, Junxuan He, Yikai Li, Junjie Liang, Yuanping Lin, Ziming Quan, Yawen Zeng, Jin Xu	2025-02-03	arXiv …, 2025	https://github.com/CaiLeng/RTBAgent	http://arxiv.org/abs/2502.00792v1
549	UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models	Xin Xu, Qiyun Xu, Tong Xiao, Tianhao Chen, Yuchen Yan, Jiaxin Zhang, Shizhe Diao, Can Yang, Yang Wang	2025-02-02	arXiv	https://github.com/YangLabHKUST/UGPhysics	https://doi.org/10.48550/arXiv.2502.00334
550	UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs	Yizhe Xiong, Wei Huang, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jungong Han, Guiguang Ding	2025-02-02	arXiv …, 2025	https://github.com/Bostoncake/UniAttn	http://arxiv.org/abs/2502.00439v1
551	MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing	Yuxuan Chen, Xu Zhu, Hua Zhou, Zhuyin Ren	2025-02-02	arXiv	https://github.com/Terry-cyx/MetaOpenFOAM	https://doi.org/10.48550/arXiv.2502.00498
552	LIBRA: Measuring Bias of Large Language Model from a Local Context	Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham, Yun Sing Koh	2025-02-02	arXiv	https://github.com/ipangbo/LIBRA	https://doi.org/10.48550/arXiv.2502.01679
553	Differentially Private Steering for Large Language Model Alignment	Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal	2025-02-01	arXiv	https://github.com/UKPLab/iclr2025-psa	https://doi.org/10.48550/arXiv.2501.18532
554	Speculative Ensemble: Fast Large Language Model Ensemble via Speculation	Jiale Fu, Yuchu Jiang, Junkai Chen, Jiaming Fan, Xin Geng, Xu Yang	2025-02-01	arXiv	https://github.com/Kamichanw/Speculative-Ensemble/	https://doi.org/10.48550/arXiv.2502.01662
555	LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models	Shenghao Fu, Qize Yang, Qijie Mo, Junkai Yan, Xihan Wei, Jingke Meng, Xiaohua Xie, Wei-Shi Zheng	2025-01-31	arXiv	https://github.com/iSEE-Laboratory/LLMDet	https://doi.org/10.48550/arXiv.2501.18954
556	Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation	Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu	2025-01-31	arXiv	https://github.com/git-disl/Virus	https://doi.org/10.48550/arXiv.2501.17433
557	Reward-Guided Speculative Decoding for Efficient LLM Reasoning	Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong	2025-01-31	arXiv	https://github.com/BaohaoLiao/RSD	http://arxiv.org/abs/2501.19324v1
558	2SSP: A Two-Stage Framework for Structured Pruning of LLMs	Fabrizio Sandri, Elia Cunegatti, Giovanni Iacca	2025-01-31	arXiv:2501.17771, 2025	https://github.com/FabrizioSandri/2SSP	http://arxiv.org/abs/2501.17771v1
559	ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation	Minghua He, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang	2025-01-30	arXiv	https://execoder4trans.github.io/	https://doi.org/10.48550/arXiv.2501.18460
560	Uncertainty Quantification and Decomposition for LLM-based Recommendation	Wonbin Kweon, Sanghwan Jang, SeongKu Kang, Hwanjo Yu	2025-01-30	arXiv:2501.17630, 2025	https://github.com/WonbinKweon/UNC_LLM_REC_WWW2025	http://arxiv.org/abs/2501.17630v1
561	CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng	2025-01-28	arXiv	https://github.com/LVUGAI/CHiP	http://arxiv.org/abs/2501.16629v1
562	xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking	Sunbowen Lee, Shiwen Ni, Chi Wei, Shuaimin Li, Liyang Fan, Ahmadreza Argha, Hamid Alinejad-Rokny, Ruifeng Xu, Yicheng Gong, Min Yang	2025-01-28	arXiv	https://github.com/Aegis1863/xJailbreak	http://arxiv.org/abs/2501.16727v2
563	Large Language Model Critics for Execution-Free Evaluation of Code Changes	Aashish Yadavally, Hoan Nguyen, Laurent Callot, Gauthier Guinet	2025-01-28	arXiv	https://github.com/amazon-science/code-agent-eval	https://doi.org/10.48550/arXiv.2501.16655
564	SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model	Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Shichao Song, Mengwei Wang, Jiawei Yang	2025-01-28	arXiv	https://github.com/IAAR-Shanghai/SafeRAG	https://doi.org/10.48550/arXiv.2501.18636
565	AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models	Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, Jiangyan Yi, Jianhua Tao	2025-01-27	arXiv	https://github.com/zeroQiaoba/AffectGPT	https://doi.org/10.48550/arXiv.2501.16566
566	Towards Evaluating and Building Versatile Large Language Models for Medicine	Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie	2025-01-27	arXiv	https://henrychur.github.io/MedS-Bench/	https://doi.org/10.48550/arXiv.2408.12547
567	LCTG Bench: LLM Controlled Text Generation Benchmark	Kentaro Kurihara, Masato Mita, Peinan Zhang, Shota Sasaki, Ryosuke Ishigami, Naoaki Okazaki	2025-01-27	arXiv	https://github.com/CyberAgentAILab/LCTG-Bench	http://arxiv.org/abs/2501.15875v1
568	TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs	Yuxuan Gu, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic	2025-01-26	arXiv	https://github.com/guyuxuan9/TensorLLM	http://arxiv.org/abs/2501.15674v1
569	Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models	Hulingxiao He, Geng Li, Zijun Geng, Jinglin Xu, Yuxin Peng	2025-01-25	arXiv	https://github.com/PKU-ICST-MIPL/Finedefics_ICLR2025	https://doi.org/10.48550/arXiv.2501.15140
570	PIP: Perturbation-based Iterative Pruning for Large Language Models	Yi Cao, Wei-Jie Xu, Yucheng Shen, Weijie Shi, Chi-Min Chan, Jiajie Xu	2025-01-25	arXiv	https://github.com/caoyiiiiii/PIP	https://doi.org/10.48550/arXiv.2501.15278
571	MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models	Zhongpu Chen, Yinfeng Liu, Long Shi, Zhi-Jie Wang, Xingyan Chen, Yu Zhao, Fuji Ren	2025-01-25	arXiv	https://github.com/SWUFE-DB-Group/MDEval-Benchmark	https://doi.org/10.48550/arXiv.2501.15000
572	A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models	Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka Zitnik, Liang Lin	2025-01-25	arXiv	https://lotbench.github.io	https://doi.org/10.48550/arXiv.2501.15147
573	UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models	Xin Xu, Jiaxin Zhang, Tianhao Chen, Zitong Chao, Jishan Hu, Can Yang	2025-01-24	arXiv	https://github.com/YangLabHKUST/UGMathBench	https://doi.org/10.48550/arXiv.2501.13766
574	MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications	Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, Andrew Y. Ng, Jonathan H. Chen	2025-01-24	arXiv	https://github.com/stanfordmlgroup/MedAgentBench	http://arxiv.org/abs/2501.14654v1
575	Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation	Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao	2025-01-24	arXiv	https://github.com/DSL-Lab/aops	http://arxiv.org/abs/2501.14275v1
576	DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing	Xinyu Ma, Yifeng Xu, Yang Lin, Tianlong Wang, Xu Chu, Xin Gao, Junfeng Zhao, Yasha Wang	2025-01-24	arXiv	https://github.com/ArthurLeoM/DRESS-LLM	http://arxiv.org/abs/2501.14371v1
577	MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents	Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, James Zou, Andrew Y. Ng, Jonathan H. Chen	2025-01-24	arXiv	https://github.com/stanfordmlgroup/MedAgentBench	http://arxiv.org/abs/2501.14654v2
578	FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration	Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu	2025-01-24	arXiv	https://github.com/FireRedTeam/FireRedASR	http://arxiv.org/abs/2501.14350v1
579	Evaluating and Improving Graph to Text Generation with Large Language Models	Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Víctor Gutiérrez-Basulto, Jeff Z. Pan	2025-01-24	arXiv	https://github.com/probe2/kg_text	https://doi.org/10.48550/arXiv.2501.14497
580	Can Large Language Models Understand Preferences in Personalized Recommendation?	Zhaoxuan Tan, Zinan Zeng, Qingkai Zeng, Zhenyu Wu, Zheyuan Liu, Fengran Mo, Meng Jiang	2025-01-24	arXiv	https://github.com/TamSiuhin/PerRecBench	https://doi.org/10.48550/arXiv.2501.13391
581	JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models	Michael K. Chen, Xikun Zhang, Dacheng Tao	2025-01-24	arXiv	https://github.com/michaelchen-lab/JustLogic	https://doi.org/10.48550/arXiv.2501.14851
582	Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models	Bo Gao, Michael W. Spratling	2025-01-24	arXiv	https://github.com/iminfine/freeatten	https://doi.org/10.48550/arXiv.2501.13428
583	Do as We Do, Not as You Think: the Conformity of Large Language Models	Zhiyuan Weng, Guikun Chen, Wenguan Wang	2025-01-24	arXiv	https://github.com/Zhiyuan-Weng/BenchForm	https://doi.org/10.48550/arXiv.2501.13381
584	Distillation Quantification for Large Language Models	Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Jiaheng Liu, Min Yang, Zhoufutu Wen, Shiwen Ni	2025-01-23	arXiv	https://github.com/Aegis1863/LLMs-Distillation-Quantification	https://doi.org/10.48550/arXiv.2501.12619
585	OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting	Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang, Sifan Zhou	2025-01-23	arXiv	https://github.com/BrotherHappy/OSTQuant	https://doi.org/10.48550/arXiv.2501.13987
586	Low-Rank Adapters Meet Neural Architecture Search for LLM Compression	J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain	2025-01-23	arXiv	https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning	http://arxiv.org/abs/2501.16372v1
587	LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps	Andrey Palaev, Adil Khan, Syed M. Ahsan Kazmi	2025-01-23	arXiv	https://github.com/Palandr123/DiffusionU-NetLLM	http://arxiv.org/abs/2501.14046v1
588	Quantification of Large Language Model Distillation	Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Haihong Wu, Tianci Liu, Jiaheng Liu, Hamid Alinejad-Rokny, Min Yang, Yitao Liang, Zhoufutu Wen, Shiwen Ni	2025-01-22	arXiv	https://github.com/Aegis1863/LLMs-Distillation-Quantification	http://arxiv.org/abs/2501.12619v3
589	A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models	Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, Xiao Huang	2025-01-21	arXiv	https://github.com/DEEP-PolyU/Awesome-GraphRAG	https://doi.org/10.48550/arXiv.2501.13958
590	VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Xianwei Zhuang, Yuxin Xie, Yufan Deng, Liming Liang, Jinghan Ru, Yuguo Yin, Yuexian Zou	2025-01-21	arXiv	https://vargpt-1.github.io/	https://doi.org/10.48550/arXiv.2501.12327
591	EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents	Zhili Cheng, Yuge Tu, Ran Li, Shiqi Dai, Jinyi Hu, Shengding Hu, Jiahao Li, Yang Shi, Tianyu Yu, Weize Chen, Lei Shi, Maosong Sun	2025-01-21	arXiv	https://github.com/thunlp/EmbodiedEval	http://arxiv.org/abs/2501.11858v1
592	Can open source large language models be used for tumor documentation in Germany? - An evaluation on urological doctors' notes	Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer	2025-01-21	arXiv	https://github.com/stefan-m-lenz/UroLlmEval	https://doi.org/10.48550/arXiv.2501.12106
593	Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution	Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong	2025-01-20	arXiv	https://depictqa.github.io/deqa-score/	https://doi.org/10.48550/arXiv.2501.11561
594	Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy	Saeid Asgari Taghanaki, Joao Monteiro	2025-01-20	arXiv	https://github.com/asgsaeid/EQT	http://arxiv.org/abs/2501.11721v1
595	Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference	Pouya Hamadanian, Sadjad Fouladi	2025-01-20	arXiv	https://github.com/microsoft/glinthawk	http://arxiv.org/abs/2501.11779v1
596	ChaosEater: Fully Automating Chaos Engineering with Large Language Models	Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri, Yuusuke Nakano	2025-01-19	arXiv	https://ntt-dkiku.github.io/chaos-eater	https://doi.org/10.48550/arXiv.2501.11107
597	InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models	Jing Ding, Kai Feng, Binbin Lin, Jiarui Cai, Qiushi Wang, Yu Xie, Xiaojin Zhang, Zhongyu Wei, Wei Chen	2025-01-19	arXiv	https://github.com/HaileyFamo/InsQABench	https://doi.org/10.48550/arXiv.2501.10943
598	Control LLM: Controlled Evolution for Intelligence Retention in LLM	Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice Leung, Ya Xu	2025-01-19	arXiv	https://github.com/linkedin/ControlLLM	http://arxiv.org/abs/2501.10979v1
599	LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport	Kyeongha Rho, Hyeongkeun Lee, Valentio Iverson, Joon Son Chung	2025-01-18	arXiv:2501.09291, 2025	https://github.com/NAVER-INTEL-Co-Lab/gaudi-lavcap	http://arxiv.org/abs/2501.09291v1
600	PaSa: An LLM Agent for Comprehensive Academic Paper Search	Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, Weinan E	2025-01-17	arXiv	https://github.com/bytedance/pasa	http://arxiv.org/abs/2501.10120v1
601	Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design	Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, Bryan Hooi	2025-01-17	arXiv:2501.08603, 2025	https://github.com/zz1358m/MCTS-AHD-master	http://arxiv.org/abs/2501.08603v2
602	When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis	Ruixuan Zhang, Beichen Wang, Juexiao Zhang, Zilin Bian, Chen Feng, Kaan Ozbay	2025-01-17	arXiv	https://github.com/ai4ce/SeeUnsafe	https://doi.org/10.48550/arXiv.2501.10604
603	FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Kartik Narayan, Vibashan VS, Vishal M. Patel	2025-01-17	arXiv	https://kartik-3004.github.io/facexbench/	http://arxiv.org/abs/2501.10360v1
604	PokerBench: Training Large Language Models to become Professional Poker Players	Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli	2025-01-16	arXiv	https://github.com/pokerllm/pokerbench	https://doi.org/10.48550/arXiv.2501.08328
605	LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu	2025-01-16	arXiv	https://github.com/appletea233/LLaVA-ST	https://doi.org/10.48550/arXiv.2501.08282
606	Gandalf the Red: Adaptive Security for LLMs	Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla	2025-01-16	arXiv …, 2025	https://github.com/lakeraai/dsec-gandalf	http://arxiv.org/abs/2501.07927v1
607	CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation	Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, Baishakhi Ray	2025-01-16	arXiv:2501.08200, 2025	https://github.com/Co1lin/CWEval	http://arxiv.org/abs/2501.08200v1
608	Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing	Eshaan Tanwar, Gayatri Oke, Tanmoy Chakraborty	2025-01-16	arXiv:2501.09127, 2025	https://github.com/EshaanT/Bilingual_processing_LLMs	http://arxiv.org/abs/2501.09127v1
609	OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training	Yijiong Yu, Ziyun Dai, Zekun Wang, Wei Wang, Ran Chen, Ji Pei	2025-01-16	arXiv …, 2025	https://github.com/yuyijiong/fineweb-edu-chinese	http://arxiv.org/abs/2501.08197v1
610	Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs	Qinyu Ma, Yuhao Zhou, Jianfeng Li	2025-01-15	Macromol. Rapid Commun. 2025, 2500065	https://github.com/QinyuMa316/RetroSynthesisAgent	http://arxiv.org/abs/2501.08897v2
611	LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation	Yiran Tao, Jehan Yang, Dan Ding, Zackory Erickson	2025-01-15	arXiv	https://lams-assistance.github.io/	http://arxiv.org/abs/2501.08558v1
612	The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities	Irina Bigoulaeva, Harish Tayyar Madabushi, Iryna Gurevych	2025-01-15	arXiv	https://github.com/UKPLab/arxiv2025-inherent-limits-plms	http://arxiv.org/abs/2501.08716v1
613	A Roadmap to Guide the Integration of LLMs in Hierarchical Planning	Israel Puerta-Merino, Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares	2025-01-14	arXiv	https://llmforplanning.github.io	http://arxiv.org/abs/2501.08068v1
614	Lifelong Learning of Large Language Model based Agents: A Roadmap	Junhao Zheng, Chengming Shi, Xidi Cai, Qiuke Li, Duzhen Zhang, Chenxing Li, Dong Yu, Qianli Ma	2025-01-13	arXiv	https://github.com/qianlima-lab/awesome-lifelong-llm-agent	https://doi.org/10.48550/arXiv.2501.07278
615	SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training	Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu	2025-01-12	arXiv	https://github.com/TianjinYellow/SPAM-Optimizer	http://arxiv.org/abs/2501.06842v1
616	ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning	Xiangru Tang, Tianyu Hu, Muyang Ye, Yanjun Shao, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, Mark Gerstein	2025-01-11	arXiv	https://github.com/gersteinlab/chemagent	https://doi.org/10.48550/arXiv.2501.06590
617	SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution	Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen	2025-01-11	arXiv …, 2025	https://github.com/InternLM/SWE-Fixer	http://arxiv.org/abs/2501.05040v1
618	FairCode: Evaluating Social Bias of LLMs in Code Generation	Yongkang Du, Jen-tse Huang, Jieyu Zhao, Lu Lin	2025-01-11	arXiv:2501.05396, 2025	https://github.com/YongkDu/FairCode	http://arxiv.org/abs/2501.05396v1
619	ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun	2025-01-11	arXiv	https://github.com/thunlp/ChartCoder	https://doi.org/10.48550/arXiv.2501.06598
620	Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models	Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu	2025-01-11	arXiv	https://github.com/Rainier-rq/FollowSoftConstraints	https://doi.org/10.48550/arXiv.2501.04945
621	Demystifying Domain-adaptive Post-training for Financial LLMs	Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty	2025-01-11	arXiv …, 2025	https://github.com/SalesforceAIResearch/FinDap	http://arxiv.org/abs/2501.04961v1
622	HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers	Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, Zhezhi He	2025-01-11	arXiv …, 2025	https://github.com/Intelligent-Computing-Research-Group/HaVen	http://arxiv.org/abs/2501.04908v1
623	Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models	You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun	2025-01-10	arXiv	https://migician-vg.github.io/	https://doi.org/10.48550/arXiv.2501.05767
624	ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events	Duygu Sezen Islakoglu, Jan-Christoph Kalo	2025-01-10	arXiv	https://github.com/duyguislakoglu/chronosense	https://doi.org/10.48550/arXiv.2501.03040
625	Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain	Jing Guo, Nan Li, Ming Xu	2025-01-10	arXiv	https://github.com/CEEAI/elle	https://doi.org/10.48550/arXiv.2501.06277
626	LLM4SR: A Survey on Large Language Models for Scientific Research	Ziming Luo, Zonglin Yang, Zexin Xu, Wei Yang, Xinya Du	2025-01-10	arXiv	https://github.com/du-nlp-lab/LLM4SR	https://doi.org/10.48550/arXiv.2501.04306
627	MinMo: A Multimodal Large Language Model for Seamless Voice Interaction	Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou	2025-01-10	arXiv	https://funaudiollm.github.io/minmo	https://doi.org/10.48550/arXiv.2501.06282
628	FlairGPT: Repurposing LLMs for Interior Designs	Gabrielle Littlefair, Niladri Shekhar Dutt, Niloy J. Mitra	2025-01-10	arXiv:2501.04648, 2025	https://flairgpt.github.io/	http://arxiv.org/abs/2501.04648v1
629	Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation	Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, Yaowei Wang, Yonghong Tian, Jin Tang	2025-01-09	arXiv …, 2025	https://github.com/Event-AHU/Medical_Image_Analysis	http://arxiv.org/abs/2501.03458v1
630	LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases	Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Viren Bajaj, Zeya Ahmad	2025-01-06	arXiv	https://github.com/cvs-health/langfair	https://doi.org/10.48550/arXiv.2501.03112
631	BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning	Beichen Zhang, Yuhong Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Haodong Duan, Yuhang Cao, Dahua Lin, Jiaqi Wang	2025-01-06	arXiv	https://github.com/beichenzbc/BoostStep	https://doi.org/10.48550/arXiv.2501.03226
632	Visual Large Language Models for Generalized and Specialized Applications	Yifan Li, Zhixin Lai, Wentao Bao, Zhen Tan, Anh Dao, Kewei Sui, Jiayi Shen, Dong Liu, Huan Liu, Yu Kong	2025-01-06	arXiv	https://github.com/JackYFL/awesome-VLLMs	https://doi.org/10.48550/arXiv.2501.02765
633	CALM: Curiosity-Driven Auditing for Large Language Models	Xiang Zheng, Longxiang Wang, Yi Liu, Xingjun Ma, Chao Shen, Cong Wang	2025-01-06	arXiv	https://github.com/x-zheng16/CALM	https://doi.org/10.48550/arXiv.2501.02997
634	HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs	Saleh Ashkboos, Mahdi Nikdan, Soroush Tabesh, Roberto L. Castro, Torsten Hoefler, Dan Alistarh	2025-01-05	arXiv	https://github.com/IST-DASLab/HALO	http://arxiv.org/abs/2501.02625v2
635	Multi-LLM Collaborative Caption Generation in Scientific Documents	Jaeyoung Kim, Jongho Lee, Hong-Jun Choi, Ting-Yao Hsu, Chieh-Yang Huang, Sungchul Kim, Ryan Rossi, Tong Yu, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang, Sungchul Choi	2025-01-05	arXiv	https://github.com/teamreboott/MLBCAP	http://arxiv.org/abs/2501.02552v1
636	MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments	Cai Yin, Zhouhong Gu, Du Zhaohan, Ye Zheyu, Cao Shaosheng, Xu Yiqian, Feng Hongwei, Chen Ping	2025-01-04	arXiv	https://github.com/lime728/MIRAGE	https://doi.org/10.48550/arXiv.2501.01652
637	Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities	Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari	2025-01-04	arXiv	https://github.com/TaraRadvand74/llm-text-detection	http://arxiv.org/abs/2501.02406v2
638	Aligning Large Language Models for Faithful Integrity Against Opposing Argument	Yong Zhao, Yang Deng, See-Kiong Ng, Tat-Seng Chua	2025-01-04	arXiv	https://github.com/zhaoy777/AFICE	https://doi.org/10.48550/arXiv.2501.01336
639	UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility	Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang	2025-01-04	arXiv	https://github.com/Hub-Tian/UAVs_Meet_LLMs	http://arxiv.org/abs/2501.02341v1
640	REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models	Jian Hu	2025-01-04	arXiv	https://github.com/OpenRLHF/OpenRLHF	https://doi.org/10.48550/arXiv.2501.03262
641	Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap	Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, Feiran Huang, Sheng Zhou, Jiajun Bu, Allen Lin, James Caverlee, Fakhri Karray, Irwin King, Philip S. Yu	2025-01-04	arXiv	https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation	https://doi.org/10.48550/arXiv.2501.01945
642	Text Clustering as Classification with LLMs	Chen Huang, Guoxiu He	2025-01-04	Available at SSRN 5081002	https://github.com/ECNU-Text-Computing/Text-Clustering-via-LLM	http://arxiv.org/abs/2410.00927v2
643	Instruction-Following Evaluation for Large Language Models	Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou	2025-01-03	arXiv	https://github.com/google-research/google-research/tree/master/instruction_following_eval	https://doi.org/10.48550/arXiv.2311.07911
644	FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze	2025-01-03	arXiv …, 2025	http://github.com/flashinfer-ai/flashinfer	http://arxiv.org/abs/2501.01005v1
645	Labels Generated by Large Language Model Helps Measuring People's Empathy in Vitro	Md. Rakibul Hasan, Yue Yao, Md. Zakir Hossain, Aneesh Krishna, Imre Rudas, Shafin Rahman, Tom Gedeon	2025-01-02	arXiv	https://github.com/hasan-rakibul/LLMPathy	https://doi.org/10.48550/arXiv.2501.00691
646	Aligning LLMs with Domain Invariant Reward Models	David Wu, Sanjiban Choudhury	2025-01-02	arXiv:2501.00911, 2025	https://github.com/portal-cornell/dial	http://arxiv.org/abs/2501.00911v1
647	Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models	Anmol Reddy Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid A. Hasan, Elita A. Lobo	2025	arXiv	https://github.com/molereddy/Alternate-Preference-Optimization	https://doi.org/10.48550/arXiv.2409.13474
648	Surveillance Video-and-Language Understanding: from Small to Large Multimodal Models	Tongtong Yuan, Xuange Zhang, Bo Liu, Kun Liu, Jian Jin, Zhenzhen Jiao	2025	IEEE Transactions on Circuits and Systems for Video Technology	https://xuange923.github.io/Surveillance-Video-Understanding	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10681489
649	LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework	Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He	2025	arXiv	https://github.com/QiaoYRan/LOGIN	https://doi.org/10.48550/arXiv.2405.13902
650	Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?	Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi	2025	arXiv	https://github.com/zhongjian-zhang/LLM4RGNN	https://doi.org/10.48550/arXiv.2408.08685
651	TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning	Xiang Li, Yunshi Lan, Chao Yang	2025	arXiv	https://github.com/Ashura5/TreeEval	https://doi.org/10.48550/arXiv.2402.13125
652	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang, Lingdong Shen, Jia Liu, Fangxin Shang, Hongxiang Li, Haifeng Huang, Yehui Yang	2025	AAAI	https://github.com/ShawnHuang497/MedPLIB	https://doi.org/10.1609/aaai.v39i4.32394
653	Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval	Guangyuan Ma, Yongliang Ma, Xing Wu, Zhenpeng Su, Ming Zhou, Songlin Hu	2025	arXiv	https://github.com/tdro-llm/tdro	https://doi.org/10.48550/arXiv.2408.10613
654	SS-GEN: A Social Story Generation Framework with Large Language Models	Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, Jian Yu	2025	AAAI	https://github.com/MIMIFY/SS-GEN	https://doi.org/10.1609/aaai.v39i2.32119
655	SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models	Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu	2025	arXiv	https://SEAS-LLM.github.io/	https://doi.org/10.48550/arXiv.2408.02632
656	Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework	Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li	2025	arXiv	https://github.com/Event-AHU/OpenPAR	https://doi.org/10.48550/arXiv.2408.09720
657	PAT: Pruning-Aware Tuning for Large Language Models	Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du	2025	arXiv	https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning	https://doi.org/10.48550/arXiv.2408.14721
658	One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models	Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, Ji-Rong Wen	2025	arXiv	https://github.com/DaoD/SPRING/	https://doi.org/10.48550/arXiv.2405.19670
659	NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning	Xin Yi, Shunfan Zheng, Linlin Wang, Gerard de Melo, Xiaoling Wang, Liang He	2025	AAAI	https://github.com/xinykou/NLSR	https://doi.org/10.1609/aaai.v39i24.34762
660	MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing	Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen	2025	arXiv	https://github.com/zjwang21/MoE-LPR	https://doi.org/10.48550/arXiv.2408.11396
661	CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?	Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma	2025	arXiv	https://github.com/CodeLLM-Research/CodeJudge-Eval	https://doi.org/10.48550/arXiv.2408.10718
662	Mitigating Social Bias in Large Language Models: A Multi-Objective Approach Within a Multi-Agent Framework	Zhenjie Xu, Wenqing Chen, Yi Tang, Xuanying Li, Cheng Hu, Zhixuan Chu, Kui Ren, Zibin Zheng, Zhichao Lu	2025	AAAI	https://github.com/Cortantse/MOMA	https://doi.org/10.1609/aaai.v39i24.34748
663	Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models	Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan	2025	AAAI	https://github.com/dirtycomputer/O2M_attack	https://doi.org/10.1609/aaai.v39i4.32396
664	MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector	Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang	2025	arXiv	https://github.com/wjfu99/MIA-Tuner	https://doi.org/10.48550/arXiv.2408.08661
665	LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation	Qidong Liu, Xian Wu, Wanyu Wang, Yejing Wang, Yuanshao Zhu, Xiangyu Zhao, Feng Tian, Yefeng Zheng	2025	AAAI	https://github.com/Applied-Machine-Learning-Lab/LLMEmb	https://doi.org/10.1609/aaai.v39i11.33327
666	LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application	Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai	2025	AAAI	https://github.com/adxcreative/LEARN	https://doi.org/10.1609/aaai.v39i11.33291
667	Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models	Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao	2025	arXiv	https://github.com/ChenhuiHu/knowledge_in_superposition	https://doi.org/10.48550/arXiv.2408.07413
668	ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation	Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu	2025	AAAI	https://github.com/zhaoyuzhi/ICM-Assistant	https://doi.org/10.1609/aaai.v39i8.32908
669	IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities	Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin	2025	arXiv	https://github.com/360CVGroup/Inner-Adaptor-Architecture	https://doi.org/10.48550/arXiv.2408.12902
670	Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning	Junlin He, Tong Nie, Wei Ma	2025	arXiv	https://github.com/Umaruchain/LLMGeovec	https://doi.org/10.48550/arXiv.2408.12116
671	Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models	Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou	2025	arXiv	https://github.com/ywh187/FitPrune	https://doi.org/10.48550/arXiv.2409.10197
672	Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering	Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Shengping Liu, Kang Liu, Jun Zhao	2025	COLING	https://github.com/Xnhyacinth/IAG	https://aclanthology.org/2025.coling-main.89/
673	QuickLLaMA: Query-aware Inference Acceleration for Large Language Models	Jingyao Li, Han Shi, Sitong Wu, Chuanyang Zheng, Zhenguo Li, Xin Jiang, Hong Xu, Jiaya Jia	2025	COLING	https://github.com/dvlab-research/Q-LLM	https://aclanthology.org/2025.coling-main.34/
674	Distilling Rule-based Knowledge into Large Language Models	Wenkai Yang, Yankai Lin, Jie Zhou, Ji-Rong Wen	2025	COLING	https://github.com/RUCBM/rule-distillation	https://aclanthology.org/2025.coling-main.61/
675	EarthMarker: A Visual Prompting Multimodal Large Language Model for Remote Sensing	Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Jun Li, Xuerui Mao	2025	IEEE Trans. Geosci. Remote. Sens.	https://github.com/wivizhang/EarthMarker	https://doi.org/10.1109/TGRS.2024.3523505
676	Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models	Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui	2025	arXiv	https://github.com/reml-group/DoG	https://doi.org/10.48550/arXiv.2409.03155
677	Towards Efficient and Effective Adaptation of Large Language Models for Sequential Recommendation	Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu	2025	arXiv	https://github.com/justarter/E2URec	https://doi.org/10.48550/arXiv.2310.01612
678	Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification	Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers	2025	arXiv	https://github.com/rsummers11/CADLab/tree/master/MAPLEZ_LLM_report_labeler/	https://doi.org/10.48550/arXiv.2403.04024
679	Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning	Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng	2025	arXiv	https://github.com/zengxingchen/ChartQA-MLLM	https://doi.org/10.48550/arXiv.2407.20174
680	Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models	Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong	2025	COLING	https://github.com/hfutml/Calibration-MLLM	https://aclanthology.org/2025.coling-main.208/
681	Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges	Vinay Samuel, Yue Zhou, Henry Peng Zou	2025	arXiv	https://github.com/vsamuel2003/data-contamination	https://doi.org/10.48550/arXiv.2409.09927
682	The Only Way is Ethics: A Guide to Ethical Research with Large Language Models	Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch	2025	COLING	https://github.com/MxEddie/Ethics-Whitepaper	https://aclanthology.org/2025.coling-main.603/
683	The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models	Zihui Wu, Haichang Gao, Jianping He, Ping Wang	2025	arXiv	https://github.com/wooozihui/jailbreakfunction	https://doi.org/10.48550/arXiv.2407.17915
684	Retrieval Augmented Instruction Tuning for Open NER with Large Language Models	Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang	2025	arXiv	https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER	https://doi.org/10.48550/arXiv.2406.17305
685	Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models	Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong	2025	COLING	https://github.com/wutaiqiang/LLM_KD_AKL	https://aclanthology.org/2025.coling-main.383/
686	Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study	Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen	2025	COLING	https://github.com/open-compass/DevEval	https://aclanthology.org/2025.coling-main.502/
687	Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching	Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han, Le Sun, Hao Wang, Zhenyu Zeng	2025	arXiv	https://github.com/tshu-w/ComEM	https://doi.org/10.48550/arXiv.2405.16884
688	Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models	Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang G. Cai	2025	arXiv	https://github.com/ChenDelong1999/Linguistic-Similarity	https://doi.org/10.48550/arXiv.2409.12435
689	LLMTreeRec: Unleashing the Power of Large Language Models for Cold-Start Recommendations	Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang	2025	COLING	https://github.com/Applied-Machine-Learning-Lab/LLMTreeRec	https://aclanthology.org/2025.coling-main.59/
690	KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting	Thilini Wijesiriwardene, Ruwan Wickramarachchi, Sreeram Reddy Vennam, Vinija Jain, Aman Chadha, Amitava Das, Ponnurangam Kumaraguru, Amit P. Sheth	2025	COLING	https://github.com/Thiliniiw/KnowledgePrompts/	https://aclanthology.org/2025.coling-main.268/
691	Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation	Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang	2025	arXiv	https://github.com/RUCAIBox/LLM-Knowledge-Boundary	https://doi.org/10.48550/arXiv.2307.11019
692	InternLM-Law: An Open Source Chinese Legal Large Language Model	Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge	2025	arXiv	https://github.com/InternLM/InternLM-Law	https://doi.org/10.48550/arXiv.2406.14887
693	ICLEval: Evaluating In-Context Learning Ability of Large Language Models	Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen	2025	arXiv	https://github.com/yiye3/ICLEval	https://doi.org/10.48550/arXiv.2406.14955
694	Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining	Zongru Wu, Pengzhou Cheng, Lingyong Fang, Zhuosheng Zhang, Gongshen Liu	2025	COLING	https://github.com/ZrW00/GraceFul	https://aclanthology.org/2025.coling-main.220/
695	GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models	Zike Yuan, Ming Liu, Hui Wang, Bing Qin	2025	arXiv	https://github.com/ZIKEYUAN/GraCoRe	https://doi.org/10.48550/arXiv.2407.02936
696	Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion	Ben Liu, Jihai Zhang, Fangquan Lin, Cheng Yang, Min Peng	2025	COLING	https://github.com/LB0828/FtG	https://aclanthology.org/2025.coling-main.740/
697	Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?	Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang	2025	COLING	https://github.com/Luckfort/CD	https://aclanthology.org/2025.coling-main.37/
698	Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models	Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, Yu Hong	2025	COLING	https://github.com/jiah-li/magic	https://aclanthology.org/2025.coling-main.305/
699	Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation	Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan, Zheng Hui, Jiawei Yao	2025	AAAI	https://github.com/FanshuoZeng/Simignore	https://doi.org/10.1609/aaai.v39i10.33107
700	The Geometry of Categorical and Hierarchical Concepts in Large Language Models	Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch	2025	arXiv	https://github.com/KihoPark/LLM_Categorical_Hierarchical_Representations	https://doi.org/10.48550/arXiv.2406.01506
701	ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models	Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang	2025	arXiv	https://github.com/yejipark-m/ConVis	https://doi.org/10.48550/arXiv.2408.13906
702	DiscoveryBench: Towards Data-Driven Discovery with Large Language Models	Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark	2025	arXiv	https://github.com/allenai/discoverybench	https://doi.org/10.48550/arXiv.2407.01725
703	MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation	Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia	2025	ICLR	https://github.com/dvlab-research/MR-GSM8K	https://openreview.net/forum?id=br4H61LOoI
704	LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code	Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica	2025	arXiv	https://livecodebench.github.io/	https://doi.org/10.48550/arXiv.2403.07974
705	Large Language Models are Interpretable Learners	Ruochen Wang, Si Si, Felix X. Yu, Dorothea Wiesmann Rothuizen, Cho-Jui Hsieh, Inderjit S. Dhillon	2025	ICLR	https://github.com/ruocwang/llm-symbolic-program	https://openreview.net/forum?id=hTphfqtafO
706	LLaMA-Omni: Seamless Speech Interaction with Large Language Models	Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng	2025	arXiv	https://github.com/ictnlp/LLaMA-Omni	https://doi.org/10.48550/arXiv.2409.06666
707	LLM-SR: Scientific Equation Discovery via Programming with Large Language Models	Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K. Reddy	2025	arXiv	https://github.com/deep-symbolic-mathematics/LLM-SR	https://doi.org/10.48550/arXiv.2404.18400
708	LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models	Xiaohao Yang, He Zhao, Dinh Q. Phung, Wray L. Buntine, Lan Du	2025	arXiv	https://github.com/Xiaohao-Yang/Topic_Model_Evaluation	https://doi.org/10.48550/arXiv.2406.09008
709	KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models	Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang	2025	ICLR	https://github.com/juyongjiang/KaSA	https://openreview.net/forum?id=OQqNieeivq
710	Improved Techniques for Optimization-Based Jailbreaking on Large Language Models	Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin	2025	arXiv	https://github.com/jiaxiaojunQAQ/I-GCG	https://doi.org/10.48550/arXiv.2405.21018
711	FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models	Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian	2025	ICLR	https://github.com/microsoft/CADGeneration/FlexCAD	https://openreview.net/forum?id=Z0eiiV3Yyh
712	Efficient Evolutionary Search Over Chemical Space with Large Language Models	Haorui Wang, Marta Skreta, Cher Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang	2025	ICLR	http://github.com/zoom-wang112358/MOLLEO	https://openreview.net/forum?id=awWiNvQwf3
713	Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification	Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaosheng Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin	2025	ICLR	https://github.com/Osilly/dynamic_llava	https://openreview.net/forum?id=hzVpZDrW73
714	Developing safe and responsible large language model: can we balance bias reduction and language understanding?	Shaina Raza, Oluwanifemi Bamgbose, Shardul Ghuge, Fatemeh Tavakoli, Deepak John Reji, Syed Raza Bashir	2025	Mach. Learn.	https://github.com/shainarazavi/Safe-Responsible-LLM	https://doi.org/10.1007/s10994-025-06767-4
715	Neuron based Personality Trait Induction in Large Language Models	Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao, Ji-Rong Wen	2025	ICLR	https://github.com/RUCAIBox/NPTI	https://openreview.net/forum?id=LYHEY783Np
716	Concept Bottleneck Large Language Models	Chung-En Sun, Tuomas P. Oikarinen, Berk Ustun, Tsui-Wei Weng	2025	ICLR	https://github.com/Trustworthy-ML-Lab/CB-LLMs	https://openreview.net/forum?id=RC5FPYVQaH
717	Can Large Language Models Understand Symbolic Graphics Programs?	Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf	2025	arXiv	https://sgp-bench.github.io/	https://doi.org/10.48550/arXiv.2408.08313
718	CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery	Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma Gongque, Jianing Yu, Qiuna Tan, Weiran Xu	2025	arXiv	https://github.com/csbench/csbench	https://doi.org/10.48550/arXiv.2406.08587
719	Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation	Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu	2025	arXiv	https://github.com/git-disl/Booster	https://doi.org/10.48550/arXiv.2409.01586
720	Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph	Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Boda Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov	2025	arXiv	https://github.com/IINemo/lm-polygraph	https://doi.org/10.48550/arXiv.2406.15627
721	Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression	Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang	2025	arXiv	https://github.com/TUDa-HWAI/Basis_Sharing	https://doi.org/10.48550/arXiv.2410.03765
722	An Engorgio Prompt Makes Large Language Model Babble on	Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu	2025	ICLR	https://github.com/jianshuod/Engorgio-prompt	https://openreview.net/forum?id=m4eXBo0VNc
723	Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards	Xiaoyu Yang, Jie Lu, En Yu	2025	ICLR	https://github.com/Anonymous0Knight/ConceptDriftMLLMs	https://openreview.net/forum?id=b20VK2GnSs
724	AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	Kim Sung-Bin, Oh Hyun-Bin, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh	2025	ICLR	https://github.com/AVHBench/AVHBench	https://openreview.net/forum?id=jTEKTdI3K9
725	A Probabilistic Perspective on Unlearning and Alignment for Large Language Models	Yan Scholten, Stephan Günnemann, Leo Schwinn	2025	arXiv	https://github.com/yascho/probabilistic-unlearning	https://doi.org/10.48550/arXiv.2410.03523
726	A Closer Look into Mixture-of-Experts in Large Language Models	Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu	2025	arXiv	https://github.com/kamanphoebe/Look-into-MoEs	https://doi.org/10.48550/arXiv.2406.18219
727	Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models	Jingyang Zhang, Jingwei Sun, Eric C. Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Helen Li	2025	arXiv	https://zjysteven.github.io/mink-plus-plus/	https://doi.org/10.48550/arXiv.2404.02936
728	Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference	Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji	2025	arXiv	https://github.com/lzhxmu/VTW	https://doi.org/10.48550/arXiv.2405.05803
729	NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions	Mehak Preet Dhaliwal, Andong Hua, Laya Pullela, Ryan Burke, Yao Qin	2025	arXiv	https://mehak126.github.io/nutribench.html	https://doi.org/10.48550/arXiv.2407.12843
730	UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model	Zhaowei Li, Wei Wang, Yiqing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang	2025	arXiv	https://github.com/lzw-lzw/UnifiedMLLM	https://doi.org/10.48550/arXiv.2408.02503
731	Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution	Wentao Tan, Qiong Cao, Yibing Zhan, Chao Xue, Changxing Ding	2025	AAAI	https://github.com/WentaoTan/SENA	https://doi.org/10.1609/aaai.v39i7.32774
732	SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models	Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao, Hangyu Mao, Fuzheng Zhang	2025	arXiv	https://sheetagent.github.io	https://doi.org/10.48550/arXiv.2403.03636
733	Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models	Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao	2025	arXiv	https://github.com/liuqi6777/pe_rank	https://doi.org/10.48550/arXiv.2406.14848
734	Learning Multiple Object States from Actions via Large Language Models	Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato	2025	WACV	https://masatate.github.io/ObjStatefromAction.github.io/	https://doi.org/10.1109/WACV61041.2025.00925
735	Large Language Models Empowered Personalized Web Agents	Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua	2025	WWW	https://hongrucai.github.io/PersonalWAB/	https://doi.org/10.1145/3696410.3714842
736	Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval	Luo Ji, Feixiang Guo, Teng Chen, Qingqing Gu, Xiaoyu Wang, Ningyuan Xi, Yihong Wang, Peng Yu, Yue Zhao, Hongyang Lei, Zhonglin Jiang, Yong Chen	2025	ECIR	https://github.com/flyfree5/LaHoRe	https://doi.org/10.1007/978-3-031-88714-7_27
737	CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation	Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, Xiangnan He	2025	arXiv	https://github.com/zyang1580/CoLLM	https://doi.org/10.48550/arXiv.2310.19488
738	Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models	Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma	2025	ICLR	https://github.com/BryceZhuo/PolyCom	https://openreview.net/forum?id=CbpWPbYHuv
739	DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Do cumentation	Anna C. Doris, Daniele Grandi, Ryan Tomich, Md Ferdous Alam, Mohammadmehdi Ataei, Hyunmin Cheong, Faez Ahmed	2025	J. Comput. Inf. Sci. Eng.	https://github.com/anniedoris/design_qa/	https://doi.org/10.1115/1.4067333
740	Zero-shot Model-based Reinforcement Learning using Large Language Models	Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl	2025	arXiv	https://github.com/abenechehab/dicl	https://doi.org/10.48550/arXiv.2410.11711
741	WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct	Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jian-Guang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Yansong Tang, Dongmei Zhang	2025	ICLR	https://github.com/nlpxucan/WizardLM	https://openreview.net/forum?id=mMPMHWOdOy
742	Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?	Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp, Jindong Gu	2025	WACV	https://chenxshuo.github.io/m-icl/	https://doi.org/10.1109/WACV61041.2025.00585
743	TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking	Danqing Wang, Jianxin Ma, Fei Fang, Lei Li	2025	ICLR	https://github.com/dqwang122/ThinkHub	https://openreview.net/forum?id=VIUisLx8lQ
744	Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation	Mufei Li, Siqi Miao, Pan Li	2025	ICLR	https://github.com/Graph-COM/SubgraphRAG	https://openreview.net/forum?id=JvkuZZ04O7
745	Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation	Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, Jian Guo	2025	ICLR	https://github.com/IDEA-FinAI/ToG-2	https://openreview.net/forum?id=oFBu7qaZpS
746	REvolve: Reward Evolution with Large Language Models using Human Feedback	Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires	2025	ICLR	https://rishihazra.github.io/REvolve	https://openreview.net/forum?id=cJPUpL8mOw
747	Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models	Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh	2025	NAACL	https://github.com/parameterlab/mia-scaling	https://aclanthology.org/2025.findings-naacl.234/
748	Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models	Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou	2025	arXiv	https://github.com/QwenLM/AutoIF	https://doi.org/10.48550/arXiv.2406.13542
749	REEF: Representation Encoding Fingerprints for Large Language Models	Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao	2025	ICLR	https://github.com/tmylla/REEF	https://openreview.net/forum?id=SnDmPkOJ0T
750	Steering Large Language Models between Code Execution and Textual Reasoning	Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang	2025	arXiv	https://yongchao98.github.io/CodeSteer/	https://doi.org/10.48550/arXiv.2410.03524
751	StringLLM: Understanding the String Processing Capability of Large Language Models	Xilong Wang, Hao Fu, Jindong Wang, Neil Zhenqiang Gong	2025	arXiv	https://github.com/wxl-lxw/StringLLM	https://doi.org/10.48550/arXiv.2410.01208
752	TESTEVAL: Benchmarking Large Language Models for Test Case Generation	Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma	2025	arXiv	https://llm4softwaretesting.github.io	https://doi.org/10.48550/arXiv.2406.04531
753	A Closer Look at Machine Unlearning for Large Language Models	Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin	2025	arXiv	https://github.com/sail-sg/closer-look-LLM-unlearning	https://doi.org/10.48550/arXiv.2410.08109
754	Distributed Mixture-of-Agents for Edge Inference with Large Language Models	Purbesh Mitra, Priyanka Kaswan, Sennur Ulukus	2024-12-30	arXiv	https://github.com/purbeshmitra/distributed_moa	http://arxiv.org/abs/2412.21200v1
755	Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study	Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen	2024-12-29	arXiv	https://github.com/YuHuiGao/FG-Bench	http://arxiv.org/abs/2412.20613v1
756	Mind the Data Gap: Bridging LLMs to Enterprise Data Integration	Moe Kayali, Fabian Wenz, Nesime Tatbul, Çağatay Demiralp	2024-12-29	arXiv	https://goby-benchmark.github.io/	http://arxiv.org/abs/2412.20331v1
757	TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication	Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang	2024-12-29	arXiv	https://github.com/ACA-Lab-SJTU/token-ring	http://arxiv.org/abs/2412.20501v1
758	On the Compositional Generalization of Multimodal LLMs for Medical Imaging	Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang	2024-12-28	arXiv	https://github.com/FreedomIntelligence/Med-MAT	http://arxiv.org/abs/2412.20070v1
759	Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Sijia Chen, Baochun Li	2024-12-27	ICML	https://github.com/iQua/llmpebase/tree/main/examples/ThoughtRollback	https://openreview.net/forum?id=aoAPOOtN9E
760	A Survey on Large Language Model Acceleration based on KV Cache Management	Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen	2024-12-27	arXiv	https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management	http://arxiv.org/abs/2412.19442v2
761	Gradient Weight-normalized Low-rank Projection for Efficient LLM Training	Jia-Hong Huang, Yixian Shen, Hongyi Zhu, Stevan Rudinac, Evangelos Kanoulas	2024-12-27	arXiv	https://github.com/Jhhuangkay/Gradient-Weight-normalized-Low-rank-Projection-for-Efficient-LLM-Training	http://arxiv.org/abs/2412.19616v1
762	MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Jiaqi Fan, Jianhua Wu, Jincheng Gao, Jianhao Yu, Yafei Wang, Hongqing Chu, Bingzhao Gao	2024-12-27	arXiv	https://github.com/fjq-tongji/MLLM-SUL	http://arxiv.org/abs/2412.19406v1
763	Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment	Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang	2024-12-26	arXiv	https://github.com/OpenGVLab/TPO	http://arxiv.org/abs/2412.19326v1
764	CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models	Ping Guo, Qingfu Zhang, Xi Lin	2024-12-25	arXiv	https://github.com/pgg3/CoEvo	http://arxiv.org/abs/2412.18890v1
765	3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding	Tatiana Zemskova, Dmitry Yudin	2024-12-24	arXiv	https://github.com/CognitiveAISystems/3DGraphLLM	http://arxiv.org/abs/2412.18450v2
766	Distilling Fine-grained Sentiment Understanding from Large Language Models	Yice Zhang, Guangyu Xie, Hongling Xu, Kaiheng Hou, Jianzhu Bao, Qianlong Wang, Shiwei Chen, Ruifeng Xu	2024-12-24	arXiv	https://github.com/HITSZ-HLT/FSA-Distillation	http://arxiv.org/abs/2412.18552v2
767	Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving	Hao Pang, Zhenpo Wang, Guoqiang Li	2024-12-24	arXiv	https://bitmobility.github.io/LGDRL/	http://arxiv.org/abs/2412.18511v1
768	Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models	Xuan Lin, Long Chen, Yile Wang, Xiangxiang Zeng, Philip S. Yu	2024-12-24	arXiv	https://github.com/chenlong164/PEIT	http://arxiv.org/abs/2412.18084v1
769	Token-Budget-Aware LLM Reasoning	Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen	2024-12-24	arXiv	https://github.com/GeniusHTX/TALE	http://arxiv.org/abs/2412.18547v3
770	Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance	Nicolas Devatine, Louis Abraham	2024-12-23	arXiv	https://github.com/NDV-tiime/CompressionDistance	http://arxiv.org/abs/2412.17321v1
771	Large Language Model Safety: A Holistic Survey	Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, Deyi Xiong	2024-12-23	arXiv	https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers	http://arxiv.org/abs/2412.17686v1
772	CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models	Yeyuan Wang, Dehong Gao, Bin Li, Rujiao Long, Lei Yi, Xiaoyan Cai, Libin Yang, Jinxia Zhang, Shanqing Yu, Qi Xuan	2024-12-22	arXiv	https://github.com/Gavin001201/CoF	http://arxiv.org/abs/2412.16869v1
773	MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge	Jie He, Nan Hu, Wanqiu Long, Jiaoyan Chen, Jeff Z. Pan	2024-12-22	arXiv	https://github.com/probe2/multi-hop/	http://arxiv.org/abs/2412.17032v1
774	PruneVid: Visual Token Pruning for Efficient Video Large Language Models	Xiaohu Huang, Hao Zhou, Kai Han	2024-12-20	arXiv	https://github.com/Visual-AI/PruneVid	http://arxiv.org/abs/2412.16117v1
775	TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use	Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, Zhengyin Du	2024-12-20	arXiv	https://github.com/Junjie-Ye/TL-Training	http://arxiv.org/abs/2412.15495v1
776	Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation	Xiaoqiang Kang, Zimu Wang, Xiaobo Jin, Wei Wang, Kaizhu Huang, Qiufeng Wang	2024-12-20	arXiv	https://github.com/Jason8Kang/TELL	http://arxiv.org/abs/2412.15594v1
777	WebLLM: A High-Performance In-Browser LLM Inference Engine	Charlie F. Ruan, Yucheng Qin, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen	2024-12-20	arXiv	https://github.com/mlc-ai/web-llm	http://arxiv.org/abs/2412.15803v1
778	Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models	Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou	2024-12-19	arXiv	https://github.com/8421BCD/fullrank	http://arxiv.org/abs/2412.14574v1
779	On Verbalized Confidence Scores for LLMs	Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada	2024-12-19	arXiv	https://github.com/danielyxyang/llm-verbalized-uq	http://arxiv.org/abs/2412.14737v1
780	ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study	Eric Modesitt, Ke Yang, Spencer Hulsey, Chengxiang Zhai, Volodymyr Kindratenko	2024-12-19	arXiv	https://github.com/ModeEric/ORBIT-Llama	http://arxiv.org/abs/2412.14436v1
781	Agent-SafetyBench: Evaluating the Safety of LLM Agents	Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, Minlie Huang	2024-12-19	arXiv	https://github.com/thu-coai/Agent-SafetyBench	http://arxiv.org/abs/2412.14470v1
782	Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes	Katarzyna Kobalczyk, Claudio Fanconi, Hao Sun, Mihaela van der Schaar	2024-12-18	arXiv	https://github.com/kasia-kobalczyk/few-shot-steerable-alignment	http://arxiv.org/abs/2412.13998v1
783	ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals	Utkarsh Saxena, Sayeh Sharify, Kaushik Roy, Xin Wang	2024-12-18	arXiv	https://github.com/utkarsh-dmx/project-resq	http://arxiv.org/abs/2412.14363v1
784	InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Cong Wei, Yujie Zhong, Haoxian Tan, Yingsen Zeng, Yong Liu, Zheng Zhao, Yujiu Yang	2024-12-18	arXiv	https://github.com/congvvc/InstructSeg	http://arxiv.org/abs/2412.14006v1
785	Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces	Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie	2024-12-18	arXiv	https://vision-x-nyu.github.io/thinking-in-space.github.io/	http://arxiv.org/abs/2412.14171v1
786	Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting	Vijay Goyal, Mustafa Khan, Aprameya Tirupati, Harveer Saini, Michael Lam, Kevin Zhu	2024-12-18	arXiv	https://github.com/alonso130r/knowledge-distillation	http://arxiv.org/abs/2412.17846v1
787	Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings	Yuanhe Zhang, Zhenhong Zhou, Wei Zhang, Xinyue Wang, Xiaojun Jia, Yang Liu, Sen Su	2024-12-18	arXiv	https://github.com/shuita2333/AutoDoS	http://arxiv.org/abs/2412.13879v1
788	Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games	Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, Kai Han	2024-12-18	arXiv	https://visual-ai.github.io/gamebot	http://arxiv.org/abs/2412.13602v1
789	Are Your LLMs Capable of Stable Reasoning?	Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen	2024-12-17	arXiv	https://github.com/open-compass/GPassK	http://arxiv.org/abs/2412.13147v2
790	Assessing the Limitations of Large Language Models in Clinical Fact Decomposition	Monica Munnangi, Akshay Swaminathan, Jason Alan Fries, Jenelle Jindal, Sanjana Narayanan, Ivan Lopez, Lucia Tu, Philip Chung, Jesutofunmi A. Omiye, Mehr Kashyap, Nigam Shah	2024-12-17	arXiv	https://github.com/som-shahlab/factehr	http://arxiv.org/abs/2412.12422v1
791	Benchmarking and Understanding Compositional Relational Reasoning of LLMs	Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang Liang	2024-12-17	arXiv	https://github.com/Caiyun-AI/GAR	http://arxiv.org/abs/2412.12841v1
792	Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks	Xunkai Li, Zhengyu Wu, Jiayi Wu, Hanwen Cui, Jishuo Jia, Rong-Hua Li, Guoren Wang	2024-12-17	arXiv	https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers	http://arxiv.org/abs/2412.12456v1
793	SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents	Sheng Yin, Xianghe Pang, Yuanzhuo Ding, Menglan Chen, Yutong Bi, Yichen Xiong, Wenhao Huang, Zhen Xiang, Jing Shao, Siheng Chen	2024-12-17	arXiv	https://github.com/shengyin1224/SafeAgentBench	http://arxiv.org/abs/2412.13178v2
794	SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for Autonomous Vehicles with Large Language Models	Zhiyuan Zhou, Heye Huang, Boqi Li, Shiyue Zhao, Yao Mu, Jianqiang Wang	2024-12-17	arXiv	https://mezzi33.github.io/SafeDrive/	http://arxiv.org/abs/2412.13238v2
795	RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation	Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou	2024-12-16	arXiv	https://github.com/sunnynexus/RetroLLM	http://arxiv.org/abs/2412.11919v1
796	RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement	Junjie Lin, Jian Zhao, Lin Liu, Yue Deng, Youpeng Zhao, Lanxiao Huang, Xia Lin, Wengang Zhou, Houqiang Li	2024-12-16	arXiv	https://github.com/Linjunjie99/RL-LLM-DT	http://arxiv.org/abs/2412.11417v2
797	LLMs Can Simulate Standardized Patients via Agent Coevolution	Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying	2024-12-16	arXiv	https://github.com/ZJUMAI/EvoPatient	http://arxiv.org/abs/2412.11716v1
798	Does VLM Classification Benefit from LLM Description Semantics?	Pingchuan Ma, Lennart Rietdorf, Dmytro Kotovenko, Vincent Tao Hu, Björn Ommer	2024-12-16	arXiv	https://github.com/CompVis/DisCLIP	http://arxiv.org/abs/2412.11917v3
799	Analyzing Images of Legal Documents: Toward Multi-Modal LLMs for Access to Justice	Hannes Westermann, Jaromir Savelka	2024-12-16	arXiv	https://github.com/hwestermann/AI4A2J_analyzing_images_of_legal_documents	http://arxiv.org/abs/2412.15260v1
800	BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement	Yuhao Du, Shunian Chen, Wenbo Zan, Peizhao Li, Mingxuan Wang, Dingjie Song, Bo Li, Yan Hu, Benyou Wang	2024-12-16	arXiv	https://github.com/FreedomIntelligence/BlenderLLM	http://arxiv.org/abs/2412.14203v1
801	Empowering LLMs to Understand and Generate Complex Vector Graphics	Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu	2024-12-15	arXiv	https://ximinng.github.io/LLM4SVGProject/	http://arxiv.org/abs/2412.11102v1
802	NITRO: LLM Inference on Intel Laptop NPUs	Anthony Fei, Mohamed S. Abdelfattah	2024-12-15	arXiv	https://github.com/abdelfattah-lab/nitro	http://arxiv.org/abs/2412.11053v1
803	Learning to Verify Summary Facts with Fine-Grained LLM Feedback	Jihwan Oh, Jeonghwan Choi, Nicole Hee-Yeon Kim, Taewon Yun, Hwanjun Song	2024-12-14	arXiv	https://github.com/DISL-Lab/FineSumFact	http://arxiv.org/abs/2412.10689v1
804	B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens	Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu	2024-12-13	arXiv	https://github.com/zhuqiangLu/B-VLLM	http://arxiv.org/abs/2412.09919v1
805	Can LLMs Convert Graphs to Text-Attributed Graphs?	Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye	2024-12-13	arXiv	https://github.com/Zehong-Wang/TANS	http://arxiv.org/abs/2412.10136v1
806	ChainStream: An LLM-based Framework for Unified Synthetic Sensing	Jiacheng Liu, Yuanchun Li, Liangyan Li, Yi Sun, Hao Wen, Xiangyu Li, Yao Guo, Yunxin Liu	2024-12-13	arXiv	https://github.com/MobileLLM/ChainStream	http://arxiv.org/abs/2412.15240v1
807	CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models	Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou	2024-12-13	arXiv	https://funaudiollm.github.io/cosyvoice2	http://arxiv.org/abs/2412.10117v3
808	Can Modern LLMs Act as Agent Cores in Radiology Environments?	Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie	2024-12-12	arXiv	https://github.com/MAGIC-AI4Med/RadABench	http://arxiv.org/abs/2412.09529v2
809	RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios	Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang	2024-12-12	arXiv	https://github.com/skyriver-2000/RuleArena	http://arxiv.org/abs/2412.08972v1
810	What Makes Cryptic Crosswords Challenging for LLMs?	Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar	2024-12-12	COLING 2025	https://github.com/bodasadallah/decrypting-crosswords	http://arxiv.org/abs/2412.09012v1
811	Autoformalizing and Simulating Game-Theoretic Scenarios using LLM-augmented Agents	Agnieszka Mensfelt, Kostas Stathis, Vince Trencsenyi	2024-12-11	arXiv	https://github.com/dicelab-rhul/autoformalizing-agents	http://arxiv.org/abs/2412.08805v1
812	Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation	Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Ping Tan, Hongan Wang, Xiaoming Deng	2024-12-11	arXiv	https://multi-graspllm.github.io	http://arxiv.org/abs/2412.08468v1
813	Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation	Pedro H. V. Valois, Lincon S. Souza, Erica K. Shimomoto, Kazuhiro Fukui	2024-12-10	arXiv	https://github.com/phvv-me/frame-representation-hypothesis	http://arxiv.org/abs/2412.07334v2
814	LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation	Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh	2024-12-10	arXiv	https://github.com/interview-eval/	http://arxiv.org/abs/2412.10424v2
815	DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong	2024-12-10	arXiv	https://jianzongwu.github.io/projects/diffsensei/	http://arxiv.org/abs/2412.07589v1
816	IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model	Weizhen Bian, Siyan Liu, Yubo Zhou, Dezhi Chen, Yijie Liao, Zhenzhen Fan, Aobo Wang	2024-12-10	KSEM	https://github.com/LuckyBian/ISY5001	https://doi.org/10.1007/978-981-97-5489-2_24
817	PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Qian Zhang, Panfeng Chen, Jiali Li, Linkun Feng, Shuyu Liu, Heng Zhao, Mei Chen, Hui Li, Yanhao Wang	2024-12-09	arXiv	https://github.com/ACMISLab/PediaBench	http://arxiv.org/abs/2412.06287v2
818	Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study	Ehsan Shareghi, Jiuzhou Han, Paul Burgess	2024-12-09	arXiv	https://auslawbench.github.io	http://arxiv.org/abs/2412.06272v1
819	Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models	Xiao Xu, Tianhao Niu, Yuxi Xie, Libo Qin, Wanxiang Che, Min-Yen Kan	2024-12-08	arXiv	https://github.com/LooperXX/MMGiC	http://arxiv.org/abs/2412.05939v1
820	LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods	Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, Yiqun Liu	2024-12-07	arXiv	https://github.com/CSHaitao/Awesome-LLMs-as-Judges	http://arxiv.org/abs/2412.05579v2
821	Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning	Michael Hersche, Giacomo Camposampiero, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi	2024-12-07	arXiv	https://github.com/IBM/raven-large-language-models	http://arxiv.org/abs/2412.05586v1
822	Training-Free Bayesianization for Low-Rank Adapters of Large Language Models	Haizhou Shi, Yibin Wang, Ligong Han, Huan Zhang, Hao Wang	2024-12-07	arXiv	https://github.com/Wang-ML-Lab/bayesian-peft	http://arxiv.org/abs/2412.05723v1
823	EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Lu Qiu, Yuying Ge, Yi Chen, Yixiao Ge, Ying Shan, Xihui Liu	2024-12-05	arXiv	https://qiulu66.github.io/egoplanbench2/	http://arxiv.org/abs/2412.04447v1
824	Reinforcement Learning Enhanced LLMs: A Survey	Shuhe Wang, Shengyu Zhang, Jie Zhang, Runyi Hu, Xiaoya Li, Tianwei Zhang, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy	2024-12-05	arXiv	https://github.com/ShuheWang1998/Reinforcement-Learning-Enhanced-LLMs-A-Survey	http://arxiv.org/abs/2412.10400v2
825	LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents	Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen	2024-12-05	arXiv	https://github.com/lbc12345/LossAgent	http://arxiv.org/abs/2412.04090v1
826	AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning	Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang	2024-12-04	arXiv	https://github.com/LaVi-Lab/AIM	http://arxiv.org/abs/2412.03248v1
827	Alignment at Pre-training! Towards Native Alignment for Arabic LLMs	Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu	2024-12-04	arXiv	https://github.com/FreedomIntelligence/AceGPT-v2	http://arxiv.org/abs/2412.03253v1
828	Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media	Kun Li, Chenwei Dai, Wei Zhou, Songlin Hu	2024-12-04	arXiv	https://github.com/linkseed18612254945/FineRob	http://arxiv.org/abs/2412.03148v1
829	From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents	Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, Zhongyu Wei	2024-12-04	arXiv	https://github.com/FudanDISC/SocialAgent	http://arxiv.org/abs/2412.03563v1
830	Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning	Long Mai, Julie Carson-Berndsen	2024-12-04	arXiv	https://github.com/mailong25/peft_diversity	http://arxiv.org/abs/2412.03343v1
831	VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Chaoyu Li, Eun Woo Im, Pooyan Fazli	2024-12-04	arXiv	https://vid-halluc.github.io/	http://arxiv.org/abs/2412.03735v1
832	Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code	Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov	2024-12-03	arXiv	https://github.com/JetBrains-Research/PandasPlotBench	http://arxiv.org/abs/2412.02764v1
833	Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design	Md Omar Faruque, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy	2024-12-03	arXiv	https://github.com/HSTRG1/GHOSTbenchmarks	http://arxiv.org/abs/2412.02816v1
834	CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese Novels	Lingxiao Wei, He Yan, Xiangju Lu, Junmin Zhu, Jun Wang, Wei Zhang	2024-12-03	arXiv	https://github.com/CxsGhost/CNNSum	http://arxiv.org/abs/2412.02819v4
835	AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?	Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue	2024-12-03	arXiv	https://av-odyssey.github.io/	http://arxiv.org/abs/2412.02611v1
836	DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline	Wenhao Sun, Sai Hou, Zixuan Wang, Bo Yu, Shaoshan Liu, Xu Yang, Shuai Liang, Yiming Gan, Yinhe Han	2024-12-02	arXiv	https://rlc-lab.github.io/dadu-e/	http://arxiv.org/abs/2412.01663v1
837	DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation	Jingyang Xiang, Sai Qian Zhang	2024-12-01	arXiv	https://github.com/JingyangXiang/DFRot	http://arxiv.org/abs/2412.00648v2
838	GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models	Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu	2024-12	CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security	https://github.com/kstanghere/GenderCARE-ccs24	https://dl.acm.org/doi/10.1145/3658644.3670284
839	Mitigating Entity-Level Hallucination in Large Language Models	Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu	2024-12	SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region	https://github.com/oneal2000/EntityHallucination	https://dl.acm.org/doi/10.1145/3673791.3698403
840	Optimization-based Prompt Injection Attack to LLM-as-a-Judge	Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong	2024-12	CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security	https://github.com/ShiJiawenwen/JudgeDeceiver	https://dl.acm.org/doi/10.1145/3658644.3690291
841	PLeak: Prompt Leaking Attacks against Large Language Model Applications	Bo Hui, Haolin Yuan, Neil Zhenqiang Gong, Philippe Burlina, Yinzhi Cao	2024-12	CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security	https://github.com/BHui97/PLeak	https://dl.acm.org/doi/10.1145/3658644.3670370
842	AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models	Yutong Zhou, Masahiro Ryo	2024-11-30	arXiv	https://github.com/Yutong-Zhou-cv/AgriBench	http://arxiv.org/abs/2412.00465v2
843	Node Importance Estimation Leveraging LLMs for Semantic Augmentation in Knowledge Graphs	Xinyu Lin, Tianyu Zhang, Chengbin Hou, Jinbao Wang, Jianye Xue, Hairong Lv	2024-11-30	arXiv	https://github.com/XinyuLin-FZ/LENIE	http://arxiv.org/abs/2412.00478v1
844	DroidCall: A Dataset for LLM-powered Android Intent Invocation	Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu	2024-11-30	arXiv	https://github.com/UbiquitousLearning/DroidCall	http://arxiv.org/abs/2412.00402v1
845	Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings	Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji	2024-11-29	arXiv	https://github.com/DoubtedSteam/DyVTE	http://arxiv.org/abs/2411.19628v1
846	Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models	Tian Yu, Shaolei Zhang, Yang Feng	2024-11-29	arXiv	https://github.com/ictnlp/Auto-RAG	http://arxiv.org/abs/2411.19443v1
847	Ensemble Watermarks for Large Language Models	Georg Niess, Roman Kern	2024-11-29	arXiv	http://github.com/CommodoreEU/master-generation	http://arxiv.org/abs/2411.19563v1
848	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin, Chaoyou Fu, Sirui Zhao, Yunhang Shen, Chunjiang Ge, Yan Yang, Zuwei Long, Yuhan Dai, Tong Xu, Xing Sun, Ran He, Caifeng Shan, Enhong Chen	2024-11-29	arXiv	https://github.com/xjtupanda/T2Vid	http://arxiv.org/abs/2411.19951v2
849	TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension	Zipeng Qiu, You Peng, Guangxin He, Binhang Yuan, Chen Wang	2024-11-29	arXiv	https://github.com/Relaxed-System-Lab/TQA-Bench	http://arxiv.org/abs/2411.19504v1
850	Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures	Yicheng Zhang, Zhen Qin, Zhaomin Wu, Shuiguang Deng	2024-11-28	arXiv	https://github.com/zyc140345/FedAMoLE	http://arxiv.org/abs/2411.19128v1
851	TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability	Shimin Chen, Xiaohan Lan, Yitian Yuan, Zequn Jie, Lin Ma	2024-11-27	arXiv	https://github.com/TimeMarker-LLM/TimeMarker/	http://arxiv.org/abs/2411.18211v1
852	ChatRex: Taming Multimodal LLM for Joint Perception and Understanding	Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang	2024-11-27	arXiv	https://github.com/IDEA-Research/ChatRex	http://arxiv.org/abs/2411.18363v2
853	Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models	Jingming Liu, Yumeng Li, Boyuan Xiao, Yichang Jian, Ziang Qin, Tianjia Shao, Yao-Xiang Ding, Kun Zhou	2024-11-27	arXiv	https://future-item.github.io/autoimagine-site	http://arxiv.org/abs/2411.18142v1
854	Can LLMs be Good Graph Judger for Knowledge Graph Construction?	Haoyu Huang, Chong Chen, Conghui He, Yang Li, Jiawei Jiang, Wentao Zhang	2024-11-26	arXiv	https://github.com/hhy-huang/GraphJudger	http://arxiv.org/abs/2411.17388v1
855	Leveraging Large Language Models and Topic Modeling for Toxicity Classification	Haniyeh Ehsani Oskouie, Christina Chance, Claire Huang, Margaret Capetz, Elizabeth Eyeson, Majid Sarrafzadeh	2024-11-26	arXiv	https://github.com/aheldis/Toxicity-Classification	http://arxiv.org/abs/2411.17876v1
856	Star Attention: Efficient LLM Inference over Long Sequences	Shantanu Acharya, Fei Jia, Boris Ginsburg	2024-11-26	arXiv	https://github.com/NVIDIA/Star-Attention	http://arxiv.org/abs/2411.17116v1
857	BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment	Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng	2024-11-25	arXiv	https://github.com/ictnlp/BayLing	https://doi.org/10.48550/arXiv.2411.16300
858	Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering	Federico Cocchi, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara	2024-11-25	arXiv	https://github.com/aimagelab/ReflectiVA	http://arxiv.org/abs/2411.16863v1
859	CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity	Zhengmin Yu, Jiutian Zeng, Siyi Chen, Wenhan Xu, Dandan Xu, Xiangyu Liu, Zonghao Ying, Nan Wang, Yuan Zhang, Min Yang	2024-11-25	arXiv	https://github.com/CS-EVAL/CS-Eval	http://arxiv.org/abs/2411.16239v2
860	Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models	Ronghuan Wu, Wanchao Su, Jing Liao	2024-11-25	arXiv	https://chat2svg.github.io/	http://arxiv.org/abs/2411.16602v1
861	Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision	Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang	2024-11-25	arXiv	https://mathcritique.github.io/	http://arxiv.org/abs/2411.16579v1
862	From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge	Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu	2024-11-25	arXiv	https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge	http://arxiv.org/abs/2411.16594v4
863	VidHal: Benchmarking Temporal Hallucinations in Vision LLMs	Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli	2024-11-25	arXiv	https://github.com/Lookuz/VidHal	http://arxiv.org/abs/2411.16771v1
864	ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration	Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin	2024-11-25	arXiv	https://github.com/om-ai-lab/ZoomEye	http://arxiv.org/abs/2411.16044v1
865	Multi-label Sequential Sentence Classification via Large Language Model	Mengfei Lan, Lecheng Zheng, Shufan Ming, Halil Kilicoglu	2024-11-23	EMNLP	https://github.com/ScienceNLP-Lab/LLM-SSC	https://aclanthology.org/2024.findings-emnlp.944
866	ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain	Haochen Zhao, Xiangru Tang, Ziran Yang, Xiao Han, Xuanzhi Feng, Yueqing Fan, Senhao Cheng, Di Jin, Yilun Zhao, Arman Cohan, Mark Gerstein	2024-11-23	arXiv	https://github.com/HaochenZhao/SafeAgent4Chem	http://arxiv.org/abs/2411.16736v1
867	Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai	Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat	2024-11-23	arXiv	https://github.com/parinzee/seed-free-synthetic-instruct	http://arxiv.org/abs/2411.15484v1
868	MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun, Ziwei Liu, Liang Wang, Caifeng Shan, Ran He	2024-11-22	arXiv	https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Benchmarks	http://arxiv.org/abs/2411.15296v2
869	DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization	Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu	2024-11-21	arXiv	https://github.com/hexuandeng/DRPruning	http://arxiv.org/abs/2411.14055v1
870	UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages	Bethel Melesse Tessema, Akhil Kedia, Tae-Sun Chung	2024-11-21	arXiv	https://github.com/bethelmelesse/unifiedcrawl	http://arxiv.org/abs/2411.14343v1
871	SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model	Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama	2024-11-21	arXiv	https://github.com/aitomatic/semikong	http://arxiv.org/abs/2411.13802v2
872	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang	2024-11-20	arXiv	https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning	http://arxiv.org/abs/2411.13504v2
873	DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving	Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen	2024-11-20	arXiv	https://github.com/XiandaGuo/Drive-MLLM	http://arxiv.org/abs/2411.13112v2
874	On the Consistency of Video Large Language Models in Temporal Comprehension	Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao	2024-11-20	arXiv	https://github.com/minjoong507/Consistency-of-Video-LLM	http://arxiv.org/abs/2411.12951v1
875	Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods	Jai Doshi, Asa Cooper Stickland	2024-11-18	arXiv	https://github.com/JaiDoshi/Knowledge-Erasure	http://arxiv.org/abs/2411.12103v2
876	FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training	Anjia Cao, Xing Wei, Zhiheng Ma	2024-11-18	arXiv	https://github.com/MIV-XJTU/FLAME	http://arxiv.org/abs/2411.11927v2
877	BianCang: A Traditional Chinese Medicine Large Language Model	Sibo Wei, Xueping Peng, Yi-fei Wang, Jiasheng Si, Weiyu Zhang, Wenpeng Lu, Xiaoming Wu, Yinglong Wang	2024-11-17	arXiv	https://github.com/QLU-NLP/BianCang	http://arxiv.org/abs/2411.11027v1
878	Multilingual Large Language Models: A Systematic Survey	Shaolin Zhu, Supryadi, Shaoyang Xu, Haoran Sun, Leiyu Pan, Menglong Cui, Jiangcun Du, Renren Jin, António Branco, Deyi Xiong	2024-11-17	arXiv	https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers	http://arxiv.org/abs/2411.11072v2
879	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models	Tingyu Qu, Mingxiao Li, Tinne Tuytelaars, Marie-Francine Moens	2024-11-17	arXiv	https://github.com/tingyu215/TS-LLaVA	http://arxiv.org/abs/2411.11066v1
880	Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering	Zeping Yu, Sophia Ananiadou	2024-11-17	arXiv	https://github.com/zepingyu0512/llava-mechanism	http://arxiv.org/abs/2411.10950v1
881	Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang	2024-11-16	arXiv	https://github.com/liuting20/MustDrop	http://arxiv.org/abs/2411.10803v1
882	Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits	Yuxuan Huang	2024-11-15	arXiv	https://github.com/Aipura/Orca	http://arxiv.org/abs/2411.10006v1
883	Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination	Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun	2024-11-15	arXiv	https://github.com/Terry-Xu-666/visual_inference_chain	http://arxiv.org/abs/2411.12591v1
884	Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash	Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Preni Golazizian, Jesse Thomason, Morteza Dehghani	2024-11-15	arXiv	https://github.com/ParsaHejabi/Simulation-Framework-for-Multi-Agent-Balderdash	http://arxiv.org/abs/2411.10422v1
885	Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era	Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen	2024-11-15	arXiv	https://github.com/tamlhp/awesome-instruction-editing	http://arxiv.org/abs/2411.09955v2
886	MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs	Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing Zhao	2024-11-14	arXiv	https://github.com/joenahm/MM-Eval	http://arxiv.org/abs/2411.09492v1
887	LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Zhenshi Li, Dilxat Muhtar, Feng Gu, Xueliang Zhang, Pengfeng Xiao, Guangjun He, Xiaoxiang Zhu	2024-11-14	arXiv	https://github.com/NJU-LHRS/LHRS-Bot	https://doi.org/10.48550/arXiv.2411.09301
888	DROJ: A Prompt-Driven Attack against Large Language Models	Leyang Hu, Boran Wang	2024-11-14	arXiv	https://github.com/Leon-Leyang/LLM-Safeguard	http://arxiv.org/abs/2411.09125v1
889	DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models	Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama	2024-11-13	arXiv	https://wyd0817.github.io/project-dart-llm/	http://arxiv.org/abs/2411.09022v1
890	CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design	Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li	2024-11-13	arXiv	https://github.com/AutoBench/CorrectBench	http://arxiv.org/abs/2411.08510v1
891	Large Language Models Can Self-Improve in Long-context Reasoning	Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam	2024-11-12	arXiv	https://github.com/SihengLi99/SEALONG	http://arxiv.org/abs/2411.08147v1
892	Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models	Yusen Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang	2024-11-12	arXiv	https://github.com/psunlpgroup/VerbosityLLM	http://arxiv.org/abs/2411.07858v2
893	ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?	Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Danielle Bitterman, Fei Wang, Kai Shu	2024-11-10	arXiv	https://clinicalbench.github.io	http://arxiv.org/abs/2411.06469v1
894	Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models	Xiaojun Wu, Junxi Liu, Huanyi Su, Zhouchi Lin, Yiyan Qi, Chengjin Xu, Jiajun Su, Jiajie Zhong, Fuwei Wang, Saizhuo Wang, Fengrui Hua, Jia Li, Jian Guo	2024-11-09	arXiv	https://github.com/IDEA-FinAI/Golden-Touchstone	http://arxiv.org/abs/2411.06272v1
895	TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering	Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen	2024-11-09	arXiv	https://github.com/tsynbio/Toursynbio-Search	http://arxiv.org/abs/2411.06024v1
896	Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation	Dong Shu, Bingbing Duan, Kai Guo, Kaixiong Zhou, Jiliang Tang, Mengnan Du	2024-11-08	arXiv	https://github.com/Tizzzzy/LLM-GDM-alignment	http://arxiv.org/abs/2411.05316v1
897	Game-theoretic LLM: Agent Workflow for Negotiation Games	Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, Yongfeng Zhang	2024-11-08	arXiv	https://github.com/Wenyueh/game_theory	http://arxiv.org/abs/2411.05990v2
898	WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models	Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun	2024-11-08	arXiv	https://github.com/OpenBMB/WorkflowLLM	http://arxiv.org/abs/2411.05451v1
899	FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs?	Eric Wu, Kevin Wu, James Zou	2024-11-07	arXiv	https://github.com/kevinwu23/StanfordFineTuneBench	http://arxiv.org/abs/2411.05059v2
900	Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model	Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Ho-Jin Choi	2024-11-07	arXiv	https://github.com/passing2961/Thanos	http://arxiv.org/abs/2411.04496v1
901	Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation	Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Natraj Raman, Sriram Gopalakrishnan, Tanmoy Chakraborty	2024-11-07	arXiv	https://github.com/LCS2-IIITD/MonteCLoRA	http://arxiv.org/abs/2411.04358v2
902	AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering	Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen	2024-11-07	arXiv	https://github.com/tsynbio/AutoPE	http://arxiv.org/abs/2411.04440v1
903	Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities	Shengzhi Li, Kittipat Kampa, Rongyu Lin, Bohang Li, Shichao Pei	2024-11-07	arXiv	https://github.com/findalexli/Abstract2Appendix	http://arxiv.org/abs/2411.05232v1
904	QUILL: Quotation Generation Enhancement of Large Language Models	Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao	2024-11-06	arXiv	https://github.com/GraceXiaoo/QUILL	http://arxiv.org/abs/2411.03675v1
905	Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy	Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu	2024-11-05	arXiv	https://github.com/RazvanDu/DynamicSlicing	http://arxiv.org/abs/2411.03513v1
906	Leveraging Large Language Models in Code Question Answering: Baselines and Issues	Georgy Andryushchenko, Vladimir Ivanov, Vladimir Makharev, Elizaveta Tukhtina, Aidar Valeev	2024-11-05	arXiv	https://github.com/IU-AES-AI4Code/CodeQuestionAnswering	http://arxiv.org/abs/2411.03012v1
907	SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents	Dawei Li, Zhen Tan, Peijia Qian, Yifan Li, Kumar Satvik Chaudhary, Lijie Hu, Jiayi Shen	2024-11-05	arXiv	https://github.com/David-Li0406/SMoA	http://arxiv.org/abs/2411.03284v1
908	Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment	Jason Vega, Junsheng Huang, Gaokai Zhang, Hangoo Kang, Minjia Zhang, Gagandeep Singh	2024-11-05	arXiv	https://github.com/uiuc-focal-lab/stochastic-monkeys/	http://arxiv.org/abs/2411.02785v2
909	Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task	Hoonick Lee, Mogan Gim, Donghyeon Park, Donghee Choi, Jaewoo Kang	2024-11-04	arXiv	http://github.com/dmis-lab/CulinaryASH	http://arxiv.org/abs/2411.01996v1
910	Eurekaverse: Environment Curriculum Generation via Large Language Models	William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma	2024-11-04	arXiv	https://eureka-research.github.io/eurekaverse	http://arxiv.org/abs/2411.01775v1
911	SQL Injection Jailbreak: a structural disaster of large language models	Jiawei Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu	2024-11-03	arXiv	https://github.com/weiyezhimeng/SQL-Injection-Jailbreak	http://arxiv.org/abs/2411.01565v3
912	Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis	Shijia Liao, Yuxuan Wang, Tianyu Li, Yifan Cheng, Ruoyi Zhang, Rongzhi Zhou, Yijin Xing	2024-11-02	arXiv	https://github.com/fishaudio/fish-speech	http://arxiv.org/abs/2411.01156v2
913	Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection	Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das	2024-11-02	arXiv	https://github.com/apple-yinhan/Noise-robust-SED	http://arxiv.org/abs/2411.01174v1
914	TODO: Enhancing LLM Alignment with Ternary Preferences	Yuxiang Guo, Lu Yin, Bo Jiang, Jiaqi Zhang	2024-11-02	arXiv	https://github.com/XXares/TODO	http://arxiv.org/abs/2411.02442v1
915	LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham	2024-11-01	arXiv	https://fsoft-aic.github.io/fsoft-LibMoE.github.io	http://arxiv.org/abs/2411.00918v1
916	Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling	Yiwen Ding, Zhiheng Xi, Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang	2024-11-01	arXiv	https://github.com/Yiwen-Ding/Guided-Self-Improvement	http://arxiv.org/abs/2411.00750v1
917	SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models	Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen	2024-11-01	arXiv	https://jayzhang42.github.io/sled_page/	http://arxiv.org/abs/2411.02433v2
918	Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM	Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma	2024-11-01	arXiv	https://freeze-omni.github.io/	http://arxiv.org/abs/2411.00774v5
919	Beyond Utility: Evaluating LLM as Recommender	Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang	2024-11-01	arXiv	https://github.com/JiangDeccc/EvaLLMasRecommender	http://arxiv.org/abs/2411.00331v1
920	MoD: A Distribution-Based Approach for Merging Large Language Models	Quy-Anh Dang, Chris Ngo	2024-11-01	arXiv	https://github.com/knovel-eng/mod	http://arxiv.org/abs/2411.00406v1
921	EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting	Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reddy Bommu, Yang Katie Zhao, Yingyan Celine Lin	2024-11	DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference	https://github.com/GATECH-EIC/Edge-LLM	https://dl.acm.org/doi/10.1145/3649329.3658473
922	Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning	Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman	2024-11	SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis	https://github.com/PoSeiDon-Workflows/LLM_AD	https://dl.acm.org/doi/10.1109/SC41406.2024.00098
923	Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging	Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang	2024-11	LAMPS '24: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis	https://github.com/ThuCCSLab/MergeGuard	https://dl.acm.org/doi/10.1145/3689217.3690614
924	BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments	Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu	2024-10-31	arXiv	https://github.com/xinghaow99/BitStack	http://arxiv.org/abs/2410.23918v1
925	DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios	Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao	2024-10-31	arXiv	https://github.com/NLP2CT/DetectRL	http://arxiv.org/abs/2410.23746v1
926	End-to-End Ontology Learning with Large Language Models	Andy Lo, Albert Q. Jiang, Wenda Li, Mateja Jamnik	2024-10-31	arXiv	https://github.com/andylolu2/ollm	http://arxiv.org/abs/2410.23584v1
927	LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction	Andre Niyongabo Rubungo, Kangming Li, Jason Hattrick-Simpers, Adji Bousso Dieng	2024-10-31	arXiv	https://github.com/vertaix/LLM4Mat-Bench	http://arxiv.org/abs/2411.00177v3
928	LLaMo: Large Language Model-based Molecular Graph Assistant	Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim	2024-10-31	arXiv	https://github.com/mlvlab/LLaMo	http://arxiv.org/abs/2411.00871v1
929	What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective	Ming Li, Yanhong Li, Tianyi Zhou	2024-10-31	arXiv	https://github.com/MingLiiii/Layer_Gradient	http://arxiv.org/abs/2410.23743v1
930	On Memorization of Large Language Models in Logical Reasoning	Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar	2024-10-30	arXiv	https://memkklogic.github.io	http://arxiv.org/abs/2410.23123v1
931	ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning	Millennium Bismay, Xiangjue Dong, James Caverlee	2024-10-30	arXiv	https://github.com/millenniumbismay/reasoningrec	http://arxiv.org/abs/2410.23180v1
932	Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback	Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos	2024-10-30	arXiv	https://github.com/facebookresearch/oni	http://arxiv.org/abs/2410.23022v2
933	SciPIP: An LLM-based Scientific Paper Idea Proposer	Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye	2024-10-30	arXiv	https://github.com/cheerss/SciPIP	http://arxiv.org/abs/2410.23166v1
934	Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning	Dong Shu, Mengnan Du	2024-10-30	arXiv	https://github.com/Tizzzzy/Demonstration_Selection_Overview	http://arxiv.org/abs/2410.23099v1
935	BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Junqi Zhao, Zhijin Fang, Shu Li, Shaohui Yang, Shichao He	2024-10-30	arXiv	https://github.com/JunqiZhao888/buzz-llm	http://arxiv.org/abs/2410.23079v1
936	Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning	Keqin Bao, Ming Yan, Yang Zhang, Jizhi Zhang, Wenjie Wang, Fuli Feng, Xiangnan He	2024-10-30	arXiv	https://github.com/ym689/rec_icl	http://arxiv.org/abs/2410.23136v1
937	Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation	Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua	2024-10-30	arXiv	https://github.com/itsmeyjt/CFT	http://arxiv.org/abs/2410.22809v1
938	Distinguishing Ignorance from Error in LLM Hallucinations	Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov	2024-10-29	arXiv	https://github.com/technion-cs-nlp/hallucination-mitigation	http://arxiv.org/abs/2410.22071v1
939	Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach	Qingchuan Li, Jiatong Li, Tongxuan Liu, Yuting Zeng, Mingyue Cheng, Weizhe Huang, Qi Liu	2024-10-29	arXiv	https://github.com/wufeiwuwoshihua/nshy	http://arxiv.org/abs/2410.21779v1
940	Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance	Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho	2024-10-29	arXiv	https://github.com/krafton-ai/Rare2Frequent	http://arxiv.org/abs/2410.22376v1
941	Scaling LLM Inference with Optimized Sample Compute Allocation	Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li	2024-10-29	arXiv	https://github.com/LeiLiLab/OSCA	http://arxiv.org/abs/2410.22480v1
942	Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks	Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese	2024-10-28	arXiv	https://github.com/pasquini-dario/project_mantis	http://arxiv.org/abs/2410.20911v2
943	LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment	Ge Yang, Changyi He, Jinyang Guo, Jianyu Wu, Yifu Ding, Aishan Liu, Haotong Qin, Pengliang Ji, Xianglong Liu	2024-10-28	arXiv	https://github.com/AboveParadise/LLMCBench	http://arxiv.org/abs/2410.21352v2
944	NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates	Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu	2024-10-28	arXiv	https://github.com/hexuandeng/NewTerm	http://arxiv.org/abs/2410.20814v1
945	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference	Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen	2024-10-28	arXiv	https://github.com/bytedance/ShadowKV	http://arxiv.org/abs/2410.21465v1
946	Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models	Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin	2024-10-28	arXiv	https://github.com/KL4805/ShoppingMMLU	http://arxiv.org/abs/2410.20745v2
947	SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization	Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister	2024-10-28	arXiv	https://mengzibin.github.io/SocialGPT.github.io/	http://arxiv.org/abs/2410.21411v1
948	Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye	Yirong Sun, Dawei Zhu, Yanjun Chen, Erjia Xiao, Xinghao Chen, Xiaoyu Shen	2024-10-28	arXiv	https://github.com/EIT-NLP/BLEUless_DocMT	http://arxiv.org/abs/2410.20941v2
949	Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data	Xinhong Xie, Tao Li, Quanyan Zhu	2024-10-27	arXiv	https://github.com/XXXinhong/Detoxification_LLM	http://arxiv.org/abs/2410.20298v1
950	Enhancing Inflation Nowcasting with LLM: Sentiment Analysis on News	Marc-Antoine Allard, Paul Teiletche, Adam Zinebi	2024-10-26	arXiv	https://github.com/paultltc/InflaBERT	http://arxiv.org/abs/2410.20198v1
951	LLMs Can Evolve Continually on Modality for X-Modal Reasoning	Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen	2024-10-26	arXiv	https://github.com/JiazuoYu/PathWeave	http://arxiv.org/abs/2410.20178v2
952	Language Agents Meet Causality -- Bridging LLMs and Causal World Models	John Gkountouras, Matthias Lindemann, Phillip Lippe, Efstratios Gavves, Ivan Titov	2024-10-25	arXiv	https://j0hngou.github.io/LLMCWM/	http://arxiv.org/abs/2410.19923v1
953	APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs	Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury	2024-10-25	arXiv	https://portal-cornell.github.io/apricot/	http://arxiv.org/abs/2410.19656v1
954	Delving into the Reversal Curse: How Far Can Large Language Models Generalize?	Zhengkai Lin, Zhihang Fu, Kai Liu, Liang Xie, Binbin Lin, Wenxiao Wang, Deng Cai, Yue Wu, Jieping Ye	2024-10-24	arXiv	https://github.com/alibaba/thinking_bias	http://arxiv.org/abs/2410.18808v2
955	GCoder: Improving Large Language Model for Generalized Graph Problem Solving	Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, Jia Li	2024-10-24	arXiv	https://github.com/Bklight999/WWW25-GCoder/tree/master	http://arxiv.org/abs/2410.19084v1
956	Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design	Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang	2024-10-24	arXiv	https://github.com/VITA-Group/READ-ME	http://arxiv.org/abs/2410.19123v1
957	Distill Visual Chart Reasoning Ability from LLMs to MLLMs	Wei He, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang	2024-10-24	arXiv	https://github.com/hewei2001/ReachQA	http://arxiv.org/abs/2410.18798v1
958	CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation	Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen	2024-10-23	arXiv	https://wangqinsi1.github.io/coreinfer_page/	http://arxiv.org/abs/2410.18311v1
959	Cross-model Control: Improving Multiple Large Language Models in One-time Training	Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao	2024-10-23	arXiv	https://github.com/wujwyi/CMC	http://arxiv.org/abs/2410.17599v1
960	ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage	Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang	2024-10-22	arXiv	https://github.com/dmis-lab/ETHIC	http://arxiv.org/abs/2410.16848v1
961	VoiceBench: Benchmarking LLM-Based Voice Assistants	Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li	2024-10-22	arXiv	https://github.com/MatthewCYM/VoiceBench	http://arxiv.org/abs/2410.17196v3
962	Improving Causal Reasoning in Large Language Models: A Survey	Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Qingzhen Liu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan	2024-10-22	arXiv	https://github.com/chendl02/Awesome-LLM-causal-reasoning	http://arxiv.org/abs/2410.16676v3
963	Automated Spinal MRI Labelling from Reports Using a Large Language Model	Robin Y. Park, Rhydian Windsor, Amir Jamaludin, Andrew Zisserman	2024-10-22	MICCAI	https://github.com/robinyjpark/AutoLabelClassifier	https://doi.org/10.1007/978-3-031-72086-4_10
964	DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models	Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao	2024-10-22	arXiv	https://github.com/ChnQ/DEAN	http://arxiv.org/abs/2410.16672v1
965	AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration	Bradley McDanel	2024-10-22	arXiv	https://github.com/BradMcDanel/AMUSD/	http://arxiv.org/abs/2410.17375v1
966	CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing	Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou	2024-10-22	arXiv	https://github.com/uclaml/COPS	http://arxiv.org/abs/2410.16670v1
967	Boosting Jailbreak Transferability for Large Language Models	Hanqing Liu, Lifeng Zhou, Huanqian Yan	2024-10-21	arXiv	https://github.com/HqingLiu/SI-GCG	http://arxiv.org/abs/2410.15645v2
968	Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report	Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, Pekka Abrahamsson	2024-10-21	arXiv	https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs	http://arxiv.org/abs/2410.15944v1
969	LLaVA-KD: A Framework of Distilling Multimodal Large Language Models	Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai	2024-10-21	arXiv	https://github.com/Fantasyele/LLaVA-KD	http://arxiv.org/abs/2410.16236v2
970	MagicPIG: LSH Sampling for Efficient LLM Generation	Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen	2024-10-21	arXiv	https://github.com/Infini-AI-Lab/MagicPIG	http://arxiv.org/abs/2410.16179v4
971	Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs	Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma	2024-10-21	arXiv	https://github.com/soacker/Mesa-Extrapolation	http://arxiv.org/abs/2410.15859v3
972	RAC: Efficient LLM Factuality Correction with Retrieval Augmentation	Changmao Li, Jeffrey Flanigan	2024-10-21	arXiv	https://github.com/jlab-nlp/Retrieval-Augmented-Correction	http://arxiv.org/abs/2410.15667v1
973	CausalGraph2LLM: Evaluating LLMs for Causal Queries	Ivaxi Sheth, Bahare Fatemi, Mario Fritz	2024-10-21	arXiv	https://github.com/ivaxi0s/CausalGraph2LLM	http://arxiv.org/abs/2410.15939v1
974	A Comprehensive Evaluation of Cognitive Biases in LLMs	Simon Malberg, Roman Poletukhin, Carolin M. Schuster, Georg Groh	2024-10-20	arXiv	https://github.com/simonmalberg/cognitive-biases-in-llms	http://arxiv.org/abs/2410.15413v1
975	Are LLMs Good Zero-Shot Fallacy Classifiers?	Fengjun Pan, Xiaobao Wu, Zongrui Li, Anh Tuan Luu	2024-10-19	arXiv	https://github.com/panFJCharlotte98/Fallacy_Detection	http://arxiv.org/abs/2410.15050v1
976	Evaluating Deep Unlearning in Large Language Models	Ruihan Wu, Chhavi Yadav, Russ Salakhutdinov, Kamalika Chaudhuri	2024-10-19	arXiv	https://github.com/wrh14/deep_unlearning	http://arxiv.org/abs/2410.15153v3
977	Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction	Yinhan He, Zaiyi Zheng, Patrick Soga, Yaozhen Zhu, yushun Dong, Jundong Li	2024-10-19	EMNLP 2024 (Findings)	https://github.com/YinhanHe123/new\_LLM4GNNExplanation	http://arxiv.org/abs/2410.15165v1
978	GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization	Zihui Wu, Haichang Gao, Ping Wang, Shudong Zhang, Zhaoxiang Liu, Shiguo Lian	2024-10-19	arXiv	https://github.com/wooozihui/GlitchMiner	http://arxiv.org/abs/2410.15052v4
979	Imprompter: Tricking LLM Agents into Improper Tool Use	Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, Earlence Fernandes	2024-10-19	arXiv	https://github.com/Reapor-Yurnero/imprompter	http://arxiv.org/abs/2410.14923v2
980	MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification	Yin Li, Liangwei Wang, Shiyuan Piao, Boo-Ho Yang, Ziyue Li, Wei Zeng, Fugee Tsung	2024-10-19	arXiv	https://github.com/MCCodeAI/MCCoder	http://arxiv.org/abs/2410.15154v1
981	SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent	Jiarui Ji, Yang Li, Hongtao Liu, Zhicheng Du, Zhewei Wei, Weiran Shen, Qi Qi, Yankai Lin	2024-10-18	arXiv	https://github.com/jijiarui-cather/SRAPAgent_Framework	http://arxiv.org/abs/2410.14152v1
982	Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation	Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen	2024-10-18	arXiv	https://github.com/ShuoTang123/MATRIX-Gen	http://arxiv.org/abs/2410.14251v1
983	CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic	Huaiyuan Yao, Longchao Da, Vishnu Nandam, Justin Turnau, Zhiwei Liu, Linsey Pang, Hua Wei	2024-10-18	arXiv	https://github.com/Hyan-Yao/CoMAL	http://arxiv.org/abs/2410.14368v1
984	Enabling Scalable Evaluation of Bias Patterns in Medical LLMs	Hamed Fayyaz, Raphael Poulain, Rahmatollah Beheshti	2024-10-18	arXiv	https://github.com/healthylaife/autofair	http://arxiv.org/abs/2410.14763v1
985	Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models	Wei Jie Yeo, Ranjan Satapathy, Erik Cambria	2024-10-18	arXiv	https://github.com/wj210/Causal-Faithfulness	https://doi.org/10.48550/arXiv.2410.14155
986	Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models	Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu	2024-10-17	EMNLP	https://github.com/yyhappier/ShortcutSuite	https://aclanthology.org/2024.emnlp-main.679
987	Data Defenses Against Large Language Models	William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, Sauvik Das	2024-10-17	arXiv	https://github.com/wagnew3/LLMDataDefenses	http://arxiv.org/abs/2410.13138v1
988	FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs	Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad	2024-10-17	arXiv	https://github.com/vectara/FaithBench	http://arxiv.org/abs/2410.13210v1
989	LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models	David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner	2024-10-17	arXiv	https://github.com/amazon-science/llm-rank-pruning	http://arxiv.org/abs/2410.13299v2
990	Retrieval-Augmented Personalization for Multimodal Large Language Models	Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue	2024-10-17	arXiv	https://github.com/Hoar012/RAP-MLLM	http://arxiv.org/abs/2410.13360v2
991	SLM-Mod: Small Language Models Surpass LLMs at Content Moderation	Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha	2024-10-17	arXiv	https://github.com/AGoyal0512/SLM-Mod	http://arxiv.org/abs/2410.13155v1
992	aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion	Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li	2024-10-17	arXiv	https://github.com/aixcoder-plugin/aiXcoder-7B	http://arxiv.org/abs/2410.13187v2
993	Hypothesis Testing the Circuit Hypothesis in LLMs	Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei	2024-10-16	arXiv	https://github.com/blei-lab/circuitry	http://arxiv.org/abs/2410.13032v1
994	Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors	Weixuan Wang, Jingyuan Yang, Wei Peng	2024-10-16	arXiv	https://github.com/weixuan-wang123/SADI	http://arxiv.org/abs/2410.12299v1
995	Self-Pluralising Culture Alignment for Large Language Models	Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong	2024-10-16	arXiv	https://github.com/shaoyangxu/CultureSPA	http://arxiv.org/abs/2410.12971v1
996	Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models	Iaroslav Chelombitko, Egor Safronov, Aleksey Komissarov	2024-10-16	arXiv	https://github.com/nup-csai/Qtok/	http://arxiv.org/abs/2410.12989v1
997	ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs	Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen	2024-10-16	arXiv	https://github.com/open-compass/ProSA	http://arxiv.org/abs/2410.12405v1
998	POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization	Batuhan K. Karaman, Ishmam Zabir, Alon Benhaim, Vishrav Chaudhary, Mert R. Sabuncu, Xia Song	2024-10-16	arXiv	https://github.com/batuhankmkaraman/POROver	http://arxiv.org/abs/2410.12999v1
999	DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs	Yingsong Luo, Ling Chen	2024-10-16	arXiv	https://github.com/LuoYingSong/DAQ	http://arxiv.org/abs/2410.12187v2
1000	Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention	Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch	2024-10-16	arXiv	https://github.com/weixuan-wang123/INCLINE	https://doi.org/10.48550/arXiv.2410.12462
1001	Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights	Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha	2024-10-16	arXiv	https://github.com/IBM/codellm-devkit	http://arxiv.org/abs/2410.13007v1
1002	Exploring Model Kinship for Merging Large Language Models	Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen	2024-10-16	arXiv	https://github.com/zjunlp/ModelKinship	https://doi.org/10.48550/arXiv.2410.12613

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Large Language Models Papers

Other Topics

Large Language Models Papers with Code

About

Releases

Packages

mtuann/llm-updated-papers

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Large Language Models Papers

Other Topics

Large Language Models Papers with Code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages