This GitHub repository contains an updated list of Federated Learning papers as of May 19, 2025.
- The resources are collected from various sources, including arXiv, NeurIPS, ICML, ICLR, ACL, EMNLP, AAAI, IJCAI, KDD, CVPR, ICCV, ECCV, NIPS, IEEE, ACM, Springer, ScienceDirect, Wiley, Nature, Science, and other top AI/ML conferences and journals.
- For a better reading experience, visit the Shinyapps website.
Explore additional research papers on the following topics:
- For Large Language Models papers, please visit the LLM Repository.
- For Backdoor Learning papers, please visit the Backdoor Learning Repository.
- For Federated Learning papers, please visit the Federated Learning Repository.
- For Machine Unlearning papers, please visit the Machine Unlearning Repository.
For contributions, inquiries, or suggestions, feel free to reach out via email.
If you find this application helpful and would like to support its development, you can buy me a coffee using one of the following methods:
- Techcombank (Vietnam): 5877 5555 55 (Nguyen Thi Lan Phuong)
- PayPal or Credit/Debit Card: https://ko-fi.com/miutheladycat
Due to GitHub repository limitations, this section includes only those papers that provide accompanying code, sorted by publish date. For access to the full list of papers, please visit the Shinyapps website.
No. | Title | Authors | Publish Date | Venue | Code | URL |
---|---|---|---|---|---|---|
1 | EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models | Bohao Xing, Xin Liu, Guoying Zhao, Chengyu Liu, Xiaolan Fu, Heikki Kälviäinen | 2025-05-16 | arXiv | https://github.com/xxtars/EmotionHallucer | http://arxiv.org/abs/2505.11405v1 |
2 | Ranked Voting based Self-Consistency of Large Language Models | Weiqin Wang, Yile Wang, Hui Huang | 2025-05-16 | arXiv | https://github.com/szu-tera/RankedVotingSC | http://arxiv.org/abs/2505.10772v1 |
3 | Unifying Segment Anything in Microscopy with Multimodal Large Language Model | Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan | 2025-05-16 | arXiv | https://github.com/ieellee/uLLSAM | http://arxiv.org/abs/2505.10769v1 |
4 | GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art | Chenkai Zhang, Yiming Lei, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang | 2025-05-16 | arXiv | https://github.com/stan-lei/GODBench-ACL2025 | http://arxiv.org/abs/2505.11436v1 |
5 | GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction | Mohammadtaha Bagherifard, Sahar Rajabi, Ali Edalat, Yadollah Yaghoobzadeh | 2025-05-16 | arXiv | https://github.com/saharsamr/Modular-LLM | http://arxiv.org/abs/2505.10939v1 |
6 | AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents | Julius Henke | 2025-05-15 | arXiv | https://github.com/JuliusHenke/autopentest | http://arxiv.org/abs/2505.10321v1 |
7 | Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M | Dario Di Palma, Felice Antonio Merra, Maurizio Sfilio, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia | 2025-05-15 | arXiv | https://github.com/sisinflab/LLM-MemoryInspector | http://arxiv.org/abs/2505.10212v1 |
8 | From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models | Yidan Wang, Yubing Ren, Yanan Cao, Binxing Fang | 2025-05-15 | arXiv | https://github.com/redwyd/SymMark | http://arxiv.org/abs/2505.09924v2 |
9 | ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts | Jing-Cheng Pang, Kaiyuan Li, Yidi Wang, Si-Hang Yang, Shengyi Jiang, Yang Yu | 2025-05-15 | arXiv | https://github.com/LAMDA-RL/ImagineBench | http://arxiv.org/abs/2505.10010v1 |
10 | PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization | Yidan Wang, Yanan Cao, Yubing Ren, Fang Fang, Zheng Lin, Binxing Fang | 2025-05-15 | arXiv | https://github.com/redwyd/PrivacyJailbreak | http://arxiv.org/abs/2505.09921v2 |
11 | LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models | Long Chen, Xiaotian Song, Yanan Sun | 2025-05-14 | arXiv | https://github.com/lc783/LAS | http://arxiv.org/abs/2505.09659v1 |
12 | Adversarial Attack on Large Language Models using Exponentiated Gradient Descent | Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu | 2025-05-14 | arXiv | https://github.com/sbamit/Exponentiated-Gradient-Descent-LLM-Attack | http://arxiv.org/abs/2505.09820v1 |
13 | CodePDE: An Inference Framework for LLM-driven PDE Solver Generation | Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar | 2025-05-13 | arXiv | https://github.com/LithiumDA/CodePDE | http://arxiv.org/abs/2505.08783v1 |
14 | Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement | Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song | 2025-05-13 | arXiv | https://github.com/valuebyte-ai/Awesome-LLM-Psychometrics | http://arxiv.org/abs/2505.08245v1 |
15 | Optimized Couplings for Watermarking Large Language Models | Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Hsiang Hsu, Haim Permuter, Flavio P. Calmon | 2025-05-13 | arXiv | https://github.com/Carol-Long/CC_Watermark | http://arxiv.org/abs/2505.08878v1 |
16 | Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era | Xixuan Hao, Yutian Jiang, Xingchen Zou, Jiabo Liu, Yifang Yin, Yuxuan Liang | 2025-05-13 | arXiv | https://github.com/CityMind-Lab/Awesome-Location-Intelligence | http://arxiv.org/abs/2505.09651v1 |
17 | HealthBench: Evaluating Large Language Models Towards Improved Human Health | Rahul K. Arora, Jason Wei, Rebecca Soskin Hicks, Preston Bowman, Joaquin Quiñonero-Candela, Foivos Tsimpourlas, Michael Sharman, Meghan Shah, Andrea Vallone, Alex Beutel, Johannes Heidecke, Karan Singhal | 2025-05-13 | arXiv | https://github.com/openai/simple-evals | http://arxiv.org/abs/2505.08775v1 |
18 | A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models | Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao Shi, Jianping Fan, Xuanjing Huang | 2025-05-12 | arXiv | https://github.com/Junjie-Ye/MulDimIF | http://arxiv.org/abs/2505.07591v1 |
19 | Are LLMs complicated ethical dilemma analyzers? | Jiashen, Du, Jesse Yao, Allen Liu, Zhekai Zhang | 2025-05-12 | arXiv | https://github.com/ALT-JS/ethicaLLM | http://arxiv.org/abs/2505.08106v1 |
20 | DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation | Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han | 2025-05-12 | arXiv | https://github.com/GasolSun36/DynamicRAG | http://arxiv.org/abs/2505.07233v2 |
21 | Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs | Yifan Wei, Xiaoyan Yu, Tengfei Pan, Angsheng Li, Li Du | 2025-05-12 | arXiv | https://github.com/weiyifan1023/senator | http://arxiv.org/abs/2505.07184v1 |
22 | MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception | Zhengye Zhang, Sirui Zhao, Shifeng Liu, Shukang Yin, Xinglong Mao, Tong Xu, Enhong Chen | 2025-05-11 | arXiv | https://github.com/zyzhangUstc/MELLM | http://arxiv.org/abs/2505.07007v1 |
23 | From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering | Gaurab Sarkar, Sougata Saha | 2025-05-11 | arXiv | https://github.com/sougata-ub/llms_for_ionic_liquids | http://arxiv.org/abs/2505.06964v1 |
24 | GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance | Jinuk Kim, Marwa El Halabi, Wonpyo Park, Clemens JS Schaefer, Deokjae Lee, Yeonhong Park, Jae W. Lee, Hyun Oh Song | 2025-05-11 | arXiv | https://github.com/snu-mllab/GuidedQuant | http://arxiv.org/abs/2505.07004v1 |
25 | POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models | Yangguang Shao, Xinjie Lin, Haozheng Luo, Chengshang Hou, Gang Xiong, Jiahao Yu, Junzheng Shi | 2025-05-10 | arXiv | https://github.com/AndyShaw01/PoisonCraft | http://arxiv.org/abs/2505.06579v1 |
26 | Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Learning | Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu | 2025-05-09 | arXiv | https://github.com/zch65458525/L2T | http://arxiv.org/abs/2505.06321v1 |
27 | HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow | You Peng, Youhe Jiang, Chen Wang, Binhang Yuan | 2025-05-08 | arXiv | https://github.com/Relaxed-System-Lab/Hexgen-Flow | http://arxiv.org/abs/2505.05286v1 |
28 | KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification | Qianbo Zang, Christophe Zgrzendek, Igor Tchappi, Afshin Khadangi, Johannes Sedlmeir | 2025-05-08 | arXiv | https://github.com/QianboZang/KG-HTC | http://arxiv.org/abs/2505.05583v1 |
29 | Prompt-Based LLMs for Position Bias-Aware Reranking in Personalized Recommendations | Md Aminul Islam, Ahmed Sayeed Faruk | 2025-05-08 | arXiv | https://github.com/aminul7506/LLMForReRanking | http://arxiv.org/abs/2505.04948v1 |
30 | Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization | Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, Jianwei Yin | 2025-05-08 | arXiv | https://github.com/colored-dye/multi_stage_influence_function | http://arxiv.org/abs/2505.05017v1 |
31 | Benchmarking LLMs' Swarm intelligence | Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun | 2025-05-07 | arXiv | https://github.com/x66ccff/swarmbench | http://arxiv.org/abs/2505.04364v1 |
32 | TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven Evolution | Zhikai Zhao, Chuanbo Hua, Federico Berto, Kanghoon Lee, Zihan Ma, Jiachen Li, Jinkyoo Park | 2025-05-07 | arXiv | https://github.com/ai4co/trajevo | http://arxiv.org/abs/2505.04480v1 |
33 | Advancing and Benchmarking Personalized Tool Invocation for LLMs | Xu Huang, Yuefeng Huang, Weiwen Liu, Xingshan Zeng, Yasheng Wang, Ruiming Tang, Hong Xie, Defu Lian | 2025-05-07 | arXiv | https://github.com/hyfshadow/PTBench | http://arxiv.org/abs/2505.04072v1 |
34 | Avoid Recommending Out-of-Domain Items: Constrained Generative Recommendation with LLMs | Hao Liao, Wensheng Lu, Jianxun Lian, Mingqi Wu, Shuo Wang, Yong Zhang, Yitian Huang, Mingyang Zhou, Xing Xie | 2025-05-06 | arXiv | https://github.com/microsoft/RecAI | http://arxiv.org/abs/2505.03336v1 |
35 | CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics | Junqi Liu, Xiaohan Lin, Jonas Bayer, Yael Dillies, Weijie Jiang, Xiaodan Liang, Roman Soletskyi, Haiming Wang, Yunzhou Xie, Beibei Xiong, Zhengfeng Yang, Jujian Zhang, Lihong Zhi, Jia Li, Zhengying Liu | 2025-05-06 | arXiv | https://github.com/MoonshotAI/CombiBench/ | http://arxiv.org/abs/2505.03171v1 |
36 | Plug-and-Play AMC: Context Is King in Training-Free, Open-Set Modulation with LLMs | Mohammad Rostami, Atik Faysal, Reihaneh Gh. Roshan, Huaxia Wang, Nikhil Muralidhar, Yu-Dong Yao | 2025-05-06 | arXiv | https://github.com/RU-SIT/context-is-king | http://arxiv.org/abs/2505.03112v1 |
37 | Automatic Calibration for Membership Inference Attack on Large Language Models | Saleh Zare Zade, Yao Qiang, Xiangyu Zhou, Hui Zhu, Mohammad Amin Roshani, Prashant Khanduri, Dongxiao Zhu | 2025-05-06 | arXiv | https://github.com/Salehzz/ACMIA | http://arxiv.org/abs/2505.03392v1 |
38 | FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, Weiyang Liu | 2025-05-05 | arXiv | https://sphere-ai-lab.github.io/FormalMATH/ | http://arxiv.org/abs/2505.02735v1 |
39 | LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis | Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng | 2025-05-05 | arXiv | https://github.com/ictnlp/LLaMA-Omni2 | http://arxiv.org/abs/2505.02625v1 |
40 | Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models | Xiaobao Wu | 2025-05-05 | arXiv | https://github.com/bobxwu/learning-from-rewards-llm-papers | http://arxiv.org/abs/2505.02686v1 |
41 | Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data | Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan | 2025-05-04 | arXiv | https://github.com/millioniron/LLM_exploration | http://arxiv.org/abs/2505.02130v1 |
42 | MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents | Zeyu Zhang, Quanyu Dai, Xu Chen, Rui Li, Zhongyang Li, Zhenhua Dong | 2025-05-04 | arXiv | https://github.com/nuster1128/MemEngine | http://arxiv.org/abs/2505.02099v1 |
43 | Amplifying Your Social Media Presence: Personalized Influential Content Generation with LLMs | Yuying Zhao, Yu Wang, Xueqi Cheng, Anne Marie Tumlin, Yunchao Liu, Damin Xia, Meng Jiang, Tyler Derr | 2025-05-03 | arXiv | https://github.com/YuyingZhao/LLM-influence-amplifier | http://arxiv.org/abs/2505.01698v1 |
44 | A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency | Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee | 2025-05-03 | arXiv | https://github.com/sihyeong/Awesome-LLM-Inference-Engine | http://arxiv.org/abs/2505.01658v1 |
45 | WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks | Jingwen Tong, Jiawei Shao, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, Jun Zhang | 2025-05-02 | arXiv | https://github.com/jwentong/WirelessAgent_R1 | https://doi.org/10.48550/arXiv.2409.07964 |
46 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton van den Hengel, Yuankai Qi, Qingming Huang | 2025-05-02 | arXiv | https://galaxycong.github.io/LLM-Flow-Dubber/ | http://arxiv.org/abs/2505.01263v1 |
47 | Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities | Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Guoxia Wang, Dianhai Yu, Yonggang Wen, Dacheng Tao | 2025-05-02 | arXiv | https://github.com/Hao840/Awesome-Low-Precision-Training | http://arxiv.org/abs/2505.01043v1 |
48 | LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection | Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou | 2025-05-01 | arXiv | https://github.com/Susan571/LENSLLM | http://arxiv.org/abs/2505.03793v1 |
49 | Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models | Bang Zhang, Ruotian Ma, Qingxuan Jiang, Peisong Wang, Jiaqi Chen, Zheng Xie, Xingyu Chen, Yue Wang, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li | 2025-05-01 | arXiv | https://github.com/Tencent/digitalhuman/tree/main/SAGE | http://arxiv.org/abs/2505.02847v2 |
50 | SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation | Quang P. M. Pham, Khoi T. N. Nguyen, Nhi H. Doan, Cuong A. Pham, Kentaro Inui, Dezhen Song | 2025-05-01 | arXiv | https://github.com/quangpham2006/SmallPlan | http://arxiv.org/abs/2505.00831v1 |
51 | A Survey on Large Language Model based Human-Agent Systems | Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, Philip S. Yu | 2025-05-01 | arXiv | https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers | http://arxiv.org/abs/2505.00753v1 |
52 | DeepCritic: Deliberate Critique with Large Language Models | Wenkai Yang, Jingwen Chen, Yankai Lin, Ji-Rong Wen | 2025-05-01 | arXiv | https://github.com/RUCBM/DeepCritic | http://arxiv.org/abs/2505.00662v1 |
53 | LLM Ethics Benchmark: A Three-Dimensional Assessment System for Evaluating Moral Reasoning in Large Language Models | Junfeng Jiao, Saleh Afroogh, Abhejay Murali, Kevin Chen, David Atkinson, Amit Dhurandhar | 2025-05-01 | arXiv | https://github.com/ | http://arxiv.org/abs/2505.00853v1 |
54 | LLM-based Interactive Imitation Learning for Robotic Manipulation | Jonas Werner, Kun Chu, Cornelius Weber, Stefan Wermter | 2025-04-30 | arXiv | https://github.com/Tubicor/LLM-iTeach | http://arxiv.org/abs/2504.21769v1 |
55 | When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator | Md Fahim Anjum | 2025-04-30 | arXiv | https://github.com/MDFahimAnjum/llm-planning-with-reasoning | http://arxiv.org/abs/2505.03786v1 |
56 | OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification | Shangyu Li, Juyong Jiang, Tiancheng Zhao, Jiasi Shen | 2025-04-29 | arXiv | https://github.com/lishangyu-hkust/OSVBench | http://arxiv.org/abs/2504.20964v1 |
57 | Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen | 2025-04-29 | arXiv | https://github.com/ypwang61/One-Shot-RLVR | http://arxiv.org/abs/2504.20571v1 |
58 | Turing Machine Evaluation for Large Language Model | Haitao Wu, Zongbo Han, Huaxi Huang, Changqing Zhang | 2025-04-29 | arXiv | https://github.com/HaitaoWuTJU/Turing-Machine-Bench | http://arxiv.org/abs/2504.20771v1 |
59 | X-Fusion: Introducing New Modality to Frozen Large Language Models | Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li | 2025-04-29 | arXiv | https://sichengmo.github.io/XFusion/ | http://arxiv.org/abs/2504.20996v1 |
60 | AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers | Zijie Lin, Yiqing Shen, Qilin Cai, He Sun, Jinrui Zhou, Mingjun Xiao | 2025-04-28 | arXiv | https://github.com/shoushouyu/Automated-Paper-to-Code | http://arxiv.org/abs/2504.20115v1 |
61 | Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies | Kavindu Warnakulasuriya, Prabhash Dissanayake, Navindu De Silva, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Surangika Ranathunga, Nisansa de Silva | 2025-04-28 | arXiv | https://coin-workshop.github.io/coine-2025-detroit/accepted_for_presentation.html | http://arxiv.org/abs/2504.19487v1 |
62 | LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects | Guangyi Liu, Pengxiang Zhao, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, Wenhao Wang, Tianze Wu, Linghao Li, Hao Wang, Guanjing Xiong, Yong Liu, Hongsheng Li | 2025-04-28 | 2025 | https://github.com/PhoneLLM/Awesome-LLM-Powered-Phone-GUI-Agents | http://arxiv.org/abs/2504.19838v1 |
63 | SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning | Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong | 2025-04-27 | arXiv | https://chen-judge.github.io/SPC/ | http://arxiv.org/abs/2504.19162v1 |
64 | Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers | Dylan Bouchard, Mohit Singh Chauhan | 2025-04-27 | arXiv | https://github.com/cvs-health/uqlm | http://arxiv.org/abs/2504.19254v2 |
65 | BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese | Peilin Zhou, Bruce Leon, Xiang Ying, Can Zhang, Yifan Shao, Qichen Ye, Dading Chong, Zhiling Jin, Chenxuan Xie, Meng Cao, Yuxin Gu, Sixin Hong, Jing Ren, Jian Chen, Chao Liu, Yining Hua | 2025-04-27 | arXiv | https://github.com/PALIN2018/BrowseComp-ZH | http://arxiv.org/abs/2504.19314v2 |
66 | Calibrating Translation Decoding with Quality Estimation on LLMs | Di Wu, Yibin Lei, Christof Monz | 2025-04-26 | arXiv | https://github.com/moore3930/calibrating-llm-mt | http://arxiv.org/abs/2504.19044v1 |
67 | Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs | Mohammad Akbar-Tajari, Mohammad Taher Pilehvar, Mohammad Mahmoody | 2025-04-26 | arXiv | https://github.com/GoAT-pydev/Graph_of_Attacks | http://arxiv.org/abs/2504.19019v1 |
68 | SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models | Nader Zantout, Haochen Zhang, Pujith Kachana, Jinkai Qiu, Ji Zhang, Wenshan Wang | 2025-04-25 | arXiv | https://github.com/nzantout/SORT3D | http://arxiv.org/abs/2504.18684v1 |
69 | DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models | Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Yingshui Tan, Yanan Wu, Jihao Gu, Yangguang Li, Jianke Zhu | 2025-04-25 | arXiv | https://github.com/Kizna1ver/DREAM | http://arxiv.org/abs/2504.18053v1 |
70 | LEAM: A Prompt-only Large Language Model-enabled Antenna Modeling Method | Tao Wu, Kexue Fu, Qiang Hua, Xinxin Liu, Muhammad Ali Imran, Bo Liu | 2025-04-25 | arXiv | https://github.com/TaoWu974/LEAM | http://arxiv.org/abs/2504.18271v1 |
71 | An Empirical Study on Prompt Compression for Large Language Models | Zheng Zhang, Jinyi Li, Yihuai Lan, Xiang Wang, Hao Wang | 2025-04-24 | arXiv | https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression | http://arxiv.org/abs/2505.00019v1 |
72 | RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning | Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li | 2025-04-24 | arXiv | https://github.com/RAGEN-AI/RAGEN | http://arxiv.org/abs/2504.20073v1 |
73 | Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs | Tiancheng Gu, Kaicheng Yang, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai, Jiankang Deng | 2025-04-24 | arXiv | https://garygutc.github.io/UniME | http://arxiv.org/abs/2504.17432v1 |
74 | Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark | Hanlei Zhang, Zhuohang Li, Yeshuang Zhu, Hua Xu, Peiwu Wang, Haige Zhu, Jie Zhou, Jinchao Zhang | 2025-04-23 | arXiv | https://github.com/thuiar/MMLA | http://arxiv.org/abs/2504.16427v2 |
75 | UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models | Yu Zheng, Longyi Liu, Yuming Lin, Jie Feng, Guozhen Zhang, Depeng Jin, Yong Li | 2025-04-23 | arXiv | https://github.com/tsinghua-fib-lab/PlanBench | http://arxiv.org/abs/2504.21027v1 |
76 | Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control | Hannah Cyberey, David Evans | 2025-04-23 | arXiv | https://github.com/hannahxchen/llm-censorship-steering | http://arxiv.org/abs/2504.17130v1 |
77 | Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution | Junjie Chen, Haitao Li, Jingli Yang, Yiqun Liu, Qingyao Ai | 2025-04-23 | arXiv | https://github.com/cjj826/GoalAct | http://arxiv.org/abs/2504.16563v1 |
78 | LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale | Joya Chen, Ziyun Zeng, Yiqi Lin, Wei Li, Zejun Ma, Mike Zheng Shou | 2025-04-22 | arXiv | https://showlab.github.io/livecc | http://arxiv.org/abs/2504.16030v1 |
79 | PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models | Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Weike Wang, Xudong Tian, Anqi Lv, Laifu Man, Jianxiang Li, Feiyu Tao, Qihua Sun, Zhou Liang, Yushu Mu, Zhongxuan Li, Jing-Jun Zhang, Shutao Zhang, Xiaotian Li, Xingqi Xia, Jiawei Lin, Zheyu Shen, Jiahang Chen, Qiuhao Xiong, Binran Wang, Fengyuan Wang, Ziyang Ni, Bohan Zhang, Fan Cui, Changkun Shao, Qing-Hong Cao, Ming-xing Luo, Muhan Zhang, Hua Xing Zhu | 2025-04-22 | arXiv | https://phybench-official.github.io/phybench-demo/ | http://arxiv.org/abs/2504.16074v1 |
80 | WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents | Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang | 2025-04-22 | arXiv | https://github.com/elated-sawyer/WALL-E | http://arxiv.org/abs/2504.15785v1 |
81 | CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs | Yingming Zheng, Xiaoliang Liu, Peng Wu, Li Pan | 2025-04-21 | arXiv | https://github.com/8zym/CRAVE | http://arxiv.org/abs/2504.14905v1 |
82 | EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models | Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, Huajun Chen | 2025-04-21 | arXiv | https://zjunlp.github.io/project/EasyEdit2/video | https://doi.org/10.48550/arXiv.2308.07269 |
83 | Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph | Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu | 2025-04-21 | arXiv | https://github.com/NEUIR/MemGraph | http://arxiv.org/abs/2504.14845v1 |
84 | Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty | 2025-04-21 | arXiv | https://github.com/SalesforceAIResearch/jetts-benchmark | http://arxiv.org/abs/2504.15253v1 |
85 | IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs | David Ma, Yuanxing Zhang, Jincheng Ren, Jarvis Guo, Yifan Yao, Zhenlin Wei, Zhenzhu Yang, Zhongyuan Peng, Boyu Feng, Jun Ma, Xiao Gu, Zhoufutu Wen, King Zhu, Yancheng He, Meng Cao, Shiwen Ni, Jiaheng Liu, Wenhao Huang, Ge Zhang, Xiaojie Jin | 2025-04-21 | arXiv | https://github.com/multimodal-art-projection/IV-Bench | http://arxiv.org/abs/2504.15415v1 |
86 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu | 2025-04-21 | arXiv | https://visulogic-benchmark.github.io/VisuLogic | http://arxiv.org/abs/2504.15279v1 |
87 | NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models | Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang | 2025-04-20 | arXiv | https://github.com/LawrenceRLiu/NoWag | http://arxiv.org/abs/2504.14569v1 |
88 | Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding | Tong Zeng, Longfeng Wu, Liang Shi, Dawei Zhou, Feng Guo | 2025-04-20 | arXiv | https://github.com/tong-zeng/DVBench | http://arxiv.org/abs/2504.14526v1 |
89 | CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | Man Ho Lam, Chaozheng Wang, Jen-tse Huang, Michael R. Lyu | 2025-04-19 | arXiv | https://donaldlamnl.github.io/CodeCrash/ | http://arxiv.org/abs/2504.14119v1 |
90 | Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model | Youngbin Lee, Yejin Kim, Suin Kim, Yongjae Lee | 2025-04-19 | arXiv | https://github.com/youngandbin/LLM-MVO-BLM | http://arxiv.org/abs/2504.14345v1 |
91 | Towards Explainable Fake Image Detection with Multi-Modal Large Language Models | Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang | 2025-04-19 | arXiv | https://github.com/Gennadiyev/mllm-defake | http://arxiv.org/abs/2504.14245v1 |
92 | Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator | Akshat Ramachandran, Souvik Kundu, Arnab Raha, Shamik Kundu, Deepak K. Mathaikutty, Tushar Krishna | 2025-04-19 | arXiv | https://github.com/FLOW-open-project/FLOW | http://arxiv.org/abs/2504.14365v1 |
93 | LLM Sensitivity Evaluation Framework for Clinical Diagnosis | Chenwei Yan, Xiangling Fu, Yuxuan Xiong, Tianyi Wang, Siu Cheung Hui, Ji Wu, Xien Liu | 2025-04-18 | Proceedings of the 31st International Conference on Computational Linguistics, 2025 | https://github.com/chenwei23333/DiagnosisQA | http://arxiv.org/abs/2504.13475v1 |
94 | ConExion: Concept Extraction with Large Language Models | Ebrahim Norouzi, Sven Hertling, Harald Sack | 2025-04-17 | arXiv | https://github.com/ISE-FIZKarlsruhe/concept_extraction | http://arxiv.org/abs/2504.12915v1 |
95 | EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting | Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, ShiLiang Zhang, Xie Chen | 2025-04-17 | arXiv | https://yanghaha0908.github.io/EmoVoice/ | http://arxiv.org/abs/2504.12867v1 |
96 | ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition | Hisham A. Alyahya, Haidar Khan, Yazeed Alnumay, M Saiful Bari, Bülent Yener | 2025-04-17 | arXiv | https://github.com/facebookresearch/ZeroSumEval | http://arxiv.org/abs/2503.10673v1 |
97 | Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Yicheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, Feng Ma | 2025-04-17 | arXiv | https://github.com/ycpNotFound/GeoGen | http://arxiv.org/abs/2504.12773v1 |
98 | Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM | Zirui Pan, Xin Wang, Yipeng Zhang, Hong Chen, Kwan Man Cheng, Yaofei Wu, Wenwu Zhu | 2025-04-16 | arXiv | https://modular-cam.github.io | http://arxiv.org/abs/2504.12048v1 |
99 | d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | Siyan Zhao, Devaansh Gupta, Qinqing Zheng, Aditya Grover | 2025-04-16 | arXiv | https://dllm-reasoning.github.io/ | http://arxiv.org/abs/2504.12216v1 |
100 | LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA | Xanh Ho, Jiahao Huang, Florian Boudin, Akiko Aizawa | 2025-04-16 | arXiv | https://github.com/Alab-NII/llm-judge-extract-qa | http://arxiv.org/abs/2504.11972v1 |
101 | HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks | Stefan Abi-Karam, Cong Hao | 2025-04-16 | arXiv | https://github.com/stefanpie/hls-eval | http://arxiv.org/abs/2504.12268v1 |
102 | A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment | Negar Arabzadeh, Charles L. A . Clarke | 2025-04-16 | arXiv | https://github.com/Narabzad/prompt-sensitivity-relevance-judgements/ | http://arxiv.org/abs/2504.12408v1 |
103 | MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning | Zhaopeng Feng, Shaosheng Cao, Jiahan Ren, Jiayuan Su, Ruizhe Chen, Yan Zhang, Zhe Xu, Yao Hu, Jian Wu, Zuozhu Liu | 2025-04-15 | arXiv …, 2025 | https://github.com/fzp0424/MT-R1-Zero | http://arxiv.org/abs/2504.10160v1 |
104 | Using LLMs as prompt modifier to avoid biases in AI image generators | René Peinl | 2025-04-15 | arXiv | https://iisys-hof.github.io/llm-prompt-img-gen/ | http://arxiv.org/abs/2504.11104v1 |
105 | Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From | Changjiang Gao, Hankun Lin, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Jiajun Chen | 2025-04-15 | arXiv | https://github.com/NJUNLP/Cross-Lingual-Context-Retrieval | http://arxiv.org/abs/2504.10906v1 |
106 | RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence | Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Lizhou Lin, Lan Sun, Renwen Wang, Jianran Liu, Qi Wu, Ling Pei | 2025-04-15 | arXiv …, 2025 | https://inowlzy.github.io/RadarLLM/ | http://arxiv.org/abs/2504.09862v1 |
107 | Propaganda via AI? A Study on Semantic Backdoors in Large Language Models | Nay Myat Min, Long H. Pham, Yige Li, Jun Sun | 2025-04-15 | arXiv | https://github.com/NayMyatMin/RAVEN | http://arxiv.org/abs/2504.12344v1 |
108 | Probing then Editing Response Personality of Large Language Models | Tianjie Ju, Zhenyu Shao, Bowen Wang, Yujia Chen, Zhuosheng Zhang, Hao Fei, Mong-Li Lee, Wynne Hsu, Sufeng Duan, Gongshen Liu | 2025-04-15 | arXiv …, 2025 | https://github.com/universe-sky/probing-then-editing-personality | http://arxiv.org/abs/2504.10227v1 |
109 | LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa D Doan, Chandan K Reddy | 2025-04-15 | arXiv …, 2025 | https://github.com/deep-symbolic-mathematics/llm-srbench | http://arxiv.org/abs/2504.10415v1 |
110 | Teaching Large Language Models to Reason through Learning and Forgetting | Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor | 2025-04-15 | arXiv | https://github.com/twni2016/llm-reasoning-uft | http://arxiv.org/abs/2504.11364v1 |
111 | Dynamic Compressing Prompts for Efficient Inference of Large Language Models | Jinwu Hu, Wei Zhang, Yufeng Wang, Yu Hu, Bin Xiao, Mingkui Tan, Qing Du | 2025-04-15 | arXiv | https://github.com/Fhujinwu/DCP | http://arxiv.org/abs/2504.11004v1 |
112 | A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Xue Zhang, Songming Zhang, Yunlong Liang, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou | 2025-04-15 | arXiv | https://github.com/songmzhang/DSKDv2 | http://arxiv.org/abs/2504.11426v1 |
113 | 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Tianyi Zhang, Yang Sui, Shaochen Zhong, Vipin Chaudhary, Xia Hu, Anshumali Shrivastava | 2025-04-15 | arXiv | https://github.com/LeanModels/DFloat11 | http://arxiv.org/abs/2504.11651v1 |
114 | LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks | Soumyadeep Pal, Changsheng Wang, James Diffenderfer, Bhavya Kailkhura, Sijia Liu | 2025-04-15 | arXiv …, 2025 | https://github.com/OPTML-Group/MU-Coreset | http://arxiv.org/abs/2504.10185v2 |
115 | CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates | Ankit Kumar Shaw, Kun Jiang, Tuopu Wen, Chandan Kumar Sah, Yining Shi, Mengmeng Yang, Diange Yang, Xiaoli Lian | 2025-04-14 | arXiv | https://Ankit-Zefan.github.io/CleanMap/ | http://arxiv.org/abs/2504.10738v1 |
116 | ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model | Wuyang Lan, Wenzheng Wang, Changwei Ji, Guoxing Yang, Yongbo Zhang, Xiaohong Liu, Song Wu, Guangyu Wang | 2025-04-13 | arXiv | https://github.com/medfound/medfound | http://arxiv.org/abs/2504.09421v2 |
117 | Fine-tuning a Large Language Model for Automating Computational Fluid Dynamics Simulations | Zhehao Dong, Zhen Lu, Yue Yang | 2025-04-13 | arXiv | https://github.com/YYgroup/AutoCFD | http://arxiv.org/abs/2504.09602v2 |
118 | Alleviating the Fear of Losing Alignment in LLM Fine-tuning | Kang Yang, Guanhong Tao, Xun Chen, Jun Xu | 2025-04-13 | arXiv | https://github.com/kangyangWHU/LLMAlignment | http://arxiv.org/abs/2504.09757v1 |
119 | Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025 | Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou | 2025-04-13 | arXiv | https://github.com/zou-group/review_feedback_agent | http://arxiv.org/abs/2504.09737v1 |
120 | DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training | Zhenting Wang, Guofeng Cui, Kun Wan, Wentian Zhao | 2025-04-13 | arXiv | https://github.com/ZhentingWang/DUMP | http://arxiv.org/abs/2504.09710v1 |
121 | HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs | Sharanya Dasgupta, Sujoy Nath, Arkaprabha Basu, Pourya Shamsolmoali, Swagatam Das | 2025-04-13 | arXiv | https://github.com/sharanya-dasgupta001/hallushift | http://arxiv.org/abs/2504.09482v1 |
122 | How new data permeates LLM knowledge and how to dilute it | Chen Sun, Renat Aksitov, Andrey Zhmoginov, Nolan Andrew Miller, Max Vladymyrov, Ulrich Rueckert, Been Kim, Mark Sandler | 2025-04-13 | arXiv | https://sunchipsster1.github.io/projects/outlandish/ | http://arxiv.org/abs/2504.09522v1 |
123 | SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model | Kaiyu Li, Zepeng Xin, Li Pang, Chao Pang, Yupeng Deng, Jing Yao, Guisong Xia, Deyu Meng, Zhi Wang, Xiangyong Cao | 2025-04-13 | arXiv | https://github.com/earth-insights/SegEarth-R1 | http://arxiv.org/abs/2504.09644v1 |
124 | Span-level Emotion-Cause-Category Triplet Extraction with Instruction Tuning LLMs and Data Augmentation | Xiangju Li, Dong Yang, Xiaogang Zhu, Faliang Huang, Peng Zhang, Zhongying Zhao | 2025-04-13 | arXiv | https://github.com/zxgnlp/InstruDa-LLM | http://arxiv.org/abs/2504.12331v1 |
125 | Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution | Chenghao Li, Chaoning Zhang, Yi Lu, Jiaquan Zhang, Qigan Sun, Xudong Wang, Jiwei Wei, Guoqing Wang, Yang Yang, Heng Tao Shen | 2025-04-13 | arXiv | https://github.com/dlMARiA/Syzygy-of-thoughts | http://arxiv.org/abs/2504.09566v2 |
126 | GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Lang Lin, Xueyang Yu, Ziqi Pang, Yu-Xiong Wang | 2025-04-12 | arXiv:2504.07962, 2025 | https://glus-video.github.io/ | http://arxiv.org/abs/2504.07962v1 |
127 | Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law | Yixin Cao, Jiahao Ying, Yaoning Wang, Xipeng Qiu, Xuanjing Huang, Yugang Jiang | 2025-04-12 | arXiv …, 2025 | https://github.com/ALEX-nlp/MUI-Eva | http://arxiv.org/abs/2504.07440v1 |
128 | LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking | Qi Liu, Haozhe Duan, Yiqun Chen, Quanfeng Lu, Weiwei Sun, Jiaxin Mao | 2025-04-12 | arXiv …, 2025 | https://github.com/liuqi6777/llm4ranking | http://arxiv.org/abs/2504.07439v1 |
129 | Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation | Bo Zhang, Hui Ma, Dailin Li, Jian Ding, Jian Wang, Bo Xu, HongFei Lin | 2025-04-12 | arXiv …, 2025 | https://github.com/zhangbo-nlp/KEDiT | http://arxiv.org/abs/2504.07754v1 |
130 | Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models | Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua | 2025-04-12 | arXiv …, 2025 | https://github.com/Lum1104/EIBench | http://arxiv.org/abs/2504.07521v1 |
131 | From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy | Adrianna Romanowski, Pedro H. V. Valois, Kazuhiro Fukui | 2025-04-12 | arXiv | https://github.com/swaggirl9000/humor | http://arxiv.org/abs/2504.09049v1 |
132 | Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks | Ye Ye | 2025-04-11 | arXiv | https://github.com/biubiutomato/TME-Agent | http://arxiv.org/abs/2504.08525v3 |
133 | A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis | Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Conghui He, Lijun Wu | 2025-04-11 | arXiv | https://github.com/GX-XinGao/GRA | http://arxiv.org/abs/2504.12322v1 |
134 | Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric | Yixin Cao, Jiahao Ying, Yaoning Wang, Xipeng Qiu, Xuanjing Huang, Yugang Jiang | 2025-04-10 | arXiv | https://github.com/ALEX-nlp/MUI-Eva | http://arxiv.org/abs/2504.07440v2 |
135 | Exploring the Effectiveness and Interpretability of Texts in LLM-based Time Series Models | Zhengke Sun, Hangwei Qian, Ivor Tsang | 2025-04-09 | arXiv | https://github.com/zachysun/TS-Lang-Exp | http://arxiv.org/abs/2504.08808v1 |
136 | V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models | Xiangxi Zheng, Linjie Li, Zhengyuan Yang, Ping Yu, Alex Jinpeng Wang, Rui Yan, Yuan Yao, Lijuan Wang | 2025-04-08 | arXiv | https://github.com/CSU-JPG/V-MAGE | http://arxiv.org/abs/2504.06148v2 |
137 | LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources | Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun | 2025-04-08 | arXiv | https://github.com/thunlp/LLMxMapReduce | http://arxiv.org/abs/2504.05732v1 |
138 | Assessing Thai Dialect Performance in LLMs with Automatic Benchmarks and Human Evaluation | Peerat Limkonchotiwat, Kanruethai Masuk, Surapon Nonesung, Chalermpun Mai-On, Sarana Nutanong, Wuttikorn Ponwitayarat, Potsawee Manakul | 2025-04-08 | arXiv | https://github.com/mrpeerat/Thai_local_benchmark | http://arxiv.org/abs/2504.05898v1 |
139 | MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | Pengfei Zhou, Fanrui Zhang, Xiaopeng Peng, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang | 2025-04-08 | arXiv | https://github.com/LanceZPF/MDK12 | http://arxiv.org/abs/2504.05782v1 |
140 | Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | Yubo Li, Xiaobin Shen, Xinyu Yao, Xueying Ding, Yidi Miao, Ramayya Krishnan, Rema Padman | 2025-04-07 | arXiv | https://github.com/yubol-cmu/Awesome-Multi-Turn-LLMs | http://arxiv.org/abs/2504.04717v1 |
141 | SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, Zhangyang Wang | 2025-04-07 | arXiv | https://github.com/VITA-Group/SEAL | http://arxiv.org/abs/2504.07986v1 |
142 | EduPlanner: LLM-Based Multi-Agent Systems for Customized and Intelligent Instructional Design | Xueqiao Zhang, Chao Zhang, Jianwen Sun, Jun Xiao, Yi Yang, Yawei Luo | 2025-04-07 | arXiv | https://github.com/Zc0812/Edu_Planner | http://arxiv.org/abs/2504.05370v1 |
143 | Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs | Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song | 2025-04-07 | arXiv | https://github.com/sunblaze-ucb/llm-api-audit | http://arxiv.org/abs/2504.04715v1 |
144 | Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials | Chu Zhao, Enneng Yang, Yuting Liu, Jianzhe Zhao, Guibing Guo, Xingwei Wang | 2025-04-07 | arXiv | https://github.com/user683/HNLMRec | http://arxiv.org/abs/2504.04726v1 |
145 | Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration | Ran Xu, Wenqi Shi, Yuchen Zhuang, Yue Yu, Joyce C. Ho, Haoyu Wang, Carl Yang | 2025-04-07 | arXiv | https://github.com/ritaranx/Collab-RAG/ | http://arxiv.org/abs/2504.04915v1 |
146 | PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters | Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu | 2025-04-07 | arXiv | https://github.com/Lizonghang/prima.cpp | http://arxiv.org/abs/2504.08791v1 |
147 | ArxivBench: Can LLMs Assist Researchers in Conducting Research? | Ning Li, Jingran Zhang, Justin Cui | 2025-04-06 | arXiv | https://github.com/arxivBenchLLM/arXivBench | http://arxiv.org/abs/2504.10496v1 |
148 | Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning | Xuerui Su, Shufang Xie, Guoqing Liu, Yingce Xia, Renqian Luo, Peiran Jin, Zhiming Ma, Yue Wang, Zun Wang, Yuting Liu | 2025-04-06 | arXiv | https://github.com/XueruiSu/Trust-Region-Preference-Approximation | http://arxiv.org/abs/2504.04524v1 |
149 | A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models | Aviv Brokman, Xuguang Ai, Yuhang Jiang, Shashank Gupta, Ramakanth Kavuluru | 2025-04-05 | arXiv | https://github.com/bionlproc/ZeroShotRE | http://arxiv.org/abs/2504.04083v1 |
150 | Window Token Concatenation for Efficient Visual Large Language Models | Yifan Li, Wentao Bao, Botao Ye, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong | 2025-04-05 | arXiv | https://github.com/JackYFL/WiCo | http://arxiv.org/abs/2504.04024v1 |
151 | AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs | Xinyu Mao, Teerapong Leelanupab, Martin Potthast, Harrisen Scells, Guido Zuccon | 2025-04-05 | arXiv | https://github.com/ielab/ai-review | http://arxiv.org/abs/2504.04193v1 |
152 | A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models | Yuantao Zhang, Zhankui Yang | 2025-04-05 | arXiv | https://github.com/zyttt-coder/LLM_similarity | http://arxiv.org/abs/2504.04216v1 |
153 | MSL: Not All Tokens Are What You Need for Tuning LLM as a Recommender | Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, Can Wang | 2025-04-05 | arXiv | https://github.com/WANGBohaO-jpg/MSL | http://arxiv.org/abs/2504.04178v1 |
154 | VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation | Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang | 2025-04-05 | arXiv | https://github.com/SJTU-OmniAgent/VocalNet | http://arxiv.org/abs/2504.04060v1 |
155 | Align to Structure: Aligning Large Language Models with Structural Information | Zae Myung Kim, Anand Ramachandran, Farideh Tavazoee, Joo-Kyung Kim, Oleg Rokhlenko, Dongyeop Kang | 2025-04-04 | arXiv | https://github.com/minnesotanlp/struct_align | http://arxiv.org/abs/2504.03622v1 |
156 | EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline | Peter Baile Chen, Tomer Wolfson, Michael Cafarella, Dan Roth | 2025-04-04 | arXiv | https://peterbaile.github.io/enrichindex/ | http://arxiv.org/abs/2504.03598v1 |
157 | AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology | Xiang Feng, Wentao Jiang, Zengmao Wang, Yong Luo, Pingbo Xu, Baosheng Yu, Hua Jin, Bo Du, Jing Zhang | 2025-04-03 | arXiv | https://github.com/MiliLab/AnesBench | http://arxiv.org/abs/2504.02404v1 |
158 | BT-ACTION: A Test-Driven Approach for Modular Understanding of User Instruction Leveraging Behaviour Trees and LLMs | Alexander Leszczynski, Sarah Gillet, Iolanda Leite, Fethiye Irmak Dogan | 2025-04-03 | arXiv | https://github.com/1Eggbert7/BT_LLM | http://arxiv.org/abs/2504.02779v1 |
159 | Measurement of LLM's Philosophies of Human Nature | Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Wangmeng Zuo | 2025-04-03 | arXiv | https://github.com/kodenii/M-PHNS | http://arxiv.org/abs/2504.02304v1 |
160 | ZClip: Adaptive Spike Mitigation for LLM Pre-Training | Abhay Kumar, Louis Owen, Nilabhra Roy Chowdhury, Fabian Güra | 2025-04-03 | arXiv | https://github.com/bluorion-com/ZClip | http://arxiv.org/abs/2504.02507v1 |
161 | Comment Staytime Prediction with LLM-enhanced Comment Understanding | Changshuo Zhang, Zihan Lin, Shukai Liu, Yongqi Liu, Han Li | 2025-04-02 | arXiv | https://github.com/lyingCS/KuaiComt.github.io | http://arxiv.org/abs/2504.01602v1 |
162 | OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Joint LLM and GNN Modeling | Heming Zhang, Tim Xu, Dekang Cao, Shunning Liang, Lars Schimmelpfennig, Levi Kaster, Di Huang, Carlos Cruchaga, Guangfu Li, Michael Province, Yixin Chen, Philip Payne, Fuhai Li | 2025-04-02 | arXiv | https://github.com/FuhaiLiAiLab/OmniCellTOSG | http://arxiv.org/abs/2504.02148v1 |
163 | TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining | Jeffrey Li, Mohammadreza Armandpour, Iman Mirzadeh, Sachin Mehta, Vaishaal Shankar, Raviteja Vemulapalli, Samy Bengio, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pouransari, Fartash Faghri | 2025-04-02 | arXiv | https://github.com/apple/ml-tic-lm | http://arxiv.org/abs/2504.02107v1 |
164 | MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits | Brandon Radosevich, John Halloran | 2025-04-02 | arXiv | https://github.com/leidosinc/McpSafetyScanner | http://arxiv.org/abs/2504.03767v1 |
165 | Urban Computing in the Era of Large Language Models | Zhonghang Li, Lianghao Xia, Xubin Ren, Jiabin Tang, Tianyi Chen, Yong Xu, Chao Huang | 2025-04-02 | arXiv | https://github.com/HKUDS/Awesome-LLM4Urban-Papers | https://doi.org/10.48550/arXiv.2504.02009 |
166 | CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models | Wei Zhou, Yuyang Gao, Xuanhe Zhou, Guoliang Li | 2025-04-01 | arXiv | https://github.com/weAIDB/CrackSQL | https://doi.org/10.48550/arXiv.2504.00882 |
167 | RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model | Lin Zhang, Zhouhong Gu, Xiaoran Shi, Hongwei Feng, Yanghua Xiao | 2025-04-01 | arXiv | https://github.com/MikeGu721/reckon | https://doi.org/10.48550/arXiv.2504.00756 |
168 | ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers | Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun | 2025-04-01 | arXiv | https://github.com/icip-cas/ShortV | https://doi.org/10.48550/arXiv.2504.00502 |
169 | m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models | Xiaoke Huang, Juncheng Wu, Hui Liu, Xianfeng Tang, Yuyin Zhou | 2025-04-01 | arXiv | https://github.com/UCSC-VLAA/m1 | https://doi.org/10.48550/arXiv.2504.00869 |
170 | MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs | Juncheng Wu, Wenlong Deng, Xingxuan Li, Sheng Liu, Taomian Mi, Yifan Peng, Ziyang Xu, Yi Liu, Hyunjin Cho, Chang-In Choi, Yihan Cao, Hui Ren, Xiang Li, Xiaoxiao Li, Yuyin Zhou | 2025-04-01 | arXiv | https://github.com/UCSC-VLAA/MedReason | http://arxiv.org/abs/2504.00993v2 |
171 | When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning | Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach | 2025-04-01 | arXiv | https://github.com/nishadsinghi/sc-genrm-scaling | http://arxiv.org/abs/2504.01005v1 |
172 | SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning | Shiyue Zhao, Junzhi Zhang, Neda Masoud, Heye Huang, Xingpeng Xia, Chengkun He | 2025-03-31 | arXiv | https://sean-shiyuez.github.io/SACA/ | http://arxiv.org/abs/2504.00115v1 |
173 | What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models | Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma | 2025-03-31 | arXiv | https://github.com/testtimescaling/testtimescaling.github.io/ | https://doi.org/10.48550/arXiv.2503.24235 |
174 | SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers | Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang, Lin Gui, Yulan He | 2025-03-31 | arXiv | https://github.com/xyzCS/SciReplicate-Bench | http://arxiv.org/abs/2504.00255v1 |
175 | LANID: LLM-assisted New Intent Discovery | Lu Fan, Jiashu Pu, Rongsheng Zhang, Xiao-Ming Wu | 2025-03-31 | arXiv | https://github.com/floatSDSDS/LANID | http://arxiv.org/abs/2503.23740v1 |
176 | Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models | Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong | 2025-03-31 | arXiv | https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers | https://doi.org/10.48550/arXiv.2503.24377 |
177 | Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving | Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen | 2025-03-31 | arXiv | https://github.com/LLMkvsys/rethink-kv-compression | https://doi.org/10.48550/arXiv.2503.24000 |
178 | Text Chunking for Document Classification for Urban System Management using Large Language Models | Joshua Rodriguez, Om Sanan, Guillermo Vizarreta-Luna, Steven A. Conrad | 2025-03-31 | arXiv | https://github.com/josh-rodriguez-csu/ChunkingforLLMs | https://doi.org/10.48550/arXiv.2504.00274 |
179 | A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well? | Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma | 2025-03-31 | arXiv | https://github.com/testtimescaling/testtimescaling.github.io/ | http://arxiv.org/abs/2503.24235v3 |
180 | ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance | Tong Xie, Jiawang Zhao, Zishen Wan, Zuodong Zhang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li | 2025-03-31 | arXiv | https://github.com/PKU-SEC-Lab/ReaLM_DAC25/ | https://doi.org/10.48550/arXiv.2503.24053 |
181 | EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing | Hongxiang Jiang, Jihao Yin, Qixiong Wang, Jiaqi Feng, Guo Chen | 2025-03-30 | arXiv | https://github.com/XiangTodayEatsWhat/EagleVision | http://arxiv.org/abs/2503.23330v1 |
182 | Agentic Large Language Models, a survey | Aske Plaat, Max J. van Duijn, Niki van Stein, Mike Preuss, Peter van der Putten, Kees Joost Batenburg | 2025-03-29 | arXiv | https://askeplaat.github.io/agentic-llm-survey-site/ | https://doi.org/10.48550/arXiv.2503.23037 |
183 | Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models | Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han | 2025-03-28 | arXiv | https://github.com/tmlr-group/landscape-of-thoughts | https://doi.org/10.48550/arXiv.2503.22165 |
184 | MediTools -- Medical Education Powered by LLMs | Amr Alshatnawi, Remi Sampaleanu, David Liebovitz | 2025-03-28 | arXiv | https://github.com/NM-Streamlit-Team/meditools | http://arxiv.org/abs/2503.22769v1 |
185 | A Refined Analysis of Massive Activations in LLMs | Louis Owen, Nilabhra Roy Chowdhury, Abhay Kumar, Fabian Güra | 2025-03-28 | arXiv | https://github.com/bluorion-com/refine_massive_activations | http://arxiv.org/abs/2503.22329v1 |
186 | QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Belinda Z. Li, Been Kim, Zi Wang | 2025-03-28 | arXiv | https://github.com/google-deepmind/questbench | http://arxiv.org/abs/2503.22674v1 |
187 | SWI: Speaking with Intent in Large Language Models | Yuwei Yin, EunJeong Hwang, Giuseppe Carenini | 2025-03-27 | arXiv | https://github.com/YuweiYin/SWI | https://doi.org/10.48550/arXiv.2503.21544 |
188 | Ignite Forecasting with SPARK: An Efficient Generative Framework for Refining LLMs in Temporal Knowledge Graph Forecasting | Gongzhu Yin, Hongli Zhang, Yi Luo, Yuchen Yang, Kun Lu, Chao Meng | 2025-03-27 | arXiv | https://github.com/yin-gz/SPARK | http://arxiv.org/abs/2503.22748v1 |
189 | Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap | Tong Nie, Jian Sun, Wei Ma | 2025-03-27 | arXiv | https://github.com/tongnie/awesome-llm4tr | https://doi.org/10.48550/arXiv.2503.21411 |
190 | Large Language Model Agent: A Survey on Methodology, Applications and Challenges | Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu, Ming Zhang | 2025-03-27 | arXiv | https://github.com/luo-junyu/Awesome-Agent-Papers | https://doi.org/10.48550/arXiv.2503.21460 |
191 | Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Hao Ai, Kunyi Wang, Zezhou Wang, Hao Lu, Jin Tian, Yaxin Luo, Peng Xing, Jen-Yuan Huang, Huaxia Li, Gen luo | 2025-03-26 | arXiv | https://github.com/aihao2000/DPN-LLaVA | https://doi.org/10.48550/arXiv.2503.20322 |
192 | Enhancing the Robustness of LLM-Generated Code: Empirical Study and Framework | ZiKe Li, MingWei Liu, Anji Li, Kaifeng He, Yanlin Wang, Xin Peng, Zibin Zheng | 2025-03-26 | arXiv | https://github.com/SYSUSELab/RobGen | http://arxiv.org/abs/2503.20197v1 |
193 | Leveraging Implicit Sentiments: Enhancing Reliability and Validity in Psychological Trait Evaluation of LLMs | Huanhuan Ma, Haisong Gong, Xiaoyuan Yi, Xing Xie, Dongkuan Xu | 2025-03-26 | arXiv | https://github.com/dependentsign/CSI | http://arxiv.org/abs/2503.20182v1 |
194 | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang | 2025-03-26 | arXiv | https://github.com/naver-ai/JOOD | http://arxiv.org/abs/2503.20823v1 |
195 | Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations | Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia | 2025-03-26 | arXiv | https://github.com/ttthhl/Protecting_Your_Video_Content | http://arxiv.org/abs/2503.21824v1 |
196 | LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation | Sarah Martinson, Lingkai Kong, Cheol Woo Kim, Aparna Taneja, Milind Tambe | 2025-03-25 | arXiv | https://github.com/sarahmart/LLM-ABS-ARMMAN-prediction | http://arxiv.org/abs/2503.22719v1 |
197 | QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition | Yuxuan Hu, Xiaodong Chen, Cuiping Li, Hong Chen, Jing Zhang | 2025-03-25 | arXiv | https://github.com/hyx1999/Quad | http://arxiv.org/abs/2503.19353v1 |
198 | CoLLM: A Large Language Model for Composed Image Retrieval | Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava | 2025-03-25 | arXiv | https://collm-cvpr25.github.io/ | https://doi.org/10.48550/arXiv.2503.19910 |
199 | PAVE: Patching and Adapting Video Large Language Models | Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li | 2025-03-25 | arXiv | https://github.com/dragonlzm/PAVE | https://doi.org/10.48550/arXiv.2503.19794 |
200 | CEFW: A Comprehensive Evaluation Framework for Watermark in Large Language Models | Shuhao Zhang, Bo Cheng, Jiale Han, Yuli Chen, Zhixuan Wu, Changbao Li, Pingli Gu | 2025-03-24 | arXiv | https://github.com/DrankXs/BalancedWatermark | https://doi.org/10.48550/arXiv.2503.20802 |
201 | I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders | Andrey V. Galichin, Alexey Dontsov, Polina Druzhinina, Anton Razzhigaev, Oleg Y. Rogov, Elena Tutubalina, Ivan V. Oseledets | 2025-03-24 | arXiv | https://github.com/AIRI-Institute/SAE-Reasoning | https://doi.org/10.48550/arXiv.2503.18878 |
202 | LLaVAction: evaluating and training multi-modal large language models for action recognition | Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. Mathis | 2025-03-24 | arXiv | https://github.com/AdaptiveMotorControlLab/LLaVAction | https://doi.org/10.48550/arXiv.2503.18712 |
203 | AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration | Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, Min Zhang | 2025-03-24 | arXiv | https://github.com/wangzx1219/AgentDropout | http://arxiv.org/abs/2503.18891v1 |
204 | BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache | Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang | 2025-03-24 | arXiv | https://github.com/DD-DuDa/BitDecoding | http://arxiv.org/abs/2503.18773v1 |
205 | Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Aabid Karim, Abdul Karim, Bhoomika Lohana, Matt Keon, Jaswinder Singh, Abdul Sattar | 2025-03-23 | arXiv | https://github.com/akarim23131/Lost_in_Cultural_Translation | http://arxiv.org/abs/2503.18018v1 |
206 | Reasoning with LLMs for Zero-Shot Vulnerability Detection | Arastoo Zibaeirad, Marco Vieira | 2025-03-22 | arXiv | https://github.com/Erroristotle/VulnSage | http://arxiv.org/abs/2503.17885v1 |
207 | Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models | Jiaming Ji, Xinyu Chen, Rui Pan, Han Zhu, Conghui Zhang, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang | 2025-03-22 | arXiv | https://github.com/SafeRLHF-V | https://doi.org/10.48550/arXiv.2503.17682 |
208 | RAIDER: Tool-Equipped Large Language Model Agent for Robotic Action Issue Detection, Explanation and Recovery | Silvia Izquierdo-Badiola, Carlos Rizzo, Guillem Alenyà | 2025-03-22 | arXiv | https://raider-llmagent.github.io/ | https://doi.org/10.48550/arXiv.2503.17703 |
209 | LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Kun Chu, Xufeng Zhao, Cornelius Weber, Stefan Wermter | 2025-03-21 | arXiv | https://github.com/Kchu/LLM-MAP | https://doi.org/10.48550/arXiv.2503.17309 |
210 | TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment | Shicheng Li, Lei Li, Kun Ouyang, Shuhuai Ren, Yuanxin Liu, Yuanxing Zhang, Fuzheng Zhang, Lingpeng Kong, Qi Liu, Xu Sun | 2025-03-21 | arXiv | https://github.com/lscpku/TEMPLE | http://arxiv.org/abs/2503.16929v2 |
211 | Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique | Yansi Li, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Qiuzhi Liu, Rui Wang, Zhuosheng Zhang, Zhaopeng Tu, Haitao Mi, Dong Yu | 2025-03-21 | arXiv | https://github.com/puddingyeah/PANEL | http://arxiv.org/abs/2503.17363v1 |
212 | RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation | Linxi Liang, Jing Gong, Mingwei Liu, Chong Wang, Guangsheng Ou, Yanlin Wang, Xin Peng, Zibin Zheng | 2025-03-21 | arXiv | https://github.com/SYSUSELab/RustEvo | http://arxiv.org/abs/2503.16922v1 |
213 | Variance Control via Weight Rescaling in LLM Pre-training | Louis Owen, Abhay Kumar, Nilabhra Roy Chowdhury, Fabian Güra | 2025-03-21 | arXiv | https://github.com/bluorion-com/weight_rescaling | http://arxiv.org/abs/2503.17500v1 |
214 | MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion | Qizhi Pei, Lijun Wu, Zhuoshi Pan, Yu Li, Honglin Lin, Chenlin Ming, Xin Gao, Conghui He, Rui Yan | 2025-03-20 | arXiv | https://github.com/QizhiPei/mathfusion | http://arxiv.org/abs/2503.16212v1 |
215 | Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't | Quy-Anh Dang, Chris Ngo | 2025-03-20 | arXiv | https://github.com/knoveleng/open-rs | http://arxiv.org/abs/2503.16219v1 |
216 | The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination | Yifan Sun, Han Wang, Dongbai Li, Gang Wang, Huan Zhang | 2025-03-20 | arXiv | https://github.com/ASTRAL-Group/BDC_mitigation_assessment | http://arxiv.org/abs/2503.16402v1 |
217 | Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models | Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie | 2025-03-20 | arXiv | https://github.com/lntzm/HICom | https://doi.org/10.48550/arXiv.2503.16036 |
218 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Ben Hu | 2025-03-20 | arXiv | https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs | https://doi.org/10.48550/arXiv.2503.16419 |
219 | Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning | Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, Chao Li, Sheng Xu, Dezhi Chen, Yun Chen, Zuo Bai, Liwen Zhang | 2025-03-20 | arXiv | https://github.com/SUFE-AIFLM-Lab/Fin-R1 | https://doi.org/10.48550/arXiv.2503.16252 |
220 | Exploring Large Language Models for Word Games:Who is the Spy? | Chentian Wei, Jiewei Chen, Jinzhu Xu | 2025-03-19 | arXiv | https://github.com/ct-wei/Who-is-The-Spy | https://doi.org/10.48550/arXiv.2503.15235 |
221 | LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning | Federico Cocchi, Nicholas Moratelli, Davide Caffagni, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara | 2025-03-19 | arXiv | https://github.com/aimagelab/LLaVA-MORE | http://arxiv.org/abs/2503.15621v1 |
222 | VisNumBench: Evaluating Number Sense of Multimodal Large Language Models | Tengjin Weng, Jingyi Wang, Wenhao Jiang, Zhong Ming | 2025-03-19 | arXiv | https://wwwtttjjj.github.io/VisNumBench/ | https://doi.org/10.48550/arXiv.2503.14939 |
223 | Aligning Multimodal LLM with Human Preference: A Survey | Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan | 2025-03-18 | arXiv | https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment | http://arxiv.org/abs/2503.14504v1 |
224 | CodingGenie: A Proactive LLM-Powered Programming Assistant | Sebastian Zhao, Alan Zhu, Hussein Mozannar, David Sontag, Ameet Talwalkar, Valerie Chen | 2025-03-18 | arXiv | https://github.com/sebzhao/CodingGenie/ | http://arxiv.org/abs/2503.14724v1 |
225 | Learning on LLM Output Signatures for gray-box LLM Behavior Analysis | Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gelberg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, Haggai Maron | 2025-03-18 | arXiv | https://github.com/BarSGuy/LLM-Output-Signatures-Network | http://arxiv.org/abs/2503.14043v1 |
226 | Word2Minecraft: Generating 3D Game Levels through Large Language Models | Shuo Huang, Muhammad Umair Nasir, Steven James, Julian Togelius | 2025-03-18 | arXiv | https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0 | https://doi.org/10.48550/arXiv.2503.16536 |
227 | SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability | Jiankang Wang, Zhihan Zhang, Zhihang Liu, Yang Li, Jiannan Ge, Hongtao Xie, Yongdong Zhang | 2025-03-18 | arXiv | https://github.com/Jayce1kk/SpaceVLLM | https://doi.org/10.48550/arXiv.2503.13983 |
228 | Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning | Junming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Botian Shi, Ding Wang | 2025-03-17 | arXiv | https://github.com/Wings-Of-Disaster/VaLiK | http://arxiv.org/abs/2503.12972v1 |
229 | Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos | Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari | 2025-03-17 | arXiv | https://github.com/google-research-datasets/egotempo | http://arxiv.org/abs/2503.13646v1 |
230 | xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick M. Blies, Günter Klambauer, Sebastian Böck, Sepp Hochreiter | 2025-03-17 | arXiv | https://github.com/NX-AI/xlstm | http://arxiv.org/abs/2503.13427v1 |
231 | NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran Wang | 2025-03-17 | arXiv | https://github.com/sungyeonparkk/NuPlanQA | https://doi.org/10.48550/arXiv.2503.12772 |
232 | A Survey on the Memory Mechanism of Large Language Model based Agents | Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen | 2025-03-16 | arXiv | https://github.com/nuster1128/LLM_Agent_Memory_Survey | https://doi.org/10.48550/arXiv.2404.13501 |
233 | SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression | Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang | 2025-03-16 | arXiv | https://github.com/AIoT-MLSys-Lab/SVD-LLM | https://doi.org/10.48550/arXiv.2503.12340 |
234 | HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs | Tsz Chung Cheng, Chung Shing Cheng, Chaak Ming Lau, Eugene Tin-Ho Lam, Chun Yat Wong, Hoi On Yu, Cheuk Hei Chong | 2025-03-16 | arXiv | https://github.com/hon9kon9ize/hkeval2025 | http://arxiv.org/abs/2503.12440v1 |
235 | Plausibility Vaccine: Injecting LLM Knowledge for Event Plausibility | Jacob Chmura, Jonah Dauvet, Sebastian Sabry | 2025-03-16 | arXiv | https://github.com/Jacob-Chmura/plausibility-vaccine | http://arxiv.org/abs/2503.12667v1 |
236 | FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents | Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szarvas, Xiaoyu Chu, Alexandru Iosup | 2025-03-15 | HotCloudPerf 2025 | https://github.com/atlarge-research/FAILS | http://arxiv.org/abs/2503.12185v1 |
237 | MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling | Zhaopeng Feng, Jiahan Ren, Jiayuan Su, Jiamei Zheng, Zhihang Tang, Hongwei Wang, Zuozhu Liu | 2025-03-15 | arXiv | https://sabijun.github.io/MT_RewardTreePage | http://arxiv.org/abs/2503.12123v1 |
238 | An LLM-Integrated Framework for Completion, Management, and Tracing of STPA | Ali Raeisdanaei, Juho Kim, Michael Liao, Sparsh Kochhar | 2025-03-15 | arXiv | https://github.com/blueskysolarracing/stpa | http://arxiv.org/abs/2503.12043v1 |
239 | A Survey on Federated Fine-tuning of Large Language Models | Yebo Wu, Chunlin Tian, Jingguang Li, He Sun, Kahou Tam, Li Li, Chengzhong Xu | 2025-03-15 | arXiv | https://github.com/Clin0212/Awesome-Federated-LLM-Learning | https://doi.org/10.48550/arXiv.2503.12016 |
240 | CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning | Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Norgaard, Nayantara Mudur, Martyna Plomecka, Paul Raccuglia, Yasaman Bahri, Victor V. Albert, Pranesh Srinivasan, Haining Pan, Philippe Faist, Brian Rohr, Ekin Dogus Cubuk, Muratahan Aykol, Amil Merchant, Michael J. Statt, Dan Morris, Drew Purves, Elise Kleeman, Ruth Alcantara, Matthew Abraham, Muqthar Mohammad, Ean Phing VanLee, Chenfei Jiang, Elizabeth Dorfman, Eun-Ah Kim, Michael P Brenner, Viren Jain, Sameera Ponda, Subhashini Venugopalan | 2025-03-14 | arXiv | https://github.com/google/curie | http://arxiv.org/abs/2503.13517v2 |
241 | FastVID: Dynamic Density Pruning for Fast Video Large Language Models | Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding | 2025-03-14 | arXiv | https://github.com/LunarShen/FastVID | https://doi.org/10.48550/arXiv.2503.11187 |
242 | Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Weichen Zhan, Zile Zhou, Zhiheng Zheng, Chen Gao, Jinqiang Cui, Yong Li, Xinlei Chen, Xiao-Ping Zhang | 2025-03-14 | arXiv | https://github.com/WeichenZh/Open3DVQA | https://doi.org/10.48550/arXiv.2503.11094 |
243 | ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning | Xinyi Wang, Jiashui Wang, Peng Chen, Jinbo Su, Yanming Liu, Long Liu, Yangdong Wang, Qiyuan Chen, Kai Yun, Chunfu Jia | 2025-03-14 | arXiv | https://github.com/wxy3596/ASMA-Tune | http://arxiv.org/abs/2503.11617v1 |
244 | Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs using Semantic Space | Zhiliang Chen, Xinyuan Niu, Chuan-Sheng Foo, Bryan Kian Hsiang Low | 2025-03-14 | arXiv | https://github.com/chenzhiliang94/convo-plan-SCOPE | http://arxiv.org/abs/2503.11586v1 |
245 | MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Jeong Hun Yeo, Hyeongseop Rha, Se Jin Park, Yong Man Ro | 2025-03-14 | arXiv | https://github.com/JeongHun0716/MMS-LLaMA | http://arxiv.org/abs/2503.11315v1 |
246 | TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models | Xudong Tan, Peng Ye, Chongjun Tu, Jianjian Cao, Yaoxin Yang, Lin Zhang, Dongzhan Zhou, Tao Chen | 2025-03-13 | arXiv | https://github.com/ShawnTan86/TokenCarve | https://doi.org/10.48550/arXiv.2503.10501 |
247 | ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs | Xin Liu, Pei Liu, Guoming Tang | 2025-03-13 | arXiv | https://github.com/SusCom-Lab/ZeroMerge | http://arxiv.org/abs/2503.10714v1 |
248 | RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs | Zhongzhan Huang, Guoming Ling, Vincent S. Liang, Yupei Lin, Yandong Chen, Shanshan Zhong, Hefeng Wu, Liang Lin | 2025-03-13 | GoogleScholar | https://github.com/MilkThink-Lab/RouterEval | http://arxiv.org/abs/2503.10657v1 |
249 | Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set | Florian Eichin, Yang Janet Liu, Barbara Plank, Michael A. Hedderich | 2025-03-13 | arXiv | https://github.com/mainlp/discourse_probes | http://arxiv.org/abs/2503.10515v1 |
250 | ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs | Xin Liu, Pei Liu, Guoming Tang | 2025-03-13 | arXiv | https://github.com/SusCom-Lab/ZSMerge | http://arxiv.org/abs/2503.10714v2 |
251 | Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs | Ariba Khan, Stephen Casper, Dylan Hadfield-Menell | 2025-03-13 | arXiv:2503.08688, 2025 | https://github.com/ariba-k/llm-cultural-alignment-evaluation | http://arxiv.org/abs/2503.08688v1 |
252 | Route Sparse Autoencoder to Interpret Large Language Models | Wei Shi, Sihang Li, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang, Xiangnan He | 2025-03-13 | arXiv | https://github.com/swei2001/RouteSAEs | https://doi.org/10.48550/arXiv.2503.08200 |
253 | OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model | Bowen Zhang, Pengcheng Luo | 2025-03-13 | arXiv | https://github.com/bwz96sco/or_llm_agent | https://doi.org/10.48550/arXiv.2503.10009 |
254 | Learning to Inference Adaptively for Multimodal Large Language Models | Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Saurabh Bagchi, Somali Chaterji, Yingyu Liang, Yin Li | 2025-03-13 | arXiv | https://zhuoyan-xu.github.io/ada-llava/ | https://doi.org/10.48550/arXiv.2503.10905 |
255 | Adapting Large Language Models for Parameter-Efficient Log Anomaly Detection | Ying Fu Lim, Jiawen Zhu, Guansong Pang | 2025-03-13 | arXiv | https://github.com/mala-lab/LogADReft | https://doi.org/10.48550/arXiv.2503.08045 |
256 | 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister | 2025-03-13 | arXiv | https://4d-langsplat.github.io | https://doi.org/10.48550/arXiv.2503.10437 |
257 | Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs | Jiani Huang, Shijie Wang, Liang-bo Ning, Wenqi Fan, Shuaiqiang Wang, Dawei Yin, Qing Li | 2025-03-12 | arXiv | https://github.com/jiani-huang/RecBench | http://arxiv.org/abs/2503.09382v1 |
258 | RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports | Jiushen Cai, Weihang Zhang, Hanruo Liu, Ningli Wang, Huiqi Li | 2025-03-12 | arXiv | https://github.com/AB-Story/RetSTA-7B | http://arxiv.org/abs/2503.09358v1 |
259 | What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models | Abhipsha Das, Nicholas Lourie, Siavash Golkar, Mariel Pettee | 2025-03-12 | arXiv | https://github.com/chiral-carbon/kg-for-science | http://arxiv.org/abs/2503.09894v1 |
260 | Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents | Dongjun Lee, Juyong Lee, Kyuyoung Kim, Jihoon Tack, Jinwoo Shin, Yee Whye Teh, Kimin Lee | 2025-03-12 | arXiv | https://lcowiclr2025.github.io | http://arxiv.org/abs/2503.10689v1 |
261 | CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data | Adel ElZemity, Budi Arief, Shujun Li | 2025-03-12 | arXiv | https://github.com/Adelsamir01/CyberLLMInstruct | http://arxiv.org/abs/2503.09334v1 |
262 | CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection | Richard A. Dubniczky, Krisztofer Zoltán Horvát, Tamás Bisztray, Mohamed Amine Ferrag, Lucas C. Cordeiro, Norbert Tihanyi | 2025-03-12 | arXiv | https://github.com/CASTLE-Benchmark | http://arxiv.org/abs/2503.09433v1 |
263 | Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models | Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wangxiang Che | 2025-03-12 | arXiv | https://long-cot.github.io/ | https://doi.org/10.48550/arXiv.2503.09567 |
264 | MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization | Zongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan | 2025-03-12 | arXiv | https://github.com/ZongwuWang/MILLION | http://arxiv.org/abs/2504.03661v1 |
265 | BYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More Excellent | Hongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Mingjie Xing, Yanjun Wu | 2025-03-12 | arXiv | https://github.com/LHY-24/BYOS | https://doi.org/10.48550/arXiv.2503.09663 |
266 | Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han | 2025-03-12 | arXiv | https://github.com/PeterGriffinJin/Search-R1 | http://arxiv.org/abs/2503.09516v1 |
267 | NVP-HRI: Zero shot natural voice and posture-based human-robot interaction via large language model | Yuzhi Lai, Shenghai Yuan, Youssef Nassar, Mingyu Fan, Thomas Weber, Matthias Rätsch | 2025-03-12 | Expert Syst. Appl. | https://github.com/laiyuzhi/NVP-HRI | https://doi.org/10.1016/j.eswa.2024.126360 |
268 | Process-Supervised LLM Recommenders via Flow-guided Tuning | Chongming Gao, Mengyao Gao, Chenxiao Fan, Shuai Yuan, Wentao Shi, Xiangnan He | 2025-03-11 | arXiv …, 2025 | https://github.com/Mr-Peach0301/Flower | http://arxiv.org/abs/2503.07377v1 |
269 | Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset | Anand Menon, Samit S. Miftah, Shamik Kundu, Souvik Kundu, Amisha Srivastava, Arnab Raha, Gabriel Theodor Sonnenschein, Suvadeep Banerjee, Deepak Mathaikutty, Kanad Basu | 2025-03-11 | arXiv | https://github.com/AnandMenon12/VERT | https://doi.org/10.48550/arXiv.2503.08923 |
270 | V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation | Guiwei Zhang, Tianyu Zhang, Mohan Zhou, Yalong Bai, Biye Li | 2025-03-11 | arXiv | https://github.com/zhangguiwei610/V2Flow | https://doi.org/10.48550/arXiv.2503.07493 |
271 | DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs | Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, Se-Young Yun | 2025-03-11 | arXiv …, 2025 | https://github.com/jongwooko/distillm-2 | http://arxiv.org/abs/2503.07067v1 |
272 | ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration | Mengting Ai, Tianxin Wei, Yifan Chen, Zhichen Zeng, Ritchie Zhao, Girish Varatkar, Bita Darvish Rouhani, Xianfeng Tang, Hanghang Tong, Jingrui He | 2025-03-11 | arXiv …, 2025 | https://github.com/iDEA-iSAIL-Lab-UIUC/ResMoE | http://arxiv.org/abs/2503.06881v1 |
273 | Graphormer-Guided Task Planning: Beyond Static Rules with LLM Safety Perception | Wanjing Huang, Tongjie Pan, Yalan Ye | 2025-03-11 | arXiv:2503.06866, 2025 | https://github.com/hwj20/GGTP | http://arxiv.org/abs/2503.06866v1 |
274 | Roamify: Designing and Evaluating an LLM Based Google Chrome Extension for Personalised Itinerary Planning | Vikranth Udandarao, Noel Abraham Tiju, Muthuraj Vairamuthu, Harsh Mistry, Dhruv Kumar | 2025-03-10 | arXiv | https://github.com/Roamify-Research/Roamify | http://arxiv.org/abs/2504.10489v1 |
275 | AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot | Xiao Wang, Lu Dong, Sahana Rangasrinivasan, Ifeoma Nwogu, Srirangaraj Setlur, Venugopal Govindaraju | 2025-03-09 | arXiv | https://wangxiaoshawn.github.io/AutoMisty.html | http://arxiv.org/abs/2503.06791v1 |
276 | How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders | Tatsuro Inaba, Kentaro Inui, Yusuke Miyao, Yohei Oseki, Benjamin Heinzerling, Yu Takagi | 2025-03-09 | arXiv | https://github.com/llm-jp/llm-jp-sae | http://arxiv.org/abs/2503.06394v1 |
277 | DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments | Wenjie Tang, Yuan Zhou, Erqiang Xu, Keyan Cheng, Minne Li, Liquan Xiao | 2025-03-08 | arXiv | https://github.com/DeciBrain-Group/DSGBench | http://arxiv.org/abs/2503.06047v1 |
278 | Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices | Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen | 2025-03-08 | arXiv | https://github.com/EIT-NLP/Layer_Select_Fuse_for_MLLM | http://arxiv.org/abs/2503.06063v1 |
279 | SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? | Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li | 2025-03-08 | arXiv | https://github.com/Lucky-Lance/SmartBench | http://arxiv.org/abs/2503.06029v1 |
280 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | Simon A. Aytes, Jinheon Baek, Sung Ju Hwang | 2025-03-07 | arXiv | https://www.github.com/SimonAytes/SoT | http://arxiv.org/abs/2503.05179v1 |
281 | RocketEval: Efficient Automated LLM Evaluation via Grading Checklist | Tianjun Wei, Wei Wen, Ruizhi Qiao, Xing Sun, Jianghong Ma | 2025-03-07 | arXiv | https://github.com/Joinn99/RocketEval-ICLR | http://arxiv.org/abs/2503.05142v1 |
282 | A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Yu Zhang, Shutong Qiao, Jiaqi Zhang, Tzu-Heng Lin, Chen Gao, Yong Li | 2025-03-07 | arXiv | https://github.com/tsinghua-fib-lab/LLM-Agent-for-Recommendation-and-Search | https://doi.org/10.48550/arXiv.2503.05659 |
283 | Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching | Bowen Pang, Kai Li, Feifan Wang | 2025-03-07 | arXiv | https://github.com/KevinLee1110/dynamic-batching | http://arxiv.org/abs/2503.05248v1 |
284 | TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge | Cheng-Han Chiang, Hung-yi Lee, Michal Lukasik | 2025-03-06 | arXiv | https://github.com/d223302/TRACT | http://arxiv.org/abs/2503.04381v1 |
285 | Insights from Rights and Wrongs: A Large Language Model for Solving Assertion Failures in RTL Design | Jie Zhou, Youshu Ji, Ning Wang, Yuchen Hu, Xinyao Jiao, Bingkun Yao, Xinwei Fang, Shuai Zhao, Nan Guan, Zhe Jiang | 2025-03-06 | arXiv | https://github.com/SEU-ACAL/reproduce-AssertSolver-DAC-25 | https://doi.org/10.48550/arXiv.2503.04057 |
286 | Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model | Wenke Huang, Jian Liang, Xianda Guo, Yiyang Fang, Guancheng Wan, Xuankun Rong, Chi Wen, Zekun Shi, Qingyun Li, Didi Zhu, Yanbiao Ma, Ke Liang, Bin Yang, He Li, Jiawei Shao, Mang Ye, Bo Du | 2025-03-06 | arXiv | https://github.com/WenkeHuang/Awesome-MLLM-Tuning | https://doi.org/10.48550/arXiv.2503.04543 |
287 | Predictable Scale: Part I - Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Houyi Li, Wenzheng Zheng, Jingcheng Hu, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang | 2025-03-06 | arXiv | https://step-law.github.io/ | https://doi.org/10.48550/arXiv.2503.04715 |
288 | Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation | Armel Zebaze, Benoît Sagot, Rachel Bawden | 2025-03-06 | arXiv | https://github.com/ArmelRandy/compositional-translation | http://arxiv.org/abs/2503.04554v1 |
289 | DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation | Amin Karimi, Charalambos Poullis | 2025-03-06 | arXiv | https://github.com/aminpdik/DSV-LFS | http://arxiv.org/abs/2503.04006v1 |
290 | Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English | Runtao Zhou, Guangya Wan, Saadia Gabriel, Sheng Li, Alexander J Gates, Maarten Sap, Thomas Hartvigsen | 2025-03-06 | arXiv | https://github.com/Runtaozhou/dialect_bias_eval | http://arxiv.org/abs/2503.04099v1 |
291 | Lost in Literalism: How Supervised Training Shapes Translationese in LLMs | Yafu Li, Ronghao Zhang, Zhilin Wang, Huajian Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang | 2025-03-06 | arXiv | https://github.com/yafuly/LLM_Translationese | http://arxiv.org/abs/2503.04369v1 |
292 | AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks | Javier Yong, Haokai Ma, Yunshan Ma, Anis Yusof, Zhenkai Liang, Ee-Chien Chang | 2025-03-05 | arXiv | https://github.com/Javiery3889/AttackSeqBench | https://doi.org/10.48550/arXiv.2503.03170 |
293 | LeRAAT: LLM-Enabled Real-Time Aviation Advisory Tool | Marc R. Schlichting, Vale Rasmussen, Heba Alazzeh, Houjun Liu, Kiana Jafari, Amelia F. Hardy, Dylan M. Asmar, Mykel J. Kochenderfer | 2025-03-05 | arXiv | https://github.com/sisl/LeRAAT/ | http://arxiv.org/abs/2503.16477v1 |
294 | Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence | Cristian Jimenez-Romero, Alper Yegenoglu, Christian Blum | 2025-03-05 | arXiv | https://github.com/crjimene/swarm_gpt | https://doi.org/10.48550/arXiv.2503.03800 |
295 | Improving LLM Safety Alignment with Dual-Objective Optimization | Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song | 2025-03-05 | arXiv | https://github.com/wicai24/DOOR-Alignment | http://arxiv.org/abs/2503.03710v1 |
296 | LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models | Xi Zhu, Haochen Xue, Ziwei Zhao, Wujiang Xu, Jingyuan Huang, Minghao Guo, Qifan Wang, Kaixiong Zhou, Yongfeng Zhang | 2025-03-05 | arXiv | https://github.com/agiresearch/PromptGFM | http://arxiv.org/abs/2503.03313v1 |
297 | ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks | Heng Zhou, Hejia Geng, Xiangyuan Xue, Zhenfei Yin, Lei Bai | 2025-03-04 | arXiv | https://github.com/hengzzzhou/ReSo | http://arxiv.org/abs/2503.02390v2 |
298 | Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs | Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen | 2025-03-04 | arXiv | https://github.com/open-compass/ANAH | http://arxiv.org/abs/2503.02846v1 |
299 | Wikipedia in the Era of LLMs: Evolution and Risks | Siming Huang, Yuliang Xu, Mingmeng Geng, Yao Wan, Dongping Chen | 2025-03-04 | arXiv | https://github.com/HSM316/LLM_Wikipedia | http://arxiv.org/abs/2503.02879v1 |
300 | Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization | Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, Tat-Seng Chua | 2025-03-04 | arXiv | https://github.com/SnowCharmQ/DPL | http://arxiv.org/abs/2503.02450v1 |
301 | Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers | Zicong He, Boxuan Zhang, Lu Cheng | 2025-03-04 | arXiv | https://github.com/ZicongHe2002/HCL-Spark | http://arxiv.org/abs/2503.02851v1 |
302 | It Helps to Take a Second Opinion: Teaching Smaller LLMs to Deliberate Mutually via Selective Rationale Optimisation | Sohan Patnaik, Milan Aggarwal, Sumit Bhatia, Balaji Krishnamurthy | 2025-03-04 | arXiv | https://github.com/Sohanpatnaik106/coalition | http://arxiv.org/abs/2503.02463v1 |
303 | PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Xueliang Zhao, Wei Wu, Jian Guan, Lingpeng Kong | 2025-03-04 | arXiv | https://github.com/zhaoxlpku/PromptCoT | https://doi.org/10.48550/arXiv.2503.02324 |
304 | LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models | Pengwei Tang, Yong Liu, Dongjie Zhang, Xing Wu, Debing Zhang | 2025-03-04 | arXiv | https://github.com/HungerPWAY/LoRA-Null | https://doi.org/10.48550/arXiv.2503.02659 |
305 | Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions | Zirui Wu, Xiao Liu, Jiayi Li, Lingpeng Kong, Yansong Feng | 2025-03-04 | arXiv | https://github.com/WilliamZR/Recipe2Plan | http://arxiv.org/abs/2503.02238v1 |
306 | Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs | Wei-Yao Wang, Zhao Wang, Helen Suzuki, Yoshiyuki Kobayashi | 2025-03-04 | arXiv | https://github.com/sony/aki | http://arxiv.org/abs/2503.02597v1 |
307 | CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom | Yisen Li, Lingfeng Yang, Wenxuan Shen, Pan Zhou, Yao Wan, Weiwei Lin, Dongping Chen | 2025-03-03 | arXiv | https://github.com/listentm/crowdselect | http://arxiv.org/abs/2503.01836v1 |
308 | Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia | Chenxi Wang, Tianle Gu, Zhongyu Wei, Lang Gao, Zirui Song, Xiuying Chen | 2025-03-03 | arXiv | https://github.com/Aurora-cx/TypoLLM | http://arxiv.org/abs/2503.01714v1 |
309 | Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens | Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue | 2025-03-03 | arXiv | https://github.com/SparkAudio/Spark-TTS | http://arxiv.org/abs/2503.01710v1 |
310 | Liger: Linearizing Large Language Models to Gated Recurrent Structures | Disen Lan, Weigao Sun, Jiaxi Hu, Jusen Du, Yu Cheng | 2025-03-03 | arXiv | https://github.com/OpenSparseLLMs/Linearization | https://doi.org/10.48550/arXiv.2503.01496 |
311 | MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents | Kunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, Jiaxuan You | 2025-03-03 | arXiv | https://github.com/MultiagentBench/MARBLE | http://arxiv.org/abs/2503.01935v1 |
312 | Position: Don't use the CLT in LLM evals with fewer than a few hundred datapoints | Sam Bowyer, Laurence Aitchison, Desi R. Ivanova | 2025-03-03 | arXiv | https://github.com/sambowyer/bayes_evals | http://arxiv.org/abs/2503.01747v2 |
313 | Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models | Tianjie Ju, Yi Hua, Hao Fei, Zhenyu Shao, Yubin Zheng, Haodong Zhao, Mong-Li Lee, Wynne Hsu, Zhuosheng Zhang, Gongshen Liu | 2025-03-03 | arXiv | https://github.com/illusionhi/ProbingPrivacy | https://doi.org/10.48550/arXiv.2503.01208 |
314 | Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace | Jia-Chen Zhang, Yu-Jie Xiong, Chun-Ming Xia, Dong-Hai Zhu, Xi-He Qiu | 2025-03-03 | COLING | https://github.com/Godz-z/DCFT | https://aclanthology.org/2025.coling-main.265/ |
315 | OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization based on CFD | Yuxuan Chen, Long Zhang, Xu Zhu, Hua Zhou, Zhuyin Ren | 2025-03-03 | arXiv | https://github.com/Terry-cyx/MetaOpenFOAM | https://doi.org/10.48550/arXiv.2503.01273 |
316 | Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios | Bryan Chen Zhengyu Tan, Roy Ka-Wei Lee | 2025-03-03 | arXiv | https://inc0mple.github.io/Implicit_Bias_Interactive_Data_Viz | http://arxiv.org/abs/2503.01532v1 |
317 | MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages | Chen Zhang, Mingxu Tao, Zhiyuan Liao, Yansong Feng | 2025-03-03 | arXiv | https://github.com/luciusssss/MiLiC-Eval | http://arxiv.org/abs/2503.01150v1 |
318 | Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity | Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao | 2025-03-02 | arXiv | https://github.com/hypasd-art/ETAPP | http://arxiv.org/abs/2503.00771v1 |
319 | HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning | Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H. F. Ng, Qing Li | 2025-03-02 | arXiv | https://github.com/jzzzzh/HiBench | http://arxiv.org/abs/2503.00912v1 |
320 | LLMDR: LLM-Driven Deadlock Detection and Resolution in Multi-Agent Pathfinding | Seungbae Seo, Junghwan Kim, Minjeong Shin, Bongwon Suh | 2025-03-02 | arXiv | https://github.com/ssbacc/llmdr-dhc | http://arxiv.org/abs/2503.00717v1 |
321 | Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions | Shiyu Fang, Jiaqi Liu, Chengkai Xu, Chen Lv, Peng Hang, Jian Sun | 2025-03-01 | arXiv | https://github.com/FanGShiYuu/Actor-Reasoner | http://arxiv.org/abs/2503.00502v1 |
322 | U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack | Yunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen Wang | 2025-03-01 | arXiv | https://github.com/Tongji-KGLLM/U-NIAH | http://arxiv.org/abs/2503.00353v1 |
323 | LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Salman H. Khan, Fahad Shahbaz Khan | 2025-02-28 | arXiv | https://github.com/mbzuai-oryx/Awesome-LLM-Post-training | https://doi.org/10.48550/arXiv.2502.21321 |
324 | Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs | Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy, Abdelrahim A. Elmadany, Omer Nacar, El Moatez Billah Nagoudi, Reem Abdel-Salam, Hanin Atwany, Youssef Nafea, Abdulfattah Mohammed Yahya, Rahaf Alhamouri, Hamzah A. Alsayadi, Hiba Zayed, Sara Shatnawi, Serry Sibaee, Yasir Ech-Chammakhy, Walid Al-Dhabyani, Marwa Mohamed Ali, Imen Jarraya, Ahmed Oumar El-Shangiti, Aisha Alraeesi, Mohammed Anwar Al-Ghrawi, Abdulrahman S. Al-Batati, Elgizouli Mohamed, Noha Taha Elgindi, Muhammed Saeed, Houdaifa Atou, Issam Ait Yahia, Abdelhak Bouayad, Mohammed Machrouh, Amal Makouar, Dania Alkawi, Mukhtar Mohamed, Safaa Taher Abdelfadil, Amine Ziad Ounnoughene, Rouabhia Anfel, Rwaa Assi, Ahmed Sorkatti, Mohamedou Cheikh Tourad, Anis Koubaa, Ismail Berrada, Mustafa Jarrar, Shady Shehata, Muhammad Abdul-Mageed | 2025-02-28 | arXiv | https://github.com/UBC-NLP/palm | http://arxiv.org/abs/2503.00151v1 |
325 | UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation | Thanet Markchom, Tong Wu, Liting Huang, Huizhi Liang | 2025-02-28 | arXiv | https://github.com/tongwu17/SemEval-2025-Task1-UoR-NCL | http://arxiv.org/abs/2502.20984v2 |
326 | InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation | Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma | 2025-02-28 | arXiv | https://github.com/FunAudioLLM/InspireMusic | https://doi.org/10.48550/arXiv.2503.00084 |
327 | DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning | Pengcheng Jiang, Jiacheng Lin, Lang Cao, Runchu Tian, SeongKu Kang, Zifeng Wang, Jimeng Sun, Jiawei Han | 2025-02-28 | arXiv | https://github.com/pat-jj/DeepRetrieval | https://doi.org/10.48550/arXiv.2503.00223 |
328 | Self-Training Elicits Concise Reasoning in Large Language Models | Tergel Munkhbat, Namgyu Ho, Seo Hyun Kim, Yongjin Yang, Yujin Kim, Se-Young Yun | 2025-02-27 | arXiv | https://github.com/TergelMunkhbat/concise-reasoning | https://doi.org/10.48550/arXiv.2502.20122 |
329 | Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen | 2025-02-27 | arXiv | http://vulnerable-ai-agents.github.io | http://arxiv.org/abs/2502.20383v1 |
330 | SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks | Nikolay Blagoev, Lydia Yiyu Chen, Oğuzhan Ersoy | 2025-02-27 | arXiv | https://github.com/gensyn-ai/skippipe | http://arxiv.org/abs/2502.19913v1 |
331 | LongRoPE2: Near-Lossless LLM Context Window Scaling | Ning Shang, Li Lyna Zhang, Siyuan Wang, Gaokai Zhang, Gilsinia Lopez, Fan Yang, Weizhu Chen, Mao Yang | 2025-02-27 | arXiv | https://github.com/microsoft/LongRoPE | http://arxiv.org/abs/2502.20082v1 |
332 | ECCOS: Efficient Capability and Cost Coordinated Scheduling for Multi-LLM Serving | Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang | 2025-02-27 | arXiv | https://github.com/agiresearch/ECCOS | http://arxiv.org/abs/2502.20576v2 |
333 | Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents | Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel Kang | 2025-02-27 | arXiv | https://github.com/uiuc-kang-lab/AdaptiveAttackAgent | http://arxiv.org/abs/2503.00061v2 |
334 | A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs | Julius Broomfield, Kartik Sharma, Srijan Kumar | 2025-02-27 | arXiv | https://github.com/claws-lab/persona-modality | http://arxiv.org/abs/2502.20504v1 |
335 | SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model | Xinghao Wang, Feng Liu, Rui Su, Zhihui Wang, Lei Bai, Wanli Ouyang | 2025-02-27 | arXiv | https://github.com/StarMoonWang/SeisMoLLM | https://doi.org/10.48550/arXiv.2502.19960 |
336 | Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models | Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao | 2025-02-27 | arXiv | https://github.com/MaybeLizzy/UGBench | https://doi.org/10.48550/arXiv.2502.19982 |
337 | Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Haochen Sun, Shuwen Zhang, Lei Ren, Hao Xu, Hao Fu, Caixia Yuan, Xiaojie Wang | 2025-02-27 | arXiv | https://github.com/YusaeMeow/Collab-Overcooked | https://doi.org/10.48550/arXiv.2502.20073 |
338 | Beneath the Surface: How Large Language Models Reflect Hidden Bias | Jinhao Pan, Chahat Raj, Ziyu Yao, Ziwei Zhu | 2025-02-27 | arXiv | https://github.com/JP-25/Hidden-Bias-Benchmark | https://doi.org/10.48550/arXiv.2502.19749 |
339 | Foot-In-The-Door: A Multi-turn Jailbreak for LLMs | Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang | 2025-02-27 | arXiv | https://github.com/Jinxiaolong1129/Foot-in-the-door-Jailbreak | http://arxiv.org/abs/2502.19820v2 |
340 | Smart Routing: Cost-Effective Multi-LLM Serving for Multi-Core AIOS | Kai Mei, Wujiang Xu, Shuhang Lin, Yongfeng Zhang | 2025-02-27 | arXiv | https://github.com/agiresearch/ECCOS | http://arxiv.org/abs/2502.20576v4 |
341 | Protecting multimodal large language models against misleading visualizations | Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych | 2025-02-27 | arXiv | https://github.com/UKPLab/arxiv2025-misleading-visualizations | https://doi.org/10.48550/arXiv.2502.20503 |
342 | AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms | Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, Yong Li | 2025-02-26 | arXiv | https://tsinghua-fib-lab.github.io/AgentSocietyChallenge | http://arxiv.org/abs/2502.18754v1 |
343 | TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation | Chenlu Ju, Jiaxin Liu, Shobhit Sinha, Hao Xue, Flora Salim | 2025-02-26 | arXiv | https://github.com/cju0/TrajLLM | http://arxiv.org/abs/2502.18712v1 |
344 | Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs | Yiheng Yang, Yujie Wang, Chi Ma, Lei Yu, Emmanuele Chersoni, Chu-Ren Huang | 2025-02-26 | arXiv | https://github.com/Oldify/CLADA | http://arxiv.org/abs/2502.19078v1 |
345 | Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs | Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian McAuley | 2025-02-26 | arXiv | https://github.com/dayuyang1999/Awesome-Code-Reasoning | http://arxiv.org/abs/2502.19411v1 |
346 | Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs | Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang | 2025-02-26 | arXiv | https://zowiezhang.github.io/projects/Amulet | http://arxiv.org/abs/2502.19148v1 |
347 | Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation | Yuxiang Wang, Xinnan Dai, Wenqi Fan, Yao Ma | 2025-02-26 | arXiv | https://github.com/myflashbarry/LLM-benchmarking | http://arxiv.org/abs/2502.18771v1 |
348 | OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models | Hui Feng, Yuntzu Yin, Emiliano Reynares, Jay Nanavati | 2025-02-26 | arXiv | https://github.com/iqvianlp/ontologyRAG | https://doi.org/10.48550/arXiv.2502.18992 |
349 | A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs | Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Angelica I Aviles-Rivero, Chuanlong Xie, Yao Zhu | 2025-02-26 | arXiv | https://github.com/920927/SLM-a-sliding-layer-merging-method | http://arxiv.org/abs/2502.19159v3 |
350 | JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models | Shuyi Liu, Simiao Cui, Haoran Bu, Yuming Shang, Xi Zhang | 2025-02-26 | arXiv | https://github.com/STAIR-BUPT/JailBench | https://doi.org/10.48550/arXiv.2502.18935 |
351 | ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | Danae Sánchez Villegas, Ingo Ziegler, Desmond Elliott | 2025-02-26 | arXiv | https://github.com/danaesavi/ImageChain | https://doi.org/10.48550/arXiv.2502.19409 |
352 | Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models | Shuliang Liu, Xinze Li, Zhenghao Liu, Yukun Yan, Cheng Yang, Zheni Zeng, Zhiyuan Liu, Maosong Sun, Ge Yu | 2025-02-26 | arXiv | https://github.com/OpenBMB/ConsJudge | https://doi.org/10.48550/arXiv.2502.18817 |
353 | Detection of LLM-Paraphrased Code and Identification of the Responsible LLM Using Coding Style Features | Shinwoo Park, Hyundong Jin, Jeong-won Cha, Yo-Sub Han | 2025-02-25 | arXiv | https://github.com/Shinwoo-Park/detecting_llm_paraphrased_code_via_coding_style_features | http://arxiv.org/abs/2502.17749v2 |
354 | Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers | Hannah Calzi Kleidermacher, James Zou | 2025-02-25 | arXiv | https://hankleid.github.io/ProjectMundo | http://arxiv.org/abs/2502.17882v1 |
355 | RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction | Jianhao Yan, Yun Luo, Yue Zhang | 2025-02-25 | arXiv | https://github.com/ElliottYan/RefuteBench-2.0 | http://arxiv.org/abs/2502.18308v1 |
356 | Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs | Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst | 2025-02-25 | arXiv | https://github.com/gayecolakoglu/LayIE-LLM | http://arxiv.org/abs/2502.18179v1 |
357 | LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena | Tianmi Ma, Jiawei Du, Wenxin Huang, Wenjie Wang, Liang Xie, Xian Zhong, Joey Tianyi Zhou | 2025-02-25 | arXiv | https://github.com/wekjsdvnm/Agent-Trading-Arena | http://arxiv.org/abs/2502.17967v1 |
358 | Detecting LLM-Generated Korean Text through Linguistic Feature Analysis | Shinwoo Park, Shubin Kim, Do-Kyung Kim, Yo-Sub Han | 2025-02-25 | arXiv | https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis | http://arxiv.org/abs/2503.00032v2 |
359 | Can Multimodal LLMs Perform Time Series Anomaly Detection? | Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S. Yu, Yue Zhao, Kai Shu | 2025-02-25 | arXiv | https://mllm-ts.github.io | http://arxiv.org/abs/2502.17812v1 |
360 | Scalable Best-of-N Selection for Large Language Models via Self-Certainty | Zhewei Kang, Xuandong Zhao, Dawn Song | 2025-02-25 | arXiv | https://github.com/backprop07/Self-Certainty | https://doi.org/10.48550/arXiv.2502.18581 |
361 | LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation | Pengzhi Li, Pengfei Yu, Zide Liu, Wei He, Xuhao Pan, Xudong Rao, Tao Wei, Wei Chen | 2025-02-25 | arXiv | https://zrealli.github.io/LDGen | https://doi.org/10.48550/arXiv.2502.18302 |
362 | Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference | Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang, Kewei Tu | 2025-02-25 | arXiv | https://github.com/Chord-Chen-30/VLLM-KnowledgeBoundary | https://doi.org/10.48550/arXiv.2502.18023 |
363 | Harnessing Multiple Large Language Models: A Survey on LLM Ensemble | Zhijun Chen, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Dingqi Yang, Hailong Sun, Philip S. Yu | 2025-02-25 | arXiv | https://github.com/junchenzhi/Awesome-LLM-Ensemble | https://doi.org/10.48550/arXiv.2502.18036 |
364 | Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data | Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang | 2025-02-25 | arXiv | https://github.com/Optimization-AI/DFT | https://doi.org/10.48550/arXiv.2502.18679 |
365 | Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs | Himanshu Beniwal, Sailesh Panda, Mayank Singh | 2025-02-24 | arXiv | https://github.com/himanshubeniwal/X-BAT | http://arxiv.org/abs/2502.16901v1 |
366 | MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski | 2025-02-24 | arXiv | https://github.com/saccharomycetes/mllms_know | http://arxiv.org/abs/2502.17422v1 |
367 | From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs | Ruxiao Chen, Chenguang Wang, Yuran Sun, Xilei Zhao, Susu Xu | 2025-02-24 | arXiv | https://github.com/SusuXu-s-Lab/FLARE | http://arxiv.org/abs/2502.17701v1 |
368 | Delta Decompression for MoE-based LLMs Compression | Hao Gu, Wei Li, Lujun Li, Qiyuan Zhu, Mark Lee, Shengjie Sun, Wei Xue, Yike Guo | 2025-02-24 | arXiv | https://github.com/lliai/D2MoE | http://arxiv.org/abs/2502.17298v1 |
369 | ConvoyLLM: Dynamic Multi-Lane Convoy Control Using LLMs | Liping Lu, Zhican He, Duanfeng Chu, Rukang Wang, Saiqian Peng, Pan Zhou | 2025-02-24 | arXiv | https://github.com/chuduanfeng/ConvoyLLM | http://arxiv.org/abs/2502.17529v2 |
370 | CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought | Boxuan Zhang, Ruqi Zhang | 2025-02-24 | arXiv | https://github.com/ZBox1005/CoT-UQ | http://arxiv.org/abs/2502.17214v1 |
371 | Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing | Yi-Kai Zhang, De-Chuan Zhan, Han-Jia Ye | 2025-02-24 | arXiv | https://cit-llm-routing.github.io | http://arxiv.org/abs/2502.17282v1 |
372 | COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs | Liming Liu, Zhenghao Xu, Zixuan Zhang, Hao Kang, Zichong Li, Chen Liang, Weizhu Chen, Tuo Zhao | 2025-02-24 | arXiv | https://github.com/lliu606/COSMOS | http://arxiv.org/abs/2502.17410v2 |
373 | On Relation-Specific Neurons in Large Language Models | Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schütze | 2025-02-24 | arXiv | https://github.com/cisnlp/relation-specific-neurons | https://doi.org/10.48550/arXiv.2502.17355 |
374 | LongSafety: Evaluating Long-Context Safety of Large Language Models | Yida Lu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Cunxiang Wang, Xiaotao Gu, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang | 2025-02-24 | arXiv | https://github.com/thu-coai/LongSafety | https://doi.org/10.48550/arXiv.2502.16971 |
375 | LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences | Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu | 2025-02-24 | arXiv | https://github.com/NEUIR/LLM-QE | https://doi.org/10.48550/arXiv.2502.17057 |
376 | Introducing Visual Perception Token into Multimodal Large Language Model | Runpeng Yu, Xinyin Ma, Xinchao Wang | 2025-02-24 | arXiv | https://github.com/yu-rp/VisualPerceptionToken | https://doi.org/10.48550/arXiv.2502.17425 |
377 | LogitLens4LLMs: Extending Logit Lens Analysis to Modern Large Language Models | Zhenyu Wang | 2025-02-24 | arXiv | https://github.com/zhenyu-02/LogitLens4LLMs | https://doi.org/10.48550/arXiv.2503.11667 |
378 | From System 1 to System 2: A Survey of Reasoning Large Language Models | Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhiwei Li, Bao-Long Bi, Ling-Rui Mei, Junfeng Fang, Zhijiang Guo, Le Song, Cheng-Lin Liu | 2025-02-24 | arXiv | https://github.com/zzli2022/Awesome-Slow-Reason-System | https://doi.org/10.48550/arXiv.2502.17419 |
379 | VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Jen-tse Huang, Dasen Dai, Jen-Yuan Huang, Youliang Yuan, Xiaoyuan Liu, Wenxuan Wang, Wenxiang Jiao, Pinjia He, Zhaopeng Tu | 2025-02-23 | arXiv | https://github.com/CUHK-ARISE/VisFactor | https://doi.org/10.48550/arXiv.2502.16435 |
380 | BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning | Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng | 2025-02-23 | arXiv | https://github.com/zhao-ht/BioMaze | https://doi.org/10.48550/arXiv.2502.16660 |
381 | CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale | Chenlong Wang, Zhaoyang Chu, Zhengxiang Cheng, Xuyi Yang, Kaiyue Qiu, Yao Wan, Zhou Zhao, Xuanhua Shi, Dongping Chen | 2025-02-23 | arXiv | https://github.com/Lucky-voyage/Code-Sync | https://doi.org/10.48550/arXiv.2502.16645 |
382 | CER: Confidence Enhanced Reasoning in LLMs | Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah | 2025-02-22 | arXiv …, 2025 | https://github.com/ | http://arxiv.org/abs/2502.14634v1 |
383 | Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations | Chunyang Li, Weiqi Wang, Tianshi Zheng, Yangqiu Song | 2025-02-22 | arXiv | https://github.com/lcy2723/Robust-Rule-Induction | http://arxiv.org/abs/2502.16169v1 |
384 | Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens | Ziwei Shan, Yaoyu He, Chengfeng Zhao, Jiashen Du, Jingyan Zhang, Qixuan Zhang, Jingyi Yu, Lan Xu | 2025-02-22 | arXiv | https://koyui.github.io/mojito/ | http://arxiv.org/abs/2502.16175v1 |
385 | OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models | Wenwen Yu, Zhibo Yang, Jianqiang Wan, Sibo Song, Jun Tang, Wenqing Cheng, Yuliang Liu, Xiang Bai | 2025-02-22 | arXiv | https://github.com/AlibabaResearch/AdvancedLiterateMachinery | https://doi.org/10.48550/arXiv.2502.16161 |
386 | Dynamic Low-Rank Sparse Adaptation for Large Language Models | Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji | 2025-02-22 | arXiv | https://github.com/wzhuang-xmu/LoSA | https://doi.org/10.48550/arXiv.2502.14816 |
387 | Plan-over-Graph: Towards Parallelable LLM Agent Schedule | Shiqi Zhang, Xinbei Ma, Zouying Cao, Zhuosheng Zhang, Hai Zhao | 2025-02-21 | arXiv:2502.14563, 2025 | https://github.com/zsq259/Plan-over-Graph | http://arxiv.org/abs/2502.14563v1 |
388 | FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs | Madhurima Chakraborty, Peter Pirkelbauer, Qing Yi | 2025-02-21 | arXiv | https://github.com/MadhuNimmo/FormalSpecCpp | http://arxiv.org/abs/2502.15217v1 |
389 | Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems | Tianjie Ju, Bowen Wang, Hao Fei, Mong-Li Lee, Wynne Hsu, Yun Li, Qianren Wang, Pengzhou Cheng, Zongru Wu, Zhuosheng Zhang, Gongshen Liu | 2025-02-21 | arXiv | https://github.com/wbw625/MultiAgentRobustness | http://arxiv.org/abs/2502.15153v1 |
390 | Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | Danni Liu, Jan Niehues | 2025-02-21 | arXiv:2502.14830, 2025 | https://github.com/dannigt/mid-align | http://arxiv.org/abs/2502.14830v1 |
391 | A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation | Shilong Hou, Ruilin Shang, Zi Long, Xianghua Fu, Yin Chen | 2025-02-21 | arXiv | https://github.com/Mebymeby/Pseudonymization-Framework | http://arxiv.org/abs/2502.15233v1 |
392 | PredictaBoard: Benchmarking LLM Score Predictability | Lorenzo Pacchiardi, Konstantinos Voudouris, Ben Slater, Fernando Martínez-Plumed, José Hernández-Orallo, Lexin Zhou, Wout Schellaert | 2025-02-21 | arXiv …, 2025 | https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard | http://arxiv.org/abs/2502.14445v1 |
393 | Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing | Qi Le, Enmao Diao, Ziyan Wang, Xinran Wang, Jie Ding, Li Yang, Ali Anwar | 2025-02-21 | arXiv | https://github.com/Qi-Le1/Probe_Pruning | http://arxiv.org/abs/2502.15618v1 |
394 | STeCa: Step-level Trajectory Calibration for LLM Agent Learning | Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li | 2025-02-21 | arXiv:2502.14276, 2025 | https://github.com/WangHanLinHenry/STeCa | http://arxiv.org/abs/2502.14276v1 |
395 | Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Giulio Zizzo, Giandomenico Cornacchia, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Beat Buesser, Mark Purcell, Pin-Yu Chen, Prasanna Sattigeri, Kush Varshney | 2025-02-21 | arXiv | https://github.com/IBM/Adversarial-Prompt-Evaluation | http://arxiv.org/abs/2502.15427v1 |
396 | Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models | Ya Wang, Zhijian Zhuo, Yutao Zeng, Xun Zhou, Jian Yang, Xiaoqing Li | 2025-02-21 | arXiv | https://github.com/kaihemo/SDD | https://doi.org/10.48550/arXiv.2502.15499 |
397 | Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization | Yupeng Chang, Yi Chang, Yuan Wu | 2025-02-21 | arXiv | https://github.com/llm172/Transfer-Prompting | https://doi.org/10.48550/arXiv.2502.14211 |
398 | On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems | Shokhrukh Ibragimov, Arnulf Jentzen, Benno Kuckuck | 2025-02-21 | arXiv | https://github.com/bkuckuck/logical-skills-of-llms | https://doi.org/10.48550/arXiv.2502.14180 |
399 | MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding | 2025-02-21 | arXiv | https://medhallu.github.io/ | https://doi.org/10.48550/arXiv.2502.14302 |
400 | From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su | 2025-02-21 | arXiv | https://github.com/OSU-NLP-Group/HippoRAG | https://doi.org/10.48550/arXiv.2502.14802 |
401 | CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo | 2025-02-21 | arXiv | https://github.com/zhrli324/Corba | https://doi.org/10.48550/arXiv.2502.14529 |
402 | Protein Large Language Models: A Comprehensive Survey | Yijia Xiao, Wanjia Zhao, Junkai Zhang, Yiqiao Jin, Han Zhang, Zhicheng Ren, Renliang Sun, Haixin Wang, Guancheng Wan, Pan Lu, Xiao Luo, Yu Zhang, James Zou, Yizhou Sun, Wei Wang | 2025-02-21 | arXiv | https://github.com/Yijia-Xiao/Protein-LLM-Survey | https://doi.org/10.48550/arXiv.2502.17504 |
403 | Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, Ritambhara Singh | 2025-02-21 | arXiv | https://github.com/rsinghlab/Shape-Blind | https://doi.org/10.48550/arXiv.2502.15969 |
404 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han | 2025-02-21 | arXiv …, 2025 | https://github.com/mit-han-lab/omniserve | http://arxiv.org/abs/2502.14866v1 |
405 | Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models | Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Md. Mehrab Tanjim, Kibum Kim, Chanyoung Park | 2025-02-20 | arXiv | https://github.com/yeonjun-in/U-SafeBench | https://doi.org/10.48550/arXiv.2502.15086 |
406 | InductionBench: LLMs Fail in the Simplest Complexity Class | Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang | 2025-02-20 | arXiv | https://github.com/Wenyueh/inductive_reasoning_benchmark | http://arxiv.org/abs/2502.15823v3 |
407 | An LLM-based Agent for Reliable Docker Environment Configuration | Ruida Hu, Chao Peng, Xinchen Wang, Cuiyun Gao | 2025-02-19 | arXiv | https://github.com/bytedance/Repo2Run | http://arxiv.org/abs/2502.13681v1 |
408 | SIFT: Grounding LLM Reasoning in Contexts via Stickers | Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng | 2025-02-19 | arXiv | https://github.com/zhijie-group/SIFT | http://arxiv.org/abs/2502.14922v1 |
409 | Judging the Judges: A Collection of LLM-Generated Relevance Judgements | Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz | 2025-02-19 | arXiv | https://llm4eval.github.io/LLMJudge-benchmark/ | http://arxiv.org/abs/2502.13908v1 |
410 | DataSciBench: An LLM Agent Benchmark for Data Science | Dan Zhang, Sining Zhoubian, Min Cai, Fengzu Li, Lekang Yang, Wei Wang, Tianjiao Dong, Ziniu Hu, Jie Tang, Yisong Yue | 2025-02-19 | arXiv | https://github.com/THUDM/DataSciBench | http://arxiv.org/abs/2502.13897v1 |
411 | Benchmarking LLMs for Political Science: A United Nations Perspective | Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu | 2025-02-19 | arXiv | https://github.com/yueqingliang1/UNBench | http://arxiv.org/abs/2502.14122v1 |
412 | Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Zenan Li, Zhaoyu Li, Wen Tang, Xian Zhang, Yuan Yao, Xujie Si, Fan Yang, Kaiyu Yang, Xiaoxing Ma | 2025-02-19 | arXiv | https://github.com/Lizn-zn/NeqLIPS/ | http://arxiv.org/abs/2502.13834v1 |
413 | Craw4LLM: Efficient Web Crawling for LLM Pretraining | Shi Yu, Zhiyuan Liu, Chenyan Xiong | 2025-02-19 | arXiv | https://github.com/cxcscmu/Crawl4LLM | http://arxiv.org/abs/2502.13347v1 |
414 |
|
Vishal Dey, Xiao Hu, Xia Ning | 2025-02-19 | arXiv | https://github.com/ninglab/GeLLMO | http://arxiv.org/abs/2502.13398v1 |
415 | PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models | Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao Wei | 2025-02-19 | arXiv | https://github.com/ligw1998/PRIV-QA | https://doi.org/10.48550/arXiv.2502.13564 |
416 | AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models | Yuanyuan Xu, Hanchen Wang, Wenjie Zhang, Lexing Xie, Yin Chen, Flora Salim, Ying Zhang, Justin Gooding, Toby Walsh | 2025-02-19 | arXiv | https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery | https://doi.org/10.48550/arXiv.2502.13626 |
417 | Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models | Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming Xie, Xuejian Gong, Kunlong Zhou | 2025-02-19 | arXiv | https://github.com/junzhang-zj/LoRAM | https://doi.org/10.48550/arXiv.2502.13533 |
418 | REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models | DongGeon Lee, Hwanjo Yu | 2025-02-19 | arXiv | https://github.com/oneonlee/REFIND | https://doi.org/10.48550/arXiv.2502.13622 |
419 | Lost in Sequence: Do Large Language Models Understand Sequential Recommendation? | Sein Kim, Hongseok Kang, Kibum Kim, Jiwan Kim, Donghyun Kim, Minchul Yang, Kwangjin Oh, Julian McAuley, Chanyoung Park | 2025-02-19 | arXiv | https://github.com/Sein-Kim/LLM-SRec | https://doi.org/10.48550/arXiv.2502.13909 |
420 | Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems | Yaochen Zhu, Chao Wan, Harald Steck, Dawen Liang, Yesu Feng, Nathan Kallus, Jundong Li | 2025-02-19 | arXiv | https://github.com/yaochenzhu/CRAG | https://doi.org/10.48550/arXiv.2502.14137 |
421 | ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities | Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, Liteng Gao | 2025-02-19 | arXiv | https://artmentor.github.io/ | https://doi.org/10.48550/arXiv.2502.13832 |
422 | LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization | Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing | 2025-02-19 | arXiv | https://github.com/DAMO-NLP-SG/LongPO | https://doi.org/10.48550/arXiv.2502.13922 |
423 | Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, Ping Luo | 2025-02-18 | arXiv | https://text-to-world.github.io/ | https://doi.org/10.48550/arXiv.2502.13092 |
424 | Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs | Adi Simhi, Itay Itzhak, Fazl Barez, Gabriel Stanovsky, Yonatan Belinkov | 2025-02-18 | arXiv | https://github.com/technion-cs-nlp/Trust_me_Im_wrong | http://arxiv.org/abs/2502.12964v1 |
425 | SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs | Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah | 2025-02-18 | arXiv | https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SparAMX | http://arxiv.org/abs/2502.12444v1 |
426 | Soundwave: Less is More for Speech-Text Alignment in LLMs | Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li | 2025-02-18 | arXiv | https://github.com/FreedomIntelligence/Soundwave | http://arxiv.org/abs/2502.12900v1 |
427 | MoBA: Mixture of Block Attention for Long-Context LLMs | Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu | 2025-02-18 | arXiv | https://github.com/MoonshotAI/MoBA | http://arxiv.org/abs/2502.13189v1 |
428 | PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models | Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, Yaowei Wang, Min Zhang | 2025-02-18 | arXiv | https://github.com/zjq0455/PTQ1.61 | https://doi.org/10.48550/arXiv.2502.13179 |
429 | SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng | 2025-02-18 | arXiv | https://github.com/ZeroNLP/SEA | https://doi.org/10.48550/arXiv.2502.12562 |
430 | Investigating and Extending Homans' Social Exchange Theory with Large Language Model based Agents | Lei Wang, Zheqing Zhang, Xu Chen | 2025-02-18 | arXiv | https://github.com/Paitesanshi/SET | https://doi.org/10.48550/arXiv.2502.12450 |
431 | Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis | Jiaqi Zhao, Ming Wang, Miao Zhang, Yuzhang Shang, Xuebo Liu, Yaowei Wang, Min Zhang, Liqiang Nie | 2025-02-18 | arXiv | https://github.com/zjq0455/PTQ_Benchmark | http://arxiv.org/abs/2502.13178v1 |
432 | G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation | Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, Jia Li | 2025-02-18 | arXiv | https://github.com/Yuhan1i/G-Refer | https://doi.org/10.48550/arXiv.2502.12586 |
433 | EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning | Xiaoqian Liu, Ke Wang, Yongbin Li, Yuchuan Wu, Wentao Ma, Aobo Kong, Fei Huang, Jianbin Jiao, Junge Zhang | 2025-02-18 | arXiv | https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/EPO | http://arxiv.org/abs/2502.12486v1 |
434 | VRoPE: Rotary Position Embedding for Video Large Language Models | Zikang Liu, Longteng Guo, Yepeng Tang, Junxian Cai, Kai Ma, Xi Chen, Jing Liu | 2025-02-17 | arXiv | https://github.com/johncaged/VRoPE | https://doi.org/10.48550/arXiv.2502.11664 |
435 | Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning | Yuqi Pang, Bowen Yang, Haoqin Tu, Yun Cao, Zeyu Zhang | 2025-02-17 | arXiv | https://github.com/Pbhgit/MVCD | http://arxiv.org/abs/2502.11751v1 |
436 | Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, Xueyang Liu | 2025-02-17 | arXiv | https://github.com/wanghanbinpanda/CodeVision | http://arxiv.org/abs/2502.11829v1 |
437 | Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? | Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu | 2025-02-17 | arXiv | https://github.com/THU-BPM/Watermark-Radioactivity-Attack | http://arxiv.org/abs/2502.11598v1 |
438 | Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei | 2025-02-17 | arXiv | https://github.com/microsoft/BitNet/tree/paper | http://arxiv.org/abs/2502.11880v1 |
439 | "Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu | 2025-02-17 | arXiv | https://github.com/pillowsofwind/LLM-CBRN-Risks | http://arxiv.org/abs/2502.11355v1 |
440 | A Survey of Personalized Large Language Models: Progress and Future Directions | Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Jieming Zhu, Minda Hu, Menglin Yang, Irwin King | 2025-02-17 | arXiv | https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models | https://doi.org/10.48550/arXiv.2502.11528 |
441 | RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars | Yuncheng Hua, Lizhen Qu, Zhuang Li, Hao Xue, Flora D. Salim, Gholamreza Haffari | 2025-02-17 | arXiv | https://github.com/AnonymousCode-ComputerScience/RIDE | https://doi.org/10.48550/arXiv.2502.11681 |
442 | Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu | 2025-02-17 | arXiv | https://llm-catastrophic-risks.github.io | http://arxiv.org/abs/2502.11355v3 |
443 | Atom of Thoughts for Markov LLM Test-Time Scaling | Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo | 2025-02-17 | arXiv | https://github.com/qixucen/atom | http://arxiv.org/abs/2502.12018v1 |
444 | Idiosyncrasies in Large Language Models | Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, Zhuang Liu | 2025-02-17 | arXiv | https://eric-mingjie.github.io/llm-idiosyncrasies/index.html | https://doi.org/10.48550/arXiv.2502.12150 |
445 | A-MEM: Agentic Memory for LLM Agents | Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, Yongfeng Zhang | 2025-02-17 | arXiv | https://github.com/WujiangXu/AgenticMemory | http://arxiv.org/abs/2502.12110v5 |
446 | LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning | Tianshi Zheng, Jiayang Cheng, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See | 2025-02-16 | arXiv | https://github.com/HKUST-KnowComp/LogiDynamics | https://doi.org/10.48550/arXiv.2502.11176 |
447 | SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors | Bohan Lyu, Siqiao Huang, Zichen Liang, Qi-An Sun, Jiaming Zhang | 2025-02-16 | arXiv | https://github.com/Imbernoulli/SURGE | https://doi.org/10.48550/arXiv.2502.11167 |
448 | BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack | Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu | 2025-02-16 | arXiv | https://github.com/zihao-ai/BoT | https://doi.org/10.48550/arXiv.2502.12202 |
449 | CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships? | Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee | 2025-02-16 | arXiv | https://github.com/aashish2000/CORDIAL | https://doi.org/10.48550/arXiv.2502.11300 |
450 | Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models | Haoyang Li, Xuejia Chen, Zhanchao Xu, Darian Li, Nicole Hu, Fei Teng, Yiming Li, Luyu Qiu, Chen Jason Zhang, Qing Li, Lei Chen | 2025-02-16 | arXiv | https://github.com/TreeAI-Lab/NumericBench | https://doi.org/10.48550/arXiv.2502.11075 |
451 | ReLearn: Unlearning via Learning for Large Language Models | Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang | 2025-02-16 | arXiv | https://github.com/zjunlp/unlearn | https://doi.org/10.48550/arXiv.2502.11190 |
452 | Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models | Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, Dacheng Tao | 2025-02-16 | arXiv | https://github.com/NY1024/RACE | https://doi.org/10.48550/arXiv.2502.11054 |
453 | G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems | Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang | 2025-02-16 | arXiv | https://github.com/wslong20/G-safeguard | http://arxiv.org/abs/2502.11127v1 |
454 | How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training | Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen | 2025-02-16 | arXiv | https://github.com/zjunlp/DynamicKnowledgeCircuits | http://arxiv.org/abs/2502.11196v1 |
455 | MasRouter: Learning to Route LLMs for Multi-Agent Systems | Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, Yiyan Qi | 2025-02-16 | arXiv | https://github.com/yanweiyue/masrouter | http://arxiv.org/abs/2502.11133v1 |
456 | Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation | Yu Cui, Hang Fu, Licheng Wang, Haibin Zhang | 2025-02-16 | arXiv | https://github.com/LMPC-Lab/GenGPUCrypto | http://arxiv.org/abs/2502.11110v1 |
457 | Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Ante Wang, Linfeng Song, Ye Tian, Dian Yu, Haitao Mi, Xiangyu Duan, Zhaopeng Tu, Jinsong Su, Dong Yu | 2025-02-16 | arXiv | https://github.com/Soistesimmer/Fetch | http://arxiv.org/abs/2502.11183v1 |
458 | Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey | Zirui Song, Bin Yan, Yuhan Liu, Miao Fang, Mingzhe Li, Rui Yan, Xiuying Chen | 2025-02-15 | arXiv | https://github.com/abilliyb/Knowledge_Injection_Survey_Papers | https://doi.org/10.48550/arXiv.2502.10708 |
459 | SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models | Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat | 2025-02-15 | arXiv | https://github.com/IntelLabs/RAG-FiT/tree/square | https://doi.org/10.48550/arXiv.2502.09390 |
460 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin | 2025-02-15 | arXiv …, 2025 | https://prefeval.github.io/ | http://arxiv.org/abs/2502.09597v1 |
461 | EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang | 2025-02-15 | arXiv | https://embodiedbench.github.io | https://doi.org/10.48550/arXiv.2502.09560 |
462 | An Empirical Analysis of Uncertainty in Large Language Model Evaluations | Qiujie Xie, Qingqiu Li, Zhuohao Yu, Yuejie Zhang, Yue Zhang, Linyi Yang | 2025-02-15 | arXiv | https://github.com/hasakiXie123/LLM-Evaluator-Uncertainty | https://doi.org/10.48550/arXiv.2502.10709 |
463 | LintLLM: An Open-Source Verilog Linting Framework Based on Large Language Models | Zhigang Fang, Renzhi Chen, Zhijie Yang, Yang Guo, Huadong Dai, Lei Wang | 2025-02-15 | arXiv | https://github.com/fangzhigang32/Static-Verilog-Analysis | https://doi.org/10.48550/arXiv.2502.10815 |
464 | CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs | Insu Han, Zeliang Zhang, Zhiyuan Wang, Yifan Zhu, Susan Liang, Jiani Liu, Haiting Lin, Mingjie Zhao, Chenliang Xu, Kun Wan, Wentian Zhao | 2025-02-15 | arXiv | https://github.com/insuhan/calibquant | http://arxiv.org/abs/2502.14882v2 |
465 | KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models | Dong Chen, Zhengqing Hu, Peiguang Fan, Yueting Zhuang, Yafei Li, Qidong Liu, Xiaoheng Jiang, Mingliang Xu | 2025-02-14 | arXiv | https://github.com/Anfeather/KKA | https://doi.org/10.48550/arXiv.2502.14880 |
466 | Large Language Diffusion Models | Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li | 2025-02-14 | arXiv | https://ml-gsai.github.io/LLaDA-demo/ | https://doi.org/10.48550/arXiv.2502.09992 |
467 | LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing | Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Shuai Wang, Minhao Cheng | 2025-02-14 | arXiv | https://github.com/likuanppd/LaRA | http://arxiv.org/abs/2502.09977v1 |
468 | MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | Yi-Fan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen, Fan Yang, Zhang Zhang, Tingting Gao, Di Zhang, Liang Wang, Rong Jin, Tieniu Tan | 2025-02-14 | arXiv | https://mm-rlhf.github.io/ | http://arxiv.org/abs/2502.10391v1 |
469 | V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models | Hsu-Kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen | 2025-02-14 | arXiv | https://eddyhkchiu.github.io/v2vllm.github.io/ | https://doi.org/10.48550/arXiv.2502.09980 |
470 | The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis | Wenbo Pan, Zhichao Liu, Qiguang Chen, Xiangyang Zhou, Haining Yu, Xiaohua Jia | 2025-02-13 | arXiv | https://github.com/BMPixel/safety-residual-space | http://arxiv.org/abs/2502.09674v1 |
471 | FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents | Mostapha Benhenda | 2025-02-13 | arXiv:2502.07393, 2025 | https://github.com/benstaf/FinRL_DeepSeek | http://arxiv.org/abs/2502.07393v1 |
472 | Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning | Jiayuan Zhu, Junde Wu | 2025-02-13 | arXiv:2502.07143, 2025 | https://github.com/SuperMedIntel/AskPatients | http://arxiv.org/abs/2502.07143v1 |
473 | LLM-Generated Microservice Implementations from RESTful API Definitions | Saurabh Chauhan, Zeeshan Rasheed, Abdul Malik Sami, Zheying Zhang, Jussi Rasku, Kai-Kristian Kemell, Pekka Abrahamsson | 2025-02-13 | arXiv | https://github.com/sirbh/code-gen | http://arxiv.org/abs/2502.09766v1 |
474 | Bag of Tricks for Inference-time Computation of LLM Reasoning | Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu | 2025-02-13 | arXiv:2502.07191, 2025 | https://github.com/usail-hkust/benchmark_inference_time_computation_LL | http://arxiv.org/abs/2502.07191v2 |
475 | LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation | Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, Weipeng Chen | 2025-02-13 | arXiv | https://github.com/RUCAIBox/LongReD | https://doi.org/10.48550/arXiv.2502.07365 |
476 | LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica | 2025-02-13 | arXiv …, 2025 | https://github.com/NovaSky-AI/SkyThought | http://arxiv.org/abs/2502.07374v2 |
477 | DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization | Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian T. Foster, Rick Stevens | 2025-02-13 | arXiv | https://github.com/xuefeng-cs/DrugImproverGPT | https://doi.org/10.48550/arXiv.2502.07237 |
478 | Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models | Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang | 2025-02-13 | arXiv | https://github.com/horizonsinzqs/QueryAttack | https://doi.org/10.48550/arXiv.2502.09723 |
479 | Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models | Yiheng Liu, Xiaohui Gao, Haiyang Sun, Bao Ge, Tianming Liu, Junwei Han, Xintao Hu | 2025-02-13 | arXiv | https://github.com/WhatAboutMyStar/LLM_ACTIVATION | https://doi.org/10.48550/arXiv.2502.20408 |
480 | DarwinLM: Evolutionary Structured Pruning of Large Language Models | Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan Alistarh | 2025-02-13 | arXiv | https://github.com/IST-DASLab/DarwinLM | https://doi.org/10.48550/arXiv.2502.07780 |
481 | RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning | Jian Xu, Sichun Luo, Xiangyu Chen, Haoming Huang, Hanxu Hou, Linqi Song | 2025-02-12 | arXiv | https://github.com/JianXu95/RALLRec | https://doi.org/10.48550/arXiv.2502.06101 |
482 | LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM | Zhi Zhou, Kun-Yang Yu, Shi-Yu Tian, Xiao-Wen Yang, Jiang-Xin Shi, Pengxiao Song, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li | 2025-02-12 | arXiv …, 2025 | https://github.com/LAMDASZ-ML/Knowledge-Guide-Data-Generation | http://arxiv.org/abs/2502.06572v2 |
483 | Calibrating LLMs with Information-Theoretic Evidential Deep Learning | Yawei Li, David Rügamer, Bernd Bischl, Mina Rezaei | 2025-02-12 | arXiv:2502.06351, 2025 | https://github.com/sandylaker/ib-edl | http://arxiv.org/abs/2502.06351v2 |
484 | Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection | Areeg Fahad Rasheed, M. Zarkoosh, Shimam Amer Chasib, Safa F. Abbas | 2025-02-12 | arXiv | https://github.com/AREEG94FAHAD/food-hazard-prdouct-cls | https://doi.org/10.48550/arXiv.2502.08687 |
485 | Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation | Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, Conghui He | 2025-02-12 | arXiv | https://github.com/opendatalab/ProverGen | https://doi.org/10.48550/arXiv.2502.06563 |
486 | Systematic Outliers in Large Language Models | Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang | 2025-02-12 | arXiv | https://github.com/an-yongqi/systematic-outliers | https://doi.org/10.48550/arXiv.2502.06415 |
487 | Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models | Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi | 2025-02-12 | arXiv | https://xujiacong.github.io/Anomaly-OV/ | https://doi.org/10.48550/arXiv.2502.07601 |
488 | Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting | Arvind Pillai, Dimitris Spathis, Subigya Nepal, Amanda C Collins, Daniel M Mackin, Michael V Heinz, Tess Z Griffin, Nicholas C Jacobson, Andrew Campbell | 2025-02-11 | arXiv | https://github.com/arvind1609/time2lang | http://arxiv.org/abs/2502.07608v3 |
489 | The foundational capabilities of large language models in predicting postoperative risks using clinical notes | Charles Alba, Bing Xue, Joanna Abraham, Thomas George Kannampallil, Chenyang Lu | 2025-02-11 | npj Digit. Medicine | https://github.com/cja5553/LLMs_in_perioperative_care | https://doi.org/10.1038/s41746-025-01489-2 |
490 | Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining | Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu, Shiqiang Wang, Hans-Arno Jacobsen, Yingbin Liang | 2025-02-10 | arXiv | https://github.com/sowmaster/Sample-Level-Loss-Reweighting-ICLR-2025 | https://doi.org/10.48550/arXiv.2502.06733 |
491 | LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights | Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, Jeff Huang | 2025-02-10 | arXiv | https://github.com/OwenSanzas/LLM-For-Vulnerability-Detection | http://arxiv.org/abs/2502.07049v2 |
492 | HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models | Paul Darm, Annalisa Riccardi | 2025-02-09 | arXiv | https://github.com/PaulDrm/targeted_intervention | http://arxiv.org/abs/2502.05945v2 |
493 | Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators | Hritik Bansal, Pratyush Maini | 2025-02-09 | arXiv | https://pratyushmaini.github.io/blog/2024/risks-private-evals/ | http://arxiv.org/abs/2503.04756v1 |
494 | AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | Jiabin Tang, Tianyu Fan, Chao Huang | 2025-02-09 | arXiv | https://github.com/HKUDS/AutoAgent | http://arxiv.org/abs/2502.05957v2 |
495 | MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents | Jiabin Tang, Tianyu Fan, Chao Huang | 2025-02-09 | arXiv | https://github.com/HKUDS/MetaChain | http://arxiv.org/abs/2502.05957v1 |
496 | LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning | Hanqing Yang, Jingdi Chen, Marie Siew, Tania Lorido-Botran, Carlee Joe-Wong | 2025-02-08 | arXiv | https://happyeureka.github.io/damcs | http://arxiv.org/abs/2502.05453v1 |
497 | Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models | Sina Tayebati, Divake Kumar, Nastaran Darabi, Dinithi Jayasuriya, Ranganath Krishnan, Amit Ranjan Trivedi | 2025-02-08 | arXiv | https://github.com/sinatayebati/vlm-uncertainty | https://doi.org/10.48550/arXiv.2502.06884 |
498 | OntoTune: Ontology-Driven Self-training for Aligning Large Language Models | Zhiqiang Liu, Chengtao Gan, Junjie Wang, Yichi Zhang, Zhongpu Bo, Mengshu Sun, Huajun Chen, Wen Zhang | 2025-02-08 | arXiv | https://github.com/zjukg/OntoTune | https://doi.org/10.48550/arXiv.2502.05478 |
499 | ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | Yuwei Yin, Giuseppe Carenini | 2025-02-07 | arXiv | https://github.com/YuweiYin/ARR | https://doi.org/10.48550/arXiv.2502.04689 |
500 | Confidence Elicitation: A New Attack Vector for Large Language Models | Brian Formento, Chuan Sheng Foo, See-Kiong Ng | 2025-02-07 | arXiv | https://github.com/Aniloid2/Confidence_Elicitation_Attacks | https://doi.org/10.48550/arXiv.2502.04643 |
501 | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Junde Wu, Jiayuan Zhu, Yuyuan Liu | 2025-02-07 | arXiv | https://github.com/theworldofagents/Agentic-Reasoning | http://arxiv.org/abs/2502.04644v1 |
502 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li | 2025-02-07 | arXiv | https://github.com/yihedeng9/DuoGuard | http://arxiv.org/abs/2502.05163v1 |
503 | LLM-Supported Natural Language to Bash Translation | Finnian Westenfelder, Erik Hemberg, Miguel Tulla, Stephen Moskal, Una-May O'Reilly, Silviu Chiricescu | 2025-02-07 | arXiv | https://github.com/westenfelder/NL2SH | http://arxiv.org/abs/2502.06858v1 |
504 | QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh | 2025-02-07 | arXiv | https://github.com/IST-DASLab/QuEST | http://arxiv.org/abs/2502.05003v1 |
505 | Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | Yuanye Liu, Jiahang Xu, Li Lyna Zhang, Qi Chen, Xuan Feng, Yang Chen, Zhongxin Guo, Yuqing Yang, Peng Cheng | 2025-02-06 | arXiv | https://github.com/HenryLau7/CFPO | http://arxiv.org/abs/2502.04295v2 |
506 | ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam | 2025-02-06 | arXiv | https://github.com/Gen-Verse/ScoreFlow | http://arxiv.org/abs/2502.04306v1 |
507 | Robotouille: An Asynchronous Planning Benchmark for LLM Agents | Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, Sanjiban Choudhury | 2025-02-06 | arXiv | https://github.com/portal-cornell/robotouille | http://arxiv.org/abs/2502.05227v1 |
508 | My LLM might Mimic AAE -- But When Should it? | Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli, Hal Daumé III | 2025-02-06 | arXiv | https://github.com/smelliecat/AAEMime | http://arxiv.org/abs/2502.04564v2 |
509 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu | 2025-02-06 | arXiv | https://github.com/JarvisPei/CMoE | http://arxiv.org/abs/2502.04416v1 |
510 | FAS: Fast ANN-SNN Conversion for Spiking Large Language Models | Long Chen, Xiaotian Song, Andy Song, BaDong Chen, Jiancheng Lv, Yanan Sun | 2025-02-06 | arXiv | https://github.com/lc783/FAS | https://doi.org/10.48550/arXiv.2502.04405 |
511 | Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers | Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin | 2025-02-06 | arXiv | https://github.com/dmbeaglehole/neural_controllers | http://arxiv.org/abs/2502.03708v1 |
512 | "Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence | Shaopeng Fu, Liang Ding, Di Wang | 2025-02-06 | arXiv | https://github.com/fshp971/adv-icl | http://arxiv.org/abs/2502.04204v1 |
513 | Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training | Changhao Jiang, Ming Zhang, Junjie Ye, Xiaoran Fan, Yifei Cao, Jiajun Sun, Zhiheng Xi, Shihan Dou, Yi Dong, Yujiong Shen, Jingqi Tong, Zhen Wang, Tao Liang, Zhihui Fei, Mingyang Wan, Guojun Ma, Qi Zhang, Tao Gui, Xuanjing Huang | 2025-02-06 | arXiv | https://github.com/yuhui1038/SMI | https://doi.org/10.48550/arXiv.2502.04066 |
514 | KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Xing Li, Zeyu Xing, Yiming Li, Linping Qu, Hui-Ling Zhen, Wulong Liu, Yiwu Yao, Sinno Jialin Pan, Mingxuan Yuan | 2025-02-06 | arXiv | https://github.com/cmd2001/KVTuner | http://arxiv.org/abs/2502.04420v1 |
515 | EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models | He Hu, Yucheng Zhou, Lianzhong You, Hongbo Xu, Qianning Wang, Zheng Lian, Fei Richard Yu, Fei Ma, Laizhong Cui | 2025-02-06 | arXiv | https://emo-gml.github.io/ | https://doi.org/10.48550/arXiv.2502.04424 |
516 | Tool Unlearning for Tool-Augmented LLMs | Jiali Cheng, Hadi Amiri | 2025-02-05 | arXiv:2502.01083, 2025 | https://clu-uml.github.io/MU-Bench-Project-Page/ | http://arxiv.org/abs/2502.01083v1 |
517 | Preference Leakage: A Contamination Problem in LLM-as-a-judge | Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu | 2025-02-05 | arXiv …, 2025 | https://github.com/David-Li0406/Preference-Leakage | http://arxiv.org/abs/2502.01534v1 |
518 | Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning | Guanlin Li, Kangjie Chen, Shangwei Guo, Jie Zhang, Han Qiu, Chao Zhang, Guoyin Wang, Tianwei Zhang, Jiwei Li | 2025-02-05 | arXiv …, 2025 | https://github.com/GuanlinLee/llm_instruction_tuning | http://arxiv.org/abs/2502.01116v1 |
519 | PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design | Yuchao Wu, Xiaofei Yu, Hao Chen, Yang Luo, Yeyu Tong, Yuzhe Ma | 2025-02-05 | arXiv | https://github.com/PICDA/PICBench | http://arxiv.org/abs/2502.03159v1 |
520 | PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs | Mauricio Soroco, Jialin Song, Mengzhou Xia, Kye Emond, Weiran Sun, Wuyang Chen | 2025-02-05 | arXiv …, 2025 | https://pde-controller.github.io/ | http://arxiv.org/abs/2502.00963v1 |
521 | LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease | Muhammad Zain Raza, Jiawei Xu, Terence Lim, Lily Boddy, Carlos M. Mery, Andrew Well, Ying Ding | 2025-02-05 | arXiv …, 2025 | https://github.com/jiaweixu98/LLM-TA | http://arxiv.org/abs/2502.01620v1 |
522 | Demystifying Long Chain-of-Thought Reasoning in LLMs | Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue | 2025-02-05 | arXiv | https://github.com/eddycmu/demystify-long-cot | http://arxiv.org/abs/2502.03373v1 |
523 | A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs | Bradley P. Allen, Paul T. Groth | 2025-02-05 | arXiv | https://github.com/bradleypallen/trex-metalinguistic-disagreement | http://arxiv.org/abs/2502.02896v1 |
524 | SPRI: Aligning Large Language Models with Context-Situated Principles | Hongli Zhan, Muneeza Azmat, Raya Horesh, Junyi Jessy Li, Mikhail Yurochkin | 2025-02-05 | arXiv | https://github.com/honglizhan/SPRI-public | https://doi.org/10.48550/arXiv.2502.03397 |
525 | A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava | 2025-02-05 | arXiv …, 2025 | https://probabilistic-inference-scaling.github.io | http://arxiv.org/abs/2502.01618v2 |
526 | Knowledge Distillation from Large Language Models for Household Energy Modeling | Mohannad Takrouri, Nicolas M. Cuadrado, Martin Takác | 2025-02-05 | arXiv | https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation | https://doi.org/10.48550/arXiv.2502.03034 |
527 | Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models | Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan | 2025-02-05 | arXiv | https://github.com/HashmatShadab/Robust-LLaVA | https://doi.org/10.48550/arXiv.2502.01576 |
528 | Internal Activation as the Polar Star for Steering Unsafe LLM Behavior | Peixuan Han, Cheng Qian, Xiusi Chen, Yuji Zhang, Denghui Zhang, Heng Ji | 2025-02-05 | arXiv …, 2025 | https://github.com/Hanpx20/SafeSwitch | http://arxiv.org/abs/2502.01042v2 |
529 | CTR-Driven Advertising Image Generation with Multimodal Large Language Models | Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin Gao, Nong Sang | 2025-02-05 | arXiv | https://github.com/Chenguoz/CAIG | https://doi.org/10.48550/arXiv.2502.06823 |
530 | Intent Representation Learning with Large Language Model for Recommendation | Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang | 2025-02-05 | arXiv | https://github.com/wangyu0627/IRLLRec | http://arxiv.org/abs/2502.03307v1 |
531 | AdaSVD: Adaptive Singular Value Decomposition for Large Language Models | Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Linghe Kong, Yulun Zhang, Xiaokang Yang | 2025-02-05 | arXiv | https://github.com/ZHITENGLI/AdaSVD | https://doi.org/10.48550/arXiv.2502.01403 |
532 | Do Large Language Model Benchmarks Test Reliability? | Joshua Vendrow, Edward Vendrow, Sara Beery, Aleksander Madry | 2025-02-05 | arXiv | https://github.com/MadryLab/platinum-benchmarks | https://doi.org/10.48550/arXiv.2502.03461 |
533 | Overcoming Vision Language Model Challenges in Diagram Understanding: A Proof-of-Concept with XML-Driven Large Language Models Solutions | Shue Shiinoki, Ryo Koshihara, Hayato Motegi, Masumi Morishige | 2025-02-05 | arXiv | https://github.com/galirage/spreadsheet-intelligence | https://doi.org/10.48550/arXiv.2502.04389 |
534 | Breaking Focus: Contextual Distraction Curse in Large Language Models | Yue Huang, Yanbo Wang, Zixiang Xu, Chujie Gao, Siyuan Wu, Jiayi Ye, Xiuying Chen, Pin-Yu Chen, Xiangliang Zhang | 2025-02-05 | arXiv | https://github.com/wyf23187/LLM_CDV | https://doi.org/10.48550/arXiv.2502.01609 |
535 | AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science | Chenyue Li, Wen Deng, Mengqian Lu, Binhang Yuan | 2025-02-05 | arXiv | https://github.com/Relaxed-System-Lab/AtmosSci-Bench | https://doi.org/10.48550/arXiv.2502.01159 |
536 | CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing | Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P. Xing, Hongyi Wang, Huaxiu Yao | 2025-02-04 | arXiv | https://github.com/aiming-lab/CITER | https://doi.org/10.48550/arXiv.2502.01976 |
537 | AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement | Shivam Singh, Karthik Swaminathan, Nabanita Dash, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna | 2025-02-04 | arXiv | https://sssshivvvv.github.io/adaptbot/ | http://arxiv.org/abs/2502.02067v1 |
538 | CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements | Afshin Khadangi, Amir Sartipi, Igor Tchappi, Gilbert Fridgen | 2025-02-04 | arXiv | https://cognartive.github.io/ | https://doi.org/10.48550/arXiv.2502.04353 |
539 | Risk-Aware Driving Scenario Analysis with Large Language Models | Yuan Gao, Mattia Piccinini, Johannes Betz | 2025-02-04 | arXiv | https://github.com/yuangao-tum/Riskaware-Scenario-analyse | https://doi.org/10.48550/arXiv.2502.02145 |
540 | SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency | Qianhao Yuan, Yanjiang Liu, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun | 2025-02-04 | arXiv | https://github.com/icip-cas/SAISA | https://doi.org/10.48550/arXiv.2502.02458 |
541 | A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) | Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han | 2025-02-04 | arXiv | https://github.com/AcademyCityL/GALI | http://arxiv.org/abs/2502.02659v1 |
542 | AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs | Hongxin Li, Jingfan Chen, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang | 2025-02-04 | arXiv | https://autogui-project.github.io/ | http://arxiv.org/abs/2502.01977v1 |
543 | Multi-Lingual Cyber Threat Detection in Tweets/X Using ML, DL, and LLM: A Comparative Analysis | Saydul Akbar Murad, Ashim Dahal, Nick Rahimi | 2025-02-04 | arXiv | https://github.com/Mmurrad/Tweet-Data-Classification | http://arxiv.org/abs/2502.04346v1 |
544 | RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models | Can Jin, Hongwu Peng, Anxiang Zhang, Nuo Chen, Jiahui Zhao, Xi Xie, Kuangzheng Li, Shuya Feng, Kai Zhong, Caiwen Ding, Dimitris N. Metaxas | 2025-02-03 | arXiv | https://github.com/jincan333/RankFlow | https://doi.org/10.48550/arXiv.2502.00709 |
545 | Progressive Binarization with Semi-Structured Pruning for LLMs | Xianglong Yan, Tianao Zhang, Zhiteng Li, Yulun Zhang | 2025-02-03 | arXiv | https://github.com/XIANGLONGYAN/PBS2P | http://arxiv.org/abs/2502.01705v1 |
546 | A Comprehensive Analysis on LLM-based Node Classification Algorithms | Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, Hong Cheng | 2025-02-03 | arXiv …, 2025 | https://llmnodebed.github.io/ | http://arxiv.org/abs/2502.00829v1 |
547 | MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies | Ehsaneddin Asgari, Yassine El Kheir, Mohammad Ali Sadraei Javaheri | 2025-02-03 | arXiv:2502.00894, 2025 | https://github.com/llm-lab-org/MorphBPE | http://arxiv.org/abs/2502.00894v1 |
548 | RTBAgent: A LLM-based Agent System for Real-Time Bidding | Leng Cai, Junxuan He, Yikai Li, Junjie Liang, Yuanping Lin, Ziming Quan, Yawen Zeng, Jin Xu | 2025-02-03 | arXiv …, 2025 | https://github.com/CaiLeng/RTBAgent | http://arxiv.org/abs/2502.00792v1 |
549 | UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models | Xin Xu, Qiyun Xu, Tong Xiao, Tianhao Chen, Yuchen Yan, Jiaxin Zhang, Shizhe Diao, Can Yang, Yang Wang | 2025-02-02 | arXiv | https://github.com/YangLabHKUST/UGPhysics | https://doi.org/10.48550/arXiv.2502.00334 |
550 | UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs | Yizhe Xiong, Wei Huang, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jungong Han, Guiguang Ding | 2025-02-02 | arXiv …, 2025 | https://github.com/Bostoncake/UniAttn | http://arxiv.org/abs/2502.00439v1 |
551 | MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing | Yuxuan Chen, Xu Zhu, Hua Zhou, Zhuyin Ren | 2025-02-02 | arXiv | https://github.com/Terry-cyx/MetaOpenFOAM | https://doi.org/10.48550/arXiv.2502.00498 |
552 | LIBRA: Measuring Bias of Large Language Model from a Local Context | Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham, Yun Sing Koh | 2025-02-02 | arXiv | https://github.com/ipangbo/LIBRA | https://doi.org/10.48550/arXiv.2502.01679 |
553 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal | 2025-02-01 | arXiv | https://github.com/UKPLab/iclr2025-psa | https://doi.org/10.48550/arXiv.2501.18532 |
554 | Speculative Ensemble: Fast Large Language Model Ensemble via Speculation | Jiale Fu, Yuchu Jiang, Junkai Chen, Jiaming Fan, Xin Geng, Xu Yang | 2025-02-01 | arXiv | https://github.com/Kamichanw/Speculative-Ensemble/ | https://doi.org/10.48550/arXiv.2502.01662 |
555 | LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models | Shenghao Fu, Qize Yang, Qijie Mo, Junkai Yan, Xihan Wei, Jingke Meng, Xiaohua Xie, Wei-Shi Zheng | 2025-01-31 | arXiv | https://github.com/iSEE-Laboratory/LLMDet | https://doi.org/10.48550/arXiv.2501.18954 |
556 | Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2025-01-31 | arXiv | https://github.com/git-disl/Virus | https://doi.org/10.48550/arXiv.2501.17433 |
557 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong | 2025-01-31 | arXiv | https://github.com/BaohaoLiao/RSD | http://arxiv.org/abs/2501.19324v1 |
558 | 2SSP: A Two-Stage Framework for Structured Pruning of LLMs | Fabrizio Sandri, Elia Cunegatti, Giovanni Iacca | 2025-01-31 | arXiv:2501.17771, 2025 | https://github.com/FabrizioSandri/2SSP | http://arxiv.org/abs/2501.17771v1 |
559 | ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation | Minghua He, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang | 2025-01-30 | arXiv | https://execoder4trans.github.io/ | https://doi.org/10.48550/arXiv.2501.18460 |
560 | Uncertainty Quantification and Decomposition for LLM-based Recommendation | Wonbin Kweon, Sanghwan Jang, SeongKu Kang, Hwanjo Yu | 2025-01-30 | arXiv:2501.17630, 2025 | https://github.com/WonbinKweon/UNC_LLM_REC_WWW2025 | http://arxiv.org/abs/2501.17630v1 |
561 | CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng | 2025-01-28 | arXiv | https://github.com/LVUGAI/CHiP | http://arxiv.org/abs/2501.16629v1 |
562 | xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking | Sunbowen Lee, Shiwen Ni, Chi Wei, Shuaimin Li, Liyang Fan, Ahmadreza Argha, Hamid Alinejad-Rokny, Ruifeng Xu, Yicheng Gong, Min Yang | 2025-01-28 | arXiv | https://github.com/Aegis1863/xJailbreak | http://arxiv.org/abs/2501.16727v2 |
563 | Large Language Model Critics for Execution-Free Evaluation of Code Changes | Aashish Yadavally, Hoan Nguyen, Laurent Callot, Gauthier Guinet | 2025-01-28 | arXiv | https://github.com/amazon-science/code-agent-eval | https://doi.org/10.48550/arXiv.2501.16655 |
564 | SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model | Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Shichao Song, Mengwei Wang, Jiawei Yang | 2025-01-28 | arXiv | https://github.com/IAAR-Shanghai/SafeRAG | https://doi.org/10.48550/arXiv.2501.18636 |
565 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models | Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, Jiangyan Yi, Jianhua Tao | 2025-01-27 | arXiv | https://github.com/zeroQiaoba/AffectGPT | https://doi.org/10.48550/arXiv.2501.16566 |
566 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie | 2025-01-27 | arXiv | https://henrychur.github.io/MedS-Bench/ | https://doi.org/10.48550/arXiv.2408.12547 |
567 | LCTG Bench: LLM Controlled Text Generation Benchmark | Kentaro Kurihara, Masato Mita, Peinan Zhang, Shota Sasaki, Ryosuke Ishigami, Naoaki Okazaki | 2025-01-27 | arXiv | https://github.com/CyberAgentAILab/LCTG-Bench | http://arxiv.org/abs/2501.15875v1 |
568 | TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs | Yuxuan Gu, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic | 2025-01-26 | arXiv | https://github.com/guyuxuan9/TensorLLM | http://arxiv.org/abs/2501.15674v1 |
569 | Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Hulingxiao He, Geng Li, Zijun Geng, Jinglin Xu, Yuxin Peng | 2025-01-25 | arXiv | https://github.com/PKU-ICST-MIPL/Finedefics_ICLR2025 | https://doi.org/10.48550/arXiv.2501.15140 |
570 | PIP: Perturbation-based Iterative Pruning for Large Language Models | Yi Cao, Wei-Jie Xu, Yucheng Shen, Weijie Shi, Chi-Min Chan, Jiajie Xu | 2025-01-25 | arXiv | https://github.com/caoyiiiiii/PIP | https://doi.org/10.48550/arXiv.2501.15278 |
571 | MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models | Zhongpu Chen, Yinfeng Liu, Long Shi, Zhi-Jie Wang, Xingyan Chen, Yu Zhao, Fuji Ren | 2025-01-25 | arXiv | https://github.com/SWUFE-DB-Group/MDEval-Benchmark | https://doi.org/10.48550/arXiv.2501.15000 |
572 | A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models | Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka Zitnik, Liang Lin | 2025-01-25 | arXiv | https://lotbench.github.io | https://doi.org/10.48550/arXiv.2501.15147 |
573 | UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models | Xin Xu, Jiaxin Zhang, Tianhao Chen, Zitong Chao, Jishan Hu, Can Yang | 2025-01-24 | arXiv | https://github.com/YangLabHKUST/UGMathBench | https://doi.org/10.48550/arXiv.2501.13766 |
574 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, Andrew Y. Ng, Jonathan H. Chen | 2025-01-24 | arXiv | https://github.com/stanfordmlgroup/MedAgentBench | http://arxiv.org/abs/2501.14654v1 |
575 | Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation | Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao | 2025-01-24 | arXiv | https://github.com/DSL-Lab/aops | http://arxiv.org/abs/2501.14275v1 |
576 | DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing | Xinyu Ma, Yifeng Xu, Yang Lin, Tianlong Wang, Xu Chu, Xin Gao, Junfeng Zhao, Yasha Wang | 2025-01-24 | arXiv | https://github.com/ArthurLeoM/DRESS-LLM | http://arxiv.org/abs/2501.14371v1 |
577 | MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents | Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, James Zou, Andrew Y. Ng, Jonathan H. Chen | 2025-01-24 | arXiv | https://github.com/stanfordmlgroup/MedAgentBench | http://arxiv.org/abs/2501.14654v2 |
578 | FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu | 2025-01-24 | arXiv | https://github.com/FireRedTeam/FireRedASR | http://arxiv.org/abs/2501.14350v1 |
579 | Evaluating and Improving Graph to Text Generation with Large Language Models | Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Víctor Gutiérrez-Basulto, Jeff Z. Pan | 2025-01-24 | arXiv | https://github.com/probe2/kg_text | https://doi.org/10.48550/arXiv.2501.14497 |
580 | Can Large Language Models Understand Preferences in Personalized Recommendation? | Zhaoxuan Tan, Zinan Zeng, Qingkai Zeng, Zhenyu Wu, Zheyuan Liu, Fengran Mo, Meng Jiang | 2025-01-24 | arXiv | https://github.com/TamSiuhin/PerRecBench | https://doi.org/10.48550/arXiv.2501.13391 |
581 | JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models | Michael K. Chen, Xikun Zhang, Dacheng Tao | 2025-01-24 | arXiv | https://github.com/michaelchen-lab/JustLogic | https://doi.org/10.48550/arXiv.2501.14851 |
582 | Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models | Bo Gao, Michael W. Spratling | 2025-01-24 | arXiv | https://github.com/iminfine/freeatten | https://doi.org/10.48550/arXiv.2501.13428 |
583 | Do as We Do, Not as You Think: the Conformity of Large Language Models | Zhiyuan Weng, Guikun Chen, Wenguan Wang | 2025-01-24 | arXiv | https://github.com/Zhiyuan-Weng/BenchForm | https://doi.org/10.48550/arXiv.2501.13381 |
584 | Distillation Quantification for Large Language Models | Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Jiaheng Liu, Min Yang, Zhoufutu Wen, Shiwen Ni | 2025-01-23 | arXiv | https://github.com/Aegis1863/LLMs-Distillation-Quantification | https://doi.org/10.48550/arXiv.2501.12619 |
585 | OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting | Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang, Sifan Zhou | 2025-01-23 | arXiv | https://github.com/BrotherHappy/OSTQuant | https://doi.org/10.48550/arXiv.2501.13987 |
586 | Low-Rank Adapters Meet Neural Architecture Search for LLM Compression | J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain | 2025-01-23 | arXiv | https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning | http://arxiv.org/abs/2501.16372v1 |
587 | LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps | Andrey Palaev, Adil Khan, Syed M. Ahsan Kazmi | 2025-01-23 | arXiv | https://github.com/Palandr123/DiffusionU-NetLLM | http://arxiv.org/abs/2501.14046v1 |
588 | Quantification of Large Language Model Distillation | Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Haihong Wu, Tianci Liu, Jiaheng Liu, Hamid Alinejad-Rokny, Min Yang, Yitao Liang, Zhoufutu Wen, Shiwen Ni | 2025-01-22 | arXiv | https://github.com/Aegis1863/LLMs-Distillation-Quantification | http://arxiv.org/abs/2501.12619v3 |
589 | A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models | Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, Xiao Huang | 2025-01-21 | arXiv | https://github.com/DEEP-PolyU/Awesome-GraphRAG | https://doi.org/10.48550/arXiv.2501.13958 |
590 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang, Yuxin Xie, Yufan Deng, Liming Liang, Jinghan Ru, Yuguo Yin, Yuexian Zou | 2025-01-21 | arXiv | https://vargpt-1.github.io/ | https://doi.org/10.48550/arXiv.2501.12327 |
591 | EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Zhili Cheng, Yuge Tu, Ran Li, Shiqi Dai, Jinyi Hu, Shengding Hu, Jiahao Li, Yang Shi, Tianyu Yu, Weize Chen, Lei Shi, Maosong Sun | 2025-01-21 | arXiv | https://github.com/thunlp/EmbodiedEval | http://arxiv.org/abs/2501.11858v1 |
592 | Can open source large language models be used for tumor documentation in Germany? - An evaluation on urological doctors' notes | Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer | 2025-01-21 | arXiv | https://github.com/stefan-m-lenz/UroLlmEval | https://doi.org/10.48550/arXiv.2501.12106 |
593 | Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution | Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong | 2025-01-20 | arXiv | https://depictqa.github.io/deqa-score/ | https://doi.org/10.48550/arXiv.2501.11561 |
594 | Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy | Saeid Asgari Taghanaki, Joao Monteiro | 2025-01-20 | arXiv | https://github.com/asgsaeid/EQT | http://arxiv.org/abs/2501.11721v1 |
595 | Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference | Pouya Hamadanian, Sadjad Fouladi | 2025-01-20 | arXiv | https://github.com/microsoft/glinthawk | http://arxiv.org/abs/2501.11779v1 |
596 | ChaosEater: Fully Automating Chaos Engineering with Large Language Models | Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri, Yuusuke Nakano | 2025-01-19 | arXiv | https://ntt-dkiku.github.io/chaos-eater | https://doi.org/10.48550/arXiv.2501.11107 |
597 | InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models | Jing Ding, Kai Feng, Binbin Lin, Jiarui Cai, Qiushi Wang, Yu Xie, Xiaojin Zhang, Zhongyu Wei, Wei Chen | 2025-01-19 | arXiv | https://github.com/HaileyFamo/InsQABench | https://doi.org/10.48550/arXiv.2501.10943 |
598 | Control LLM: Controlled Evolution for Intelligence Retention in LLM | Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice Leung, Ya Xu | 2025-01-19 | arXiv | https://github.com/linkedin/ControlLLM | http://arxiv.org/abs/2501.10979v1 |
599 | LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport | Kyeongha Rho, Hyeongkeun Lee, Valentio Iverson, Joon Son Chung | 2025-01-18 | arXiv:2501.09291, 2025 | https://github.com/NAVER-INTEL-Co-Lab/gaudi-lavcap | http://arxiv.org/abs/2501.09291v1 |
600 | PaSa: An LLM Agent for Comprehensive Academic Paper Search | Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, Weinan E | 2025-01-17 | arXiv | https://github.com/bytedance/pasa | http://arxiv.org/abs/2501.10120v1 |
601 | Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design | Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, Bryan Hooi | 2025-01-17 | arXiv:2501.08603, 2025 | https://github.com/zz1358m/MCTS-AHD-master | http://arxiv.org/abs/2501.08603v2 |
602 | When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis | Ruixuan Zhang, Beichen Wang, Juexiao Zhang, Zilin Bian, Chen Feng, Kaan Ozbay | 2025-01-17 | arXiv | https://github.com/ai4ce/SeeUnsafe | https://doi.org/10.48550/arXiv.2501.10604 |
603 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan, Vibashan VS, Vishal M. Patel | 2025-01-17 | arXiv | https://kartik-3004.github.io/facexbench/ | http://arxiv.org/abs/2501.10360v1 |
604 | PokerBench: Training Large Language Models to become Professional Poker Players | Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli | 2025-01-16 | arXiv | https://github.com/pokerllm/pokerbench | https://doi.org/10.48550/arXiv.2501.08328 |
605 | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu | 2025-01-16 | arXiv | https://github.com/appletea233/LLaVA-ST | https://doi.org/10.48550/arXiv.2501.08282 |
606 | Gandalf the Red: Adaptive Security for LLMs | Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla | 2025-01-16 | arXiv …, 2025 | https://github.com/lakeraai/dsec-gandalf | http://arxiv.org/abs/2501.07927v1 |
607 | CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation | Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, Baishakhi Ray | 2025-01-16 | arXiv:2501.08200, 2025 | https://github.com/Co1lin/CWEval | http://arxiv.org/abs/2501.08200v1 |
608 | Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing | Eshaan Tanwar, Gayatri Oke, Tanmoy Chakraborty | 2025-01-16 | arXiv:2501.09127, 2025 | https://github.com/EshaanT/Bilingual_processing_LLMs | http://arxiv.org/abs/2501.09127v1 |
609 | OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training | Yijiong Yu, Ziyun Dai, Zekun Wang, Wei Wang, Ran Chen, Ji Pei | 2025-01-16 | arXiv …, 2025 | https://github.com/yuyijiong/fineweb-edu-chinese | http://arxiv.org/abs/2501.08197v1 |
610 | Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs | Qinyu Ma, Yuhao Zhou, Jianfeng Li | 2025-01-15 | Macromol. Rapid Commun. 2025, 2500065 | https://github.com/QinyuMa316/RetroSynthesisAgent | http://arxiv.org/abs/2501.08897v2 |
611 | LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation | Yiran Tao, Jehan Yang, Dan Ding, Zackory Erickson | 2025-01-15 | arXiv | https://lams-assistance.github.io/ | http://arxiv.org/abs/2501.08558v1 |
612 | The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities | Irina Bigoulaeva, Harish Tayyar Madabushi, Iryna Gurevych | 2025-01-15 | arXiv | https://github.com/UKPLab/arxiv2025-inherent-limits-plms | http://arxiv.org/abs/2501.08716v1 |
613 | A Roadmap to Guide the Integration of LLMs in Hierarchical Planning | Israel Puerta-Merino, Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares | 2025-01-14 | arXiv | https://llmforplanning.github.io | http://arxiv.org/abs/2501.08068v1 |
614 | Lifelong Learning of Large Language Model based Agents: A Roadmap | Junhao Zheng, Chengming Shi, Xidi Cai, Qiuke Li, Duzhen Zhang, Chenxing Li, Dong Yu, Qianli Ma | 2025-01-13 | arXiv | https://github.com/qianlima-lab/awesome-lifelong-llm-agent | https://doi.org/10.48550/arXiv.2501.07278 |
615 | SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training | Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu | 2025-01-12 | arXiv | https://github.com/TianjinYellow/SPAM-Optimizer | http://arxiv.org/abs/2501.06842v1 |
616 | ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | Xiangru Tang, Tianyu Hu, Muyang Ye, Yanjun Shao, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, Mark Gerstein | 2025-01-11 | arXiv | https://github.com/gersteinlab/chemagent | https://doi.org/10.48550/arXiv.2501.06590 |
617 | SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution | Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen | 2025-01-11 | arXiv …, 2025 | https://github.com/InternLM/SWE-Fixer | http://arxiv.org/abs/2501.05040v1 |
618 | FairCode: Evaluating Social Bias of LLMs in Code Generation | Yongkang Du, Jen-tse Huang, Jieyu Zhao, Lu Lin | 2025-01-11 | arXiv:2501.05396, 2025 | https://github.com/YongkDu/FairCode | http://arxiv.org/abs/2501.05396v1 |
619 | ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun | 2025-01-11 | arXiv | https://github.com/thunlp/ChartCoder | https://doi.org/10.48550/arXiv.2501.06598 |
620 | Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models | Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu | 2025-01-11 | arXiv | https://github.com/Rainier-rq/FollowSoftConstraints | https://doi.org/10.48550/arXiv.2501.04945 |
621 | Demystifying Domain-adaptive Post-training for Financial LLMs | Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty | 2025-01-11 | arXiv …, 2025 | https://github.com/SalesforceAIResearch/FinDap | http://arxiv.org/abs/2501.04961v1 |
622 | HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers | Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, Zhezhi He | 2025-01-11 | arXiv …, 2025 | https://github.com/Intelligent-Computing-Research-Group/HaVen | http://arxiv.org/abs/2501.04908v1 |
623 | Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models | You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun | 2025-01-10 | arXiv | https://migician-vg.github.io/ | https://doi.org/10.48550/arXiv.2501.05767 |
624 | ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events | Duygu Sezen Islakoglu, Jan-Christoph Kalo | 2025-01-10 | arXiv | https://github.com/duyguislakoglu/chronosense | https://doi.org/10.48550/arXiv.2501.03040 |
625 | Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain | Jing Guo, Nan Li, Ming Xu | 2025-01-10 | arXiv | https://github.com/CEEAI/elle | https://doi.org/10.48550/arXiv.2501.06277 |
626 | LLM4SR: A Survey on Large Language Models for Scientific Research | Ziming Luo, Zonglin Yang, Zexin Xu, Wei Yang, Xinya Du | 2025-01-10 | arXiv | https://github.com/du-nlp-lab/LLM4SR | https://doi.org/10.48550/arXiv.2501.04306 |
627 | MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou | 2025-01-10 | arXiv | https://funaudiollm.github.io/minmo | https://doi.org/10.48550/arXiv.2501.06282 |
628 | FlairGPT: Repurposing LLMs for Interior Designs | Gabrielle Littlefair, Niladri Shekhar Dutt, Niloy J. Mitra | 2025-01-10 | arXiv:2501.04648, 2025 | https://flairgpt.github.io/ | http://arxiv.org/abs/2501.04648v1 |
629 | Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation | Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, Yaowei Wang, Yonghong Tian, Jin Tang | 2025-01-09 | arXiv …, 2025 | https://github.com/Event-AHU/Medical_Image_Analysis | http://arxiv.org/abs/2501.03458v1 |
630 | LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases | Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Viren Bajaj, Zeya Ahmad | 2025-01-06 | arXiv | https://github.com/cvs-health/langfair | https://doi.org/10.48550/arXiv.2501.03112 |
631 | BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | Beichen Zhang, Yuhong Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Haodong Duan, Yuhang Cao, Dahua Lin, Jiaqi Wang | 2025-01-06 | arXiv | https://github.com/beichenzbc/BoostStep | https://doi.org/10.48550/arXiv.2501.03226 |
632 | Visual Large Language Models for Generalized and Specialized Applications | Yifan Li, Zhixin Lai, Wentao Bao, Zhen Tan, Anh Dao, Kewei Sui, Jiayi Shen, Dong Liu, Huan Liu, Yu Kong | 2025-01-06 | arXiv | https://github.com/JackYFL/awesome-VLLMs | https://doi.org/10.48550/arXiv.2501.02765 |
633 | CALM: Curiosity-Driven Auditing for Large Language Models | Xiang Zheng, Longxiang Wang, Yi Liu, Xingjun Ma, Chao Shen, Cong Wang | 2025-01-06 | arXiv | https://github.com/x-zheng16/CALM | https://doi.org/10.48550/arXiv.2501.02997 |
634 | HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs | Saleh Ashkboos, Mahdi Nikdan, Soroush Tabesh, Roberto L. Castro, Torsten Hoefler, Dan Alistarh | 2025-01-05 | arXiv | https://github.com/IST-DASLab/HALO | http://arxiv.org/abs/2501.02625v2 |
635 | Multi-LLM Collaborative Caption Generation in Scientific Documents | Jaeyoung Kim, Jongho Lee, Hong-Jun Choi, Ting-Yao Hsu, Chieh-Yang Huang, Sungchul Kim, Ryan Rossi, Tong Yu, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang, Sungchul Choi | 2025-01-05 | arXiv | https://github.com/teamreboott/MLBCAP | http://arxiv.org/abs/2501.02552v1 |
636 | MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments | Cai Yin, Zhouhong Gu, Du Zhaohan, Ye Zheyu, Cao Shaosheng, Xu Yiqian, Feng Hongwei, Chen Ping | 2025-01-04 | arXiv | https://github.com/lime728/MIRAGE | https://doi.org/10.48550/arXiv.2501.01652 |
637 | Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities | Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari | 2025-01-04 | arXiv | https://github.com/TaraRadvand74/llm-text-detection | http://arxiv.org/abs/2501.02406v2 |
638 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument | Yong Zhao, Yang Deng, See-Kiong Ng, Tat-Seng Chua | 2025-01-04 | arXiv | https://github.com/zhaoy777/AFICE | https://doi.org/10.48550/arXiv.2501.01336 |
639 | UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility | Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang | 2025-01-04 | arXiv | https://github.com/Hub-Tian/UAVs_Meet_LLMs | http://arxiv.org/abs/2501.02341v1 |
640 | REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models | Jian Hu | 2025-01-04 | arXiv | https://github.com/OpenRLHF/OpenRLHF | https://doi.org/10.48550/arXiv.2501.03262 |
641 | Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, Feiran Huang, Sheng Zhou, Jiajun Bu, Allen Lin, James Caverlee, Fakhri Karray, Irwin King, Philip S. Yu | 2025-01-04 | arXiv | https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation | https://doi.org/10.48550/arXiv.2501.01945 |
642 | Text Clustering as Classification with LLMs | Chen Huang, Guoxiu He | 2025-01-04 | Available at SSRN 5081002 | https://github.com/ECNU-Text-Computing/Text-Clustering-via-LLM | http://arxiv.org/abs/2410.00927v2 |
643 | Instruction-Following Evaluation for Large Language Models | Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou | 2025-01-03 | arXiv | https://github.com/google-research/google-research/tree/master/instruction_following_eval | https://doi.org/10.48550/arXiv.2311.07911 |
644 | FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze | 2025-01-03 | arXiv …, 2025 | http://github.com/flashinfer-ai/flashinfer | http://arxiv.org/abs/2501.01005v1 |
645 | Labels Generated by Large Language Model Helps Measuring People's Empathy in Vitro | Md. Rakibul Hasan, Yue Yao, Md. Zakir Hossain, Aneesh Krishna, Imre Rudas, Shafin Rahman, Tom Gedeon | 2025-01-02 | arXiv | https://github.com/hasan-rakibul/LLMPathy | https://doi.org/10.48550/arXiv.2501.00691 |
646 | Aligning LLMs with Domain Invariant Reward Models | David Wu, Sanjiban Choudhury | 2025-01-02 | arXiv:2501.00911, 2025 | https://github.com/portal-cornell/dial | http://arxiv.org/abs/2501.00911v1 |
647 | Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models | Anmol Reddy Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid A. Hasan, Elita A. Lobo | 2025 | arXiv | https://github.com/molereddy/Alternate-Preference-Optimization | https://doi.org/10.48550/arXiv.2409.13474 |
648 | Surveillance Video-and-Language Understanding: from Small to Large Multimodal Models | Tongtong Yuan, Xuange Zhang, Bo Liu, Kun Liu, Jian Jin, Zhenzhen Jiao | 2025 | IEEE Transactions on Circuits and Systems for Video Technology | https://xuange923.github.io/Surveillance-Video-Understanding | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10681489 |
649 | LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework | Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He | 2025 | arXiv | https://github.com/QiaoYRan/LOGIN | https://doi.org/10.48550/arXiv.2405.13902 |
650 | Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? | Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi | 2025 | arXiv | https://github.com/zhongjian-zhang/LLM4RGNN | https://doi.org/10.48550/arXiv.2408.08685 |
651 | TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning | Xiang Li, Yunshi Lan, Chao Yang | 2025 | arXiv | https://github.com/Ashura5/TreeEval | https://doi.org/10.48550/arXiv.2402.13125 |
652 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang, Lingdong Shen, Jia Liu, Fangxin Shang, Hongxiang Li, Haifeng Huang, Yehui Yang | 2025 | AAAI | https://github.com/ShawnHuang497/MedPLIB | https://doi.org/10.1609/aaai.v39i4.32394 |
653 | Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval | Guangyuan Ma, Yongliang Ma, Xing Wu, Zhenpeng Su, Ming Zhou, Songlin Hu | 2025 | arXiv | https://github.com/tdro-llm/tdro | https://doi.org/10.48550/arXiv.2408.10613 |
654 | SS-GEN: A Social Story Generation Framework with Large Language Models | Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, Jian Yu | 2025 | AAAI | https://github.com/MIMIFY/SS-GEN | https://doi.org/10.1609/aaai.v39i2.32119 |
655 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu | 2025 | arXiv | https://SEAS-LLM.github.io/ | https://doi.org/10.48550/arXiv.2408.02632 |
656 | Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework | Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li | 2025 | arXiv | https://github.com/Event-AHU/OpenPAR | https://doi.org/10.48550/arXiv.2408.09720 |
657 | PAT: Pruning-Aware Tuning for Large Language Models | Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du | 2025 | arXiv | https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning | https://doi.org/10.48550/arXiv.2408.14721 |
658 | One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models | Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, Ji-Rong Wen | 2025 | arXiv | https://github.com/DaoD/SPRING/ | https://doi.org/10.48550/arXiv.2405.19670 |
659 | NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning | Xin Yi, Shunfan Zheng, Linlin Wang, Gerard de Melo, Xiaoling Wang, Liang He | 2025 | AAAI | https://github.com/xinykou/NLSR | https://doi.org/10.1609/aaai.v39i24.34762 |
660 | MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen | 2025 | arXiv | https://github.com/zjwang21/MoE-LPR | https://doi.org/10.48550/arXiv.2408.11396 |
661 | CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding? | Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma | 2025 | arXiv | https://github.com/CodeLLM-Research/CodeJudge-Eval | https://doi.org/10.48550/arXiv.2408.10718 |
662 | Mitigating Social Bias in Large Language Models: A Multi-Objective Approach Within a Multi-Agent Framework | Zhenjie Xu, Wenqing Chen, Yi Tang, Xuanying Li, Cheng Hu, Zhixuan Chu, Kui Ren, Zibin Zheng, Zhichao Lu | 2025 | AAAI | https://github.com/Cortantse/MOMA | https://doi.org/10.1609/aaai.v39i24.34748 |
663 | Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan | 2025 | AAAI | https://github.com/dirtycomputer/O2M_attack | https://doi.org/10.1609/aaai.v39i4.32396 |
664 | MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector | Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang | 2025 | arXiv | https://github.com/wjfu99/MIA-Tuner | https://doi.org/10.48550/arXiv.2408.08661 |
665 | LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation | Qidong Liu, Xian Wu, Wanyu Wang, Yejing Wang, Yuanshao Zhu, Xiangyu Zhao, Feng Tian, Yefeng Zheng | 2025 | AAAI | https://github.com/Applied-Machine-Learning-Lab/LLMEmb | https://doi.org/10.1609/aaai.v39i11.33327 |
666 | LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application | Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai | 2025 | AAAI | https://github.com/adxcreative/LEARN | https://doi.org/10.1609/aaai.v39i11.33291 |
667 | Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models | Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao | 2025 | arXiv | https://github.com/ChenhuiHu/knowledge_in_superposition | https://doi.org/10.48550/arXiv.2408.07413 |
668 | ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation | Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu | 2025 | AAAI | https://github.com/zhaoyuzhi/ICM-Assistant | https://doi.org/10.1609/aaai.v39i8.32908 |
669 | IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities | Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin | 2025 | arXiv | https://github.com/360CVGroup/Inner-Adaptor-Architecture | https://doi.org/10.48550/arXiv.2408.12902 |
670 | Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning | Junlin He, Tong Nie, Wei Ma | 2025 | arXiv | https://github.com/Umaruchain/LLMGeovec | https://doi.org/10.48550/arXiv.2408.12116 |
671 | Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models | Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou | 2025 | arXiv | https://github.com/ywh187/FitPrune | https://doi.org/10.48550/arXiv.2409.10197 |
672 | Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering | Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Shengping Liu, Kang Liu, Jun Zhao | 2025 | COLING | https://github.com/Xnhyacinth/IAG | https://aclanthology.org/2025.coling-main.89/ |
673 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li, Han Shi, Sitong Wu, Chuanyang Zheng, Zhenguo Li, Xin Jiang, Hong Xu, Jiaya Jia | 2025 | COLING | https://github.com/dvlab-research/Q-LLM | https://aclanthology.org/2025.coling-main.34/ |
674 | Distilling Rule-based Knowledge into Large Language Models | Wenkai Yang, Yankai Lin, Jie Zhou, Ji-Rong Wen | 2025 | COLING | https://github.com/RUCBM/rule-distillation | https://aclanthology.org/2025.coling-main.61/ |
675 | EarthMarker: A Visual Prompting Multimodal Large Language Model for Remote Sensing | Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Jun Li, Xuerui Mao | 2025 | IEEE Trans. Geosci. Remote. Sens. | https://github.com/wivizhang/EarthMarker | https://doi.org/10.1109/TGRS.2024.3523505 |
676 | Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models | Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui | 2025 | arXiv | https://github.com/reml-group/DoG | https://doi.org/10.48550/arXiv.2409.03155 |
677 | Towards Efficient and Effective Adaptation of Large Language Models for Sequential Recommendation | Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu | 2025 | arXiv | https://github.com/justarter/E2URec | https://doi.org/10.48550/arXiv.2310.01612 |
678 | Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification | Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers | 2025 | arXiv | https://github.com/rsummers11/CADLab/tree/master/MAPLEZ_LLM_report_labeler/ | https://doi.org/10.48550/arXiv.2403.04024 |
679 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng | 2025 | arXiv | https://github.com/zengxingchen/ChartQA-MLLM | https://doi.org/10.48550/arXiv.2407.20174 |
680 | Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong | 2025 | COLING | https://github.com/hfutml/Calibration-MLLM | https://aclanthology.org/2025.coling-main.208/ |
681 | Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges | Vinay Samuel, Yue Zhou, Henry Peng Zou | 2025 | arXiv | https://github.com/vsamuel2003/data-contamination | https://doi.org/10.48550/arXiv.2409.09927 |
682 | The Only Way is Ethics: A Guide to Ethical Research with Large Language Models | Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch | 2025 | COLING | https://github.com/MxEddie/Ethics-Whitepaper | https://aclanthology.org/2025.coling-main.603/ |
683 | The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models | Zihui Wu, Haichang Gao, Jianping He, Ping Wang | 2025 | arXiv | https://github.com/wooozihui/jailbreakfunction | https://doi.org/10.48550/arXiv.2407.17915 |
684 | Retrieval Augmented Instruction Tuning for Open NER with Large Language Models | Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang | 2025 | arXiv | https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER | https://doi.org/10.48550/arXiv.2406.17305 |
685 | Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models | Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong | 2025 | COLING | https://github.com/wutaiqiang/LLM_KD_AKL | https://aclanthology.org/2025.coling-main.383/ |
686 | Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study | Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen | 2025 | COLING | https://github.com/open-compass/DevEval | https://aclanthology.org/2025.coling-main.502/ |
687 | Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching | Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han, Le Sun, Hao Wang, Zhenyu Zeng | 2025 | arXiv | https://github.com/tshu-w/ComEM | https://doi.org/10.48550/arXiv.2405.16884 |
688 | Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models | Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang G. Cai | 2025 | arXiv | https://github.com/ChenDelong1999/Linguistic-Similarity | https://doi.org/10.48550/arXiv.2409.12435 |
689 | LLMTreeRec: Unleashing the Power of Large Language Models for Cold-Start Recommendations | Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang | 2025 | COLING | https://github.com/Applied-Machine-Learning-Lab/LLMTreeRec | https://aclanthology.org/2025.coling-main.59/ |
690 | KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting | Thilini Wijesiriwardene, Ruwan Wickramarachchi, Sreeram Reddy Vennam, Vinija Jain, Aman Chadha, Amitava Das, Ponnurangam Kumaraguru, Amit P. Sheth | 2025 | COLING | https://github.com/Thiliniiw/KnowledgePrompts/ | https://aclanthology.org/2025.coling-main.268/ |
691 | Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation | Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang | 2025 | arXiv | https://github.com/RUCAIBox/LLM-Knowledge-Boundary | https://doi.org/10.48550/arXiv.2307.11019 |
692 | InternLM-Law: An Open Source Chinese Legal Large Language Model | Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge | 2025 | arXiv | https://github.com/InternLM/InternLM-Law | https://doi.org/10.48550/arXiv.2406.14887 |
693 | ICLEval: Evaluating In-Context Learning Ability of Large Language Models | Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen | 2025 | arXiv | https://github.com/yiye3/ICLEval | https://doi.org/10.48550/arXiv.2406.14955 |
694 | Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining | Zongru Wu, Pengzhou Cheng, Lingyong Fang, Zhuosheng Zhang, Gongshen Liu | 2025 | COLING | https://github.com/ZrW00/GraceFul | https://aclanthology.org/2025.coling-main.220/ |
695 | GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models | Zike Yuan, Ming Liu, Hui Wang, Bing Qin | 2025 | arXiv | https://github.com/ZIKEYUAN/GraCoRe | https://doi.org/10.48550/arXiv.2407.02936 |
696 | Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion | Ben Liu, Jihai Zhang, Fangquan Lin, Cheng Yang, Min Peng | 2025 | COLING | https://github.com/LB0828/FtG | https://aclanthology.org/2025.coling-main.740/ |
697 | Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? | Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang | 2025 | COLING | https://github.com/Luckfort/CD | https://aclanthology.org/2025.coling-main.37/ |
698 | Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models | Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, Yu Hong | 2025 | COLING | https://github.com/jiah-li/magic | https://aclanthology.org/2025.coling-main.305/ |
699 | Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation | Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan, Zheng Hui, Jiawei Yao | 2025 | AAAI | https://github.com/FanshuoZeng/Simignore | https://doi.org/10.1609/aaai.v39i10.33107 |
700 | The Geometry of Categorical and Hierarchical Concepts in Large Language Models | Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch | 2025 | arXiv | https://github.com/KihoPark/LLM_Categorical_Hierarchical_Representations | https://doi.org/10.48550/arXiv.2406.01506 |
701 | ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models | Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang | 2025 | arXiv | https://github.com/yejipark-m/ConVis | https://doi.org/10.48550/arXiv.2408.13906 |
702 | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models | Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark | 2025 | arXiv | https://github.com/allenai/discoverybench | https://doi.org/10.48550/arXiv.2407.01725 |
703 | MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia | 2025 | ICLR | https://github.com/dvlab-research/MR-GSM8K | https://openreview.net/forum?id=br4H61LOoI |
704 | LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica | 2025 | arXiv | https://livecodebench.github.io/ | https://doi.org/10.48550/arXiv.2403.07974 |
705 | Large Language Models are Interpretable Learners | Ruochen Wang, Si Si, Felix X. Yu, Dorothea Wiesmann Rothuizen, Cho-Jui Hsieh, Inderjit S. Dhillon | 2025 | ICLR | https://github.com/ruocwang/llm-symbolic-program | https://openreview.net/forum?id=hTphfqtafO |
706 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng | 2025 | arXiv | https://github.com/ictnlp/LLaMA-Omni | https://doi.org/10.48550/arXiv.2409.06666 |
707 | LLM-SR: Scientific Equation Discovery via Programming with Large Language Models | Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K. Reddy | 2025 | arXiv | https://github.com/deep-symbolic-mathematics/LLM-SR | https://doi.org/10.48550/arXiv.2404.18400 |
708 | LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models | Xiaohao Yang, He Zhao, Dinh Q. Phung, Wray L. Buntine, Lan Du | 2025 | arXiv | https://github.com/Xiaohao-Yang/Topic_Model_Evaluation | https://doi.org/10.48550/arXiv.2406.09008 |
709 | KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models | Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang | 2025 | ICLR | https://github.com/juyongjiang/KaSA | https://openreview.net/forum?id=OQqNieeivq |
710 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin | 2025 | arXiv | https://github.com/jiaxiaojunQAQ/I-GCG | https://doi.org/10.48550/arXiv.2405.21018 |
711 | FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models | Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian | 2025 | ICLR | https://github.com/microsoft/CADGeneration/FlexCAD | https://openreview.net/forum?id=Z0eiiV3Yyh |
712 | Efficient Evolutionary Search Over Chemical Space with Large Language Models | Haorui Wang, Marta Skreta, Cher Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang | 2025 | ICLR | http://github.com/zoom-wang112358/MOLLEO | https://openreview.net/forum?id=awWiNvQwf3 |
713 | Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification | Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaosheng Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin | 2025 | ICLR | https://github.com/Osilly/dynamic_llava | https://openreview.net/forum?id=hzVpZDrW73 |
714 | Developing safe and responsible large language model: can we balance bias reduction and language understanding? | Shaina Raza, Oluwanifemi Bamgbose, Shardul Ghuge, Fatemeh Tavakoli, Deepak John Reji, Syed Raza Bashir | 2025 | Mach. Learn. | https://github.com/shainarazavi/Safe-Responsible-LLM | https://doi.org/10.1007/s10994-025-06767-4 |
715 | Neuron based Personality Trait Induction in Large Language Models | Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao, Ji-Rong Wen | 2025 | ICLR | https://github.com/RUCAIBox/NPTI | https://openreview.net/forum?id=LYHEY783Np |
716 | Concept Bottleneck Large Language Models | Chung-En Sun, Tuomas P. Oikarinen, Berk Ustun, Tsui-Wei Weng | 2025 | ICLR | https://github.com/Trustworthy-ML-Lab/CB-LLMs | https://openreview.net/forum?id=RC5FPYVQaH |
717 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf | 2025 | arXiv | https://sgp-bench.github.io/ | https://doi.org/10.48550/arXiv.2408.08313 |
718 | CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery | Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma Gongque, Jianing Yu, Qiuna Tan, Weiran Xu | 2025 | arXiv | https://github.com/csbench/csbench | https://doi.org/10.48550/arXiv.2406.08587 |
719 | Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2025 | arXiv | https://github.com/git-disl/Booster | https://doi.org/10.48550/arXiv.2409.01586 |
720 | Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Boda Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov | 2025 | arXiv | https://github.com/IINemo/lm-polygraph | https://doi.org/10.48550/arXiv.2406.15627 |
721 | Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression | Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang | 2025 | arXiv | https://github.com/TUDa-HWAI/Basis_Sharing | https://doi.org/10.48550/arXiv.2410.03765 |
722 | An Engorgio Prompt Makes Large Language Model Babble on | Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu, Han Qiu | 2025 | ICLR | https://github.com/jianshuod/Engorgio-prompt | https://openreview.net/forum?id=m4eXBo0VNc |
723 | Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards | Xiaoyu Yang, Jie Lu, En Yu | 2025 | ICLR | https://github.com/Anonymous0Knight/ConceptDriftMLLMs | https://openreview.net/forum?id=b20VK2GnSs |
724 | AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models | Kim Sung-Bin, Oh Hyun-Bin, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh | 2025 | ICLR | https://github.com/AVHBench/AVHBench | https://openreview.net/forum?id=jTEKTdI3K9 |
725 | A Probabilistic Perspective on Unlearning and Alignment for Large Language Models | Yan Scholten, Stephan Günnemann, Leo Schwinn | 2025 | arXiv | https://github.com/yascho/probabilistic-unlearning | https://doi.org/10.48550/arXiv.2410.03523 |
726 | A Closer Look into Mixture-of-Experts in Large Language Models | Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu | 2025 | arXiv | https://github.com/kamanphoebe/Look-into-MoEs | https://doi.org/10.48550/arXiv.2406.18219 |
727 | Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models | Jingyang Zhang, Jingwei Sun, Eric C. Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Helen Li | 2025 | arXiv | https://zjysteven.github.io/mink-plus-plus/ | https://doi.org/10.48550/arXiv.2404.02936 |
728 | Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference | Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji | 2025 | arXiv | https://github.com/lzhxmu/VTW | https://doi.org/10.48550/arXiv.2405.05803 |
729 | NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions | Mehak Preet Dhaliwal, Andong Hua, Laya Pullela, Ryan Burke, Yao Qin | 2025 | arXiv | https://mehak126.github.io/nutribench.html | https://doi.org/10.48550/arXiv.2407.12843 |
730 | UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model | Zhaowei Li, Wei Wang, Yiqing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang | 2025 | arXiv | https://github.com/lzw-lzw/UnifiedMLLM | https://doi.org/10.48550/arXiv.2408.02503 |
731 | Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution | Wentao Tan, Qiong Cao, Yibing Zhan, Chao Xue, Changxing Ding | 2025 | AAAI | https://github.com/WentaoTan/SENA | https://doi.org/10.1609/aaai.v39i7.32774 |
732 | SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models | Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao, Hangyu Mao, Fuzheng Zhang | 2025 | arXiv | https://sheetagent.github.io | https://doi.org/10.48550/arXiv.2403.03636 |
733 | Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models | Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao | 2025 | arXiv | https://github.com/liuqi6777/pe_rank | https://doi.org/10.48550/arXiv.2406.14848 |
734 | Learning Multiple Object States from Actions via Large Language Models | Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato | 2025 | WACV | https://masatate.github.io/ObjStatefromAction.github.io/ | https://doi.org/10.1109/WACV61041.2025.00925 |
735 | Large Language Models Empowered Personalized Web Agents | Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua | 2025 | WWW | https://hongrucai.github.io/PersonalWAB/ | https://doi.org/10.1145/3696410.3714842 |
736 | Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval | Luo Ji, Feixiang Guo, Teng Chen, Qingqing Gu, Xiaoyu Wang, Ningyuan Xi, Yihong Wang, Peng Yu, Yue Zhao, Hongyang Lei, Zhonglin Jiang, Yong Chen | 2025 | ECIR | https://github.com/flyfree5/LaHoRe | https://doi.org/10.1007/978-3-031-88714-7_27 |
737 | CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation | Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, Xiangnan He | 2025 | arXiv | https://github.com/zyang1580/CoLLM | https://doi.org/10.48550/arXiv.2310.19488 |
738 | Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma | 2025 | ICLR | https://github.com/BryceZhuo/PolyCom | https://openreview.net/forum?id=CbpWPbYHuv |
739 | DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Do cumentation | Anna C. Doris, Daniele Grandi, Ryan Tomich, Md Ferdous Alam, Mohammadmehdi Ataei, Hyunmin Cheong, Faez Ahmed | 2025 | J. Comput. Inf. Sci. Eng. | https://github.com/anniedoris/design_qa/ | https://doi.org/10.1115/1.4067333 |
740 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl | 2025 | arXiv | https://github.com/abenechehab/dicl | https://doi.org/10.48550/arXiv.2410.11711 |
741 | WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jian-Guang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Yansong Tang, Dongmei Zhang | 2025 | ICLR | https://github.com/nlpxucan/WizardLM | https://openreview.net/forum?id=mMPMHWOdOy |
742 | Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning? | Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp, Jindong Gu | 2025 | WACV | https://chenxshuo.github.io/m-icl/ | https://doi.org/10.1109/WACV61041.2025.00585 |
743 | TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking | Danqing Wang, Jianxin Ma, Fei Fang, Lei Li | 2025 | ICLR | https://github.com/dqwang122/ThinkHub | https://openreview.net/forum?id=VIUisLx8lQ |
744 | Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation | Mufei Li, Siqi Miao, Pan Li | 2025 | ICLR | https://github.com/Graph-COM/SubgraphRAG | https://openreview.net/forum?id=JvkuZZ04O7 |
745 | Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation | Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, Jian Guo | 2025 | ICLR | https://github.com/IDEA-FinAI/ToG-2 | https://openreview.net/forum?id=oFBu7qaZpS |
746 | REvolve: Reward Evolution with Large Language Models using Human Feedback | Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires | 2025 | ICLR | https://rishihazra.github.io/REvolve | https://openreview.net/forum?id=cJPUpL8mOw |
747 | Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models | Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh | 2025 | NAACL | https://github.com/parameterlab/mia-scaling | https://aclanthology.org/2025.findings-naacl.234/ |
748 | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou | 2025 | arXiv | https://github.com/QwenLM/AutoIF | https://doi.org/10.48550/arXiv.2406.13542 |
749 | REEF: Representation Encoding Fingerprints for Large Language Models | Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao | 2025 | ICLR | https://github.com/tmylla/REEF | https://openreview.net/forum?id=SnDmPkOJ0T |
750 | Steering Large Language Models between Code Execution and Textual Reasoning | Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang | 2025 | arXiv | https://yongchao98.github.io/CodeSteer/ | https://doi.org/10.48550/arXiv.2410.03524 |
751 | StringLLM: Understanding the String Processing Capability of Large Language Models | Xilong Wang, Hao Fu, Jindong Wang, Neil Zhenqiang Gong | 2025 | arXiv | https://github.com/wxl-lxw/StringLLM | https://doi.org/10.48550/arXiv.2410.01208 |
752 | TESTEVAL: Benchmarking Large Language Models for Test Case Generation | Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma | 2025 | arXiv | https://llm4softwaretesting.github.io | https://doi.org/10.48550/arXiv.2406.04531 |
753 | A Closer Look at Machine Unlearning for Large Language Models | Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin | 2025 | arXiv | https://github.com/sail-sg/closer-look-LLM-unlearning | https://doi.org/10.48550/arXiv.2410.08109 |
754 | Distributed Mixture-of-Agents for Edge Inference with Large Language Models | Purbesh Mitra, Priyanka Kaswan, Sennur Ulukus | 2024-12-30 | arXiv | https://github.com/purbeshmitra/distributed_moa | http://arxiv.org/abs/2412.21200v1 |
755 | Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study | Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen | 2024-12-29 | arXiv | https://github.com/YuHuiGao/FG-Bench | http://arxiv.org/abs/2412.20613v1 |
756 | Mind the Data Gap: Bridging LLMs to Enterprise Data Integration | Moe Kayali, Fabian Wenz, Nesime Tatbul, Çağatay Demiralp | 2024-12-29 | arXiv | https://goby-benchmark.github.io/ | http://arxiv.org/abs/2412.20331v1 |
757 | TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication | Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang | 2024-12-29 | arXiv | https://github.com/ACA-Lab-SJTU/token-ring | http://arxiv.org/abs/2412.20501v1 |
758 | On the Compositional Generalization of Multimodal LLMs for Medical Imaging | Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang | 2024-12-28 | arXiv | https://github.com/FreedomIntelligence/Med-MAT | http://arxiv.org/abs/2412.20070v1 |
759 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen, Baochun Li | 2024-12-27 | ICML | https://github.com/iQua/llmpebase/tree/main/examples/ThoughtRollback | https://openreview.net/forum?id=aoAPOOtN9E |
760 | A Survey on Large Language Model Acceleration based on KV Cache Management | Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen | 2024-12-27 | arXiv | https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management | http://arxiv.org/abs/2412.19442v2 |
761 | Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | Jia-Hong Huang, Yixian Shen, Hongyi Zhu, Stevan Rudinac, Evangelos Kanoulas | 2024-12-27 | arXiv | https://github.com/Jhhuangkay/Gradient-Weight-normalized-Low-rank-Projection-for-Efficient-LLM-Training | http://arxiv.org/abs/2412.19616v1 |
762 | MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Jiaqi Fan, Jianhua Wu, Jincheng Gao, Jianhao Yu, Yafei Wang, Hongqing Chu, Bingzhao Gao | 2024-12-27 | arXiv | https://github.com/fjq-tongji/MLLM-SUL | http://arxiv.org/abs/2412.19406v1 |
763 | Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang | 2024-12-26 | arXiv | https://github.com/OpenGVLab/TPO | http://arxiv.org/abs/2412.19326v1 |
764 | CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models | Ping Guo, Qingfu Zhang, Xi Lin | 2024-12-25 | arXiv | https://github.com/pgg3/CoEvo | http://arxiv.org/abs/2412.18890v1 |
765 | 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Tatiana Zemskova, Dmitry Yudin | 2024-12-24 | arXiv | https://github.com/CognitiveAISystems/3DGraphLLM | http://arxiv.org/abs/2412.18450v2 |
766 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang, Guangyu Xie, Hongling Xu, Kaiheng Hou, Jianzhu Bao, Qianlong Wang, Shiwei Chen, Ruifeng Xu | 2024-12-24 | arXiv | https://github.com/HITSZ-HLT/FSA-Distillation | http://arxiv.org/abs/2412.18552v2 |
767 | Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving | Hao Pang, Zhenpo Wang, Guoqiang Li | 2024-12-24 | arXiv | https://bitmobility.github.io/LGDRL/ | http://arxiv.org/abs/2412.18511v1 |
768 | Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models | Xuan Lin, Long Chen, Yile Wang, Xiangxiang Zeng, Philip S. Yu | 2024-12-24 | arXiv | https://github.com/chenlong164/PEIT | http://arxiv.org/abs/2412.18084v1 |
769 | Token-Budget-Aware LLM Reasoning | Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen | 2024-12-24 | arXiv | https://github.com/GeniusHTX/TALE | http://arxiv.org/abs/2412.18547v3 |
770 | Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance | Nicolas Devatine, Louis Abraham | 2024-12-23 | arXiv | https://github.com/NDV-tiime/CompressionDistance | http://arxiv.org/abs/2412.17321v1 |
771 | Large Language Model Safety: A Holistic Survey | Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, Deyi Xiong | 2024-12-23 | arXiv | https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers | http://arxiv.org/abs/2412.17686v1 |
772 | CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models | Yeyuan Wang, Dehong Gao, Bin Li, Rujiao Long, Lei Yi, Xiaoyan Cai, Libin Yang, Jinxia Zhang, Shanqing Yu, Qi Xuan | 2024-12-22 | arXiv | https://github.com/Gavin001201/CoF | http://arxiv.org/abs/2412.16869v1 |
773 | MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge | Jie He, Nan Hu, Wanqiu Long, Jiaoyan Chen, Jeff Z. Pan | 2024-12-22 | arXiv | https://github.com/probe2/multi-hop/ | http://arxiv.org/abs/2412.17032v1 |
774 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang, Hao Zhou, Kai Han | 2024-12-20 | arXiv | https://github.com/Visual-AI/PruneVid | http://arxiv.org/abs/2412.16117v1 |
775 | TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use | Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, Zhengyin Du | 2024-12-20 | arXiv | https://github.com/Junjie-Ye/TL-Training | http://arxiv.org/abs/2412.15495v1 |
776 | Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation | Xiaoqiang Kang, Zimu Wang, Xiaobo Jin, Wei Wang, Kaizhu Huang, Qiufeng Wang | 2024-12-20 | arXiv | https://github.com/Jason8Kang/TELL | http://arxiv.org/abs/2412.15594v1 |
777 | WebLLM: A High-Performance In-Browser LLM Inference Engine | Charlie F. Ruan, Yucheng Qin, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen | 2024-12-20 | arXiv | https://github.com/mlc-ai/web-llm | http://arxiv.org/abs/2412.15803v1 |
778 | Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models | Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou | 2024-12-19 | arXiv | https://github.com/8421BCD/fullrank | http://arxiv.org/abs/2412.14574v1 |
779 | On Verbalized Confidence Scores for LLMs | Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada | 2024-12-19 | arXiv | https://github.com/danielyxyang/llm-verbalized-uq | http://arxiv.org/abs/2412.14737v1 |
780 | ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study | Eric Modesitt, Ke Yang, Spencer Hulsey, Chengxiang Zhai, Volodymyr Kindratenko | 2024-12-19 | arXiv | https://github.com/ModeEric/ORBIT-Llama | http://arxiv.org/abs/2412.14436v1 |
781 | Agent-SafetyBench: Evaluating the Safety of LLM Agents | Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, Minlie Huang | 2024-12-19 | arXiv | https://github.com/thu-coai/Agent-SafetyBench | http://arxiv.org/abs/2412.14470v1 |
782 | Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes | Katarzyna Kobalczyk, Claudio Fanconi, Hao Sun, Mihaela van der Schaar | 2024-12-18 | arXiv | https://github.com/kasia-kobalczyk/few-shot-steerable-alignment | http://arxiv.org/abs/2412.13998v1 |
783 | ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals | Utkarsh Saxena, Sayeh Sharify, Kaushik Roy, Xin Wang | 2024-12-18 | arXiv | https://github.com/utkarsh-dmx/project-resq | http://arxiv.org/abs/2412.14363v1 |
784 | InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Cong Wei, Yujie Zhong, Haoxian Tan, Yingsen Zeng, Yong Liu, Zheng Zhao, Yujiu Yang | 2024-12-18 | arXiv | https://github.com/congvvc/InstructSeg | http://arxiv.org/abs/2412.14006v1 |
785 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie | 2024-12-18 | arXiv | https://vision-x-nyu.github.io/thinking-in-space.github.io/ | http://arxiv.org/abs/2412.14171v1 |
786 | Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting | Vijay Goyal, Mustafa Khan, Aprameya Tirupati, Harveer Saini, Michael Lam, Kevin Zhu | 2024-12-18 | arXiv | https://github.com/alonso130r/knowledge-distillation | http://arxiv.org/abs/2412.17846v1 |
787 | Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings | Yuanhe Zhang, Zhenhong Zhou, Wei Zhang, Xinyue Wang, Xiaojun Jia, Yang Liu, Sen Su | 2024-12-18 | arXiv | https://github.com/shuita2333/AutoDoS | http://arxiv.org/abs/2412.13879v1 |
788 | Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games | Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, Kai Han | 2024-12-18 | arXiv | https://visual-ai.github.io/gamebot | http://arxiv.org/abs/2412.13602v1 |
789 | Are Your LLMs Capable of Stable Reasoning? | Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen | 2024-12-17 | arXiv | https://github.com/open-compass/GPassK | http://arxiv.org/abs/2412.13147v2 |
790 | Assessing the Limitations of Large Language Models in Clinical Fact Decomposition | Monica Munnangi, Akshay Swaminathan, Jason Alan Fries, Jenelle Jindal, Sanjana Narayanan, Ivan Lopez, Lucia Tu, Philip Chung, Jesutofunmi A. Omiye, Mehr Kashyap, Nigam Shah | 2024-12-17 | arXiv | https://github.com/som-shahlab/factehr | http://arxiv.org/abs/2412.12422v1 |
791 | Benchmarking and Understanding Compositional Relational Reasoning of LLMs | Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang Liang | 2024-12-17 | arXiv | https://github.com/Caiyun-AI/GAR | http://arxiv.org/abs/2412.12841v1 |
792 | Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks | Xunkai Li, Zhengyu Wu, Jiayi Wu, Hanwen Cui, Jishuo Jia, Rong-Hua Li, Guoren Wang | 2024-12-17 | arXiv | https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers | http://arxiv.org/abs/2412.12456v1 |
793 | SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Sheng Yin, Xianghe Pang, Yuanzhuo Ding, Menglan Chen, Yutong Bi, Yichen Xiong, Wenhao Huang, Zhen Xiang, Jing Shao, Siheng Chen | 2024-12-17 | arXiv | https://github.com/shengyin1224/SafeAgentBench | http://arxiv.org/abs/2412.13178v2 |
794 | SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for Autonomous Vehicles with Large Language Models | Zhiyuan Zhou, Heye Huang, Boqi Li, Shiyue Zhao, Yao Mu, Jianqiang Wang | 2024-12-17 | arXiv | https://mezzi33.github.io/SafeDrive/ | http://arxiv.org/abs/2412.13238v2 |
795 | RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation | Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou | 2024-12-16 | arXiv | https://github.com/sunnynexus/RetroLLM | http://arxiv.org/abs/2412.11919v1 |
796 | RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement | Junjie Lin, Jian Zhao, Lin Liu, Yue Deng, Youpeng Zhao, Lanxiao Huang, Xia Lin, Wengang Zhou, Houqiang Li | 2024-12-16 | arXiv | https://github.com/Linjunjie99/RL-LLM-DT | http://arxiv.org/abs/2412.11417v2 |
797 | LLMs Can Simulate Standardized Patients via Agent Coevolution | Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying | 2024-12-16 | arXiv | https://github.com/ZJUMAI/EvoPatient | http://arxiv.org/abs/2412.11716v1 |
798 | Does VLM Classification Benefit from LLM Description Semantics? | Pingchuan Ma, Lennart Rietdorf, Dmytro Kotovenko, Vincent Tao Hu, Björn Ommer | 2024-12-16 | arXiv | https://github.com/CompVis/DisCLIP | http://arxiv.org/abs/2412.11917v3 |
799 | Analyzing Images of Legal Documents: Toward Multi-Modal LLMs for Access to Justice | Hannes Westermann, Jaromir Savelka | 2024-12-16 | arXiv | https://github.com/hwestermann/AI4A2J_analyzing_images_of_legal_documents | http://arxiv.org/abs/2412.15260v1 |
800 | BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement | Yuhao Du, Shunian Chen, Wenbo Zan, Peizhao Li, Mingxuan Wang, Dingjie Song, Bo Li, Yan Hu, Benyou Wang | 2024-12-16 | arXiv | https://github.com/FreedomIntelligence/BlenderLLM | http://arxiv.org/abs/2412.14203v1 |
801 | Empowering LLMs to Understand and Generate Complex Vector Graphics | Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu | 2024-12-15 | arXiv | https://ximinng.github.io/LLM4SVGProject/ | http://arxiv.org/abs/2412.11102v1 |
802 | NITRO: LLM Inference on Intel Laptop NPUs | Anthony Fei, Mohamed S. Abdelfattah | 2024-12-15 | arXiv | https://github.com/abdelfattah-lab/nitro | http://arxiv.org/abs/2412.11053v1 |
803 | Learning to Verify Summary Facts with Fine-Grained LLM Feedback | Jihwan Oh, Jeonghwan Choi, Nicole Hee-Yeon Kim, Taewon Yun, Hwanjun Song | 2024-12-14 | arXiv | https://github.com/DISL-Lab/FineSumFact | http://arxiv.org/abs/2412.10689v1 |
804 | B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens | Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu | 2024-12-13 | arXiv | https://github.com/zhuqiangLu/B-VLLM | http://arxiv.org/abs/2412.09919v1 |
805 | Can LLMs Convert Graphs to Text-Attributed Graphs? | Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye | 2024-12-13 | arXiv | https://github.com/Zehong-Wang/TANS | http://arxiv.org/abs/2412.10136v1 |
806 | ChainStream: An LLM-based Framework for Unified Synthetic Sensing | Jiacheng Liu, Yuanchun Li, Liangyan Li, Yi Sun, Hao Wen, Xiangyu Li, Yao Guo, Yunxin Liu | 2024-12-13 | arXiv | https://github.com/MobileLLM/ChainStream | http://arxiv.org/abs/2412.15240v1 |
807 | CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou | 2024-12-13 | arXiv | https://funaudiollm.github.io/cosyvoice2 | http://arxiv.org/abs/2412.10117v3 |
808 | Can Modern LLMs Act as Agent Cores in Radiology Environments? | Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie | 2024-12-12 | arXiv | https://github.com/MAGIC-AI4Med/RadABench | http://arxiv.org/abs/2412.09529v2 |
809 | RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios | Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang | 2024-12-12 | arXiv | https://github.com/skyriver-2000/RuleArena | http://arxiv.org/abs/2412.08972v1 |
810 | What Makes Cryptic Crosswords Challenging for LLMs? | Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar | 2024-12-12 | COLING 2025 | https://github.com/bodasadallah/decrypting-crosswords | http://arxiv.org/abs/2412.09012v1 |
811 | Autoformalizing and Simulating Game-Theoretic Scenarios using LLM-augmented Agents | Agnieszka Mensfelt, Kostas Stathis, Vince Trencsenyi | 2024-12-11 | arXiv | https://github.com/dicelab-rhul/autoformalizing-agents | http://arxiv.org/abs/2412.08805v1 |
812 | Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation | Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Ping Tan, Hongan Wang, Xiaoming Deng | 2024-12-11 | arXiv | https://multi-graspllm.github.io | http://arxiv.org/abs/2412.08468v1 |
813 | Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation | Pedro H. V. Valois, Lincon S. Souza, Erica K. Shimomoto, Kazuhiro Fukui | 2024-12-10 | arXiv | https://github.com/phvv-me/frame-representation-hypothesis | http://arxiv.org/abs/2412.07334v2 |
814 | LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation | Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh | 2024-12-10 | arXiv | https://github.com/interview-eval/ | http://arxiv.org/abs/2412.10424v2 |
815 | DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong | 2024-12-10 | arXiv | https://jianzongwu.github.io/projects/diffsensei/ | http://arxiv.org/abs/2412.07589v1 |
816 | IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model | Weizhen Bian, Siyan Liu, Yubo Zhou, Dezhi Chen, Yijie Liao, Zhenzhen Fan, Aobo Wang | 2024-12-10 | KSEM | https://github.com/LuckyBian/ISY5001 | https://doi.org/10.1007/978-981-97-5489-2_24 |
817 | PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models | Qian Zhang, Panfeng Chen, Jiali Li, Linkun Feng, Shuyu Liu, Heng Zhao, Mei Chen, Hui Li, Yanhao Wang | 2024-12-09 | arXiv | https://github.com/ACMISLab/PediaBench | http://arxiv.org/abs/2412.06287v2 |
818 | Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study | Ehsan Shareghi, Jiuzhou Han, Paul Burgess | 2024-12-09 | arXiv | https://auslawbench.github.io | http://arxiv.org/abs/2412.06272v1 |
819 | Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models | Xiao Xu, Tianhao Niu, Yuxi Xie, Libo Qin, Wanxiang Che, Min-Yen Kan | 2024-12-08 | arXiv | https://github.com/LooperXX/MMGiC | http://arxiv.org/abs/2412.05939v1 |
820 | LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods | Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, Yiqun Liu | 2024-12-07 | arXiv | https://github.com/CSHaitao/Awesome-LLMs-as-Judges | http://arxiv.org/abs/2412.05579v2 |
821 | Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning | Michael Hersche, Giacomo Camposampiero, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi | 2024-12-07 | arXiv | https://github.com/IBM/raven-large-language-models | http://arxiv.org/abs/2412.05586v1 |
822 | Training-Free Bayesianization for Low-Rank Adapters of Large Language Models | Haizhou Shi, Yibin Wang, Ligong Han, Huan Zhang, Hao Wang | 2024-12-07 | arXiv | https://github.com/Wang-ML-Lab/bayesian-peft | http://arxiv.org/abs/2412.05723v1 |
823 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu, Yuying Ge, Yi Chen, Yixiao Ge, Ying Shan, Xihui Liu | 2024-12-05 | arXiv | https://qiulu66.github.io/egoplanbench2/ | http://arxiv.org/abs/2412.04447v1 |
824 | Reinforcement Learning Enhanced LLMs: A Survey | Shuhe Wang, Shengyu Zhang, Jie Zhang, Runyi Hu, Xiaoya Li, Tianwei Zhang, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy | 2024-12-05 | arXiv | https://github.com/ShuheWang1998/Reinforcement-Learning-Enhanced-LLMs-A-Survey | http://arxiv.org/abs/2412.10400v2 |
825 | LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents | Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen | 2024-12-05 | arXiv | https://github.com/lbc12345/LossAgent | http://arxiv.org/abs/2412.04090v1 |
826 | AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning | Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang | 2024-12-04 | arXiv | https://github.com/LaVi-Lab/AIM | http://arxiv.org/abs/2412.03248v1 |
827 | Alignment at Pre-training! Towards Native Alignment for Arabic LLMs | Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu | 2024-12-04 | arXiv | https://github.com/FreedomIntelligence/AceGPT-v2 | http://arxiv.org/abs/2412.03253v1 |
828 | Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media | Kun Li, Chenwei Dai, Wei Zhou, Songlin Hu | 2024-12-04 | arXiv | https://github.com/linkseed18612254945/FineRob | http://arxiv.org/abs/2412.03148v1 |
829 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, Zhongyu Wei | 2024-12-04 | arXiv | https://github.com/FudanDISC/SocialAgent | http://arxiv.org/abs/2412.03563v1 |
830 | Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning | Long Mai, Julie Carson-Berndsen | 2024-12-04 | arXiv | https://github.com/mailong25/peft_diversity | http://arxiv.org/abs/2412.03343v1 |
831 | VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding | Chaoyu Li, Eun Woo Im, Pooyan Fazli | 2024-12-04 | arXiv | https://vid-halluc.github.io/ | http://arxiv.org/abs/2412.03735v1 |
832 | Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code | Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov | 2024-12-03 | arXiv | https://github.com/JetBrains-Research/PandasPlotBench | http://arxiv.org/abs/2412.02764v1 |
833 | Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design | Md Omar Faruque, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy | 2024-12-03 | arXiv | https://github.com/HSTRG1/GHOSTbenchmarks | http://arxiv.org/abs/2412.02816v1 |
834 | CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese Novels | Lingxiao Wei, He Yan, Xiangju Lu, Junmin Zhu, Jun Wang, Wei Zhang | 2024-12-03 | arXiv | https://github.com/CxsGhost/CNNSum | http://arxiv.org/abs/2412.02819v4 |
835 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue | 2024-12-03 | arXiv | https://av-odyssey.github.io/ | http://arxiv.org/abs/2412.02611v1 |
836 | DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline | Wenhao Sun, Sai Hou, Zixuan Wang, Bo Yu, Shaoshan Liu, Xu Yang, Shuai Liang, Yiming Gan, Yinhe Han | 2024-12-02 | arXiv | https://rlc-lab.github.io/dadu-e/ | http://arxiv.org/abs/2412.01663v1 |
837 | DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation | Jingyang Xiang, Sai Qian Zhang | 2024-12-01 | arXiv | https://github.com/JingyangXiang/DFRot | http://arxiv.org/abs/2412.00648v2 |
838 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu | 2024-12 | CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security | https://github.com/kstanghere/GenderCARE-ccs24 | https://dl.acm.org/doi/10.1145/3658644.3670284 |
839 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu | 2024-12 | SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region | https://github.com/oneal2000/EntityHallucination | https://dl.acm.org/doi/10.1145/3673791.3698403 |
840 | Optimization-based Prompt Injection Attack to LLM-as-a-Judge | Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong | 2024-12 | CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security | https://github.com/ShiJiawenwen/JudgeDeceiver | https://dl.acm.org/doi/10.1145/3658644.3690291 |
841 | PLeak: Prompt Leaking Attacks against Large Language Model Applications | Bo Hui, Haolin Yuan, Neil Zhenqiang Gong, Philippe Burlina, Yinzhi Cao | 2024-12 | CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security | https://github.com/BHui97/PLeak | https://dl.acm.org/doi/10.1145/3658644.3670370 |
842 | AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models | Yutong Zhou, Masahiro Ryo | 2024-11-30 | arXiv | https://github.com/Yutong-Zhou-cv/AgriBench | http://arxiv.org/abs/2412.00465v2 |
843 | Node Importance Estimation Leveraging LLMs for Semantic Augmentation in Knowledge Graphs | Xinyu Lin, Tianyu Zhang, Chengbin Hou, Jinbao Wang, Jianye Xue, Hairong Lv | 2024-11-30 | arXiv | https://github.com/XinyuLin-FZ/LENIE | http://arxiv.org/abs/2412.00478v1 |
844 | DroidCall: A Dataset for LLM-powered Android Intent Invocation | Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu | 2024-11-30 | arXiv | https://github.com/UbiquitousLearning/DroidCall | http://arxiv.org/abs/2412.00402v1 |
845 | Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings | Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji | 2024-11-29 | arXiv | https://github.com/DoubtedSteam/DyVTE | http://arxiv.org/abs/2411.19628v1 |
846 | Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | Tian Yu, Shaolei Zhang, Yang Feng | 2024-11-29 | arXiv | https://github.com/ictnlp/Auto-RAG | http://arxiv.org/abs/2411.19443v1 |
847 | Ensemble Watermarks for Large Language Models | Georg Niess, Roman Kern | 2024-11-29 | arXiv | http://github.com/CommodoreEU/master-generation | http://arxiv.org/abs/2411.19563v1 |
848 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin, Chaoyou Fu, Sirui Zhao, Yunhang Shen, Chunjiang Ge, Yan Yang, Zuwei Long, Yuhan Dai, Tong Xu, Xing Sun, Ran He, Caifeng Shan, Enhong Chen | 2024-11-29 | arXiv | https://github.com/xjtupanda/T2Vid | http://arxiv.org/abs/2411.19951v2 |
849 | TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension | Zipeng Qiu, You Peng, Guangxin He, Binhang Yuan, Chen Wang | 2024-11-29 | arXiv | https://github.com/Relaxed-System-Lab/TQA-Bench | http://arxiv.org/abs/2411.19504v1 |
850 | Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures | Yicheng Zhang, Zhen Qin, Zhaomin Wu, Shuiguang Deng | 2024-11-28 | arXiv | https://github.com/zyc140345/FedAMoLE | http://arxiv.org/abs/2411.19128v1 |
851 | TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability | Shimin Chen, Xiaohan Lan, Yitian Yuan, Zequn Jie, Lin Ma | 2024-11-27 | arXiv | https://github.com/TimeMarker-LLM/TimeMarker/ | http://arxiv.org/abs/2411.18211v1 |
852 | ChatRex: Taming Multimodal LLM for Joint Perception and Understanding | Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang | 2024-11-27 | arXiv | https://github.com/IDEA-Research/ChatRex | http://arxiv.org/abs/2411.18363v2 |
853 | Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models | Jingming Liu, Yumeng Li, Boyuan Xiao, Yichang Jian, Ziang Qin, Tianjia Shao, Yao-Xiang Ding, Kun Zhou | 2024-11-27 | arXiv | https://future-item.github.io/autoimagine-site | http://arxiv.org/abs/2411.18142v1 |
854 | Can LLMs be Good Graph Judger for Knowledge Graph Construction? | Haoyu Huang, Chong Chen, Conghui He, Yang Li, Jiawei Jiang, Wentao Zhang | 2024-11-26 | arXiv | https://github.com/hhy-huang/GraphJudger | http://arxiv.org/abs/2411.17388v1 |
855 | Leveraging Large Language Models and Topic Modeling for Toxicity Classification | Haniyeh Ehsani Oskouie, Christina Chance, Claire Huang, Margaret Capetz, Elizabeth Eyeson, Majid Sarrafzadeh | 2024-11-26 | arXiv | https://github.com/aheldis/Toxicity-Classification | http://arxiv.org/abs/2411.17876v1 |
856 | Star Attention: Efficient LLM Inference over Long Sequences | Shantanu Acharya, Fei Jia, Boris Ginsburg | 2024-11-26 | arXiv | https://github.com/NVIDIA/Star-Attention | http://arxiv.org/abs/2411.17116v1 |
857 | BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment | Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng | 2024-11-25 | arXiv | https://github.com/ictnlp/BayLing | https://doi.org/10.48550/arXiv.2411.16300 |
858 | Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering | Federico Cocchi, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara | 2024-11-25 | arXiv | https://github.com/aimagelab/ReflectiVA | http://arxiv.org/abs/2411.16863v1 |
859 | CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | Zhengmin Yu, Jiutian Zeng, Siyi Chen, Wenhan Xu, Dandan Xu, Xiangyu Liu, Zonghao Ying, Nan Wang, Yuan Zhang, Min Yang | 2024-11-25 | arXiv | https://github.com/CS-EVAL/CS-Eval | http://arxiv.org/abs/2411.16239v2 |
860 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu, Wanchao Su, Jing Liao | 2024-11-25 | arXiv | https://chat2svg.github.io/ | http://arxiv.org/abs/2411.16602v1 |
861 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang | 2024-11-25 | arXiv | https://mathcritique.github.io/ | http://arxiv.org/abs/2411.16579v1 |
862 | From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu | 2024-11-25 | arXiv | https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge | http://arxiv.org/abs/2411.16594v4 |
863 | VidHal: Benchmarking Temporal Hallucinations in Vision LLMs | Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli | 2024-11-25 | arXiv | https://github.com/Lookuz/VidHal | http://arxiv.org/abs/2411.16771v1 |
864 | ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration | Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin | 2024-11-25 | arXiv | https://github.com/om-ai-lab/ZoomEye | http://arxiv.org/abs/2411.16044v1 |
865 | Multi-label Sequential Sentence Classification via Large Language Model | Mengfei Lan, Lecheng Zheng, Shufan Ming, Halil Kilicoglu | 2024-11-23 | EMNLP | https://github.com/ScienceNLP-Lab/LLM-SSC | https://aclanthology.org/2024.findings-emnlp.944 |
866 | ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain | Haochen Zhao, Xiangru Tang, Ziran Yang, Xiao Han, Xuanzhi Feng, Yueqing Fan, Senhao Cheng, Di Jin, Yilun Zhao, Arman Cohan, Mark Gerstein | 2024-11-23 | arXiv | https://github.com/HaochenZhao/SafeAgent4Chem | http://arxiv.org/abs/2411.16736v1 |
867 | Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai | Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat | 2024-11-23 | arXiv | https://github.com/parinzee/seed-free-synthetic-instruct | http://arxiv.org/abs/2411.15484v1 |
868 | MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun, Ziwei Liu, Liang Wang, Caifeng Shan, Ran He | 2024-11-22 | arXiv | https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Benchmarks | http://arxiv.org/abs/2411.15296v2 |
869 | DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization | Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu | 2024-11-21 | arXiv | https://github.com/hexuandeng/DRPruning | http://arxiv.org/abs/2411.14055v1 |
870 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema, Akhil Kedia, Tae-Sun Chung | 2024-11-21 | arXiv | https://github.com/bethelmelesse/unifiedcrawl | http://arxiv.org/abs/2411.14343v1 |
871 | SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model | Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama | 2024-11-21 | arXiv | https://github.com/aitomatic/semikong | http://arxiv.org/abs/2411.13802v2 |
872 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang | 2024-11-20 | arXiv | https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning | http://arxiv.org/abs/2411.13504v2 |
873 | DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen | 2024-11-20 | arXiv | https://github.com/XiandaGuo/Drive-MLLM | http://arxiv.org/abs/2411.13112v2 |
874 | On the Consistency of Video Large Language Models in Temporal Comprehension | Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao | 2024-11-20 | arXiv | https://github.com/minjoong507/Consistency-of-Video-LLM | http://arxiv.org/abs/2411.12951v1 |
875 | Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods | Jai Doshi, Asa Cooper Stickland | 2024-11-18 | arXiv | https://github.com/JaiDoshi/Knowledge-Erasure | http://arxiv.org/abs/2411.12103v2 |
876 | FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training | Anjia Cao, Xing Wei, Zhiheng Ma | 2024-11-18 | arXiv | https://github.com/MIV-XJTU/FLAME | http://arxiv.org/abs/2411.11927v2 |
877 | BianCang: A Traditional Chinese Medicine Large Language Model | Sibo Wei, Xueping Peng, Yi-fei Wang, Jiasheng Si, Weiyu Zhang, Wenpeng Lu, Xiaoming Wu, Yinglong Wang | 2024-11-17 | arXiv | https://github.com/QLU-NLP/BianCang | http://arxiv.org/abs/2411.11027v1 |
878 | Multilingual Large Language Models: A Systematic Survey | Shaolin Zhu, Supryadi, Shaoyang Xu, Haoran Sun, Leiyu Pan, Menglong Cui, Jiangcun Du, Renren Jin, António Branco, Deyi Xiong | 2024-11-17 | arXiv | https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers | http://arxiv.org/abs/2411.11072v2 |
879 | TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models | Tingyu Qu, Mingxiao Li, Tinne Tuytelaars, Marie-Francine Moens | 2024-11-17 | arXiv | https://github.com/tingyu215/TS-LLaVA | http://arxiv.org/abs/2411.11066v1 |
880 | Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering | Zeping Yu, Sophia Ananiadou | 2024-11-17 | arXiv | https://github.com/zepingyu0512/llava-mechanism | http://arxiv.org/abs/2411.10950v1 |
881 | Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model | Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang | 2024-11-16 | arXiv | https://github.com/liuting20/MustDrop | http://arxiv.org/abs/2411.10803v1 |
882 | Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits | Yuxuan Huang | 2024-11-15 | arXiv | https://github.com/Aipura/Orca | http://arxiv.org/abs/2411.10006v1 |
883 | Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun | 2024-11-15 | arXiv | https://github.com/Terry-Xu-666/visual_inference_chain | http://arxiv.org/abs/2411.12591v1 |
884 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Preni Golazizian, Jesse Thomason, Morteza Dehghani | 2024-11-15 | arXiv | https://github.com/ParsaHejabi/Simulation-Framework-for-Multi-Agent-Balderdash | http://arxiv.org/abs/2411.10422v1 |
885 | Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen | 2024-11-15 | arXiv | https://github.com/tamlhp/awesome-instruction-editing | http://arxiv.org/abs/2411.09955v2 |
886 | MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing Zhao | 2024-11-14 | arXiv | https://github.com/joenahm/MM-Eval | http://arxiv.org/abs/2411.09492v1 |
887 | LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Zhenshi Li, Dilxat Muhtar, Feng Gu, Xueliang Zhang, Pengfeng Xiao, Guangjun He, Xiaoxiang Zhu | 2024-11-14 | arXiv | https://github.com/NJU-LHRS/LHRS-Bot | https://doi.org/10.48550/arXiv.2411.09301 |
888 | DROJ: A Prompt-Driven Attack against Large Language Models | Leyang Hu, Boran Wang | 2024-11-14 | arXiv | https://github.com/Leon-Leyang/LLM-Safeguard | http://arxiv.org/abs/2411.09125v1 |
889 | DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models | Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama | 2024-11-13 | arXiv | https://wyd0817.github.io/project-dart-llm/ | http://arxiv.org/abs/2411.09022v1 |
890 | CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design | Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li | 2024-11-13 | arXiv | https://github.com/AutoBench/CorrectBench | http://arxiv.org/abs/2411.08510v1 |
891 | Large Language Models Can Self-Improve in Long-context Reasoning | Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam | 2024-11-12 | arXiv | https://github.com/SihengLi99/SEALONG | http://arxiv.org/abs/2411.08147v1 |
892 | Verbosity |
Yusen Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang | 2024-11-12 | arXiv | https://github.com/psunlpgroup/VerbosityLLM | http://arxiv.org/abs/2411.07858v2 |
893 | ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? | Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Danielle Bitterman, Fei Wang, Kai Shu | 2024-11-10 | arXiv | https://clinicalbench.github.io | http://arxiv.org/abs/2411.06469v1 |
894 | Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | Xiaojun Wu, Junxi Liu, Huanyi Su, Zhouchi Lin, Yiyan Qi, Chengjin Xu, Jiajun Su, Jiajie Zhong, Fuwei Wang, Saizhuo Wang, Fengrui Hua, Jia Li, Jian Guo | 2024-11-09 | arXiv | https://github.com/IDEA-FinAI/Golden-Touchstone | http://arxiv.org/abs/2411.06272v1 |
895 | TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering | Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen | 2024-11-09 | arXiv | https://github.com/tsynbio/Toursynbio-Search | http://arxiv.org/abs/2411.06024v1 |
896 | Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation | Dong Shu, Bingbing Duan, Kai Guo, Kaixiong Zhou, Jiliang Tang, Mengnan Du | 2024-11-08 | arXiv | https://github.com/Tizzzzy/LLM-GDM-alignment | http://arxiv.org/abs/2411.05316v1 |
897 | Game-theoretic LLM: Agent Workflow for Negotiation Games | Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, Yongfeng Zhang | 2024-11-08 | arXiv | https://github.com/Wenyueh/game_theory | http://arxiv.org/abs/2411.05990v2 |
898 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun | 2024-11-08 | arXiv | https://github.com/OpenBMB/WorkflowLLM | http://arxiv.org/abs/2411.05451v1 |
899 | FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? | Eric Wu, Kevin Wu, James Zou | 2024-11-07 | arXiv | https://github.com/kevinwu23/StanfordFineTuneBench | http://arxiv.org/abs/2411.05059v2 |
900 | Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Ho-Jin Choi | 2024-11-07 | arXiv | https://github.com/passing2961/Thanos | http://arxiv.org/abs/2411.04496v1 |
901 | Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation | Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Natraj Raman, Sriram Gopalakrishnan, Tanmoy Chakraborty | 2024-11-07 | arXiv | https://github.com/LCS2-IIITD/MonteCLoRA | http://arxiv.org/abs/2411.04358v2 |
902 | AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering | Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen | 2024-11-07 | arXiv | https://github.com/tsynbio/AutoPE | http://arxiv.org/abs/2411.04440v1 |
903 | Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities | Shengzhi Li, Kittipat Kampa, Rongyu Lin, Bohang Li, Shichao Pei | 2024-11-07 | arXiv | https://github.com/findalexli/Abstract2Appendix | http://arxiv.org/abs/2411.05232v1 |
904 | QUILL: Quotation Generation Enhancement of Large Language Models | Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao | 2024-11-06 | arXiv | https://github.com/GraceXiaoo/QUILL | http://arxiv.org/abs/2411.03675v1 |
905 | Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy | Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu | 2024-11-05 | arXiv | https://github.com/RazvanDu/DynamicSlicing | http://arxiv.org/abs/2411.03513v1 |
906 | Leveraging Large Language Models in Code Question Answering: Baselines and Issues | Georgy Andryushchenko, Vladimir Ivanov, Vladimir Makharev, Elizaveta Tukhtina, Aidar Valeev | 2024-11-05 | arXiv | https://github.com/IU-AES-AI4Code/CodeQuestionAnswering | http://arxiv.org/abs/2411.03012v1 |
907 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li, Zhen Tan, Peijia Qian, Yifan Li, Kumar Satvik Chaudhary, Lijie Hu, Jiayi Shen | 2024-11-05 | arXiv | https://github.com/David-Li0406/SMoA | http://arxiv.org/abs/2411.03284v1 |
908 | Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment | Jason Vega, Junsheng Huang, Gaokai Zhang, Hangoo Kang, Minjia Zhang, Gagandeep Singh | 2024-11-05 | arXiv | https://github.com/uiuc-focal-lab/stochastic-monkeys/ | http://arxiv.org/abs/2411.02785v2 |
909 | Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task | Hoonick Lee, Mogan Gim, Donghyeon Park, Donghee Choi, Jaewoo Kang | 2024-11-04 | arXiv | http://github.com/dmis-lab/CulinaryASH | http://arxiv.org/abs/2411.01996v1 |
910 | Eurekaverse: Environment Curriculum Generation via Large Language Models | William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma | 2024-11-04 | arXiv | https://eureka-research.github.io/eurekaverse | http://arxiv.org/abs/2411.01775v1 |
911 | SQL Injection Jailbreak: a structural disaster of large language models | Jiawei Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu | 2024-11-03 | arXiv | https://github.com/weiyezhimeng/SQL-Injection-Jailbreak | http://arxiv.org/abs/2411.01565v3 |
912 | Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis | Shijia Liao, Yuxuan Wang, Tianyu Li, Yifan Cheng, Ruoyi Zhang, Rongzhi Zhou, Yijin Xing | 2024-11-02 | arXiv | https://github.com/fishaudio/fish-speech | http://arxiv.org/abs/2411.01156v2 |
913 | Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection | Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das | 2024-11-02 | arXiv | https://github.com/apple-yinhan/Noise-robust-SED | http://arxiv.org/abs/2411.01174v1 |
914 | TODO: Enhancing LLM Alignment with Ternary Preferences | Yuxiang Guo, Lu Yin, Bo Jiang, Jiaqi Zhang | 2024-11-02 | arXiv | https://github.com/XXares/TODO | http://arxiv.org/abs/2411.02442v1 |
915 | LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham | 2024-11-01 | arXiv | https://fsoft-aic.github.io/fsoft-LibMoE.github.io | http://arxiv.org/abs/2411.00918v1 |
916 | Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling | Yiwen Ding, Zhiheng Xi, Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-11-01 | arXiv | https://github.com/Yiwen-Ding/Guided-Self-Improvement | http://arxiv.org/abs/2411.00750v1 |
917 | SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen | 2024-11-01 | arXiv | https://jayzhang42.github.io/sled_page/ | http://arxiv.org/abs/2411.02433v2 |
918 | Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM | Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma | 2024-11-01 | arXiv | https://freeze-omni.github.io/ | http://arxiv.org/abs/2411.00774v5 |
919 | Beyond Utility: Evaluating LLM as Recommender | Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang | 2024-11-01 | arXiv | https://github.com/JiangDeccc/EvaLLMasRecommender | http://arxiv.org/abs/2411.00331v1 |
920 | MoD: A Distribution-Based Approach for Merging Large Language Models | Quy-Anh Dang, Chris Ngo | 2024-11-01 | arXiv | https://github.com/knovel-eng/mod | http://arxiv.org/abs/2411.00406v1 |
921 | EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting | Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reddy Bommu, Yang Katie Zhao, Yingyan Celine Lin | 2024-11 | DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference | https://github.com/GATECH-EIC/Edge-LLM | https://dl.acm.org/doi/10.1145/3649329.3658473 |
922 | Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning | Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman | 2024-11 | SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis | https://github.com/PoSeiDon-Workflows/LLM_AD | https://dl.acm.org/doi/10.1109/SC41406.2024.00098 |
923 | Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging | Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang | 2024-11 | LAMPS '24: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis | https://github.com/ThuCCSLab/MergeGuard | https://dl.acm.org/doi/10.1145/3689217.3690614 |
924 | BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments | Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu | 2024-10-31 | arXiv | https://github.com/xinghaow99/BitStack | http://arxiv.org/abs/2410.23918v1 |
925 | DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao | 2024-10-31 | arXiv | https://github.com/NLP2CT/DetectRL | http://arxiv.org/abs/2410.23746v1 |
926 | End-to-End Ontology Learning with Large Language Models | Andy Lo, Albert Q. Jiang, Wenda Li, Mateja Jamnik | 2024-10-31 | arXiv | https://github.com/andylolu2/ollm | http://arxiv.org/abs/2410.23584v1 |
927 | LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction | Andre Niyongabo Rubungo, Kangming Li, Jason Hattrick-Simpers, Adji Bousso Dieng | 2024-10-31 | arXiv | https://github.com/vertaix/LLM4Mat-Bench | http://arxiv.org/abs/2411.00177v3 |
928 | LLaMo: Large Language Model-based Molecular Graph Assistant | Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim | 2024-10-31 | arXiv | https://github.com/mlvlab/LLaMo | http://arxiv.org/abs/2411.00871v1 |
929 | What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | Ming Li, Yanhong Li, Tianyi Zhou | 2024-10-31 | arXiv | https://github.com/MingLiiii/Layer_Gradient | http://arxiv.org/abs/2410.23743v1 |
930 | On Memorization of Large Language Models in Logical Reasoning | Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar | 2024-10-30 | arXiv | https://memkklogic.github.io | http://arxiv.org/abs/2410.23123v1 |
931 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay, Xiangjue Dong, James Caverlee | 2024-10-30 | arXiv | https://github.com/millenniumbismay/reasoningrec | http://arxiv.org/abs/2410.23180v1 |
932 | Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback | Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos | 2024-10-30 | arXiv | https://github.com/facebookresearch/oni | http://arxiv.org/abs/2410.23022v2 |
933 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye | 2024-10-30 | arXiv | https://github.com/cheerss/SciPIP | http://arxiv.org/abs/2410.23166v1 |
934 | Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning | Dong Shu, Mengnan Du | 2024-10-30 | arXiv | https://github.com/Tizzzzy/Demonstration_Selection_Overview | http://arxiv.org/abs/2410.23099v1 |
935 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao, Zhijin Fang, Shu Li, Shaohui Yang, Shichao He | 2024-10-30 | arXiv | https://github.com/JunqiZhao888/buzz-llm | http://arxiv.org/abs/2410.23079v1 |
936 | Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning | Keqin Bao, Ming Yan, Yang Zhang, Jizhi Zhang, Wenjie Wang, Fuli Feng, Xiangnan He | 2024-10-30 | arXiv | https://github.com/ym689/rec_icl | http://arxiv.org/abs/2410.23136v1 |
937 | Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation | Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua | 2024-10-30 | arXiv | https://github.com/itsmeyjt/CFT | http://arxiv.org/abs/2410.22809v1 |
938 | Distinguishing Ignorance from Error in LLM Hallucinations | Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov | 2024-10-29 | arXiv | https://github.com/technion-cs-nlp/hallucination-mitigation | http://arxiv.org/abs/2410.22071v1 |
939 | Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach | Qingchuan Li, Jiatong Li, Tongxuan Liu, Yuting Zeng, Mingyue Cheng, Weizhe Huang, Qi Liu | 2024-10-29 | arXiv | https://github.com/wufeiwuwoshihua/nshy | http://arxiv.org/abs/2410.21779v1 |
940 | Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance | Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho | 2024-10-29 | arXiv | https://github.com/krafton-ai/Rare2Frequent | http://arxiv.org/abs/2410.22376v1 |
941 | Scaling LLM Inference with Optimized Sample Compute Allocation | Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li | 2024-10-29 | arXiv | https://github.com/LeiLiLab/OSCA | http://arxiv.org/abs/2410.22480v1 |
942 | Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese | 2024-10-28 | arXiv | https://github.com/pasquini-dario/project_mantis | http://arxiv.org/abs/2410.20911v2 |
943 | LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment | Ge Yang, Changyi He, Jinyang Guo, Jianyu Wu, Yifu Ding, Aishan Liu, Haotong Qin, Pengliang Ji, Xianglong Liu | 2024-10-28 | arXiv | https://github.com/AboveParadise/LLMCBench | http://arxiv.org/abs/2410.21352v2 |
944 | NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates | Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu | 2024-10-28 | arXiv | https://github.com/hexuandeng/NewTerm | http://arxiv.org/abs/2410.20814v1 |
945 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen | 2024-10-28 | arXiv | https://github.com/bytedance/ShadowKV | http://arxiv.org/abs/2410.21465v1 |
946 | Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models | Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin | 2024-10-28 | arXiv | https://github.com/KL4805/ShoppingMMLU | http://arxiv.org/abs/2410.20745v2 |
947 | SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister | 2024-10-28 | arXiv | https://mengzibin.github.io/SocialGPT.github.io/ | http://arxiv.org/abs/2410.21411v1 |
948 | Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye | Yirong Sun, Dawei Zhu, Yanjun Chen, Erjia Xiao, Xinghao Chen, Xiaoyu Shen | 2024-10-28 | arXiv | https://github.com/EIT-NLP/BLEUless_DocMT | http://arxiv.org/abs/2410.20941v2 |
949 | Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data | Xinhong Xie, Tao Li, Quanyan Zhu | 2024-10-27 | arXiv | https://github.com/XXXinhong/Detoxification_LLM | http://arxiv.org/abs/2410.20298v1 |
950 | Enhancing Inflation Nowcasting with LLM: Sentiment Analysis on News | Marc-Antoine Allard, Paul Teiletche, Adam Zinebi | 2024-10-26 | arXiv | https://github.com/paultltc/InflaBERT | http://arxiv.org/abs/2410.20198v1 |
951 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen | 2024-10-26 | arXiv | https://github.com/JiazuoYu/PathWeave | http://arxiv.org/abs/2410.20178v2 |
952 | Language Agents Meet Causality -- Bridging LLMs and Causal World Models | John Gkountouras, Matthias Lindemann, Phillip Lippe, Efstratios Gavves, Ivan Titov | 2024-10-25 | arXiv | https://j0hngou.github.io/LLMCWM/ | http://arxiv.org/abs/2410.19923v1 |
953 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury | 2024-10-25 | arXiv | https://portal-cornell.github.io/apricot/ | http://arxiv.org/abs/2410.19656v1 |
954 | Delving into the Reversal Curse: How Far Can Large Language Models Generalize? | Zhengkai Lin, Zhihang Fu, Kai Liu, Liang Xie, Binbin Lin, Wenxiao Wang, Deng Cai, Yue Wu, Jieping Ye | 2024-10-24 | arXiv | https://github.com/alibaba/thinking_bias | http://arxiv.org/abs/2410.18808v2 |
955 | GCoder: Improving Large Language Model for Generalized Graph Problem Solving | Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, Jia Li | 2024-10-24 | arXiv | https://github.com/Bklight999/WWW25-GCoder/tree/master | http://arxiv.org/abs/2410.19084v1 |
956 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang | 2024-10-24 | arXiv | https://github.com/VITA-Group/READ-ME | http://arxiv.org/abs/2410.19123v1 |
957 | Distill Visual Chart Reasoning Ability from LLMs to MLLMs | Wei He, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-10-24 | arXiv | https://github.com/hewei2001/ReachQA | http://arxiv.org/abs/2410.18798v1 |
958 | CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation | Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen | 2024-10-23 | arXiv | https://wangqinsi1.github.io/coreinfer_page/ | http://arxiv.org/abs/2410.18311v1 |
959 | Cross-model Control: Improving Multiple Large Language Models in One-time Training | Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao | 2024-10-23 | arXiv | https://github.com/wujwyi/CMC | http://arxiv.org/abs/2410.17599v1 |
960 | ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage | Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang | 2024-10-22 | arXiv | https://github.com/dmis-lab/ETHIC | http://arxiv.org/abs/2410.16848v1 |
961 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li | 2024-10-22 | arXiv | https://github.com/MatthewCYM/VoiceBench | http://arxiv.org/abs/2410.17196v3 |
962 | Improving Causal Reasoning in Large Language Models: A Survey | Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Qingzhen Liu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan | 2024-10-22 | arXiv | https://github.com/chendl02/Awesome-LLM-causal-reasoning | http://arxiv.org/abs/2410.16676v3 |
963 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park, Rhydian Windsor, Amir Jamaludin, Andrew Zisserman | 2024-10-22 | MICCAI | https://github.com/robinyjpark/AutoLabelClassifier | https://doi.org/10.1007/978-3-031-72086-4_10 |
964 | DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models | Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao | 2024-10-22 | arXiv | https://github.com/ChnQ/DEAN | http://arxiv.org/abs/2410.16672v1 |
965 | AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration | Bradley McDanel | 2024-10-22 | arXiv | https://github.com/BradMcDanel/AMUSD/ | http://arxiv.org/abs/2410.17375v1 |
966 | CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing | Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou | 2024-10-22 | arXiv | https://github.com/uclaml/COPS | http://arxiv.org/abs/2410.16670v1 |
967 | Boosting Jailbreak Transferability for Large Language Models | Hanqing Liu, Lifeng Zhou, Huanqian Yan | 2024-10-21 | arXiv | https://github.com/HqingLiu/SI-GCG | http://arxiv.org/abs/2410.15645v2 |
968 | Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report | Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, Pekka Abrahamsson | 2024-10-21 | arXiv | https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs | http://arxiv.org/abs/2410.15944v1 |
969 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai | 2024-10-21 | arXiv | https://github.com/Fantasyele/LLaVA-KD | http://arxiv.org/abs/2410.16236v2 |
970 | MagicPIG: LSH Sampling for Efficient LLM Generation | Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen | 2024-10-21 | arXiv | https://github.com/Infini-AI-Lab/MagicPIG | http://arxiv.org/abs/2410.16179v4 |
971 | Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs | Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma | 2024-10-21 | arXiv | https://github.com/soacker/Mesa-Extrapolation | http://arxiv.org/abs/2410.15859v3 |
972 | RAC: Efficient LLM Factuality Correction with Retrieval Augmentation | Changmao Li, Jeffrey Flanigan | 2024-10-21 | arXiv | https://github.com/jlab-nlp/Retrieval-Augmented-Correction | http://arxiv.org/abs/2410.15667v1 |
973 | CausalGraph2LLM: Evaluating LLMs for Causal Queries | Ivaxi Sheth, Bahare Fatemi, Mario Fritz | 2024-10-21 | arXiv | https://github.com/ivaxi0s/CausalGraph2LLM | http://arxiv.org/abs/2410.15939v1 |
974 | A Comprehensive Evaluation of Cognitive Biases in LLMs | Simon Malberg, Roman Poletukhin, Carolin M. Schuster, Georg Groh | 2024-10-20 | arXiv | https://github.com/simonmalberg/cognitive-biases-in-llms | http://arxiv.org/abs/2410.15413v1 |
975 | Are LLMs Good Zero-Shot Fallacy Classifiers? | Fengjun Pan, Xiaobao Wu, Zongrui Li, Anh Tuan Luu | 2024-10-19 | arXiv | https://github.com/panFJCharlotte98/Fallacy_Detection | http://arxiv.org/abs/2410.15050v1 |
976 | Evaluating Deep Unlearning in Large Language Models | Ruihan Wu, Chhavi Yadav, Russ Salakhutdinov, Kamalika Chaudhuri | 2024-10-19 | arXiv | https://github.com/wrh14/deep_unlearning | http://arxiv.org/abs/2410.15153v3 |
977 | Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction | Yinhan He, Zaiyi Zheng, Patrick Soga, Yaozhen Zhu, yushun Dong, Jundong Li | 2024-10-19 | EMNLP 2024 (Findings) | https://github.com/YinhanHe123/new\_LLM4GNNExplanation | http://arxiv.org/abs/2410.15165v1 |
978 | GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization | Zihui Wu, Haichang Gao, Ping Wang, Shudong Zhang, Zhaoxiang Liu, Shiguo Lian | 2024-10-19 | arXiv | https://github.com/wooozihui/GlitchMiner | http://arxiv.org/abs/2410.15052v4 |
979 | Imprompter: Tricking LLM Agents into Improper Tool Use | Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, Earlence Fernandes | 2024-10-19 | arXiv | https://github.com/Reapor-Yurnero/imprompter | http://arxiv.org/abs/2410.14923v2 |
980 | MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification | Yin Li, Liangwei Wang, Shiyuan Piao, Boo-Ho Yang, Ziyue Li, Wei Zeng, Fugee Tsung | 2024-10-19 | arXiv | https://github.com/MCCodeAI/MCCoder | http://arxiv.org/abs/2410.15154v1 |
981 | SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent | Jiarui Ji, Yang Li, Hongtao Liu, Zhicheng Du, Zhewei Wei, Weiran Shen, Qi Qi, Yankai Lin | 2024-10-18 | arXiv | https://github.com/jijiarui-cather/SRAPAgent_Framework | http://arxiv.org/abs/2410.14152v1 |
982 | Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen | 2024-10-18 | arXiv | https://github.com/ShuoTang123/MATRIX-Gen | http://arxiv.org/abs/2410.14251v1 |
983 | CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic | Huaiyuan Yao, Longchao Da, Vishnu Nandam, Justin Turnau, Zhiwei Liu, Linsey Pang, Hua Wei | 2024-10-18 | arXiv | https://github.com/Hyan-Yao/CoMAL | http://arxiv.org/abs/2410.14368v1 |
984 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz, Raphael Poulain, Rahmatollah Beheshti | 2024-10-18 | arXiv | https://github.com/healthylaife/autofair | http://arxiv.org/abs/2410.14763v1 |
985 | Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models | Wei Jie Yeo, Ranjan Satapathy, Erik Cambria | 2024-10-18 | arXiv | https://github.com/wj210/Causal-Faithfulness | https://doi.org/10.48550/arXiv.2410.14155 |
986 | Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models | Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu | 2024-10-17 | EMNLP | https://github.com/yyhappier/ShortcutSuite | https://aclanthology.org/2024.emnlp-main.679 |
987 | Data Defenses Against Large Language Models | William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, Sauvik Das | 2024-10-17 | arXiv | https://github.com/wagnew3/LLMDataDefenses | http://arxiv.org/abs/2410.13138v1 |
988 | FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs | Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad | 2024-10-17 | arXiv | https://github.com/vectara/FaithBench | http://arxiv.org/abs/2410.13210v1 |
989 | LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models | David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner | 2024-10-17 | arXiv | https://github.com/amazon-science/llm-rank-pruning | http://arxiv.org/abs/2410.13299v2 |
990 | Retrieval-Augmented Personalization for Multimodal Large Language Models | Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue | 2024-10-17 | arXiv | https://github.com/Hoar012/RAP-MLLM | http://arxiv.org/abs/2410.13360v2 |
991 | SLM-Mod: Small Language Models Surpass LLMs at Content Moderation | Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha | 2024-10-17 | arXiv | https://github.com/AGoyal0512/SLM-Mod | http://arxiv.org/abs/2410.13155v1 |
992 | aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion | Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li | 2024-10-17 | arXiv | https://github.com/aixcoder-plugin/aiXcoder-7B | http://arxiv.org/abs/2410.13187v2 |
993 | Hypothesis Testing the Circuit Hypothesis in LLMs | Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei | 2024-10-16 | arXiv | https://github.com/blei-lab/circuitry | http://arxiv.org/abs/2410.13032v1 |
994 | Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors | Weixuan Wang, Jingyuan Yang, Wei Peng | 2024-10-16 | arXiv | https://github.com/weixuan-wang123/SADI | http://arxiv.org/abs/2410.12299v1 |
995 | Self-Pluralising Culture Alignment for Large Language Models | Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong | 2024-10-16 | arXiv | https://github.com/shaoyangxu/CultureSPA | http://arxiv.org/abs/2410.12971v1 |
996 | Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models | Iaroslav Chelombitko, Egor Safronov, Aleksey Komissarov | 2024-10-16 | arXiv | https://github.com/nup-csai/Qtok/ | http://arxiv.org/abs/2410.12989v1 |
997 | ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs | Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen | 2024-10-16 | arXiv | https://github.com/open-compass/ProSA | http://arxiv.org/abs/2410.12405v1 |
998 | POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization | Batuhan K. Karaman, Ishmam Zabir, Alon Benhaim, Vishrav Chaudhary, Mert R. Sabuncu, Xia Song | 2024-10-16 | arXiv | https://github.com/batuhankmkaraman/POROver | http://arxiv.org/abs/2410.12999v1 |
999 | DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs | Yingsong Luo, Ling Chen | 2024-10-16 | arXiv | https://github.com/LuoYingSong/DAQ | http://arxiv.org/abs/2410.12187v2 |
1000 | Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention | Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch | 2024-10-16 | arXiv | https://github.com/weixuan-wang123/INCLINE | https://doi.org/10.48550/arXiv.2410.12462 |
1001 | Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights | Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha | 2024-10-16 | arXiv | https://github.com/IBM/codellm-devkit | http://arxiv.org/abs/2410.13007v1 |
1002 | Exploring Model Kinship for Merging Large Language Models | Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen | 2024-10-16 | arXiv | https://github.com/zjunlp/ModelKinship | https://doi.org/10.48550/arXiv.2410.12613 |