Evaluating Large Language Models for Enhanced Fuzzing: An Analysis Framework for LLM-Driven Seed Generation
Outlet Title
IEEE Access
Document Type
Article
Publication Date
2024
Abstract
Fuzzing is a crucial technique for detecting software defects by dynamically generating and testing program inputs. This study introduces a framework designed to assess the application of Large Language Models (LLMs) to automate the generation of effective seed inputs for fuzzing, particularly in the Python programming environment where traditional approaches are less effective. Utilizing the Atheris fuzzing framework, we created over 38,000 seed inputs from LLMs targeted at 50 Python functions from widely-used libraries. Our findings underscore the critical role of LLM selection in seed effectiveness. In certain cases, seeds generated by LLMs rivaled or surpassed traditional fuzzing campaigns, with a corpus of fewer than 100 LLM-generated entries outperforming over 100,000 conventionally produced inputs. These seeds significantly improved code coverage and instruction count during fuzzing sessions, illustrating the efficacy of our framework in facilitating an automated, scalable approach to evaluating LLM effectiveness. The results, validated through linear regression analysis, demonstrate that selecting the appropriate LLM based on its training and capabilities is essential for optimizing fuzzing efficiency and facilitates the testing of future LLM versions.
Recommended Citation
Black, Gavin; Vaidyan, Varghese; and Comert, Gurcan, "Evaluating Large Language Models for Enhanced Fuzzing: An Analysis Framework for LLM-Driven Seed Generation" (2024). Research & Publications. 135.
https://scholar.dsu.edu/ccspapers/135