Download Full Text (1.4 MB)


The research project, Feature Extraction and, Analysis of Binaries for Classification, provides an in-depth examination of the features shared by unlabeled binary samples, for classification into the categories of benign or malicious software using several different methods. Because of the time it takes to manually analyze or reverse engineer binaries to determine their function, the ability to gather features and then instantly classify samples without explicitly programming the solution is incredibly valuable. It is possible to use an online service; however, this is not always viable depending on the sensitivity of the binary. With Python3 and the Pefile library, we can gather the necessary features to begin choosing different classifier models from the Scikit-learn library for machine learning. This all addresses the issue of local automated classification, and we present several different classifier models, datasets and methods that allow for the classification of unknown binaries with a high degree of accuracy for predicting malware and benignware.

Publication Date

Spring 4-9-2020


Computer Sciences | Databases and Information Systems | Data Science | Theory and Algorithms

Feature Extraction and Analysis of Binaries for Classification