Microsoft and Intel develop STAMINA a new deep learning tool for detecting and classifying malware
The fight against malware is getting stronger with Microsoft and Intel joining hands for a new research project that converts the malware into images so that researchers can analyze it without it infecting their systems.
Microsoft and Intel have launched a deep learning tool called STAMINA (STAtic Malware-as-Image Network Analysis). STAMINA uses deep learning techniques to convert malware into grayscale images and then scans the image for textural and structural patterns specific to malware samples.
The tool is based on a simple assumption that inspection of malware binaries plotted as grayscale images are not as complex as inspecting the original malware. The analysts have to just analyze the textural and structural similarities between binaries from the same malware families.
How will STAMINA work?
STAMINA consists of four steps: preprocessing (image conversion), transfer learning, evaluation, and interpretation. The intel-Microsoft research team working on STAMINA said that the procedure is first taking an input file and converting its binary form into a stream of raw pixel data. Once this was done, the researchers converted the 1D (one-dimensional) image to a 2D photo so that normal image analysis algorithms can analyze it.
The width of the image was selected based on the input file’s size, using the table below. The height was dynamic and resulted from dividing the raw pixel stream by the chosen width value. Once the image is ready, a deep learning tool is deployed to train a malware classifier for static malware classification. This tool will classify the malware based on its features, harm it can do and how it was written. The researchers say that it would be difficult to train an entire deep neural network from scratch, due to the limitation of datasets. However, Microsoft has given its library of malware data sets to the researchers as a starting point. The Microsoft malware library contains 2.2 million infected PE (Portable Executable) file hashes.
The researchers said that resizing the raw image did not “negatively impact the classification result,” and this was a necessary step so that the computational resources won’t have to work with images consisting of billions of pixels, which would most likely slow down processing.
The researchers have issued a White Paper detailing how STAMINA works.