2016]. In recent years, the advancement in deep learning has enabled it to provide performance at
par with what humans can do on several tasks [Silver et al., 2017] resulting in growing faith in such
real world deployed systems [Tesla, 2020] [Apple, 2020] [Grigorescu et al., 2020]. However, deep
learning systems are found to be vulnerable to adversarial attacks [Szegedy et al., 2013], which are
malicious inputs specially designed to confuse a trained model to wrongly classify the output.
2 Related Work
Rule-based signature-based approaches require a cybersecurity researcher to manually set up rules,
or categorize a binary as malware and mark its signature. This would require researchers to know
how every new malware works and is not a scalable approach. [Saxe and Berlin, 2015] propose a
deep learning based approach to help solve this problem. [Stokes et al., 2017] describe using deep
learning for malware detection as a double-edged sword, where deep learning could be really helpful
in identifying new, yet unknown malware, but miscreants can also come up with ways to fool the
neural networks by creating adversarial samples with small perturbations that do not change the
sample’s original function, but rather fools the network into classifying it into some other class.
[Kalash et al., 2018] used CNNs to classify binaries as malware or benign files where binaries
converted to an image representation were used. The authors were able to achieve best accuracy of
98.52% for the Malimg dataset [Nataraj et al., 2011], and best accuracy of 98.99% for the Microsoft
Malware Dataset [Ronen et al., 2018]. [Chen et al., 2019] evaluated various methods of conducting
adversarial attacks on CNN based malware detectors. The success rate of white-box attacks for the
Fast Gradient Sign Method (FGSM) was really low around 3%, whereas for the Bit-Flip Attack (BFA)
it was around a mean of 20%.
After the success which recurrent neural networks have shown for other tasks, they have been tried
for the task of malware detection [Beek et al., 2021]. [Tobiyama et al., 2016] used a combination of
convolutional neural networks and recurrent neural networks for the purpose of malware detection.
RNNs were used for feature extraction and CNNs were used for feature classification. They obtain a
best case AUC score of 0.96. With the use of RNN for malware detection, it became known that even
they are susceptible to adversarial samples due to the general susceptibility of neural networks to
adversarial attacks [Hu and Tan, 2017]. To simulate the more realistic black-box nature of attacks,
[Hu and Tan, 2017] first trained a substitute RNN to simulate the behavior of the detector to be
attacked. Another RNN was trained to create adversarial samples from malware inputs.
Previous methods did not look at the whole meaning of the assembly code, but rather looked
at different chunks of the assembly language instructions. To overcome this, Transformer-based
neural networks for malware detection were proposed by [Li et al., 2021]. These Transformer-
based approaches achieve better accuracy than previous approaches ([Moskovitch et al., 2008],
[Baldangombo et al., 2013], [Saxe and Berlin, 2015], [Mourtaji et al., 2019]) in all experiments.
3 Training a Transformer for malware detection
In this section we list down the details of training a competitive Transformers-based malware detector
on which we will carry out an adversarial attack in section 5, and evaluate defenses against the attack
in section 6.
3.1 System Design and Architecture
Our malware detection system is mainly divided into 3 parts: 1.
Assembly Module
- The assembly
module consists of a disassembler, a tokenizer and a Transformer. The input to the assembly module
is an exe file, which is fed directly to the disassembler. The assembly module is responsible to
calculate assembly language features, which would be used for final classification. 2.
Static Feature
Module
- The static feature module consists of a DLL extractor, and a string extractor. The input to
this is the same as that to the assembly module, an exe file. The DLL extractor extracts PE imports
from the file, and the string extractor extracts all the printable strings from the given input file. The
static feature module outputs two set of vectors, one from the DLL extractor, and the other from
string extractor. The output from the static feature module will be used for final classification. 3.
Neural Network Module
- The neural network module consists of a neural network, which takes in
2