Computer vision and deep learning introduce new methods to identify cyber threats

The technique could detect phishing websites with a 94 percent accuracy

Computer vision and deep learning introduce new methods to identify cyber threats

The demonstrated capability of neural networks in computer vision tasks sparked a surge in interest in deep learning throughout the previous decade. With enough annotated photos of cats and dogs, a neural network can detect repeating patterns in each category and categorize unknown images with reasonable accuracy.

In 2019, a group of cybersecurity experts asked if security threat detection may be approached as an image classification challenge. Their instincts were correct, and they were able to develop a machine learning model that could detect malware using images generated from application file content. The same method was utilized a year later to create a machine learning system that detects phishing websites.

A strong technique that combines binary visualization and machine learning can bring fresh solutions to old issues. It has shown potential in the field of cybersecurity, but it could also be used in other areas. Searching files for known signatures of harmful payloads is a common method of detecting malware. Malware detectors keep a database of virus definitions, which can contain opcode sequences or code snippets, and they look for these signatures in new files. Unfortunately, malware authors can readily evade detection by obfuscating their code or employing polymorphism techniques to change their code at runtime.

Dynamic analysis tools attempt to detect harmful behaviour while a programme is running, however, they are sluggish and need the creation of a sandbox environment to evaluate suspected programmes. 

Researchers have also experimented with a variety of machine learning algorithms to detect malware in recent years. These machine learning models have made headway in overcoming some of the obstacles of malware detection, such as code obfuscation. However, they introduce new problems, such as the requirement to memorize a large number of attributes and the use of a virtual environment to assess the target samples.

By turning malware detection into a computer vision challenge, binary visualization can reshape the field. Files are run through algorithms that convert binary and ASCII values to colour codes in this approach.

Researchers from the University of Plymouth and the University of Peloponnese shown in a report released in 2019 that when benign and malicious files are displayed using this method, new patterns arise that distinguish dangerous and safe data. Using traditional malware detection methods, these alterations would have gone undiscovered.

You can train an artificial neural network to determine the difference between harmful and safe files if you have detectable patterns. The researchers constructed a visual collection of binary files that comprised both benign and malicious files. The collection included a wide range of harmful payloads (viruses, worms, trojans, rootkits, and other threats) as well as file kinds (.exe, .doc, .pdf, .txt, etc.).

The photos were then utilized to build a neural network classifier. The self-organizing incremental neural network (SOINN) architecture was utilized, which is quick and especially good at coping with noisy input. They also employed an image preprocessing technique to reduce the binary images to 1,024-dimension feature vectors, which makes learning patterns in the input data more easier and faster. On a personal workstation with an Intel Core i5 processor, the resulting neural network was fast enough to calculate a training dataset with 4,000 samples in 15 seconds.

The deep learning model was especially good at detecting malware in.doc and.pdf files, which are the preferred medium for ransomware assaults, according to the researchers’ tests. The researchers believe that by adjusting the model to include filetype as one of its learning aspects, the model’s performance can be enhanced. The algorithm had a detection rate of roughly 74% overall.

Phishing assaults are becoming more common among businesses and people. Many phishing scams entice victims to click on a link to a malicious website that masquerades as a reputable service, prompting them to enter sensitive information such as passwords or financial information.

Blacklisting dangerous domains and whitelisting safe domains are two common methods for identifying phishing websites. The former strategy overlooks new phishing websites until someone falls for them, whereas the latter is overly restricted and necessitates significant effort to enable access to all safe domains.

Heuristics are used in other detecting approaches. These strategies are more accurate than blacklists, although they are still insufficient for optimal detection.

In the year 2020, a team of academics from the Universities of Plymouth and Portsmouth combined binary visualization with deep learning to create a new method for spotting phishing websites. The method converts website markup and source code into colour values using binary visualization libraries.

When visualising websites, certain patterns arise that distinguish safe from harmful websites, just as they do with benign and malignant programme files. The legal site has a more detailed RGB value since it would be built from additional characters taken from licences, hyperlinks, and detailed data entry forms, according to the study. The phishing equivalent, on the other hand, would often have a single or no CSS reference, many images rather than forms, and a single login form without any security routines. When scraped, this would result in a smaller data input string.

The researchers presented an example of the visual representation of the code of the PayPal login against the fake phishing PayPal website. The researchers used an image collection of valid and malicious website code to construct a classification machine learning algorithm. MobileNet, a lightweight convolutional neural network (CNN) built to run on consumer smartphones rather than high-capacity cloud servers, was the architecture they chose. Image categorization and object detection are two applications for which CNNs are particularly well adapted.

The model is then sent into a phishing detection tool after it has been trained. When a user comes across a new website, it first checks to see if the URL is in its list of harmful sites. If the domain is fresh, it is converted using a visualization method and then run through a neural network to see if it has dangerous website patterns. This two-step architecture ensures that the system makes use of the speed of blacklist databases as well as the smart identification of phishing detection techniques based on neural networks.

The technique could detect phishing websites with a 94 percent accuracy, according to the researchers’ tests. The researchers are currently working on adapting the approach for use in real-world scenarios. Machine learning will equip scientists with new methods to address cybersecurity concerns as it advances. Binary visualization demonstrates that with enough imagination and discipline, we can come up with new ways to solve old issues.