Defined as the “ability for (computers) to learn without being explicitly programmed,” machine learning is huge news for the information security industry. It’s a technology that potentially can help security analysts with everything from malware and log analysis to possibly identifying and closing vulnerabilities earlier. Perhaps too, it could improve endpoint security, automate repetitive tasks, and even reduce the likelihood of attacks resulting in data exfiltration.
However, cybercriminals are also increasingly utilising machine learning and AI technologies in launching attacks.
How can cybercriminals use machine learning? Here are a few examples:
- Increasingly evasive malware
Malware creation is largely a manual process for cyber criminals. They write scripts to make up computer viruses and trojans, and leverage rootkits, password scrapers and other tools to aid distribution and execution.
But what if they could speed up this process? Is there a way machine learning could be help create malware?
The first known example of using machine learning for malware creation was presented in 2017 in a paper entitled “Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN.” In the report, the authors revealed how they built a generative adversarial network (GAN) based algorithm to generate adversarial malware samples that, critically, were able to bypass machine-learning-based detection systems.
Other researchers, have predicted machine learning could ultimately be used to “modify code on the fly based on how and what has been detected in the lab,” an extension on polymorphic malware.
- Smart botnets for scalable attacks
Fortinet believes that 2018 will be the year of self-learning ‘hivenets’ and ‘swarmbots’, in essence marking the belief that ‘intelligent’ IoT devices can be commanded to attack vulnerable systems at scale. A subfield of AI, swarm technology is defined as the “collective behavior of decentralised, self-organised systems, natural or artificial” and is today already used in drones and fledgling robotics devices.
- Advanced spear phishing emails get smarter
One of the more obvious applications of adversarial machine learning is using algorithms like text-to-speech, speech recognition, and natural language processing (NLP) for smarter social engineering. After all, through recurring neural networks, you can already teach such software writing styles, so in theory phishing emails could become more sophisticated and believable.
In particular, machine learning could facilitate advanced spear phishing emails to be targeted at high-profile figures, while automating the process as a whole. Systems could be trained on genuine emails and learn to make something that looks and read convincing.
In McAfee Labs’ predictions for 2017, the firm said that criminals would increasingly look to use machine learning to analyze massive quantities of stolen records to identify potential victims and build contextually detailed emails that would very effectively target these individuals.
- Threat intelligence goes haywire
Threat intelligence is arguably a mixed blessing when it comes to machine learning. On the one hand, it is universally accepted that, in an age of false positives, machine learning systems will help analysts to identify the real threats coming from multiple systems. “Applying machine learning delivers two significant gains in the domain of threat intelligence,” said Staffan Truvé, CTO and co-founder, Recorded Future.
“First, the processing and structuring of such huge volumes of data, including analysis of the complex relationships within it, is a problem almost impossible to address with manpower alone. Augmenting the machine with a reasonably capable human, means you’re more effectively armed than ever to reveal and respond to emerging threats,” Truvé said. “The second is automation — taking all these tasks, which we as humans can perform without a problem, and using the technology to scale up to a much larger volume we could ever handle.”
However, there’s the belief, too, that criminals will adapt to simply overload those alerts once more. McAfee CTO Steve Grobman previously pointed to a technique known as “raising the noise floor.” A hacker will use this technique to bombard an environment in a way to generate a lot of false positives to common machine learning models. Once a target recalibrates its system to filter out the false alarms, the attacker can launch a real attack that can get by the machine learning system.
- Unauthorised access
An early example of machine learning for security attacks was published back in 2012, by researchers Claudia Cruz, Fernando Uceda, and Leobardo Reyes. They used support vector machines (SVM) to break a system running on reCAPTCHA images with an accuracy of 82 percent. All captcha mechanisms were subsequently improved, only for the researchers to use deep learning to break the CAPTCHA once more.
Separately, the “I am Robot” research at last year’s BlackHat revealed how researchers broke the latest semantic image CAPTCHA and compared various machine learning algorithms. The paper promised a 98 percent accuracy on breaking Google’s reCAPTCHA.
- Poisoning the machine learning engine
A far simpler, yet effective, technique is that the machine learning engine used to detect malware could be poisoned, rendering it ineffective, much like criminals have done with antivirus engines in the past. It sounds simple enough; the machine learning model learns from input data, if that data pool is poisoned, then the output is also poisoned. Researchers from New York University demonstrated how convolutional neural networks (CNNs) could be backdoored to produce these false (but controlled) results through CNNs like Google, Microsoft, and AWS.