This chapter explains the Literature Review for this project that layout the synopsis of researching of IoT botnet by using machine learning. The Literature Review discussed here provides the compilation from various authors and studies that have made this project before. This is why by comparing and contrasting the choice of using the right methodology is crucial to get a best experience for this project. In this chapter, published information regarding topics related to this project is reviewed and discussed. Besides, the problems related to this project is studied and analyzed. Further information regarding definition of Internet of Things (IoT), IoT issues, taxonomy botnets in IoT, botnet attacks, DDoS attack types, characteristics of botnets to arrange DDoS attacks, overview of Mirai attacks, overview of Hajime attacks, comparison between IoT botnets Mirai and Hajime, techniques for IoT botnets detection using machine learning and the previous research in the area of this topic are studied and the possible solution to the problem is proposed.
2.2.1Domain Related to this project
According to (Robert et. al, 2017), there are three parts of IoT security areas that are examined such as IoT vulnerabilities, the connected workplace and also IoT management.
The internet-able devices are significantly increased any devices connected, such as medical devices PCs, and cars, some harmless such as fridge or printer that could any easy route into a network for a hacker. Even these internet-able devices may not prime targets to protect, they still could be route into a network for a hacker to access valuable data or use together to cripple a network.
The Connected Workplace
The lack of security or malicious threats is increased by use hundreds and thousands of internet-connected devices. For example, printers can be connected in the workplace without any security updates and patches of laptops and mobile phones. The key of organizations or enterprises are to have clear view of their IT estate is visibility. The Internet of Thing devices should be considered as endpoint such as computer, mobile and tablet and should be monitored to detect malicious threats. The organizations or enterprises should control monitor, assess, and investigate all endpoints that any compromise can be quickly remediated (Kumbhar, 2017).
There is no standard platform to leverage the development of IoT applications, which means that designers need to start from scratch with each new application. The Applied Science and Technology Research Institute (ASTRI) has developed the “IoT Management and Application Platform” (IMAP). The system supports several technological standards for communication between devices and network architecture. This means the system can be used on different platforms to support IoT devices.
Malicious software (malware) exploits vulnerabilities in computing system. Malware includes viruses, worms, Trojan horses, spyware that gather information about a computer user and access to a system without permission. It can appear in the form of code, scripts, active content, or other software. According to Sanjeev and Ankur (2017), Malware programs are divided into 2 classes, first class of malwares needs a host program (viruses, Trojan horses, logic bombs, trapdoors) and second class of malwares are independent programs (worms, zombie). Other categorization of malwares does not replicate (activated by trigger) and others that producing copies of themselves.
Malware (especially viruses and worms) are self-replicate programs. Viruses require user interaction and propagate slower than worms because it needs user interaction while worms do not require user interaction and propagate quickly. All the bots are under the controlled of BotMaster. If bots exist in computer, it is not harmful until it receives command from BotMaster. After receiving the command from BotMaster, it is dangerous for system. These bots are not self-propagated from one network to another network. They are in idle state. After receiving the commands from BotMaster, they propagate from one system/network to system/network and to malicious activities.
Based on Kaspersky Lab report (2017), the reason behind the rise because of the IoT is fragile and exposed in the face of cybercriminals. The vast majority of smart devices are running operating systems based on Linux, making attacks on them easier because criminals can write generic malicious code that targets a huge number of devices simultaneously. Most of them do not even have a security solution and their manufacturers usually do not produce any security updates or new firmware. This means there are millions and millions of potentially vulnerable devices or maybe even devices that have been already compromised.
Figure 2.1 Malware Analysis in year 2013 – 2017 (Kaspersky Lab, 2017)
Smart devices such as smartwatches, smart TVs, routers, and cameras are connecting to each other and building the growing IoT phenomenon, a network of devices equipped with embedded technology that allows them to interact with each other or the external environment. Because of the large number and variety of devices, the IoT has become an attractive target for cybercriminals. By successfully hacking IoT devices criminals are able to spy on people, blackmail them, and even discreetly make them their partners in crime. What’s worse, botnets such as Mirai and Hajime have indicated that the threat is on the rise.
According to Kaspersky Lab report (2017) have conducted research into IoT malware to examine how serious the risk is. The report stated that the team have set up artificial networks, which simulate the networks of different IoT devices (routers, connected cameras etc.) to observe malware attempting to attack their virtual devices. Most of the attacks registered by the company’s experts targeted digital video recorders or IP cameras (63%), and 20% of hits were against network devices, including routers, and DSL modems, etc. About 1% of targets were people’s most common devices, like printers and smart home devices.
Figure 2.2 Distribution of Attack Sources by Device Type (Kaspersky Lab, 2017)
Internet of things (IoT)
The term “Internet of Things” means connecting the devices with one another it was introduced by Kevin in the year 1998 by (Effy et. al, 2016). The word “Things” in IoT can be referred to a wide variety of devices such as mobile phones and remote. In IoT, users can connect devices with one another and can create a huge network among them and the work is done without any human interference. Basically, IoT is a revolution allowing to build connection among various of people come across in their day to day life and their everyday interaction with the network with help of no human.
Based on (Kaspersky Lab, 2017) report, IoT devices often have weak security that is very easy to bypass. The number of malicious programs attacking the IoT has more than doubled this year. According to the (Cisco Internet Business Solutions Group, 2017), IoT is simply the point in time when more things or objects were connected to the Internet than people; i.e. from anytime, anyplace uniquely identifiable objects or “things” with a digital presence can be connected for anyone on any network. These connections will multiply and create an entirely new dynamic network of networks at any point in time.
Internet of Things (IoT) Issues
There are 5 list of IoT issues areas are examined to explore some of the most pressing challenges and questions related to the technology. The table below describes in details about the security, privacy, interoperability and standards, legal, regulatory and rights, and lastly emerging economies and development.
Table 2.1: IoT Issues (Karen, 2015)
IoT Issues Description
Security Users need to trust that IoT devices and related data services are secure from vulnerabilities, especially as this technology become more pervasive and integrated into users’ daily lives.
Poorly secured IoT devices and services can serve as potential entry points for cyber-attack and expose user data to theft by leaving data streams inadequately protected.
The interconnected nature of IoT devices means that every poorly secured device that is connected online potentially affects the security and resilience of the Internet globally.
Privacy The full potential of the Internet of Things depends on strategies that respect individual privacy choices across a broad spectrum of expectations.
The data streams and user specificity afforded by IoT devices can unlock incredible and unique value to IoT users but concerns about privacy and potential harms might hold back full adoption of the Internet of Things.
This means that privacy rights and respect for user privacy expectations are integral to ensuring user trust and confidence in the Internet, connected devices, and related services.
Interoperability/ Standards In addition, poorly designed and configured IoT devices may have negative consequences for the networking resources they connect to and the broader Internet.
The use of generic, open, and widely available standards as technical building blocks for IoT devices and services (such as the Internet Protocol) will support greater user benefits, innovation, and economic opportunity.
Legal, Regulatory and Rights The use of IoT devices raises many new regulatory and legal questions as well as amplifies existing legal issues around the Internet.
One set of issues surrounds cross border data flows, which occur when IoT devices collect data about people in one jurisdiction and transmit it to another jurisdiction with different data protection laws for processing.
Further, data collected by IoT devices is sometimes susceptible to misuse, potentially causing discriminatory outcomes for some users.
Emerging Economy and Development Issues The Internet of Things holds significant promise for delivering social and economic benefits to emerging and developing economies.
In addition, the unique needs and challenges of implementation in less-developed regions will need to be addressed, including infrastructure readiness, market and investment incentives, technical skill requirements, and policy resources.
Botnets Life Cycle
According to Sanjay (2006), botnets have been around since early 2004. The attacker machines are usually running in the Linux operating system. Botnet is a collection of compromised machines (bots) receiving and responding to commands from a server (the C&C server) that serves as a rendezvous mechanism for commands from a human controller (the botmaster) by (Sheharbano, 2014). A Bot meaning robot which is also called as Zombie. The compromised computer can be controlled remotely by botmaster by executing some orders through the received commands to install the new malware. The computer becomes a Bot or Zombie after the Bot code successfully installed into the compromised computers. Hence, the existing malware such as virus and worm which are focusing on attacking the infecting host can use bots to receive commands from botmaster and are used in distributed attack platform.
Figure 2.3 Structure of a typical botnet (Sheharbano, 2014)
Generally, specific attacker created botnets using one piece of malware to infect a large number of compromised machines. Botnets also known as a number of internet connected devices used by the botnets’ owner to perform various tasks. The owner can control the botnet using command and control (C;C) software. The compromised computer that form a botnet can be programmed to redirect transmission to a specific computer. According to Cooke et .al (2016), classified botnets upon their C;C and concluded that C;C communication is extremely flexible, and thus it is difficult for any botnet detection to rely on specific communication characteristics. The main difference between Botnet and other kind of malwares is the existence of Command-and-Control (C;C).
Type of Botnet Attacks
Botnets can serve both legitimate and illegitimate purposes. Based on Hongmei et. al 2009, botnets can perform various task such as Distributed Denial of Service attack, send spam and spread malware, steal data because of information leakage, click fraud and lastly identity fraud.
Botnets are often used for DDoS attacks, which can disable the network services of victim system by consuming its bandwidth. For instance, a perpetrator may order the botnet to connect a victim’s IRC channel at first, and then this target can be flooded by thousands of service requests from the botnet. In this kind of DDoS attack, the victim IRC network is taken down. Evidence reveals that most commonly implemented by botnets are TCP SYN and UDP flooding attacks.
Spamming and Spreading Malware
About 70% to 90% of the world’s spam is caused by botnets nowadays, which has most experienced in the Internet security industry concerned. Similarly, botnets can be used to spread malware too. For instance, a botnet can launch Witty worm to attack ICQ protocol since the victims’ system may have not activated Internet Security Systems (ISS) services.
In fact, some bots may sniff not only the traffic passing by the compromised machines but also the command data within the victims, perpetrators can retrieve sensitive information like usernames and passwords from botnets easily. Since the bots rarely affect the performance of the running infected systems, they are often out of the surveillance area and hard to be caught. Keylogging is the very solution to the inner attack. This enables the attacker to steal thousands of private information and credential data.
With the help of botnet, perpetrators are able to install advertisement add-ons and browser helper objects (BHOs) for business purpose. This is also effective to online polls or games because each victim’s host owns a unique IP address scattered across the globe, every single click will be regarded as a valid action from a legitimate person.
Identity Fraud also known as Identity Theft is a fast-growing crime on the Internet. It usually includes legitimate-like URLs and asks the receiver to submit personal or confidential information through spamming mechanisms. In a further step, botnets also can set up several fake websites pretending to be an official business sites to harvest victims’ information. Once a fake site is closed by its owner, another one can pop up, until you shut down the computer.
DDoS Attack Types
Botnets are often used for DDoS attacks, which can disable the network services of victim system by consuming its bandwidth by (Jing, 2009). According to (Kaspersky Lab, 2017) report, Distributed Denial of Service attacks are on the rise, with over a third (33%) of organizations facing a DDoS attack in 2017, compared to just 17 percent in 2016. In fact, all the organization are at risk of experiencing DDoS attack because of the rapid growth in the cyber threat. Besides, the DDoS attack will more powerful if more connected devices in the botnets. Almost hundreds of millions or maybe there will be billions of Internet-connected devices in future can be perform in such attack. Not all of devices are protected good enough, so the devices are likely to be a part of some IoT botnets. This is more complicated because it firstly requires access to a large number of compromised systems, a botnet, which can be used as distributed sources, all controlled from one master attack workstation. The figure below describes DDoS model by Robert and Eric, 2017.
Figure 2.4 DDoS Model (Robert and Eric, 2017)
Cybercriminals are increasingly using DDoS attacks as a way to gain access to valuable and lucrative corporate data, and not just to cripple a victim’s services. According to Alenezi (2016), criminals used a DDoS attack to disrupt the work of more than 80 major Internet services, including Twitter, Amazon, PayPal, and Netflix. (Based on Kaspersky Lab, 2015) report, the cost of such an incident is between $52,000 and $444,000, as a result of the inability to carry out core business, loss of contracts and opportunities, credit rating impact, and insurance premium increases.
Figure 2.5 Largest DDoS attacks for each year (Alenezi, 2016)
Based on (Kaspersky Lab, 2017) report, the share of SYN DDoS attacks decreased (from 60.43% to 55.63%) due to less activity by the Linux-based Xor DDoS botnet. These attacks still rank first, however the percentage of ICMP attacks (3.37%), still the least common, also fell. The relative frequency of other types of attacks increased, but whereas in the previous quarter TCP attacks ranked second after SYN, UDP overshadowed both these types, rising from second-to-last to second-from-top (in Q4 UDP DDoS accounted for 15.24% of all attacks). Botnets are capable of launching a number of attacks, like Distributed Denial of Service attacks (DDoS), Keylogging, Phishing and Spamming, Identity theft and even other Bot proliferation.
Figure 2.6 DDoS attacks by type in year 2017 (Kaspersky Lab, 2017)
Based on trusted website (Riorey, 2006), DDoS attacks can be distinguished in Network Layer and Application Layer. In Network layer attacks which is layer 3 are almost always DDoS assaults set up to clog the “pipelines” connecting the network, while in Application layer attacks which is layer 7 can be either DoS or DDoS threats that seek to overload a server by sending a large number of requests requiring resource-intensive handling and processing. This includes all approaches that target vulnerabilities or weaknesses in the network and transport layer of the OSI model. The protocols most often attacked are TCP, UDP or ICMP, as they support the Internet. This category is normally used in DDoS attacks because it can be directed against systems connected to the Internet. The examples of TCP attacks are SYN Flood, SYN-ACK Flood, Fragmented ACK, RST or FIN Flood, Synonymous IP, Fake Session, Session Attack and also Misused Application. Next, the TCP Http attack types are Http Fragmentation, Excessive VERB, Excessive VERB Single Session, Multiple VERB Single Request, Recursive GET, Random Recursive GET and Faulty Application. Moreover, the UDP attack types are UDP Flood, Fragmentation, DNS Flood, VoIP Flood, Media Data Flood and Non-Spoofed UDP Flood. The example of ICMP attack types are ICMP Flood, Fragmentation and also Ping Flood.
Figure 2.7 Classification in Network Layer (Riorey, 2006)
Characteristics of Botnets to Arrange DDoS Attacks
According to Kishore (2017), there are a few characteristics of IoT botnets used to arrange the DDoS attacks. Firstly, most of the IoT malwares are Linux based malwares. The majority of the IoT malware has limited or no side- effects on performance of the host. They become active and perform DDoS on certain command from its botnet sources. Next, many IoT malware reside on IoT devices’ temporary memory (RAM). Besides, most IoT malwares does not use reflection techniques to launch an attack, so it is much difficult to recognize and mitigate the attack using the conventional methods. The volume of traffic floods generated by IoT botnets are very high, in the orders of 100 Gbps or higher, in comparison to conventional PC botnets. Moreover, the location of the infected IoT devices are distributed all around the world. Lastly, apart from generating commonly used traffic floods, namely, HTTP, TCP, UDP traffic, some IoT botnets generates unconventional traffic like GRE traffic and use uncommon “DNS water torture” technique during DDoS attacks.
IoT Botnets: Mirai Attacks
Based on the statistics from the Malaysia Computer Emergency Response Team (MyCERT), the timeline below illustrates the emergence of Mirai from late 2016 to early 2017. Giaretta et.al (2017) also stated that the Mirai infected hundreds of thousands of connected devices all over the world in year 2016. Beginning in September 2016, a DDoS attack temporarily crippled Krebs on Security, OVH and Dyn. The initial attack on OVH using the Mirai botnet exceeded 1 Tbps in volume among the largest on record. MyCERT observed a large number of IP addresses from Malaysia infected with the Mirai botnet that were recruited to launch the DDoS attack. The Mirai infection in Malaysia is visualized beginning in October 2016, which was the first month, until September 2017. The graph is categorized into state, port number and variant.
Figure 2.8 Mirai Infections in Malaysia 2016 – 2017 (Sharifah, 2017)
The most predominant malware of the last years is Mirai attacks on IoT devices by Nicola et.al (2017). Mirai is a worm-like family of malware that infected IoT devices and corralled them into a DDoS botnet by (Sharifah and Sahrom 2017). Once exploited, the devices are reported to a control server in order to be used as part of a large-scale botnet. Hence, the botnet can be used to perpetrate several types of DDoS attacks exploiting a wide range of protocols.
Figure 2.9 IoT Malwares with DDoS Capabilities (Nicola, 2017)
Mirai botnet is perhaps the most famous of all IoT malware that took down a significant portion of the Internet. Mirai uses the default password for the telnet or SSH accounts to gain shell access. Once it is able to get access to this account, it installs malware on the system. This malware creates delayed processes and then deletes files that might alert antivirus software to its presence. It is difficult to identify an infected system without doing a memory analysis. Mirai opens ports and creates a connection with botmasters and then starts looking for other devices it can infect. After that, it waits for more instructions. Since it has no activity while it waits and no files left on the system, it is difficult to detect.
The Mirai botnet’s source code was released to the public which provided intrusion analysts insight into the attack and associated intrusion detection by (Anna, 2016). The DDoS traffic was produced by a variety of IoT devices. Once it identifies an insecure device, the malware tries to log in with a series of common default passwords used by manufacturers. If those passwords do not work, then Mirai uses brute force attacks to guess the password. Once a device is compromised, it connects to C;C infrastructure and can divert varying amounts of traffic toward a DDoS target.
According to Kambourakis et.al (2017), the bot part (coded in C) is responsible for unleashing one of several DDoS attacks and for exploring the IP space for new victims. In fact, Mirai botnets mostly targets Linux-based IoT devices. The Mirai’s infrastructure shown below is composed of a C&C module that provides the multiple attacks with a management console, a “report” or “collector” server that gathers and maintains information about the active bots in the botnet, as well as “loader” devices that facilitate the propagation of the malware to newly-discovered victims.
Figure 2.10 Overview of Mirai Communication and Basic Components (Kambourakis, 2017)
IoT Botnets: Hajime Attacks
Over the past few months, Hajime has been spreading quickly over worldwide. According to (Martin, 2017), Hajime meaning ‘beginning’ in Japanese, showed its first signs of activity in October 2016. Hajime was first discovered by researchers in October of last year and, just like Mirai ( HYPERLINK “https://www.symantec.com/security_response/writeup.jsp?docid=2014-100222-5658-99” “_self” Linux.Gafgyt), it spreads via unsecured devices that have open Telnet ports and use default passwords. In fact, Hajime uses the exact same username and password combinations that Mirai is programmed to use, plus two more. Hajime is a worm according to sources which have placed research on the subject by (Edwards, 2016).
Based on Kaspersky Lab report (2017) in figure 2.11, the malware is building a huge peer-to-peer botnet, a decentralized group of compromised machines discreetly performing spam or DDoS attacks. The very first big difference is that Hajime is built on a peer-to-peer network, whereas Mirai uses hardcoded addresses for the C&C server. Instead of a C&C server address, Hajime pushes command modules to the peer-to-peer network. Based on the hardcoded credentials included in the worm’s source code, Hajime targets routers, DVRs, and CCTV systems, just like Mirai by (Edwards, 2016).
Figure 2.11 Distribution of Hajime infectors by country (Kaspersky Lab, 2017)
According to Kaspersky Lab report (2017) in figure 2.12, there is no attacking code or capability in Hajime, only a propagation module. Hajime also an advanced and stealthy family, uses different techniques, which is mainly brute-force attacks on device passwords to infect devices and then takes a number of steps to conceal itself from the compromised victim. Thus, the device becomes part of the botnet. Once on an infected device, it takes multiple steps to conceal its running processes and hide its files on the file system. According to Kaspersky Lab report, Hajime infections had primarily come from Vietnam (over 20%), Taiwan (almost 13%) and Brazil (around 9%) at the time of research. Most of the compromised devices are located in Iran, Vietnam and Brazil. Overall, throughout the research period, Kaspersky Lab revealed at least 297,499 unique devices sharing the Hajime configuration.
Figure 2.12 Distribution of infected devices by country (Kaspersky Lab, 2017)
Comparison between IoT Botnets Mirai and Hajime
Hajime botnet was first discovered by researchers in October of last year just like Mirai botnet, it spreads via unsecured devices that have open Telnet ports and use default passwords. In fact, Hajime uses the exact same username and password combinations that Mirai is programmed to use. Unlike Mirai, which uses hardcoded addresses for its command and control (C;C) server, Hajime is built on a peer-to-peer network. There is not a single C;C server address, instead the controller pushes command modules to the peer network and the message propagates to all the peers over time. This is typically considered a more robust design as it makes take downs more difficult. Hajime is also stealthier and more advanced in comparison to Mirai. Once on an infected device, it takes multiple steps to conceal its running processes and hide its files on the file system. The author can open a shell script to any infected machine in the network at any time, and the code is modular, so new capabilities can be added on the fly. It is apparent from the code that a fair amount of development time went into designing this worm.
Intrusion Detection System (IDS)
According to Vijayarani et. al (2015) stated that the IDS meant to be a software application which monitors the network or system activities and finds if any malicious operations occur. IDS are implemented in the network to detect the presence of intruders especially those that manage or trying to bypass the security defense layer such as a firewall, anti-virus, and access control so that preventive measures can be taken. Based on Hamdan et. al (2010), IDS attempt to detect computer attacks by inspecting data records observed by processes on the same network. Generally, these attacks are divided into two categories, host-based attacks and network-based attacks. Host based attack detection routines normally use system call data from an audit process that tracks all system calls made on behalf of each user on a particular machine. Network-based attack detection routines usually use network traffic data from a network packet sniffer.
Table 2.2 Comparison of HIDS and NIDS performance (Xavier, 2016)
Performance in terms of :Host-Based IDS
(HIDS) Network-Based IDS
Intruder deterrence Strong deterrence for inside intruders Strong deterrence for outside intruders
Threat response time Weak real time response but performs better for a long-term attack Strong response time against outside intruders
Assessing damage Excellent in determining extent of damage Very weak in determining extent of damage
Intruder prevention Good at preventing inside intruders Good at preventing outside intruders
Threat anticipation Good at trending and detecting suspicious behavior patterns Good at trending and detecting suspicious behavior patterns
IDS appliances can be used for auditing purposes. In other words, they just detect if particular software or protocol is in use on the observed network. There are three commonly used detection mechanisms available:
Anomaly-based is a detection method commonly used for protocols because all the valid forms of a protocol are known and clearly defined in RFCs. Deviations from those forms are then identified as anomalies. A drawback of this method is obvious just because the traffic follows defined standards, the content cannot be considered as not malicious.
Behavior-based is a mechanism which watches the ongoing network activity and looks for suspicious events. In other words, behavior-based detection is base lined on everyday activity and looks for anything that deviates. This technology allows detecting any difference, including unknown issues such as zero-day attacks.
This detection mechanism compares event patterns against known attack patterns, signatures, stored in the appliance database. Consequently, its detection capability is limited only to known signatures and malicious activity. The similarity to antivirus software solutions comes to mind. Besides, the regular updates are crucial.
Based on Ayman et. al (2017) classify malware analysis methods by the mode of analysis whether it is static, dynamic or a mix from both (hybrid). The difference between static and dynamic analysis is shown in table below.
Static analysis is analyzing the software without executing it, it looks at the file itself and tries to extract information about the structure and the data in the file such that the time the program is compiled, which compiler is used, information about structure and data in the file can be determined. While, the dynamic analysis is testing the program by executing it at real time and trying to find errors in the program while running, there are many ways to dynamically analyze a suspicious software as described in the following sections.
Static analysis can be done either on the source code or the binary executable. The issue is that when the code is compiled from source code to binary code some information will be lost and the analysis of the code will be very complicated. While the good point here is that static analysis can identify specific coding errors that can lead to problems at run-time like crashes or memory-leaks. Static analysis can be classified into either basic or advanced static analysis.
Table 2.3 Comparison between Static Analysis and Dynamic Analysis Methods (Ayman, 2017)
Factors Static Analysis Dynamic Analysis
Time Less time if automated but more time if conducted manually More time is needed
Input Source code, Byte code of interpreted
language or binary code of a compiled application Memory snapshots and run-time data
consumption More cost efficient Needs more resources in memory and processing
Accuracy Less than dynamic analysis Better because it detects run-time vulnerability
Advantages Faster and code weaknesses are found earlier in the development life cycle
More cost efficient than dynamic analysis
Static analysis analyzes the source code so it checks all possible malware executions Find vulnerabilities at runtime
More attractive than static analysis because it is concerned with actual code execution
Limitations Cannot find vulnerabilities at run-time
Hard to perform Analyzes only a single malware at a time
Introduction to Machine Learning
Machine learning (ML) was introduced in the late 1950’s as a technique for artificial intelligence (AI) by Yue, 2015. ML is the use of algorithms within a program to learn from collected data. Within ML there are various algorithms that exist to learn from data. ML algorithms include clustering, classification, pattern recognition, correlation and statistical techniques.
Machine learning (ML) Algorithms
According to Liang et. al 2018, Machine learning techniques including supervised learning, unsupervised learning, and reinforcement learning have been widely applied to improve network security, such as authentication, access control, anti-jamming offloading and malware detections.
Firstly, supervised learning consists of support vector machine (SVM), naive Bayes, K- nearest neighbor (K-NN), neural network, deep neural network (DNN) and random forest. IoT devices can use SVM to detect network intrusion and spoofing attacks, apply K-NN in the network intrusion and malware detections and utilize neural network to detect network intrusion and DoS attacks. Naive Bayes can be applied in the intrusion detection and random forest classifier can be used to detect malwares. IoT devices with sufficient computation and memory resources can utilize DNN to detect spoofing attacks.
Secondly, unsupervised learning does not require labeled data in the supervised learning and investigates the similarity between the unlabeled data to cluster them into different groups. Lastly, reinforcement learning techniques such as Q-learning, Dyna-Q, post-decision state (PDS) and deep Q-network (DQN) enable an IoT device to choose the security protocols as well as the key parameters against various attacks via trial-and-error.
Machine learning (ML) Techniques
According to Koroniotis et.al, 2017, there are four types of machine learning techniques for IoT botnets detection. The ML consists of Association Rule Mining, Decision Tree, Artificial Neural Network and Naive Bayes. A brief description of the machine learning is provided first, then this project also provides an analysis of results obtained based on the accuracy and false alarm rate.
Association Rule Mining (ARM) and Decision Tree (DT) are the classification algorithm. The Association Rule Mining is performed by generating rules of a form while Decision Tree produces a tree-like structure to determine the class chosen for a record. In addition, the Artificial Neural Network also known as ANN is a classification model which was based on the idea of the human neurons while Naive Bayes classifies a record into a specific class.
By combining the four condition such as True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) values to create two metrics, namely Accuracy and False Alarm Rate (FAR) which can use to evaluate the techniques. These two metrics are calculated as follows:
Accuracy represents the probability that a record is correctly identified, either as attack, or as normal traffic. The calculation of Accuracy (Overall Success Rate) is OSR= (TN+TP)/(TP+FP+TN+FN)
False Alarm Rate (FAR) represents the probability that a record gets incorrectly classified. The calculation of the False Alarm Rate is FAR = FP+FN/(FP+FN+TP+TN)
According to (Koroniotis, 2017), show that DT techniques was the best at distinguishing between Botnet and normal network traffic. This algorithm makes use of Information Gain, to pick the feature which best splits the data based on the classification feature, during construction of the tree and at every node. The figure below showed that DT had the highest accuracy out of all the algorithms that were tested at 93.23%, and the lowest FAR at 6.77%. ARM was the second-best classifier, having an accuracy of close to 86% and FAR just over twice that of the DT. The Naïve Bayes classifier, which relies on probability to classify records in classes was third, with 20% less accuracy and close to 21% more false alarms than the DT. Finally, the Artificial Neural Network was the least accurate out of the four algorithms that we tested, with accuracy and false alarm rate for this classifier showing a 30% differentiation from the C4,5 algorithm.
Figure 2.13 Accuracy vs FAR of ML Techniques (Koroniotis, 2017)
Basically, the critical review is a writing task from the summarization and evaluation of a text. It can be a book, a journal article or other medium. People need to read the selected text in detail in order to present a fair and reasonable evaluation of the selected text. The material must be clearly understood so that, the analyzation and evaluation of that material will be done perfectly using appropriate criteria. Therefore, there are several journals that were to be used as guidelines for this project. Among them are as follows:
2.3.1IoT Botnets Attack
The attacker can ask the infected computers called ‘Agents’ or ‘Zombies’ to perform all sorts of tasks for him, like sending spam, performing DDoS attacks, phishing campaigns, delivering malware, or leasing or selling their botnets to other fraudsters anywhere by Nabil (2011). According to Dhruba (2015) stated that the recent trends in the launching of various types of DDoS attacks using botnets. Based on Effy (2016) DDoS is an example of active attack in communication layer at IoT devices which is considered as the most powerful attack. According to Manos (2017) stated that the Mirai botnet, composed primarily of embedded and IoT devices, took the Internet by storm in late 2016 when it overwhelmed several high-profile targets with massive distributed denial-of-service (DDoS) attacks. Moreover, based on Antonakakis (2017) also stated that the IoT botnets are the new normal of DDoS attacks.
Table 2.4 Previous Research about IoT Botnets Attack
Author Machine learning Techniques
DDoS Attacks Spamming and spreading malware and Advertisements Hosting malicious applications and activities
(Nabil, 2011) / / /
(Dhruba, 2015) / (Effy, 2016) / (Manos, 2017) / (Antonakakis, 2017) / 2.3.2Machine Learning Techniques
According to Barthakur (2013), stated that the Decision Tree algorithm is one of the most popular classification algorithm that uses recursive partition of instance space based on concept of information entropy. Besides, the Decision Tree was the best at distinguishing between Botnet and normal network traffic stated by Koroniotis (2017). Yue (2015) stated that the ML consists of k-Nearest Neighbors, Neural Network, Decision Tree, Support Vector Machines and also Naïve Bayes. Based on Janice (2016), by using neural network will be able to train the network to detect invalid data points in IoT systems. According to Liang (2018), stated that the ML techniques such as support vector machine (SVM), Naive Bayes, K- nearest neighbor (K-NN) and neural network can be used to label the network traffic or app traces of IoT devices to build the classification or regression model.
Table 2.5 Previous Research about Machine Learning Techniques
Author Machine learning Techniques
k-Nearest Neighbors Neural Network Decision Tree Support Vector Machines Naive Bayes Association Rule Mining
(Barthakur, 2013) / / / (Yue, 2015) / / / / / (Janice, 2016) / (Koroniotis, 2017) / / / /
(Liang, 2018) / / / / / 2.4Proposed solution / further project
In this project, it is found that the machine learning for locally detecting IoT botnets attack not efficient. For improving the detection rate of IoT botnets attack, more complicated cases in practice can be considered such as botnets complication. Hence, to the best knowledge, there is currently no systematical evaluation on IoT botnet complication and several critical questions are yet to be answered, such as whether IoT botnet is complicated in a similar way to traditional botnets, and how limited resources influence complication methods. In addition, new attacks image extraction methods can be proposed to obtain more representative features of botnets for classification.
Overall, this chapter is about the literature review of the whole project in order to make sure that the studies had been done based on the topic and subtopic as mentioned. The proposed solution methodology will be elaborated more in the next chapter. The next chapter will be thriving on the methodology that is used in this project. Based on the literature review, there are several domains involved in this chapter such as IoT issues, taxonomy of botnets behavior, botnets attack, techniques for IoT botnets detection and etc. It is because to ensure that the project to be developed can give the contribution as well as ensure that the objectives of the project have been stated successfully achieved. In addition, there are some previous studies that have been used as references to this project. It is to reinforce the reasons why this project should be implemented. All related or past research, references, case study and other findings that relate to this project title will be used for the purpose of successfully studying the project in time without any mistake. From the perspective of hackers, these IoT devices are computing resources that can be used for any type of malicious purposes to allow an attacker to control it from a remote location without the knowledge of the device’s rightful owner.