Malware beaconing detection

Malware beaconing detection

Title	Malware beaconing detection
Publication Type	thesis
School or College	College of Engineering
Department	Computing
Author	Tripathi, Anand
Date	2018
Description	Security professionals are in constant battle with the recent trend of sophisticated malware targeting organizations and governments to gain unauthorized access to confidential knowledge and intellectual property. Recent years have also seen the rise of botnets that are often used for sending spam emails, stealing information, as well as launching wide-scale distributed denial of service attacks. Many approaches have been proposed to detect malware infection, but they either rely on end-host installations or require deeppacket inspection for signature matching. In this work, we utilize a common behavior of malware called "beaconing", where an infected node communicates with a command and control server at regular intervals for reporting its liveliness, to detect the presence of malware on an infected node. Using statistical methods for finding periodicity in a time series generated from network flow records, we were able to identify nodes infected with malware present on a large organization network. We evaluated our detection system on a real-world traffic dataset to show the effectiveness of our approach.
Type	Text
Publisher	University of Utah
Dissertation Name	Master of Science
Language	eng
Rights Management	© Anand Tripathi
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s6nd217v
Setname	ir_etd
ID	1703491
OCR Text	Show MALWARE BEACONING DETECTION by Anand Tripathi A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Master of Science in Computer Science School of Computing The University of Utah December 2018 Copyright c Anand Tripathi 2018 All Rights Reserved The University of Utah Graduate School STATEMENT OF THESIS APPROVAL The thesis of Anand Tripathi has been approved by the following supervisory committee members: Jacobus Van der Merwe , Chair(s) 07/30/2018 Date Approved Sneha Kasera , Member 08/02/2018 Date Approved Suresh Venkatasubramanian , Member 07/30/2018 Date Approved by Ross Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda , Dean of The Graduate School. ABSTRACT Security professionals are in constant battle with the recent trend of sophisticated malware targeting organizations and governments to gain unauthorized access to confidential knowledge and intellectual property. Recent years have also seen the rise of botnets that are often used for sending spam emails, stealing information, as well as launching wide-scale distributed denial of service attacks. Many approaches have been proposed to detect malware infection, but they either rely on end-host installations or require deeppacket inspection for signature matching. In this work, we utilize a common behavior of malware called “beaconing”, where an infected node communicates with a command and control server at regular intervals for reporting its liveliness, to detect the presence of malware on an infected node. Using statistical methods for finding periodicity in a time series generated from network flow records, we were able to identify nodes infected with malware present on a large organization network. We evaluated our detection system on a real-world traffic dataset to show the effectiveness of our approach. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii CHAPTERS 1. 2. 3. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Malware Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 NetFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Periodicity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2.1 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2.2 Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2.3 Circular Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Directed Anomaly Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 6 6 6 6 7 7 7 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Command and Control Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Horizontal Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Vertical Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Beaconing Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 10 11 4. THREAT MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5. MALWARE BEACONING DETECTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Data Preprocessing Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Periodicity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3.1 Candidate Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3.2 Candidate Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 14 14 15 16 16 17 5.2.4 Detection Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4.1 Directed Anomaly Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 False Positive Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. EVALUATION AND RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1 Evaluation on Publicly Available Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Larger Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Evaluation on Real-World Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Robustness of the Periodicity Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. 18 18 20 21 22 25 27 27 30 31 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 v LIST OF FIGURES 2.1 An example showing DAS scoring in two dimensions . . . . . . . . . . . . . . . . . . . . 9 5.1 Overview of the proposed method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1 Distribution of number of communications for each unique connection. . . . . . . 23 6.2 Example of C&C communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.3 Permutation-based filtering for periodic candidate selection. . . . . . . . . . . . . . . . 24 6.4 Filtered periodogram after using permutation-based filtering . . . . . . . . . . . . . . 25 6.5 Performance evaluation against different noise level. . . . . . . . . . . . . . . . . . . . . . 29 6.6 Average execution time of the stages of the proposed method. . . . . . . . . . . . . . . 30 LIST OF TABLES 2.1 Periodicity measure and periods for various malware families [17] . . . . . . . . . 5 5.1 Flow features used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 Features used for directed anomaly scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.1 Summary of data from the two NetFlow logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.2 Evaluation on ISCX botnet dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.3 Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 ACKNOWLEDGEMENTS I would like to thank my advisor, Prof. Jacobus Van der Merwe, for the constant guidance and support. He continuously motivated me to work smarter and harder even when some of my ideas did not work out. I have learned a lot from him and working under his mentorship has been a rewarding experience. I thank my committee members, Prof. Sneha K. Kasera and Prof. Suresh Venkatasubramanian, for their suggestions and feedback. I am grateful to Corey Roach, Benjamin Poster, Nico Holguin, and other members of UIT for providing me data for my evaluation and for the initial idea of exploiting beacons for malware detection. I would also like to thank Mike Hibler for permitting me to execute active malware in the Emulab testbed. Finally, I must express my very profound gratitude to my parents and to my sister for providing me with unfailing support and continuous encouragement throughout my years of study. This work was supported in part by the National Science Foundation under grant number 1642158. CHAPTER 1 INTRODUCTION With the ever-increasing growth in cybersecurity-related incidents, it is very difficult for traditional systems such as Antivirus engines and firewalls to keep up with the newer stealthier versions of malware. Host-based intrusion detection systems (HIDS) typically rely on signatures for detection, which is effective only against known threats. This disadvantage of HIDS is compensated by complimenting them with network-based intrusion detection systems (NIPS), which are capable of analyzing a large number of hosts to make detection decisions without the need of installing end-point software. A common behavior often observed across malware families is called “beaconing”, where after infecting a node, the malware periodically communicates with a centralized command and control (C&C) server for a variety of purposes, including but not limited to an indication that the node is alive and ready to accept commands or recent updates [1, 23]. These communications often take place over default HTTP/HTTPS ports for firewall evasion; however, during our experiments, we also observed instances where an attacker has used nonstandard ports for these communications. Our key insight is that this beacon communication is highly periodical and generally of smaller size. This common behavior can be exploited to detect these malicious beacons. However, a large number of benign systems such as streaming and heartbeat services also show similar behavior, making this task challenging. In this work, we present a novel system for detecting malware beacon communication by analyzing Netflow records. Netflow records are readily available, lightweight, and only contain header information and are thus resilient to encrypted communication. We model the NetFlow records as a continuous time series to identify periodicity in communication between two nodes. To counter false positives, we have employed a variety of mechanisms that filter and generate a final list of connections of interest, which are sorted based on 2 priority for an administrator to investigate and take action. We have evaluated our results on publicly available malware datasets as well as on data from the network of a large-scale organization with thousands of nodes. 1.1 Thesis Statement It is possible to detect signs of early malware infection by identifying and analyzing beaconing behavior in the network. 1.2 Contribution In summary, the thesis will make the following contributions • Propose a system that can reliably identify periodic communications and separate malicious beaconing behavior. • Design a highly scalable implementation that can create a manageable list of possibly infected nodes and their communication pattern for a system administrator. • Evaluate our methodology using synthetic datasets as well as real-world data collected from a large university. 1.3 Thesis Overview The rest of the thesis is structured as follows: Chapter 2 gives some background on Netflow and data mining techniques. In Chapter 3, we discuss existing solutions and some of their drawbacks. Chapter 4 defines the threat model. In Chapter 5, we provide our system design and discuss the algorithm choice. Finally, we evaluate our proposed system in Chapter 6 and conclude the thesis, along with future work in Chapter 7. CHAPTER 2 BACKGROUND In this chapter, we discuss the background information on malware communication as well as techniques used for periodicity detection. 2.1 Malware Communication Malicious software (malware) is a commonly used tool often employed during a targeted attack for compromising or infiltrating an organization’s network to steal sensitive information or disruption of service [1]. This section provides background on malware and common network communication patterns observed across malware families. The general structure of a targeted attack consists of a sequence of steps [10] described as follows. • Reconnaissance: In this stage, an attacker collects information about the target, including weaknesses or vulnerabilities that can be exploited. For example, the attacker may monitor their target’s network and system by using techniques such as port or vulnerability scans. • Initial Compromise: This stage represents the actual intrusion, in which the attacker manages to penetrate the target’s network. This can be achieved via spearphishing attacks or compromised websites. In both cases, the victim unknowingly infects her system with malware, which gives control of the device and the connected network to the adversary. • Command and Control: During this stage, the compromised systems establish a communication channel back to the adversary through which they can be fed instructions. This gives the attacker full control of the malware for launching future attacks. 4 • Exfiltration: The final stage of the kill chain is where the attacker extracts, collects, or encrypts information stolen from the victim. The information is sent back to the adversary using the channel established during the command and control stage. If the malware is used instead for disruption of services, a command to launch a Distributed Denial of Service (DDoS) may also be given using the same channel. For this work, we are most interested in the Command and Control phase. Once an attacker has established a channel with the infected device, she can use it to continuously send commands and updates to the malware. Due to the presence of firewalls and NAT boxes, it has become difficult for attackers to push these commands from an external host. Instead, they rely on a pull request from the internal infected node. Therefore, the malware regularly communicates with an external host often known as a command and control (C&C) server. These communications produce a periodic pattern that can usually be observed in the network data [8]. An adversary that uses a pull style communication requires that the malware calls back home in a periodic manner to ensure that she has real-time control over the infected system. A recent study by Huynh et al. [17] showed that 30 out of 31 experimented malware families were found to leave periodic traces in network traffic when they communicate with command and control servers. Table 2.1 lists the periodicity of all tested malware families from their study. 2.2 NetFlow Developed by Cisco systems [6], Netflow is a protocol used for collecting metadata about IP traffic flows traversing a network device such as router, switch, or host. A netflow-enabled device generates metadata at the interface level and sends this information to a flow collector, where the flow records are stored to enable network traffic analysis. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device [6]. Netflow records are highly granular; for example, flow records include details such as IP addresss, packet and byte counts, timestamps, Type of Service (ToS), application ports, input and output interfaces, etc. Exported Netflow can be used for a variety of purposes, including but not limited to enterprise accounting, network monitoring, security analysis, and data mining. There ex- 5 Table 2.1: Periodicity measure and periods for various malware families. Adapted from “On periodic behavior of malware: experiments, opportunities and challenges” by Ngoc Anh Huynh, Wee Keong Ng, and Hoang Giang Do, 2016, 11th International Conference on Malicious and Unwanted Software: Know Your Enemy (MALWARE), c 2016 IEEE Malware Family Recons Period Connection Period Darkcomet xRat njRat Sub-7 Novalite Babylon Citadel Zeus Andromeda Neutrino Athena Cridex Gootkit Kelihos NgrBot SmokeLoader Rustock Kuluoz Conficker Simda Sinowal Silly Virut Dyre Hesperbot Gameover Ramnit Tinba Sality Gozi Shylock N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 14 mins 16 mins N/A N/A N/A N/A N/A 20 mins N/A N/A 15 mins 12 s 5 mins 33 s N/A N/A N/A N/A N/A N/A N/A N/A 21 s 7s 3s 10 s 21 s 6s 25 mins 5 mins 1 min 10 s 16 s 1 min 7 s 56 s 13 s N/A 17 s 17 mins N/A 21 s N/A 15 mins 45 s N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Keep-alive Period 21 s 25 s 15 s 20 s 20 s 20 s 4 mins 20 mins 9 mins 4 min 10 s 1 min 37 s N/A N/A 3 mins 55 s N/A N/A N/A N/A N/A N/A N/A N/A N/A 6 mins 1 min 5s 13 mins 3 mins 34 s 24 mins 45 s 4 hours 5 mins 47 s 4 mins Periodicity Measure 0.7061 0.6843 0.7531 0.7533 0.6435 0.7132 0.6451 0.7748 0.6343 0.6752 0.5835 0.6795 0.7098 0.7331 0.7512 0.7671 0.3652 0.5953 0.6439 0.7102 0.6735 0.6983 0.6992 0.6951 0.7798 0.6745 0.7689 0.8998 0.7423 0.9882 0.6961 6 ists multiple variants of Netflow records with each vendor having additional information about the flow. 2.3 Periodicity Detection Periodicity detection is a well-developed area of research. Most work assigns a numerical score on the derived metadata from the data. The various methods used for periodicity detection varies from statistical methods to signal spectrum analysis. 2.3.1 Statistical Methods The most straightforward method is to apply statistical analysis on the features derived from the network traffic. Hubbali et al. [16] proposed a method that relies on the fact that periodic communication exhibits very low variance and standard deviation considering their intertime differences. For a random variable X, Xi is the value taken by the random variable in one observation and µ is the mean of all values taken by that random variable, then, Standard Deviation for X is given by the following equation v u u SD = t( 2.3.2 M 1 ( Xi , − µ ) 2 ) ∑ M − 1 i =1 (2.1) Spectral Analysis Spectral analysis is another common tool used for periodicity detection. This is because spectral analysis characterizes the frequency content of a signal. In order to discover potential periodicity of a time series, one needs to examine the power spectral density (PSD). PSD tells us how much is the expected power at each frequency of the signal. We can then find the most dominant periods by calculating the inverse of these frequencies. There are two well-known estimators of the PSD, the periodogram and circular autocorrelation. Both these methods can be computed using the Discrete Fourier Transform of a signal. 2.3.2.1 Discrete Fourier Transform The normalized Discrete Fourier Transform of a sequence x (n), n = 0, 1, 2...N − 1 is a sequence of complex number X(f) given by: 7 1 X ( f k/N ) = √ N N −1 ∑ x (n)e − j2πkn N , k = 0, 1, 2...N − 1 (2.2) n =0 where the subscript k/N denotes the frequency that each coefficient captures. 2.3.2.2 Periodogram The periodogram of X is squared length of each Fourier coefficient. P( f k /N ) = k X ( f k /N )k2 k = 0, 1, 2...N − −1 (2.3) Note that we can only detect frequencies that are at most half of the maximum signal frequency due to the Nyquist fundamental theorem. 2.3.2.3 Circular Autocorrelation Another way of estimating dominant periods is using circular AutoCorrelation function (ACF), which examines how similar a sequence is to itself at different τ lags. ACF (τ ) = 1 N N −1 ∑ x (τ ).x (n + τ ) (2.4) n =0 The assumption is a periodic signal will overlap with the original signal at the lag and enforce it, while a nonperiodic signal will give randomized results. 2.4 Directed Anomaly Scoring Directed anomaly scoring (DAS) was proposed by Ho, Grant, et al. [14] in their work for detecting credential spearphishing attacks in enterprise settings. At a high level, DAS works by ranking all events by comparing how suspicious each event is relative to all other events. After ranking all events, DAS selects the N most suspicious events, where N is the alert budget. DAS first assigns an anomaly score for each event E, by calculating the total number of events that are at least as suspicious as event E in each feature dimensions. Events with higher scores are more suspicious than events with lower scores. Finally, the algorithm sorts the events based on their anomaly scores and outputs the N highest-scoring events. Figure 2.1 shows a visual representation of the DAS algorithm. The event C has a rank of 2 because there are two events A and B that have higher benign values than C for both features X and Y. 8 Formally, each event has a feature vector E ∈ Rd where d is the number of features. An event E is considered to be at least as suspicious as E0 if Ei ≤ Ei0 for all i = 1, 2, .., d. Algorithm 1 shows the DAS scoring and alert generation procedure. More Benign 9 A 0 Feature Y D 1 B C E 0 2 4 Feature X More Benign Figure 2.1: An example showing DAS scoring in two dimensions. Based on “Detecting Credential Spearphishing Attacks in Enterprise Settings.” by Ho, Grant, et al. [14] Algorithm 1: Scoring and Alert Selection in DAS [14] Score(E,L): for each event X in L do if E is more suspicious than X in every dimension then Increment E’s score by 1 AlertGen(L(a list of events),N): for each event E in L do Score(E,L) Sort L by each event’s score return the N events from L with the highest scores CHAPTER 3 RELATED WORK A large body of work is present in the area of C&C communication detection. In this section, we describe the related work in the literature. 3.1 Command and Control Detection 3.1.1 Horizontal Correlation These systems work by correlating network events from two or more infected hosts that show similar communication patterns with the command and control server. Some of the anomaly-based botnet detection systems such as BotFinder [27] and BotSniffer [12] are able to detect clusters of synchronized hosts infected with the same malware. The primary strength of these systems is that they are protocol independent as well as able to detect malware without a priori knowledge of malware internals. However, the downside of using these systems is that they generally require at least two infected nodes for correlation, and so are unable to detect single command and control contact. In addition, they require a noisy environment such as the presence of a DDoS or spam attack for detection to work. Due to a rise of stealthy malware in recent years, these techniques are not robust. Our method, on the other hand, can detect single command and control communication by utilizing its consistent repetitive pattern even before the system is in attack mode. 3.1.2 Vertical Correlation Another method of individual infected machine C&C detection is to inspect network traffic for indication of C&C communication. Bothunter [11] identifies a typical infection life cycle, whereas the system proposed by Wurzinger et al. [33] automatically generates signatures that represent the behavior of infected nodes. These systems have achieved high accuracy while maintaining low false positives. However, they are limited by the kind of infection they are able to detect due to their reliance on known instances of malware 11 infection and requirement of a physical sample for model training purposes. Our work, on the other hand, is generic and can identify an infected host node without requiring a malware sample. 3.2 Beaconing Detection The C&C detector proposed by Oprea et al. [21] leverages enterprise-specific features extracted from HTTP connections and belief propagation framework to detect a single C&C server communication with high confidence. Similarly, Shalaginov et al. [26] applied periodicity detection algorithms on DNS records to identify infected hosts. Their reliance on HTTP connections, however, make them inefficient in communications that are not on HTTP/HTTPS ports. Since our work is port agnostic, we can detect stealthy communication on any port. In addition to that, these systems require additional information extracted from their respective sources for filtering or classification; for example, Oprea et al. [21] extracts domain names from HTTP requests, and Shalaginov et al. [26] extracts IP address from DNS requests, so malware which does not use domain name resolution cannot be detected. Our method, on the other hand, relies only on NetFlow records. Disclosure [4] uses extracted temporal and client-based features from Netflow records to identify command and control servers. Their method uses supervised learning algorithms and incorporates external third party intelligence sources to reduce false positives. Our work, in contrast, works using unsupervised learning methods and thus does not require a labeled dataset for training. We also use a few third-party reputation services to increase the confidence of the alerts and to prioritize them for the administrative response. Hu et al. [15] also used periodicity detection using periodograms in their proposed work BayWatch. Their intrusion detection system (IDS) architecture uses multiple layers of filtering to identify malicious candidate connections, which are then fed to a machine learning model to generate a final list. This list is then verified by a system administrator to filter out false positives. Huynh et al. [18] propose a visual analytics solution to support automatic generation of alerts for periodic malware detection as well as providing an interactive visualization for manual inspection of periodic signals by an analyst. They also propose a periodicity measuring algorithm that uses energy occupied by the top 10% of most dominant frequencies 12 in the frequency spectrum of a signal as a measure of its periodicity. Their visualization tool can also be used to understand other kinds of periodic traffic. Mackenzie Haffey [13] in his master’s thesis has also worked on characterizing periodic traffic, including detection of botnet ”beacons”. The system proposed by Apruzzese et al. [2] is the closest to our work. Both systems use NetFlow data and auto-correlation to identify malware beacons. Authors of the paper have also proposed a modified algorithm for calculating the auto-correlation score, which is resilient to small perturbation. However, their system relies on clustering of connections and alerts from external data sources like a NIDS to classify the connection as malicious. Our work, on the other hand, does not depend on alerts from external data sources but uses a ranking-based approach. CHAPTER 4 THREAT MODEL We assume that the adversary has gained access to internal networks by infecting one or more nodes through some means like social engineering, phishing, drive-by-downloads, etc. Our threat model assumes that the infected node is continuously communicating with an external command and control server to receive instructions and updates. The proposed method is not a host-based intrusion detection system and thus cannot detect malware with no network beaconing activity. The output is a list of potential nodes currently sending beacon packets, which are presented to a network administrator, who can then make a decision on his/her judgment. The module on its own does not prevent or stop an active attack. We also assume that the Netflow records and third-party databases utilized are trustworthy and the agents collecting these records are not vulnerable to attacks. We additionally assume that the reputation information received from third-party resources are up-to-date and reliable. CHAPTER 5 MALWARE BEACONING DETECTION In this chapter, we present our method to detect malware beacons. 5.1 Dataset We used two datasets for our experiments; the first dataset consists of NetFlow records collected by University Information Technology (UIT), University of Utah. We were able to obtain one month’s worth of data from the campus housing, University of Utah subnet. The second dataset we used was a publicly available botnet traffic [9] dataset collected at the CTU University, Czech Republic, in 2011. We selected 2 variants of botnets that had periodic communication to multiple command and control servers. Since the dataset has labels for all communications, we used this dataset as our ground truth. Table 5.1 shows the fields from the NetFlow records used in our experiments. 5.2 Architecture 5.2.1 Overview At a high level our system consists of three stages shown in Figure 5.1 and described below; a data preprocessing module, periodicity detector, and detection module. The proposed method works with easily collectable netflow information and builds a list of Table 5.1: Flow features used. Field Date & Time Source IP Source Port Destination IP Destination Port Bytes Protocol Description Time-stamp at which the first packet for this flow was received. IP address of the device the packet originated from Application port from which the packet originated IP address of the port the packet is destined for Application port the packet is destined for Total bytes transmitted during the flow Transmission protocol used 15 Netﬂow Records Third party reputation (e.g. VirusTotal) Server Classiﬁcation Feature Extraction Candidate Selection Candidate Veriﬁcation Anomaly Detection Reputation based ﬁltering Candidate List of Infected Nodes Generate Timeseries Data Pre-processing Module Periodicity Detector Detection Module Figure 5.1: Overview of the proposed method. most suspicious connections. We rely on third-party reputation only for verification and false positive reduction. The input network flows dataset is first processed by the data preprocessing module that extracts relevant features from the large set of options present in NetFlow records. The features extracted are then forwarded to the periodicity detector, which uses statistical features from both time and frequency domain, to eliminate noise and potential false positives, and identifies most feasible candidates for our detection module. The detection module uses unsupervised learning methods to aggregate multiple connections and uses Directed Anomaly Scoring [14] to generate a list of possible infected nodes. Finally, the false positive reduction module uses heuristic analysis and third-party reputation scores to narrow the output to a manageable list that can be investigated by a network security administrator. 5.2.2 Data Preprocessing Module Our aim is to detect periodic communication between two nodes. In this regard, looking at frequency of continuous communication between those nodes can provide us with a good idea of the nature of communication. We start by using network flow records between internal and external networks as input to our model. As a node can use different ports for each new flow, we do this aggregation on server ports. In order to categorize an IP as a server, we use the heuristic method proposed by Bilge et al. [4] where an IP address is classified as a server if the number of flows directed towards its top two ports accounts for at least 90% of flows towards the address. Once we have a list of servers, we create a time series for all connections between a node 16 n and a server s by grouping flows originating or destined to s from n. The set is created using the 3-tuple <source IP address, source port address, destination IP address> if the source is a classified server, or the tuple <destination IP address, destination port address, source IP address> in case the destination address is classified as a server. Let Tns = {t1 , t2 , ...} represent the time-stamp of connections between the node n and server s. Tns is then transformed to an integer sequence xns (t) = { x0 , x1 , x2 , ....x N −1 }, where N is the total length of discrete time series depending on the sampling period. For example, for a sampling period of 1 second, a connection measured over an hour will have 3600 total points. We use this integer sequence xns as input to our second stage. As periodicity detection is expensive, we use another heuristic filtering at this point to reduce the total number of connections we need to forward to our periodicity detector. We filter out any connection with τmin or fewer connections. This is configurable and an administrator can modify this value based on the available resources. From our experiments, we found that any connection with less than τmin does not have sufficient information to be classified as periodic. 5.2.3 Periodicity Detection Beaconing is a repetitive pattern where an internal node communicates with the external node(s) at regular time intervals. To identify if a connection is periodic, we use multiple algorithms that are well known for periodicity detection. We take the time series xns generated in the last stage as input to this module. The time intervals are determined by the sampling period tsample . A smaller sampling period will give fine-grained measurements but will require large resources for processing. The first step is to identify periodic candidates; we use periodograms [31] to get the power spectral density of the discrete signal. 5.2.3.1 Candidate Selection One of the naive ways of identifying periodicity is to take the frequency with the maximum power. However, that is not the best strategy for two reasons. First, DFT can be applied to any signal, so selecting the highest frequency will identify a candidate from each signal. This will create a lot of false positives and a large number of connections for verification in the detection step. Second, some malware have shown periodic behavior at 17 multiple periods; for example, a malware instance can be in contact with two command and control servers with different time periods. Selecting the frequency with highest power will filter out the additional periods. Despite its limitations, a periodogram has been proven effective in identifying the periodicity in signal. For that reason, we use the two-tier approach proposed by Vlachos et al. [31]. The main idea is to identify the signal energy attributed to nonperiodic mechanisms. Once we have this information, any frequency with power above this threshold can be considered to be dominant. Let us assume we have a sequence x. The outcome of a permutation on x will yield xe. The new sequence xe will retain the first order statistics of the original signal but will not exhibit patterns of periodicity, so we can safely discard anything that has the same structure as xe. At this step, we can record the maximum power (pmax ) that xe exhibits, at frequency f . e( f ) pmax = argmax X 2 Any frequency in the original signal x is considered interesting only if it has higher power than pmax . A single permutation can still have a periodicity component. To get a higher confidence, we repeat the experiment m times and take C ∗ m-th highest power, where C is the confidence level. This provides us with a set of frequencies F = { f 1 , f 2 , ..., f k } that are candidates for periodicity. An additional trimming step is used for periods that are too small or too large. So, any period greater than N/2 or less than 2 is removed. If the final set is empty, that connection is considered nonperiodic and is filtered. Algorithm 2 captures a pseudo-code of the algorithm for identifying periodic hints. 5.2.3.2 Candidate Validation Autocorrelation function (ACF) is yet another tool used for periodicity detection; it is more powerful in identifying periodicity at a time lag t. However, using it for detecting all periods in a signal is expensive since we will need to repeat the process for all lags t ∈ [1, N − 1]. We are using ACF to verify the candidates F identified at the last stage. Any candidate with a lower autocorrelation score than τac f is considered a false positive and removed from F. In the end, if the set is nonempty, the connection is deemed periodic and sent to 18 Algorithm 2: Algorithm getPeriodHints [31] getPeriodHints(Q): k = 100 # number of permutations maxPower = {} periods = {} for i ← 1 to k do Qp = permute(Q) P = getPeriodogram(Qp) power = max(P.power) maxPower.add(power) percentile = 99 maxPower.sort() # ascending P threshold = maxPower(maxPower.length ∗ ( percentile/100)) P = getPeriodogram(Qp) for i ← 1 to k do if P[i ].power > P threshold) then periods.add(P) #Period Trimming N = Q.length for i ← 1 to periods.length do if ( periods[i ].hint >= N/2\|\| periods[i ].hint <= 2) then periods[i ].erase() return periods the detection stage. 5.2.4 Detection Module One of the key challenges is that beaconing behavior in itself is not an indicator of malicious behavior. Many benign applications such as streaming services, software updates, etc. exhibit similar network behavior that resembles beaconing. Labeling all these connections as malicious will generate a large number of false positives. Due to the imbalance dataset, the current problem of identify malicious beaconing becomes a needle in the haystack problem. The lack of labels and data imbalance makes using supervised machine learning algorithms inefficient. In response to these challenges, we use directed anomaly scoring proposed by Ho, Grant, et al. [14]. The DAS anomaly detection method works by ranking events by their importance with respect to each other. 5.2.4.1 Directed Anomaly Scoring Table 5.2 shows the features chosen for our detection module. We craft two sets of features extracted from candidate connections and prior reputations. The first two features 19 Table 5.2: Features used for directed anomaly scoring Features Periodic communications from/to internal node in a day Periodic communications from/to external source IP in a day No. of days since first access (external IP) No. of communicating nodes (Internal IPs) Comparator for DAS ≥ ≤ ≤ ≤ are based on the assumption that an infected node will communicate with multiple command and control servers and that a node that receives frequent periodic communication is less likely to be suspicious. The other two features are based on the self-generated reputation of an IP address. We could have used third-party reputation systems to identify suspicious connections. However, in our experiments, we found that they suffered from two major challenges. First, these third-party sources had limited information; for example, VirusTotal only has information for an IP address if someone has already queried for it. Second, the information is not current, so an IP address previously seen in an attack may now belong to a benign service. Similarly, an attacker can take control of a high-reputation domain and launch an attack from the trusted IPs. Our idea is to give higher reputation for nodes that frequently communicate with our university network. One might consider unsupervised or semisupervised anomaly detection techniques to identify malicious connections using the same features as Table 5.2. While a number of such techniques exist, including density-based estimation techniques such as Gaussian Mixture Models (GMMs) [5] and clustering techniques such as k-nearest neighbor (kNN) [20] or k-means, these techniques suffer from multiple limitations [14]. First, scalar features have a directionality to their values. For example, the fewer times an external node has communicated with our network, the more suspicious it is. However, an unusually large amount of communication with the same node is not ground for suspicion by itself. Standard anomaly detection systems do not incorporate the notion of asymmetry in a direction and will consider the other case equally suspicious. Second, classical techniques are parametric and require some hyperparameter tuning or assumptions on the initial datasets, such as the number of clusters for most clustering algorithm. We chose DAS [14] because it does not have these limitations and has been proven to work in similar environments. 20 We start by generating our own reputation database using a sliding window approach. We iterate through Netflow records of the previous month and update a table of features whenever we run our module. We then populate the values of the remaining features using the candidates we received from the previous stage. Finally, we rank our connections based on the scoring method discussed in Section 2.4, which assigns a score for each connection. We can then take the top cbudget suspicious connections based on the budget allocated by the administrator. These connections are then manually verified to filter out any remaining false positives. 5.2.5 False Positive Reduction NetFlow data provides limited information about the activity carried on in the network. Since our system works with NetFlow data as well as detects periodicity in connections, our output is likely to contain false positives. The different threshold parameters (τmin ) can be tuned to decrease false positives, thereby reducing or increasing the number of connections an administrator has to investigate. However, since a lot of benign connections like streaming services and heartbeat mechanisms often also fit our criteria of beacon activity, we have used multiple external systems for filtering these unwanted connections. To counter the large number of false positives, we have implemented multiple filters in addition to a reputation-based component. The various filters used in this module are: • We assume that the command and control server is an external server and thus any internal server IP can be safely filtered out. There, however, exists a rare possibility of a C&C server in the internal networks, in which case, it will be counted as a false negative. • Another assumption we make is that benign services like music streaming apps, etc. will be accessed by a large number of users and thus a server with a lot of connections from internal nodes can be labeled as a clean service. This, however, has the disadvantage of not being able to detect multiple infected nodes communicating to the same set of C&C servers in the network. Our reputation system works using third-party public services such as VirusTotal [30] and PassiveDNS [24]. For each suspect node, we get the reputation report for all outgoing 21 and incoming connections to the server. If any of those servers have a high-risk reputation score, we increase the priority of all connections that communicated with that server IP or the infected node. 5.3 Implementation We implemented our module using Spark python APIs [34] running over Hadoop to process large-scale data efficiently. Each stage is designed as a modular map-reduce task [7], which can be executed in parallel on single or multiple machines. We use standard python libraries, Scikit-learn [22] and scipy [19], for periodicity detection and statistical algorithms. CHAPTER 6 EVALUATION AND RESULTS We evaluated our method on three datasets; two publicly available malware datasets, and a dataset collected at a large university network consisting of malware executed in a controlled environment. In this chapter, we will discuss the findings and results of our experiments. 6.1 Evaluation on Publicly Available Dataset As beaconing is a side offense and is generally not associated or documented with known instances of malware infection, we did not have labeled datasets available for known beaconing behavior. Since our goal was to verify if our method works on known malware connections, we decided to use the malware dataset made available by Malware Capture Facility Project (MCFP), CTU University, Czech Republic. The CTU-13 [9] is a dataset of botnet traffic captured by the MCFP in 2011. The dataset consists of real botnet traffic mixed with regular traffic, and background traffic. This dataset was a good candidate for our research for multiple reasons. First, since it captured the network traffic in numerous formats including Netflow records, we could use the dataset without any alteration to our approach. Secondly, the folks at CTU have done an excellent job of labeling each flow in the Netflow records to differentiate C&C traffic and background noise; hence, we were able to use this dataset as ground truth for our experiments. We used NetFlow records from two different datasets from the MCFP data, collected while running variants of malware belonging to the Neeris botnet family. Figure 6.1 shows the distribution of communication per connections. We use this to identify the value of τmin . A value of τmin = 5 filters out 86% of all connections, meaning any connection that has fewer than 5 communications between the source and destination pair. 23 Figure 6.1: Distribution of number of communications for each unique connection. Nearly 64% of total connections have only 1 communication and over 86% connections have less than 5 communications. Table 6.1 summarizes the size and complexity of the datasets used. Figure 6.2 shows the communication between the infected XP node (147.32.84.165) and the command and control server, zkaoo[dot]com (173.192.170.88:80). Identifying periodicity in this signal just by observing the graph is quite challenging. We rely on the periodogram-based approach to classify the connection as periodic or nonperiodic. As discussed earlier, we used the permutation-based filtering method to identify candidate frequencies for further investigation. Figure 6.3 illustrates this method; we start by randomizing the original signal multiple times while keeping track of the highest power spectral density (PSD) values. The assumption here is that permutation will destroy any inherent periodicity in the original signal. Finally, we take the highest PSD value for these permutations and use that as a threshold for filtering candidates in the original periodogram as shown in Figure 6.4. Table 6.1: Summary of data from the two NetFlow logs. Each step filters most of the connections. Total number of flows Server flows Unique flow count Filtered unique flow count Candidates for periodic communications Subset 1 2824637 329016 52635 4140 1445 Subset 2 1808123 224190 37518 2999 973 24 Figure 6.2: Example of C&C communication. Infected node (147.32.84.165) communicating with command and control server, zkaoo[dot]com (173.192.170.88:80) (a) Randomized signal (Iter 1) (b) PSD for randomized signal (Iter 1) (c) Randomized signal (Iter 2) (d) PSD for randomized signal (Iter 2) (e) Randomized signal (Iter 3) (f) PSD for randomized signal (Iter 3) Figure 6.3: Permutation-based filtering for periodic candidate selection. 25 Figure 6.4: Filtered periodogram after using permutation-based filtering Only six candidates had PSD values higher than the threshold 0.198880382107. These candidates were further validated using the autocorrelation function. In this case, the final set for this connection was nonempty and thus was labeled periodic and forwarded to the detection module. Since the dataset used was old and had a time-span of only one day, we were not able to fully utilize our proposed detection method on this dataset. The reason for that is because our module requires a reputation database generated from historical data to update some of the features from Table 5.2. We conducted our experiments with the remaining features and found that the infected node was in the top 10 most suspicious connections for both datasets. Hence, with a small budget of 10 alerts per day, a system administrator would be successfully able to identify the infected device for this dataset. 6.1.1 Larger Dataset We also evaluated our system on a larger labeled dataset obtained from the University of New Brunswick [28]. The dataset [3] consists of botnet traces merged from multiple other datasets. We used their test dataset comprising traffic from 16 malware families such as Zeus, Rbot, and Neeris. The dataset size is 8.5 GB, of which 44.97% is malicious flows. We first converted the dataset to Netflow records for use with our proposed approach. The test set consists of 177,278 netflows and involves 27,891 unique IP addresses. Our module generated 32 alerts for internal nodes on this dataset. We verified these IP address with ground truth provided along with the dataset. We found that our system 26 correctly identifies 17 IP addresses infected with 9 different malware families. Table 6.2 presents the output list of infected IPs, malware family, and suspiciousness ranking generated after our anomaly detection stage. Table 6.3 shows the confusion matrix for our detection system. From this matrix, we can calculate the following derivations: • True positive rate (TPR): TPR or Recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. 17 TP = = 0.71 TP + FN 24 TPR = • Precision or positive predictive value (PPV): This is the fraction of relevant instances among the retrieved instances. Table 6.2: Evaluation on ISCX botnet dataset IP Address 158.65.110.24 147.32.84.180 147.32.84.160 192.168.4.120 192.168.2.109 192.168.1.105 192.168.2.112 192.168.248.165 192.168.4.118 192.168.106.131 192.168.2.113 172.16.253.130 192.168.2.110 192.168.3.35 172.29.0.116 192.168.1.103 147.32.84.130 Label Weasel 1 Neris 0 Virut 0 IRC15 IRC23 IRC26 IRC24 Zero access 1 IRC21 Black hole 30 IRC19 Tbot 0 IRC13 Zeus 0 Zeus 3 IRC17 Murlo 0 DAS Rank 1 2 3 4 11 12 13 14 15 21 22 24 26 27 29 31 32 Table 6.3: Confusion matrix Actual Label Malicious Benign Predicted Label Malicious Benign TP = 17 FN = 7 FP = 15 NA 27 PPV = TP 17 = = 0.53 TP + FP 32 The true positive rate of our system is 71% since our approach was not able to detect 7 out of 16 malware samples. On further investigation, we found that the network traffic generated for these samples are too short for periodic patterns to emerge. In the process, we did get a large number of false positives (15). The reason for this is because we did not have enough reputation information on these IP addresses. Once we use reputation-based features, good services will get filtered, and we should be able to avoid a large number of false positives. 6.2 Evaluation on Real-World Dataset We evaluated our system to verify its application on real-world network traffic. In this regard, we got one month’s worth of Netflow records for a subnet from campus housing, obtained via UIT, University of Utah. We then infected one of our Emulab [32] nodes with a Backdoor worm from Rbot family (MD5: 7df5fe0353a2fae962cbffb03e735cd8). We executed the sample in a controlled environment using Cuckoo sandbox [25] with limited network accessibility. We then merged Netflow records for the two subnets, the infected machine and the data obtained from UIT, to generate a unified dataset. This new dataset served as a representative of real organizational network traffic. We observed that the malware sample first resolved the domain name android-update[dot][0]servehttp[dot]com to two IP addresses 156.221.203.197 and 62.117.61.146 and then tried to communicate with these hosts on port 5552. The hosts were nonfunctional, so we did not get a chance to observe any command and control traffic. However, these communications with C&C hosts were periodic; thus, our system was able to detect it. Our detection method correctly identified the infected node with a budget of 20 alerts, i.e., the infected node was in the list of top 20 most suspicious nodes. Thus, our evaluation shows that the proposed method can detect recent malware present on a large-scale network with a small overhead. 6.3 Robustness of the Periodicity Detector In this section, we evaluate the noise tolerance of our periodicity detector described in Chapter 5. Our goal is to understand how our periodicity detector performs against noise 28 in the signal. We also followed a similar robustness evaluation as Hu et al. [15]. Our baseline is a periodic time series T with events at period P = 100s (frequency of 0.01 Hz). We experiment with 3 different type of distortion to the original signal. For each noise type and noise level, we modify our original periodic signal and then run our module to verify if we can detect it. Each experiment is repeated 100 times with the probability of detection calculated as follows: P(detection) = m 100 where m is the number of times the signal was detected as periodic at period P. We start by introducing Gaussian noise into the signal. Each event in the time series T is shifted by a value drawn from a normal distribution N (0, σ2 )where σ is the standard deviation. This models the noise introduced by factors such as an adversary trying to hide communication or network delays [15]. Second, we drop events at random from T emulating missing-event noise caused by packet drops and unreliable network communication. More formally, we drop N ∗ p events from T, where p is the probability of dropping packets and N is the total number of events. Finally, for add-on noise, we inject additional events at random periods in the time-series. This kind of noise may account for regular communication between the client and server that is nonperiodic. Figure 6.5 summarizes the performance of our method against various types of noise. Figure 6.5a displays the detection probability with varying standard deviation. Our approach was able to detect periodic communications with high probability even in the presence of 10% Gaussian noise. As the noise level increased above 20%, all inherent periodic patterns were destroyed, and the algorithm was no longer able to classify the time-series correctly. Figure 6.5b shows that the method performs well even when a large number of events are missing. The detection rate decreases gradually after 60% until there are no more events left. Finally, Figure 6.5c further demonstrates that our method can identify periodic communication even in the presence of a significant amount of noise. The reason for this is because adding new events does not disturb the original series and thus, the primary periodic events still exist in the frequency domain. However, as we 29 (a) Gaussian noise level with varying σ. (b) Missing event noise. (c) Noise by adding events Figure 6.5: Performance evaluation against different noise level. 30 add more events, the energy of these principal events reduces, resulting in them getting filtered out as noise. 6.4 Performance Evaluation Early detection of infection is of paramount importance to prevent further damage. Hence, in this section, we will evaluate our system’s real-time execution time. We measured the time taken across different modules of our approach for analyzing a day’s worth of NetFlow data. The dataset used was NetFlow recorded at a large subnet of our campus network with millions of flow records. The results are illustrated in Figure 6.6. Each bar represents the average execution time (in seconds) over a month of data for a different phase of the proposed method. These analyses were performed on Emulab’s [32] d430 [29] machines equipped with two 2.4 GHz 64-bit 8-Core Xeon E5-2630v3 processors, and 64 GB RAM. It is important to observe that the total processing time to analyze 24 hours of network traffic is less than 10 minutes. Due to its short execution time, our method can be used efficiently for online security analysis. A lot of time used is in the periodicity detection phase. Filtering time-series that reach this stage should help decrease the total time by a substantial margin. Figure 6.6: Average execution time of the stages of the proposed method. 31 6.5 Discussion The goal of network intrusion detection is to analyze millions of NetFlow records for malicious intent. Our detection system reduces the working load to a handful of alerts ranked based on their suspiciousness. Our system generates a few false positives; however, they can be easily filtered out by an analyst. In addition to that, during our experiments, we observed that a few times, these alerts are relevant in other ways as they generally correspond to some other traffic like Tor exit nodes or VPN traffic, which are not malicious per se but are interesting nonetheless. During the evaluation, we observed that our system fails to detect the command and control server if sufficient information is not available. That means the system is more efficient if the malware is present in the network long enough to generate periodic patterns. An active attacker can evade our system by introducing randomness to the interval that the infected machine uses for communicating with the external server. In this regard, we have shown through our evaluation that the system is tolerant towards some level of noise. Also, introducing randomness adds complexity and unpredictability to the attack infrastructure [15, 17], and thus, an attacker would want to avoid it. For example, it becomes unpredictable for an adversary to determine when an infected hosts will contact the command and control server, and thus, there is no guarantee when the malware will receive their commands or updates. We chose NetFlow because it is readily available, is of smaller size, and avoids privacy issues. In addition to NetFlow, our system applies to other data sources such as pcap, web proxy logs, and firewall logs. The core principle of using time-series analysis relies on the availability of request time intervals between a source and a destination; thus, it can be applied to any dataset as long as we have all features from Table 5.1. CHAPTER 7 CONCLUSION Detecting early indication of malware infection is a challenging task. Due to everevolving threat space, it has become harder for intrusion detection systems to identify and take action on malware infection. Our work proposes a method to detect internal nodes infected with malware using periodic communication often called beacons between the malware and the command and control (C&C) servers. Our method relies only on the statistical properties of a connection and thus does not require deep packet inspection. Presence of a vast majority of good services using periodic communications, however, make this task challenging. We start by building time-series for each connection between a client-server pair. Then we use robust algorithms to identify periodic communications. Finally, to segregate malicious traffic from good services, we use a reputation-based anomaly detection mechanism, which produces a manageable list of infected nodes. Through our evaluation, we have shown that a lot of malware families use periodic ”beaconing“ that can be used to detect these infections. We also show that the proposed method is resilient to small perturbations, performs well even in the presence of noise, and can scale to millions of connections per day. Thus, we have proved our thesis statement ”It is possible to detect signs of early malware infection by identifying and analyzing beaconing behavior in the network” to be true. 7.1 Future Work 1. Our current method is a batch detection system where we analyze a day’s worth of data at a time. A future direction would be to pipeline the process so that we can generate alerts in real-time. We can use a similar system as proposed by Ho, Grant, et al. [14] to achieve this. However, the challenging part is to identify if a connection is periodic in real-time. Such a system would need to keep track of all 33 communications long enough to guarantee that enough information is available for periodicity detection. 2. Machine learning has become a favorite tool in multiple domains. Due to lack of a labeled dataset, we were unable to utilize machine learning algorithms for our detection methodology. A good experiment would be to collect a labeled dataset and use machine learning techniques such as a Random forest classifier or Support vector machines for feature selection and classification. 3. A great addition to our method would be to build a framework that can provide context for each detection alert our system generates. During our regular interaction with folks from University Information Technology (UIT), the University of Utah, one of the concerns about using detection techniques based on machine learning or statistical methods was that often, there is no direct correlation with the intuition of a security researcher and the generated notification. As such, a framework that can correlate instances based on the generated alert and provide visualization will allow the analyst to assess and interpret the alerts quickly and more efficiently. REFERENCES [1] Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., Durumeric, Z., Halderman, J. A., Invernizzi, L., Kallitsis, M., et al. Understanding the mirai botnet. In USENIX Security Symposium (2017), pp. 1092–1110. [2] Apruzzese, G., Marchetti, M., Colajanni, M., Zoccoli, G. G., and Guido, A. Identifying malicious hosts involved in periodic communications. In 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA) (Oct 2017), pp. 1–8. [3] Beigi, E. B., Jazi, H. H., Stakhanova, N., and Ghorbani, A. A. Towards effective feature selection in machine learning-based botnet detection approaches. In 2014 IEEE Conference on Communications and Network Security (Oct 2014), pp. 247–255. [4] Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., and Kruegel, C. Disclosure: Detecting botnet command and control servers through large-scale netflow analysis. In Proceedings of the 28th Annual Computer Security Applications Conference (New York, NY, USA, 2012), ACSAC ’12, ACM, pp. 129–138. [5] Chandola, V., Banerjee, A., and Kumar, V. Anomaly detection: A survey. ACM Computing Surveys 41, 3 (July 2009), 15:1–15:58. [6] Claise, B. Cisco systems netflow services export version 9. [7] Dean, J., and Ghemawat, S. Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (Jan. 2008), 107–113. [8] Eslahi, M., Rohmad, M. S., Nilsaz, H., Naseri, M. V., Tahir, N. M., and Hashim, H. Periodicity classification of http traffic to detect http botnets. In 2015 IEEE Symposium on Computer Applications Industrial Electronics (ISCAIE) (April 2015), pp. 119–123. [9] Garca, S., Grill, M., Stiborek, J., and Zunino, A. An empirical comparison of botnet detection methods. Computers & Security 45 (2014), 100 – 123. [10] Gardiner, J., Cova, M., and Nagaraja, S. Command & control: Understanding, denying and detecting. CoRR abs/1408.1136 (2014). [11] Gu, G., Porras, P., Yegneswaran, V., Fong, M., and Lee, W. Bothunter: Detecting malware infection through ids-driven dialog correlation. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (Berkeley, CA, USA, 2007), SS’07, USENIX Association, pp. 12:1–12:16. [12] Gu, G., Zhang, J., and Lee, W. Botsniffer: Detecting botnet command and control channels in network traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium (2008), vol. 8, pp. 1–18. 35 [13] Haffey, M. Characterization of periodic network traffic. Master’s thesis, University of Calgary, 2017. [14] Ho, G., Sharma, A., Javed, M., Paxson, V., and Wagner, D. Detecting credential spearphishing in enterprise settings. In 26th USENIX Security Symposium (USENIX Security 17) (2017), USENIX Association, pp. 469–485. [15] Hu, X., Jang, J., Stoecklin, M. P., Wang, T., Schales, D. L., Kirat, D., and Rao, J. R. Baywatch: Robust beaconing detection to identify infected hosts in largescale enterprise networks. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (June 2016), pp. 479–490. [16] Hubballi, N., and Goyal, D. Flowsummary: Summarizing network flows for communication periodicity detection. In Pattern Recognition and Machine Intelligence (Berlin, Heidelberg, 2013), Springer Berlin Heidelberg, pp. 695–700. [17] Huynh, N. A., Ng, W. K., and Do, H. G. On periodic behavior of malware: Experiments, opportunities and challenges. In 2016 11th International Conference on Malicious and Unwanted Software (MALWARE) (Oct 2016), pp. 1–8. [18] Huynh, N. A., Ng, W. K., Ulmer, A., and Kohlhammer, J. Uncovering periodic network signals of cyber attacks. In 2016 IEEE Symposium on Visualization for Cyber Security (VizSec) (Oct 2016), pp. 1–8. [19] Jones, E., Oliphant, T., and Peterson, P. Scipy: Open source scientific tools for Python. [20] Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., and Srivastava, J. A comparative study of anomaly detection schemes in network intrusion detection. In Proceedings of the 2003 SIAM International Conference on Data Mining (2003), SIAM, pp. 25–36. [21] Oprea, A., Li, Z., Yen, T., Chin, S. H., and Alrwais, S. Detection of early-stage enterprise infection by mining large-scale log data. In 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (June 2015), pp. 45–56. [22] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12 (Nov. 2011), 2825– 2830. [23] Polychronakis, M., Mavrommatis, P., and Provos, N. Ghost turns zombie: Exploring the life cycle of web-based malware. In Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats (Berkeley, CA, USA, 2008), LEET’08, USENIX Association, pp. 11:1–11:8. [24] RisqIQ. RisqIQ PassiveDNS. https://www.riskiq.com. [25] Sandbox, C. Automated malware analysis. https://cuckoosandbox.org, 2013. [26] Shalaginov, A., Franke, K., and Huang, X. Malware beaconing detection by mining large-scale dns logs for targeted attack identification. In 18th International Conference on Computational Intelligence in Security Information Systems. WASET (2016). 36 [27] Tegeler, F., Fu, X., Vigna, G., and Kruegel, C. Botfinder: Finding bots in network traffic without deep packet inspection. In Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies (New York, NY, USA, 2012), CoNEXT ’12, ACM, pp. 349–360. [28] University of New Brunswick. datasets/botnet.html, 2018. Botnet Dataset. http://www.unb.ca/cic/ [29] University of Utah, Flux Research Group. The D430 nodes of Emulab. https: //wiki.emulab.net/wiki/d430, 2018. [30] VirusTotal. Virustotal. https://www.virustotal.com. [31] Vlachos, M., Yu, P., and Castelli, V. On periodicity detection and structural periodic similarity. In Proceedings of the 2005 SIAM International Conference on Data Mining (2005), SIAM, pp. 449–460. [32] White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., and Joglekar, A. An integrated experimental environment for distributed systems and networks. In Proc. of the Fifth Symposium on Operating Systems Design and Implementation (Boston, MA, Dec. 2002), USENIX Association, pp. 255–270. [33] Wurzinger, P., Bilge, L., Holz, T., Goebel, J., Kruegel, C., and Kirda, E. Automatically generating models for botnet detection. In Computer Security – ESORICS 2009 (Berlin, Heidelberg, 2009), M. Backes and P. Ning, Eds., Springer Berlin Heidelberg, pp. 232–249. [34] Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (2012), USENIX Association, pp. 2–2.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6nd217v