当前位置:首页 >> >>

On the Role of Information Compaction to Intrusion Detection


On the Role of Information Compaction to Intrusion Detection
Fernando God? 1 and Dieter Hutter2 and Ra?l Monroy3 ?nez u
Centre for Intelligent Systems, ITESM–Monterrey Eugenio Garza Sada 2501, Monterrey, 64849, Mexico fgodinez@itesm.mx 2 DFKI, Saarbr¨cken University u Stuhlsatzenhausweg 3, D-66123 Saarbr¨cken, Germany u hutter@dfki.de 3 Department of Computer Science, ITESM–Estado de M?xico e Carr. Lago de Guadalupe, Km. 3.5, Estado de M?xico, 52926, Mexico e raulm@itesm.mx
1

Abstract. An intrusion detection system (IDS) usually has to analyse Giga-bytes of audit information. In the case of anomaly IDS, the information is used to build a user pro?le characterising normal behaviour. Whereas for misuse IDSs, it is used to test against known attacks. Probabilistic methods, e.g. hidden Markov models, have proved to be suitable to pro?le formation but are prohibitively expensive. To bring these methods into practise, this paper aims to reduce the audit information by folding up subsequences that commonly occur within it. Using n-grams language models, we have been able to successfully identify the n-grams that appear most frequently. The main contribution of this paper is a n-gram extraction and identi?cation process that signi?cantly reduces an input log ?le keeping key information for intrusion detection. We reduced log ?les by a factor of 3.6 in the worst case and 4.8 in the best case. We also tested reduced data using hidden Markov models (HMMs) for intrusion detection. The time needed to train the HMMs is greatly reduced by using our reduced log ?les, but most importantly, the impact on both the detection and false positive ratios are negligible.

1

Introduction

Probabilistic methods for intrusion detection, e.g. hidden Markov models (HMMs), have proved to be suitable to user pro?le formation. This pro?ling can be used both for misuse and anomaly detection. However, pro?le formation is prohibitively expensive, since these methods consume many computational resources [1, 2] with respect to the amount of information to be analysed. To complement an earlier work [3], in this paper we propose a session folding method that signi?cantly reduces the amount of information a probabilistic method
This research is supported by three research grants CONACyT 33337-A, CONACyTDLR J200.324/2003 and ITESM CCEM-0302-05.

ought to analyse, without missing key details, thus making the use of probabilistic methods feasible. At system call level, a session is composed of a ?nite sequence of system calls. The length of the sequence can vary depending on the activity of the user, it can be as short as a few hundred system calls, or as large as tens of thousand system calls. We have used HMMs both to, classify the service a session belongs to [4], and to detect intrusions. Time complexity for training and parsing of an HMM is directly related to the length of the sequence they ought to analyse. To shrink a log ?le, we suggest to fold up the subsequences of system calls that most frequently occur within the sessions in a log ?le. Our method substitutes each such a repetitive session subsequence with a fresh tag, which can be regarded as a meta-level system call. To identify the most repetitive session subsequences, we use n-gram theory. N-gram theory has been largely used in the context of natural language processing; it has also been used in anomaly detection [5–9]. Let an n-gram be a sequence of n symbols, system calls in our case. Then, an n-gram language model is used to calculate the probability that a system call s will appear at position n, given the occurrence of an (n ? 1)-gram; that is, a sequence of n-1 system calls. So the n-gram language model enables us to estimate the likelihood of appearance of a given n-gram along a larger sequence. By using n-gram theory, we have identi?ed the n-grams with the highest rate of occurrence among the log ?les regardless the service, and within three di?erent services, namely: i) ftp, ii) telnet and iii) smtp. This paper’s key contribution is a collection of n-grams that are very likely to be repetitively found in many computer sessions of the analysed services. Folding repetitive n-grams signi?cantly reduces the length of a given session. Along our analysis we used n-grams of di?erent length, ranging from bigrams to 100-grams. Our experiments show that we can e?ectively reduce an input session by an average factor of 4 (3.6, worst case scenario, and 4.8, best case one.) The reduced log ?les were tested using an HMM IDS. The time needed to train the HMMs is greatly reduced by using a folded session. The order of HMMs parsing is O(n2 l) where n is the number of states in the HMM, and l is the length of the parsed sequence. Therefore parsing time is reduced by the same factor by which we reduce the input sequences. More importantly, detection ratio is better than the one obtained with non reduced log ?les. The false positive ratio is only 1% higher. It is worth mentioning that data used to extract the n-grams do not contain any attacks or anomalies. The remainder of this paper is organised as follows: §2 is an overview of other attempts at log ?le reduction; §3 presents the reason to chose n-grams over other feature extraction methods like HMMs; §4 gives a brief overview of how and why we chose data for identifying repetitive n-grams; §5 shows the use of the n-gram models to identify the n-grams with higher occurrence frequency; §6 summarises the reduct obtained throughout our investigations; §7 is a description of our validation experiments and the selected data for such experiments; §8 shows the results obtained from applying the validation experiments to a collection of BSM log ?les and to the service selection module [4]; §9 shows the impact of

our methodology on intrusion detection; ?nally, conclusions drawn up from our investigations appear in §10.

2

Session Length Reduction Methods

N-gram theory has been largely used in the context of natural language analysis, it also has been used in anomaly detection by Maxion and Tan [6, 7], Marceau [8], Wespi [9], and Forrest et al. [5]. All these papers present ways of using n-grams for anomaly detection and not for log ?le reduction. Therefore comparing our method with these methods is out of the question. Log ?le reduction methods are presented below. Marin et. al. [10] have suggested to use an expert system to approach log ?le reduction. The expert system ?rst discriminates objects that most frequently occur, using a series of fuzzy rules. Then it clusters the discriminate objects, to form a small number of object classes. Even though its reduction factor is very high, this method is prone to a large false-negative intrusion detection rate. This is because it prunes out key information and is restricted to a 1-gram only. By contrast, we do not get rid of non-commonly used system calls and the scope of our reduction analysis is considerably larger. As we showed, this contributes to keep false-negative intrusion detection rate low. Lane and Brodley have also addressed the problem of log ?le reduction [11, 12]. They have proposed two main clustering methods, one based on K-centres and the other on greedy clustering. By applying both methods, each cluster contains a collection of objects sequences, which is then collapsed into only two objects: the cluster centre and the associated mean. The other object sequences are simply disregarded. Lane and Brodley’s methods may yield a huge reduction ratio (e.g. 500 di?erent sequences might be shrunk to only 30 ones or even fewer); however, eager sequence elimination inevitably leads to poor or incomplete pro?les and therefore to an increase in the false-alarm detection rate. By comparison, even though our technique yields a lower reduction ratio, it does not have a negative impact on the false positive ratio. Besides from the two methods above mentioned, Lane and Brodley have also explored two heuristic pruning techniques: least recently used (LRU) and least frequently used (LFU). In both techniques the reduction ratio is de?ned a priori and hence a predetermined number of sequences ought to be eliminated from the input session. Both techniques will of course produce a reduction ratio as high as indicated, but at the expense of losing of chief information. Lane and Brodley report on an increment in the false-positive and false-negative ratio (20% and 16% respectively). By comparison, even though our technique yields a lower reduction ratio, it does not have a negative impact on the false positive ratio. But most importantly false positives are reduced. Knop et. al. [13] have addressed a similar problem, which also aims to make intrusion detection feasible, but does not consider log ?le reduction. Rather than shrinking an input session, Knop et. al.’s method simpli?es its content by calling similar objects with the same name. Let Σ(seq) denote the alphabet of symbol

sequence seq, then given a sequence s, the method will return a new sequence, s such that length(s ) = length(s) but that |Σ(s )| |Σ(s)|. The method works as follows: it ?rst correlates objects using a method similar to principal component analysis [14]. This way, two objects will be assigned a coe?cient equal to 1 if they are completely correlated, and equal to 0 otherwise. Then using this correlation coe?cient information, a K-Nearest Neighbours algorithm is applied for achieving object clustering. Object reduction amounts to selecting one object from a cluster discarding the rest of them. Once again, the alphabet simpli?cation factor is very impressive but so is the loss of information. One important limitation of Knop et al.’s approach is that it is in general di?cult to apply in the intrusion detection context. This is because the method requires some kind of numerical value to express correlation between objects to be applicable. This is unnatural, since almost every piece of data in sessions are strings. This completes our revision of related work regarding session length reduction. In the next section we will analyse the reasons that to choose n-grams as our reduction methods.

3

The use of N-grams as a Reduction Method

N-grams models are generally used to predict the next element in a sequence. To make this prediction an n-gram model has to have a frequency analysis from which a probability of occurrence is extracted. This probability is then used to estimate the probability of occurrence of the next element in the sequence. Other methods can be used to predict the next element in a given sequence, e.g. HMMs. N-grams can be used to identify the most repetitive subsequences in any sequence of elements. HMMs can also do this job. But the HMMs do not ?nd an speci?c sequence, they ?nd a family of subsequences, i.e. [15, 16]. If we were to use a method such as HMMs then we can substitute any subsequence described by the family for the same tag. This substitution allows for an attack subsequence to be substituted and regarded as a normal subsequence. By using n-gram models we guarantee a 1-to-1 substitution relation. That way, variations of an attack that pose as a normal sequence (mimicry attacks) are not substituted. Another thing we want from an IDS is not to lose the capability to return to the original sequence. One reason for wanting the reduction process to be reversible is for forensic analysis. We must assume that after an intrusion is detected, the attacker already has done some harm to the system. Therefore, we will perform a forensic analysis on the data to follow all of the attacker steps. To follow the events we need the track of activity that lead to an insecure state of the system. When designing a security device it is necessary to assume that this device will interact with other devices. And the design of our methods is done so it can provide the most information to other devices in the security chain. In the next section we will describe the data we selected to generate our n-gram models and to extract the most frequent n-grams.

4

Data Selection for N-Gram Extraction

For the process of n-gram selection, we chose to use the 1998 DARPA repository as our data source. We used 5 log ?les out of 35 log ?les for the n-gram selection process. This number of log ?les form a representative sample: n= N 1 + N e2 (1)

Where, n is the number of samples we need to take to represent a population of size N with a con?dence 1 ? e (or with an error tolerance of e). In our case we chose a con?dence of 97.5% (e = 2.5%). We have an average of 250 sessions in each log ?le. For a population of 8750 sessions we need around 1300 sessions as our sample. So we used 5 log ?les (close to 1250 sessions) as our representative sample. In order to take a representative sample for a year of log ?les (52 weeks, equivalent to 65000 sessions), considering the same 97.5% con?dence, we would need 1500 sessions. Using 1250 sessions we have a con?dence of 97.2% that this sample represents a year of data. We only used sessions without attacks or anomalies. The rationale behind this is to select only n-grams that identify normal tra?c and do not add noise to the intrusion detection process. The selected log ?les are the ones with less attack sessions, so after discarding attack or anomaly sessions we ended up with the highest possible number of sessions. A BSM log ?le is composed of a set of sessions, each of which belongs to a given service. Moreover each session is formed by one or more processes. During our methodology we separated each of these sessions and concatenated all of its processes. Both the number of sessions and processes in each session are variable. By separating log ?les and ordering them by sessions, we extracted high occurrence frequency n-grams that are part only of a single session without overlapping contiguous sessions. From now on, we will refer to these ordered sessions as the training sessions. We selected n-grams in two ways: i) extracting them from every available session; ii) extracting them from service speci?c sessions. Each log ?le contains a ?nite number of sessions, each of which belongs to a given service. Some services have a more representative body among the log ?les. Services with lower number of sessions have n-grams that reduce that service greatly. Since these n-grams have lower frequency than the n-grams of most common services they might get lost in the process. That is why we assigned a priority that uses the number of sessions a service belongs to instead of the total number sessions. In the following section, we will describe the entire selection and reduction process. In these processes we will describe the use of the above mentioned priority.

5

Session Folding Using N-gram Models

This section aims to describe how we used the n-gram theory to achieve log ?le reduction through session folding. The reduction is two step procedure, the ?rst

step aims at selecting the n-grams with the highest repetition frequency and to attach their corresponding priority, we will call this process n-gram detection and tagging. The second step, is the actual reduction process which makes use of the tagged n-grams extracted in the ?rst step for session folding. N-gram detection and tagging only needs to be performed once and can be done o?ine. Whereas the reduction is done in real time and for every session. Following is a description of each step in the n-gram identi?cation process: i) n-gram extraction; ii) n-gram frequency analysis and n-gram priority assignment. Later on we shall describe each step in the reduction process: i) n-gram comparison, aimed at avoiding overlapping; ii) subsequence substitution using a fresh tag. Step 1 in the identi?cation process is the language model creation process. Step 2 uses the frequency analysis from the language model for a priority assignment (both will be described below). 5.1 N-Gram Identi?cation and Tagging

The ?rst step toward session folding is to extract the n-grams with a higher occurrence frequency and occurrence probability, and then tag them with a priority. The priority is calculated using both a combination of services, and a service exclusive analysis. N-Gram Extraction N-Gram extraction consists of the application of a blind, exhaustive procedure. As a result, we obtained the n-grams that occur most frequently in the training sessions. Although in theory n-gram model creation should consider all possible n-grams, in practise only n-grams that exist within the training data are used. For example for a 10-gram with a vocabulary of 200 tokens, the possible number of sequences should be 20010 × 199. However, in our experiments, we found only 2291. N-Gram extraction prunes all sequences with 0 occurrences. For the ?nal language model a low probability is assigned to pruned sequences. In our calculations, we considered the n-grams that were pruned from the entire log ?le as well as those pruned from di?erent services within that ?le. N-gram Priority Assignment Using the n-gram count and the language model, we identi?ed the n-grams with a higher frequency or probability of occurrence. We considered n-grams of di?erent sizes to ?nd n-grams that provide a high reduction rate. If an n-gram is present in the training log ?les an occurrence frequency is assigned to it. If it is not present then an occurrence probability is assigned. Using either the n-gram occurrence frequency or probability, we estimated a reduction ratio later used as a priority P r for every n-gram that was found. The priority is used to select which n-gram to use ?rst in the reduction process. Not only does this ratio consider large n-grams with a high frequency, but it also considers the total number of system calls N on the training sequence sessions

from which the n-grams were extracted. We calculated a reduction percentage for every selected n-gram, i.e. how much will a given n-gram reduce a log ?le. If we use n-gram occurrence frequency f the n-gram priority P r is calculated by: n × (f + 1) Pr = (2) N By contrast if we use n-gram occurrence probability P the n-gram priority P r is calculated by: Pr = P × n (3) In both cases n is the size of the n-gram. Both equations provide a reduction ratio for the input n-gram. Using P r we choose the sequences that provide a high reduction rate. Our n-gram selection criterion is divided in two: selection of n-grams with high P r at log ?le level, and n-grams with high P r at service level. Service Exclusive N-gram Selection and Priority Assignment It is possible that a certain n-gram has a high ratio, i.e. in an smtp session; but smtp sessions are a small segment of an entire log ?le. This is because some services in the training log ?les have a less representative body than other services. This situation might exclude n-grams with low reduction ratio at log ?le level from the reduction set, even though the reduction ratio within a given service is high. The same analysis presented for n-gram priority assignment is done at service level and it is our second criterion for n-gram selection. Assigning a priority to an n-gram is not su?cient to avoid overlapping. Below we will show how we solved this problem. 5.2 Session Folding Using Tagged N-Grams

N-grams tend to overlap with each other, they might intersect at some point. To avoid overlapping when making an n-gram substitution we used a priority queue approach to select the n-gram to substitute. The queue was used to substitute high ratio n-grams. We created a window of size equal to the largest n-gram in the queue. Once the window is full, we tested its content against all n-grams in the queue. The order of the priority queue is given by the ratio de?ned in equations (2) and (3). By substituting n-grams with higher ratio we warranty that, even if there is an overlapping, only the n-grams that provide maximum reduction are used. Notice that by substituting an n-gram with a new symbol we are avoiding further substitution on that segment resulting in overlapping elimination. We avoid substitution because the newly added symbol is not present in any n-gram used in the substitution. Log ?le reduction using n-grams is accomplished by n-gram substitution at session level. In n-gram model creation, overlapping is not considered, but only conditional probability. We need to take the concept even further and include P r as a reduction measure. Whenever a high priority n-gram is found, it is replaced

by a fresh symbol that substitutes the entire n-gram. The ?rst time we use an n-gram, a new symbol is created and added to the symbol dictionary. In the next section we will show how this methodology is applied to session folding and the subsequent reduction.

6

The Methodology in Action

We will provide evidence that as a result of applying our methodology, we obtained a large reduction ratio using a small number of n-grams. For the experiments reported in this paper we used the CMU-Cambridge Statistical Language Modelling Toolkit [17]. The toolkit provides a series of UNIX tools that facilitate language modelling. Each of the steps required for n-gram substitution and log ?le reduction is described below. 6.1 N-gram Extraction

By using the CMU toolkit we extracted the n-grams and the associated frequency. The analysis was made for each log ?le and also for each service. The results of such an analysis are presented in histograms shown in ?gure 1 where the ?rst ?gure corresponds to an all services histogram and the second to a telnet histogram. Here, histogram’s axises are shown as a right hand coordinate system. The x axis represents the n-grams size. The z axis represents the frequency f for that given n-gram. The y axis is the number of di?erent n-grams of size n with frequency f. The histograms are used to analyse the amount of n-grams that have a number of repetitions similar to a multiple of the number of di?erent sessions included in the training data. If there are m sessions and the frequency of an n-gram is a multiple of m, then that n-gram is more likely to be common among every session. That is, we prefer n-grams that are repeated in a large number of sessions over n-grams that repeated many times in a couple of sessions. The former are more general n-grams and therefore provide a better reduction ratio for unseen sessions. The same histograms can be used to know in advance, how many ngrams to look for when making the frequency based selection. In what follows we will see such frequency analysis and priority assignment. 6.2 N-Gram Tagging

Based on the occurrence frequency analysis we identi?ed the n-grams with a desired number of repetitions. This analysis makes the extraction of such ngrams much easier and faster. From the extraction we chose 100 n-grams for the reduction. These n-grams are mostly the ones whose occurrence frequency is close to a multiple of the number of sessions. Also, based on the language model probabilities we selected about 50 n-grams with a probability of occurrence above 98%.

Fig. 1. Mixed Services and Telnet Histograms

We also extracted 50 di?erent n-grams using the analysis over separate services. Selected n-grams have an occurrence frequency similar to a multiple of the number of sessions for that service. This means that such n-grams are common to many sessions of that service. All these n-grams have an associated occurrence count, and the total number of system calls present in the original log ?le. The n-gram set used in the folding process and the dictionary are available at our web site4 . Along with the size of the n-grams, the number of system calls will de?ne the reduction ratio. After extracting n-grams for every log ?le (both all sessions and service separated) a language model is generated. We need to explain the selection of the discounting strategy. The one provided with the software we used is the good-Turing estimator. N-gram theory is a great method to avoid elimination of unseen n-grams without imposing considerable probability reduction of existing n-grams. The main overhead of using n-gram language models for reduction is the space required to hold a language model ?le. For a log ?le with 800, 000 objects of size 17Mb, a ?le of 5Mb is needed to hold n-gram occurrence frequency and about 1Gb to hold the language model. The language model and an n-gram occurrence frequency ?le are used to extract key n-grams. Fortunately, the language model and the frequency ?le are temporal, as they are eliminated after key n-gram extraction. The format for the language model is described in [17]. After selecting n-grams with high occurrence frequency or probability we calculated the reduction ratio. The reduction ratio will sort selected n-grams in a priority queue for subsequent replacement of such n-grams in the log ?les. Using the priority queue we substitute, or fold, a given session. Prior to the substitution we load any abbreviation dictionary that we have previously used in order to avoid repetitive abbreviations in sub-sequent reductions. The priority queue is then used to avoid overlapping. 6.3 Session Folding

With the use of the dictionary we selected the next abbreviation that will be used in the folding process. An abbreviation is only generated when an n-gram is used in a substitution, not all n-grams will be assigned an abbreviation. For example from the 100 n-grams selected based on occurrence frequency, only 11 were really used. From the 50 selected based on occurrence probability, only 5 were used and from the 50 selected from each service only 3 were used. This means that about 89% to 93% of the selected n-grams were overlapping. Selected n-grams do not intersect, so after the analysis, subsequent reductions did not consider a priority. Even by using only 9% of the selected n-grams, we obtained a reduction of 65% in the worst case and a reduction of 82% in the best case. We obtained an average reduction of 74%. This reduction was tested using the n-gram set generated from the ?rst 5 BSM log ?les to reduce another set of 5 BSM log ?les. The result is a set of 19 n-grams that provide an average reduction rate of 74%.
4

http://webdia.cem.itesm.mx/ac/raulm/ids

In the next section we will provide an insight of the experiments we conducted to test our reduction method and the data involved in our experiments.

7

Validation Experiments and Data Selection

To test the reduction capabilities of our method we chose to reduce log ?les from the 1998 and 1999 DARPA repositories using extracted n-grams from the 1998 repository. By using to di?erent years of log ?les we aim at showing the generality of our reduction method. As we will see, the results are pretty similar between reductions for each year. This is proof that changes in user activity is not a critical factor for our reduction method. We speci?cally used the 5 log ?les used for n-gram selection plus another 5 logs from 1998 and 5 more from 1999. This is a total of 15 log ?les out of 40. As described in §4, this gives us a con?dence of 98.8%. The data used in the validation process includes attack and anomaly sessions. The validation procedure is straight-forward, we use the extracted and tagged n-grams, and then apply the reduction methodology to avoid overlapping. Nonetheless, we still need to prove that our reduction method keeps the information necessary to discern between two events regardless the reduction. To prove this, we compared the service selection results described in [4] using folded sessions against the results of unfolded sessions. We also used this comparison to show the HMM training time di?erence between folded and unfolded sessions. Another test is to compare the results of using our IDS with mimicry attacks over folded and unfolded sessions. In the next section we present the results of applying the above mentioned experiments.

8

Validation Results

In this section we will explain how the resulting set of n-grams was validated. The validation is performed by means of a reduction of unseen sessions. Extracted n-grams provide an average reduction of 74% within the training sessions. We also used the n-grams to reduce unseen sessions from 5 di?erent log ?les from the 1998 repository, and 5 from 1999. As input we have an unseen log ?le and as output we provide the reduced log ?le. In tables 1, and 2, we show the reduction ratio over the validation data of 1998 and 1999 respectively (we will only show results for unseen data). The table columns are: log ?le ID, original number of system calls, compressed number of system calls, and number of ngrams used in the reduction. Last row of the table shows the results of applying the reduction to a ?le with only telnet sessions. By training the HMMs we can validate the impact of our methodology in training times. As we expected training time is signi?cantly reduced by using folded sessions. The reason for time reduction is that the order of the HMMs training algorithm is O(n2 l), where n is the number of states in the HMM, and l is length of the training sequence. In ?gure 2 we show training times for the HMMs presented in [4]. In each ?gure, training times for unfolded sessions and its folded counterpart are

Table 1. Validation Results, 1998 Log Files Log File ID 1 2 3 4 5 telnet Original Compressed Reduction % Used Object # Object # N-grams 776, 000 270, 000 65.3% 7 1, 800, 000 486, 000 73% 12 1, 150, 000 344, 000 70.1% 5 801, 000 175, 000 78.2% 9 1, 158, 000 392, 000 66.2% 5 209, 000 48, 000 77.1% 5

Table 2. Validation Results, 1999 Log Files Log File Original Compressed Reduction % Used Object # Object # N-grams 1 820, 855 248, 719 69.7% 9 2 490, 896 142, 360 71% 8 3 630, 457 198, 594 68.5% 7 4 520, 358 139, 456 73.2% 11 5 220, 658 52, 296 76.3% 13

contrasted. Also in table 3 we show the comparative results for service selection using folded and unfolded sessions. The ?rst column in the table indicates the service discriminator used. The ?rst row indicates to which service a session belongs to. The table should be read as: the percentage of sessions of service n (?rst row) classi?ed as service m (?rst column). The ?rst percentage of each cell corresponds to unfolded sessions and the second result corresponds to folded sessions. The service selection process is explained in [4]. We can see from this table that not only does session folding keeps discernibility information, but it also reduces the number of false positives. The false positive reduction for intrusion detection will be explored in the next section.
Table 3. % of Correct Service Discrimination, Folded vs. Unfolded Sessions HMMs telnet headers telnet 100% vs. 100% smtp 1% vs. 0.8% ftp 0% vs. 0% finger 0% vs. 0% smtp headers ftp headers finger headers 2% vs. 1.8% 1% vs. 1% 0% vs. 0% 100% vs. 100% 2% vs. 1.6% 0% vs. 0% 1% vs. 0.7% 100% vs. 100% 0% vs. 0% 0% vs. 0% 0% vs. 0% 100% vs. 100%

9

Reduction Impact on Intrusion Detection

One of the major hypothesis to be tested in this thesis states that shrinking the information to be analysed does not get in the way to intrusion detection. This

Fig. 2. Telnet Service HMM Training Times, Unfolded Sessions vs. Folded Sessions

section aims to show that using n-gram reduction allows us to use HMMs with larger sequences than the ones proposed in previous works, such as [2, 18, 1]. HMMs take a large amount of time for training. Wagner, in [19], also describes the disadvantages of using only short sequences as the detection base using HMMs. We used entire sessions containing the attacks for both, our training and testing data. We used a single word network for each attack. We used 20 instances of each attack to train the HMMs. The tests were conducted against entire sessions of di?erent services, in this case we used folded sessions. Again, we tested against 800 telnet, 1000 smtp, 50 ftp and 150 finger sessions. 9.1 Detection Results

By using reduced session we obtained a 98% detection ratio and the false positives were 23 out of 200 detected attacks. With a higher similarity measure, 95%, false positives lowered from 23 to 14 and the detection ratio also lowered but only to 94%. We tested the same attacks for both scenarios. The di?erence in false positives was found in short attacks as eject. Most of the false positives were normal sessions labelled as one of these short attacks. Nevertheless, higher detection ratio is present in variations of these same short attacks. We have successfully detected a signi?cant subclass of general case mimicry attacks which is a great breakthrough since no other research has done anything similar. For all our experiments we used the “Hidden Markov Model Toolkit (HTK)”. The software allows for large HMMs to be used and it also has the ability to use word networks. The software and its documentation can be found at http://htk.eng.cam.ac.uk/. We now present the conclusions that we have drawn from our experiments.

10

Conclusions

Based on our results we conclude that we successfully reduced the number of objects in a log ?le with nearly no impact for intrusion detection. By identifying a small number of key n-grams we reduced BSM log ?les by a factor of 4. The number of key n-grams is small enough not to increase considerably the vocabulary of system calls. An increase in the vocabulary would impact on the training time of a method such as HMM. Our method allowed us to ?nd a small number of n-grams that provide a large reduction. This reduction ratio is comparable to the ones proposed by rival techniques and even better in most cases. Moreover, our reduction method is capable of returning to the original set of system calls. By contrast, rival techniques are incapable of reverting the reduction process as discussed in section 2. We also trained some HMMs with unfolded sessions and with folded session, and the di?erence in training times between them is a considerable improvement. That way we proved that using longer sequences to train HMMs is convenient. In section 9 we showed how folded sessions are equally useful, in some cases better, than unfolded sessions for intrusion detection.

References
1. Warrender, C., Forrest, S., Pearlmutter, B.: Detecting Intrusions Using System Calls: Alternative Data Models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, IEEE Computer Society Press (1999) 133–145 2. Yeung, D., Ding, Y.: Host-Based Intrusion Detection Using Dynamic and Static Behavioral Models. Pattern Recognition Vol. 36 (2003) pp. 229–243 3. God? ?nez, F., Hutter, D., Monroy, R.: Attribute Reduction for E?ective Intrusion Detection. In Favela, J., Menasalvas, E., Ch?vez, E., eds.: Proceedings of the 2004 a Atlantic Web Intelligence Conference, AWIC‘04. Number 3034 in Lecture Notes in Arti?cial Intelligence, Springer-Verlag (2004) 74–83 4. God? ?nez, F., Hutter, D., Monroy, R.: Service Discrimination and Audit File Reduction for E?ective Intrusion Detection. In: Proceedings of the 2004 Workshop on Information Security Applications, WISA‘04. Number 3325 in Lecture Notes in Computer Science, Springer-Verlag (2004) 101–115 5. Forrest, S., Hofmeyr, S., Somayaji, A., Longsta?, T.: A Sense of Self for Unix Processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, Los Alamitos, CA, IEEE Computer Society Press (1996) 120–128 6. Maxion, R.A., Tan, K.M.C.: Anomaly Detection in Embedded Systems. IEEE Transactions on Computers 51 (2002) 108–120 7. Maxion, R.A., Tan, K.M.C.: Benchmarking Anomaly-Based Detection Systems. In: Proceedings of the 1st International Conference on Dependable Systems and Networks, New York, New York, USA, IEEE Computer Society Press (2000) 623– 630 8. Marceau, C.: Characterizing the Behavior of a Program Using Multiple-Length Ngrams. In: Proceedings of the 2000 Workshop on New Security Paradigms, ACM Press (2000) 101–110

9. Wespi, A., Dacier, M., Debar, H.: An Intrusion-Detection System Based on the Teiresias Pattern-Discovery Algorithm. In Gattiker, U.E., Pedersen, P., Petersen, K., eds.: Proceceedings of EICAR ’99. (1999) 10. Marin, J.A., Ragsdale, D., Surdu, J.: A Hybrid Approach to Pro?le Creation and Intrusion Detection. In: Proc. of DARPA Information Survivability Conference and Exposition, IEEE Computer Society (2001) 11. Lane, T., Brodley, C.E.: Temporal Sequence Learning and Data Reduction for Anomaly Detection. ACM Transactions on Information and System Security 2 (1999) 295–331 12. Lane, T., Brodley, C.E.: Data Reduction Techniques for Instance-Based Learning from Human/Computer Interface Data. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann (2000) 519–526 13. Knop, M.W., Schopf, J.M., Dinda, P.A.: Windows Performance Monitoring and Data Reduction Using WatchTower and Argus. Technical Report Technical Report NWU-CS-01-6, Department of Computer Science, Northwestern University (2001) 14. Rencher, A.: Methods in Multivariate Analysis. Wiley & Sons, New York (1995) 15. Krogh, A., Brown, M., Mian, I.S., Sj¨lander, K., Haussler, D.: Hidden Markov o Models in Computational Biology: Applications to Protein Modeling. Journal Molecular Biology 235 (1994) 1501–1531 16. Hughey, R., Krogh, A.: Hidden Markov Models for Sequence Analysis: Extension and Analysis of Basic Method. Comp. Appl. BioSci 12 (1996) 95–108 17. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book for HTK Version 3.2. Cambridge University Engineering Department (2002) 18. Qiao, Y., Xin, X., Bin, Y., Ge, S.: Anomaly Intrusion Detection Method Based on HMM. Electronic Letters 38 (2002) 663–664 19. Wagner, D., Soto, P.: Mimicry Attacks on Host Based Intrusion Detection Systems. In: Proceedings of the Ninth ACM Conference on Computer and Communications Security, Washington, DC, USA, ACM (2002) 255–265


相关文章:
更多相关标签: