COMPUTER NETWORKS RESEARCH LAB

Department of Electrical and Computer Engineering, McGill University

NAVIGATION

Home
People
Photos

RESEARCH

Projects
Publications

LINKS

AAPN
MITACS
McGill TSP
McGill ECE

LOCAL ACCESS

Local Info

Result Snapshots and Sample Code Return to Network Monitoring Projects

Anomalies are primarily declared by our Kernel-based Online Anomaly Detection (KOAD) algorithm, when the projection error d_t exceeds the upper threshold n₂ (Red1 alarm). Fig. 1 presents the results of running the KOAD algorithm on an example dataset from Abilene.
The primary offline methods use the technique of Principal Component Analysis (PCA) [1-2]. Hence, we compare our results with those obtained using PCA.

Furthermore, we observe that our proposed anomaly detection algorithm effectively identifies a region of normality, that corresponds to a high-density region of the space. It is natural to compare the outcome of the KOAD approach to other approaches for determining high-density regions. One common approach is the estimation of minimum volume sets (MVSs) [4].

Muñoz and Moguerza have proposed the One-Class Neighbor Machine (OCNM) algorithm for estimating minimum volume sets in [5].

Fig.1: (a) KOAD projection error d_t, with green dotted line indicating n₁ and n₂; (b) magnitude of PCA projection onto the residual subspace with red dashed line indicating PCA Q-statistic threshold; (c) OCNM using nth nearest-neighbour distance with red dashed line indicating anomaly distance threshold; and (d) positions at which anomalies are detected by (diamond) KRLS, (o) PCA, and (square) OCNM, all as functions of time.

Fig. 2 depicts the progression of the kernel values of x_t with x_j for three different types of dictionary members x_j. Fig. 2(top panel) shows a consistently useful dictionary element. Many input vectors are close to this element, resulting in high kernel values. Sudden drops below the threshold correspond primarily to anomalies. Fig. 2(middle panel) shows the behaviour of a dictionary element that gradually becomes obsolete. This figure motivates the need to eventually discard such dictionary members. Fig. 2(bottom panel) illustrates the case of a Red2 alarm. The kernel values drop significantly below the threshold immediately after this input vector arrives and produces a d > n₁. The algorithm signals an orange alarm in this case, tracks it for l timesteps, upon which it performs the secondary usefulness test and elevates the orange alarm to a Red2 alarm. This anomalous input vector is never entered into the dictionary.

Fig.2: Behaviour over time of k(x_t,x_j) when (top) x_j a normal D member; (middle) a D member that eventually becomes obsolete; and (bottom) an anomalous D member. In each figure, the red dashed line indicates the usefulness threshold d.

Fig. 3 illustrates example anomalies in the Abilene flow-counts data set. (a) The structure of normal traffic shifts dramatically for approximately 240 time steps (20 hours). Approximately 20 backbone flows experience step-changes in behaviour during this time period (Fig. 3(a), top panel), and the nature of the network traffic changes fundamentally. When the first input vector corresponding to this shift arrives, KOAD signals a Red1 alarm (as d_t > n₂), and keeps tracking it for l timesteps, during which all input vectors produce significantly high d_t values and are all signalled as Red1 alarms (Fig. 3(a), bottom panel). Once the first input vector corresponding to this shift is entered into the dictionary, it explains all subsequent input vectors that belong to this temporary shift, and d_t becomes low. The spikes at t = 1100 and 1160 represent different, definite anomalies. PCA dedicates one principal component to this block of time and does not indicate any anomalous behaviour. (b) Three anomalies occur in the Abilene IP flow-counts dataset on the Seattle - Chicago backbone flow, at around t = 710, 1434 and 1615. Each anomaly persists for 3 - 5 timesteps, where the number of IP flows increases dramatically (Fig. 3(b), top panel). The PCA-based algorithm clusters these anomalies together and dedicates a principal component to describing them (Fig. 3(b), bottom panel). The anomalies consequently go undetected. The KOAD algorithm in contrast adapts over time and identifies all three as anomalies for most threshold settings.

Fig. 3a Top panel: Variation in the number of IP flows over time in the Abilene Seattle-Chicago backbone flow. Bottom panel: Variation in the KOAD d_t during the same time-period for n₁ = 0.03 and n₂ = 0.07 (red dashed line).

Fig. 3b Top panel: Variation in the number of IP flows over time in the Abilene New York - Chicago backbone flow. Bottom panel: Projection of x_tonto the second principal component over the same interval.

Sample Code and Data

We use data collected from the 11 core routers in the Abilene backbone network for 1 week (December 15 to December 21, 2003). It is comprised of a multivariate timeseries consisting of the number of packets (P), the number of individual IP flows (F), and the number of bytes (B) in each of the Abilene backbone flows (the traffic entering at one core router and exiting at another). The statistics are collected (binned) at 5 minute intervals. All three datasets are of dimension T x F, where T is the number of timesteps (in our case = 2016) and F is the number of backbone flows (in our case = 121). At each subsequent timestep t, we may then use the tth row of the relevant timeseries (P, F or B) as the input vector x_t.

The data and a zip file containing the Matlab source code to run our algorithms, are available here: Abilene.mat & KOADcode.zip. The code package includes the following files: "KOAD.m" which is to be used with the "kernel.m" function; "PCA.m"; "OCNM.m" which is to be used with the "M1.m" sparsity function.
Another zip file containing the code used to generate the figures on this page can be found here: Figs.zip. To obtain the figures on this page, we run KOAD, PCA and OCNM on Abilene.mat to produce the following files (storing the experimental results): "Fig1.mat"; "Fig2.mat"; "Fig3a.mat"; "Fig3b.mat". The Matlab code to subsequently plot the figures from the relevant experimental results file are: "Fig1.m"; "Fig2.m"; "Fig3a.m"; "Fig3b.m".