Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Open sidebar
public
cyberbits
Commits
043bf40d
Commit
043bf40d
authored
Jun 28, 2018
by
Wolf
Browse files
Upload New File
parent
221cc997
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
61 additions
and
0 deletions
+61
-0
session-2-anomalies/4_Statistics.py
session-2-anomalies/4_Statistics.py
+61
-0
No files found.
session-2-anomalies/4_Statistics.py
0 → 100644
View file @
043bf40d
# load libraries
import
numpy
as
np
from
sklearn.neighbors.kde
import
KernelDensity
from
sklearn.metrics
import
accuracy_score
,
f1_score
,
roc_auc_score
# load data set
data
=
np
.
loadtxt
(
'crimerate_binary.csv'
,
delimiter
=
','
)
[
n
,
p
]
=
data
.
shape
# split data into a training set and a testing set
size_train
=
int
(
0.75
*
n
)
# we use first 75% data for training, the rest for testing
sample_train
=
data
[
0
:
size_train
,
0
:
-
1
]
label_train
=
data
[
0
:
size_train
,
-
1
]
sample_test
=
data
[
size_train
:,
0
:
-
1
]
label_test
=
data
[
size_train
:,
-
1
]
# ------------------------------------
# Statistics-based anomaly detection
# tutorial slides, page 75 - 83
# ------------------------------------
# step 1. construct a distribution model
model
=
KernelDensity
(
kernel
=
'gaussian'
,
bandwidth
=
1e0
)
# step 2. estimate the distribution using ONLY NORMAL examples
model
.
fit
(
sample_train
[
label_train
==
0
,:])
# the following code trains the model using both normal and abnormal examples
# model.fit(sample_train, )
# step 3. apply model to estimate density of testing examples
# treat this density as anomalous score
adscore
=
1
-
model
.
score_samples
(
sample_test
)
# get AUC score
auc_score
=
roc_auc_score
(
label_test
,
adscore
)
# to get detection error and f1-score, we need to threshold anomalous score
# the range of adscore is [94.5, 101]
threshold
=
97
adscore
[
adscore
<=
threshold
]
=
0
adscore
[
adscore
>
threshold
]
=
1
# evaluate detection error and f1-score
# now evaluate error and f1-score
err
=
1
-
accuracy_score
(
label_test
,
adscore
)
f1score
=
f1_score
(
label_test
,
adscore
)
# step 4. print results
print
(
'
\n
Statistics-based Approach'
)
print
(
'Detection Error = %.4f'
%
err
)
print
(
'F1 Score = %.4f'
%
f1score
)
print
(
'AUC Score = %.4f'
%
auc_score
)
# -----------
# Assignment
# -----------
# 1. estimate distribution using both normal and abnormal examples (replace line 27 with line 29), what do you observe?
# 2. play with different hyper-parameter bandwidth of distribution model (line 24), what do you observe?
# 3. play with different thresholds (line 38), what do you observe?
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment