Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Open sidebar
public
cyberbits
Commits
34c27045
Commit
34c27045
authored
Jun 28, 2018
by
Wolf
Browse files
Upload New File
parent
043bf40d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
60 additions
and
0 deletions
+60
-0
session-2-anomalies/5_Neighborhood.py
session-2-anomalies/5_Neighborhood.py
+60
-0
No files found.
session-2-anomalies/5_Neighborhood.py
0 → 100644
View file @
34c27045
# load libraries
import
numpy
as
np
from
sklearn.neighbors
import
NearestNeighbors
from
sklearn.metrics
import
accuracy_score
,
f1_score
,
roc_auc_score
# load data set
data
=
np
.
loadtxt
(
'crimerate_binary.csv'
,
delimiter
=
','
)
[
n
,
p
]
=
data
.
shape
# split data into a training set and a testing set
size_train
=
int
(
0.75
*
n
)
# we use first 75% data for training, the rest for testing
sample_train
=
data
[
0
:
size_train
,
0
:
-
1
]
label_train
=
data
[
0
:
size_train
,
-
1
]
sample_test
=
data
[
size_train
:,
0
:
-
1
]
label_test
=
data
[
size_train
:,
-
1
]
# ------------------------------------
# Neighborhood-based anomaly detection
# tutorial slides, page 85 - 93
# ------------------------------------
# step 1. choose number of neighbors
num_neighbor
=
30
# step 2. construct a neighborhood using ONLY NORMAL examples
nbrs
=
NearestNeighbors
(
n_neighbors
=
num_neighbor
,
algorithm
=
'ball_tree'
).
fit
(
sample_train
[
label_train
==
0
,:])
# the following code trains the model using both normal and abnormal examples
# nbrs = NearestNeighbors(n_neighbors=num_neighbor, algorithm='ball_tree').fit(sample_train)
# step 3. compute distance from each testing example to its num_neighbor neighbors
distances
,
indices
=
nbrs
.
kneighbors
(
sample_test
)
# treat average distance as anomalous score
adscore
=
np
.
sum
(
distances
,
axis
=
1
)
/
num_neighbor
# get AUC score
auc_score
=
roc_auc_score
(
label_test
,
adscore
)
# to get detection error and f1-score, we need to threshold anomalous score
# the range of adscore is [0.9, 3.3]
threshold
=
2
adscore
[
adscore
<=
threshold
]
=
0
adscore
[
adscore
>
threshold
]
=
1
# evaluate detection error and f1-score
# now evaluate error and f1-score
err
=
1
-
accuracy_score
(
label_test
,
adscore
)
f1score
=
f1_score
(
label_test
,
adscore
)
# step 4. print results
print
(
'
\n
Neighborhood-based Approach'
)
print
(
'Detection Error = %.4f'
%
err
)
print
(
'F1 Score = %.4f'
%
f1score
)
print
(
'AUC Score = %.4f'
%
auc_score
)
# -----------
# Assignment
# -----------
# 1. construct neighborhood using both normal and abnormal examples (replace line 27 with line 29), what do you observe?
# 2. play with different size of neighborhood (line 24), what do you observe?
# 3. play with different thresholds (line 39), what do you observe?
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment