Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Open sidebar
public
cyberbits
Commits
8060935c
Commit
8060935c
authored
Jun 28, 2018
by
Wolf
Browse files
Upload New File
parent
ae3f4b62
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
54 additions
and
0 deletions
+54
-0
session-2-anomalies/1_Classification_LinearRegression.py
session-2-anomalies/1_Classification_LinearRegression.py
+54
-0
No files found.
session-2-anomalies/1_Classification_LinearRegression.py
0 → 100644
View file @
8060935c
# load libraries
import
numpy
as
np
from
sklearn
import
linear_model
from
sklearn.metrics
import
accuracy_score
,
f1_score
,
roc_auc_score
# load data set
data
=
np
.
loadtxt
(
'crimerate_binary.csv'
,
delimiter
=
','
)
[
n
,
p
]
=
data
.
shape
# split data into a training set and a testing set
size_train
=
int
(
0.75
*
n
)
# we use first 75% data for training, the rest for testing
sample_train
=
data
[
0
:
size_train
,
0
:
-
1
]
label_train
=
data
[
0
:
size_train
,
-
1
]
sample_test
=
data
[
size_train
:,
0
:
-
1
]
label_test
=
data
[
size_train
:,
-
1
]
# ----------------------------------------
# classification-based anomaly detection
# tutorial slides, page 49 - 59
# use linear regression model for detection
# ----------------------------------------
# step 1. choose a classification model (linear regression)
model
=
linear_model
.
LinearRegression
()
# step 2. train the model using examples
model
.
fit
(
sample_train
,
label_train
)
# step 3. apply model to predict whether an example is normal or anomaly
label_pred
=
model
.
predict
(
sample_test
)
# we can treat above output as anomalous score, and get AUC score from it
auc_score
=
roc_auc_score
(
label_test
,
label_pred
)
# because this is a regression model, we need to threshold its output to get detection error and f1-score
threshold
=
0.4
label_pred
[
label_pred
<=
threshold
]
=
0
label_pred
[
label_pred
>
threshold
]
=
1
err
=
1
-
accuracy_score
(
label_test
,
label_pred
)
f1score
=
f1_score
(
label_test
,
label_pred
)
# step 4. print results
print
(
'
\n
Classification-based Approach (Linear Regression Model)'
)
print
(
'Detection Error = %.4f'
%
err
)
print
(
'F1 Score = %.4f'
%
f1score
)
print
(
'AUC Score = %.4f'
%
auc_score
)
# -----------
# Assignment
# -----------
# play with different threshold values (line 37), what do you observe?
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment