MS Learn - Train and evaluate classification models 2 - Evaluate Classification Model & Pipeline & Other Algorithm

April 13, 2025

Thumbnail

Confusion Matrix, Common Classification MetricsPermalink

Confusion Matrix는 다음과 같은 표를 의미한다.

TP, True Positive: 실제 True이고 모델이 True로 예측
FN, False Negative: 실제 True인데 모델이 False로 예측
FP, False Positive: 실제 False인데 모델이 True로 예측
TN, True Negative: 실제 False이고 모델이 False로 예측

우리는 여기서 흔히 쓰이는 4가지 지표들을 확인할 수 있다.

Accuracy, 정확도

앞선 글에서 봤듯이 전체 예측중에 실제 맞은 예측의 확률이다.

Accuracy=\dfrac{TP+TF}{TP+FN+FP+TN}

Recall, 재현률, 민감도

실제 True인 것 중 모델이 얼마나 맞게 예측했는지의 확률이다.

Recall=\dfrac{TP}{TP+FN}

Precision, 정밀도

모델이 True라고 예측한것 중 진짜 True는 얼마나 되는지에 대한 확률이다.

Precision=\dfrac{TP}{TP+FP}

F-1 Score

Recall, Precision의 조화평균이며 두 가지를 모두 종합적으로 평가할 수 있는 지표이다.

F_1=2 \times \dfrac{Precision \cdot Recall}{Precision+Recall}

ROC Curve, AUCPermalink

Received Operator Characteristic(ROC) Chart도 분류 모델의 중요한 성능 확인 지표이다.

X축은 False Positive Rate(FPR, 거짓 양성 비율= $\dfrac{FP}{FP+TN}$ )이고 Y축은 TPR(Recall)이다.

ROC Curve의 아래 부분의 면적을 AUC(Area Under Curve)라 하며 이것이 1에 가까울수록 좋은 모델이다.

ROC Curve 자체는 위로 솟아있을수록 좋은 모델이다.

ExercisePermalink

이제 Accuracy뿐 아닌 여러 metrics으로 모델을 평가해보자.

가장 단순한 방법은 sklearn.metrics의 classification_report함수를 이용해 전체적인 값들을 살펴보는것이다.

from sklearn. metrics import classification_report

print(classification_report(y_test, predictions))

0, 1가 index인 값들은 각각 분류에 대해서 계산된 값이다. 예를 들어 당뇨(1)인 데이터에 대한 recall은 $0.60$ 으로 낮은 수치이다.
Accuracy: $0.79$ 이다.
Support: 표본의 개수이다.
Macro Avg: 단순히 $0$ , $1$ 의 score들의 평균이다.
Weighted Avg: Support(샘플)수를 고려한 모든 분류값들의 score들의 평균이다. 샘플개수로 가중치가 들어간다.

이런 값들은 sklearn.metrics의 precision_score나 recall_score로 직접 얻어올 수도 있다.

이 함수들은 기본적으로 데이터를 Binary Classification Model의 결과라고 가정하며 $1$ 에 대해서 값을 매겨준다.

from sklearn.metrics import precision_score, recall_score

print("Overall Precision:", precision_score(y_test, predictions))
print("Overall Recall:", recall_score(y_test, predictions))

Overall Precision: 0.7242472266244057
Overall Recall: 0.6036988110964333

Confusion Matrix도 얻어올 수 있다. 마찬가지로 sklearn.metrics의 confusion_matrix를 사용하면 된다.

from sklearn.metrics import confusion_matrix

# Print the confusion matrix
cm = confusion_matrix(y_test, predictions)
print (cm)

우린 지금까지

$\hat{y}=\begin{cases} 1 \ \text{if}~P(\hat{y}) \ge 0.5 \\ 0 \ \text{else} \end{cases}$ 처럼 계산을 했는데 $P(\hat{y})$ 값을 그대로 가져오려면 predict가 아닌 predict_proba함수를 쓰면 된다.

y_scores = model.predict_proba(X_test)
print(y_scores)

[[0.81657749 0.18342251]
 [0.96303915 0.03696085]
 [0.80873957 0.19126043]
 ...
 [0.60693276 0.39306724]
 [0.1065467  0.8934533 ]
 [0.63858497 0.36141503]]

ROC 커브도 그려보자.

sklearn.metrics의 roc_curve함수를 이용해 fpr, tpr, thresholds를 얻어올 수 있고 matplot.pyplot을 이용해 차트를 그려준다.

from sklearn.metrics import roc_curve
from sklearn.metrics import confusion_matrix
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])

# plot ROC curve
fig = plt.figure(figsize=(6, 6))
# Plot the diagonal 50% line
plt.plot([0, 1], [0, 1], 'k--')
# Plot the FPR and TPR achieved by our model
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()

AUC도 구해줄 수 있다.

from sklearn.metrics import roc_auc_score

auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))

Pipeline을 이용해 Feature를 전처리하기Permalink

지금도 결과가 나쁘지 않지만 좀 더 개선해보자.

그러기 위해서 우리는 데이터를 전처리해야한다.

DataFrame에서 0~6 인덱스의 Column들은 StandardScaler를 이용해 Numeric Data들을 평균 0, 표준편차 1의 형태로 변경해준다.

7 인덱스의 Column은 Categorical Feature이므로 One-Hot Encoding을 이용해 데이터를 변경해주자.

이러한 코드들을 sklearn에서 제공하는 Pipeline을 이용해 재사용성 높게 구현할 수 있다.

StandardScaler, OneHotEncoder는 각각 위에서 설명한 전처리과정들이고 ColumnTransformer는 Feature들에 각각 어떤 전처리를 처리할지 결정하는것이다.

그리고 마지막 Pipeline에는 우리의 전처리기와 이전에 사용하던 LogisticRegression Estimator를 넣어준다.

# Train the model
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
import numpy as np

# Define preprocessing for numeric columns (normalize them so they're on the same scale)
numeric_features = [0,1,2,3,4,5,6]
numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())])

# Define preprocessing for categorical features (encode the Age column)
categorical_features = [7]
categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# Combine preprocessing steps
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# Create preprocessing and training pipeline
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('logregressor', LogisticRegression(C=1/reg, solver="liblinear"))])


# fit the pipeline to train a logistic regression model on the training set
model = pipeline.fit(X_train, (y_train))
print (model)

이렇게 학습시킨 모델로 성능을 측정하면

Confusion Matrix:
 [[2667  319]
 [ 406 1108]] 

Accuracy: 0.8388888888888889
Overall Precision: 0.7764540995094604
Overall Recall: 0.7318361955085865
AUC: 0.9202436115345857

상당히 높은 정확도가 나옴을 알 수 있다.

다른 알고리즘 사용하기Permalink

분류학습은 Logistic Regression 말고도 Support Vector Machine이나 의사결정트리나 여러 모델을 합치는 Ensemble 테크닉을 쓰든가 할 수 있다.

Random Forest라는 여러 의사결정 트리를 사용하는 알고리즘을 사용해보자.

sklearn.ensemble의 RandomForestClassifier 클래스를 사용하면 된다.

from sklearn.ensemble import RandomForestClassifier

# Create preprocessing and training pipeline
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('logregressor', RandomForestClassifier(n_estimators=100))])

# fit the pipeline to train a random forest model on the training set
model = pipeline.fit(X_train, (y_train))
print (model)

Confusion Matrix:
 [[2850  136]
 [ 182 1332]] 

Accuracy: 0.9293333333333333
Overall Precision: 0.9073569482288828
Overall Recall: 0.8797886393659181

AUC: 0.98146657099047

더 높은 정확도를 얻을 수 있다.

모델을 Pickle로 저장해서 다시 쓰기Permalink

import joblib

# Save the model as a pickle file
filename = './diabetes_model.pkl'
joblib.dump(model, filename)

# Load the model from the file
model = joblib.load(filename)

# predict on a new sample
# The model accepts an array of feature arrays (so you can predict the classes of multiple patients in a single call)
# We'll create an array with a single array of features, representing one patient
X_new = np.array([[2,180,74,24,21,23.9091702,1.488172308,22]])
print ('New sample: {}'.format(list(X_new[0])))

# Get a prediction
pred = model.predict(X_new)

# The model returns an array of predictions - one for each set of features submitted
# In our case, we only submitted one patient, so our prediction is the first one in the resulting array.
print('Predicted class is {}'.format(pred[0]))

MS Learn - Train and evaluate classification models 2 - Evaluate Classification Model & Pipeline & Other Algorithm

Confusion Matrix, Common Classification MetricsPermalink

ROC Curve, AUCPermalink

ExercisePermalink

Pipeline을 이용해 Feature를 전처리하기Permalink

다른 알고리즘 사용하기Permalink

모델을 Pickle로 저장해서 다시 쓰기Permalink

Comments

You may also enjoy

iOS Architecture에 대한 고찰과 회고

Git Worktree

Regex for programmers, comprehensive guide

Contribute to folke/snacks.nvim