模型评估
→ 返回机器学习
评估要回答两件事:(1) 用什么指标 能对齐业务代价;(2) 用什么数据切分 才能估计「上线后」表现,并避免 特征工程 中的信息泄漏与测试集反复调参导致的 间接泄漏。
留出集与 K 折交叉验证
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold, StratifiedKFold, cross_val_score
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(max_iter=200)
print("5-fold acc:", cross_val_score(clf, X, y, cv=5, scoring="accuracy"))
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
print("stratified:", cross_val_score(clf, X, y, cv=skf).mean().round(3))分层 K 折:每折中类别比例接近全集,适合 不均衡分类。
时间序列切分
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import TimeSeriesSplit
X = np.arange(20).reshape(-1, 1) * 1.0
y = np.cumsum(np.random.randn(20))
tscv = TimeSeriesSplit(n_splits=4)
model = Ridge(alpha=1.0)
for i, (train_idx, test_idx) in enumerate(tscv.split(X)):
model.fit(X[train_idx], y[train_idx])
pred = model.predict(X[test_idx])
mae = np.mean(np.abs(pred - y[test_idx]))
print(f"fold {i} MAE:", round(mae, 3))训练集时间应 始终早于 验证/测试,避免「用未来解释过去」。
分类:混淆矩阵与报告
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.25, stratify=y, random_state=42)
clf = LogisticRegression(max_iter=500).fit(X_tr, y_tr)
y_pred = clf.predict(X_te)
print(confusion_matrix(y_te, y_pred))
print(classification_report(y_te, y_pred, digits=3))ROC-AUC 与 PR-AUC(不均衡时多看 PR)
from sklearn.metrics import (
PrecisionRecallDisplay,
RocCurveDisplay,
average_precision_score,
roc_auc_score,
)
y_score = clf.predict_proba(X_te)[:, 1]
print("ROC-AUC:", roc_auc_score(y_te, y_score).round(3))
print("PR-AUC (AP):", average_precision_score(y_te, y_score).round(3))
# RocCurveDisplay.from_predictions(y_te, y_score); plt.show()
# PrecisionRecallDisplay.from_predictions(y_te, y_score); plt.show()正类极稀少时,准确率高但 PR 很差 很常见;业务上常更关心 召回 / 精确率 与 PR 曲线下面积。
回归:MAE / RMSE / R²
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
X, y = fetch_california_housing(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=0)
reg = HistGradientBoostingRegressor(random_state=0).fit(X_tr, y_tr)
pred = reg.predict(X_te)
print("MAE:", mean_absolute_error(y_te, pred).round(3))
print("RMSE:", np.sqrt(mean_squared_error(y_te, pred)).round(3))
print("R2:", r2_score(y_te, pred).round(3))sklearn >= 1.4 可直接:from sklearn.metrics import root_mean_squared_error。
概率校准(可靠性)
from sklearn.calibration import calibration_curve
prob = clf.predict_proba(X_te)[:, 1]
prob_true, prob_pred = calibration_curve(y_te, prob, n_bins=10)
# 若曲线偏离对角线,可用 CalibratedClassifierCV 或换模型不均衡分类补充指标
from sklearn.metrics import balanced_accuracy_score, matthews_corrcoef, f1_score
y_hat = clf.predict(X_te)
print("balanced acc:", balanced_accuracy_score(y_te, y_hat).round(3))
print("MCC:", matthews_corrcoef(y_te, y_hat).round(3))
print("macro-F1:", f1_score(y_te, y_hat, average="macro").round(3))(需与上文 clf / X_te / y_te 同一段脚本;或重新 fit 后再算。)
多模型对比时注意
| 要点 | 说明 |
|---|---|
| 同一折 / 同一测试集 | 才可公平比分数 |
| 置信区间 | bootstrap 或多次重复 CV,避免神话 0.1% 提升 |
| 代价敏感 | 混淆 FN/FP 不对称时,指标权重应和业务一起定 |