The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy.
For highly imbalanced class problem, precision and recall is much better perf metric than accuracy.
example if highly imbalanced class problem:
1. cancer rate
2. insurance fraud: 150 / 9850
3. Information retrieval
Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents, while precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.
Accuracy Paradox
P, N Total: 10000
T 100 50
F 150 9700
Accuracy = 9800/10000 = 98%
Precision = 100 / 250 = 0.4
Recall = 100 / 150 = 0.667
F1 = 2 * 0.4 * 0.667 / 1.067 = 0.57
P, N
T 0 150
F 0 9850
Accuracy = 9850/10000 = 98.5%
Precision = 0 / 0 = 0
Recall = 0 / 150 = 0
F1 = 2 * 0 * 0 / 0 = 0
P, N Total: 10000
T 150 0
F 0 9850
Precision = 150 / 150 = 1
Recall = 150 / 150 = 1
F1 = 2 * 1 * 1 / (1+1) = 1
F1 = 2 * precision * recall / (precision + recall)
Accuracy:
Weighted Accuracy:
Lift: typical application is marketing
Precision/Recall: document retrieval
ROC Area: medicine & biology, false positive & false negative
The measure you optimize makes a difference
The measure you report makes a difference
Use the measure appropritate for problme/community
accuracy often is not sufficient/appropriate
Only accuracy generalizes to >2 classes
Confision Matrix
Pred True, Pred False
True True Positive, False Negative
False False Positive, True Negative
Precision, How many of the returned doc are correct = TP / (TP + FP)
Recall, how many of the correct doc does the model return = TP / (TP + FN)
True: 100
False: 10000
TP: 90, FN: 10
FP: 10, TN: 9990
Precision = 90 / 100 = 0.9
Recall = 90 / 100 = 0.9
F = 2 * 0.9 * 0.9 / (1.8) = 0.9
The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets
No comments:
Post a Comment