Hands-On Unsupervised Learning with Python
上QQ阅读APP看书,第一时间看更新

A trade-off between homogeneity and completeness using the V-measure

The reader who's familiar with supervised learning should know the concept of F-score (or F-measure), which is the harmonic mean of precision and recall. The same kind of trade-off can be employed also when evaluating clustering results given the ground truth.

In fact, in many cases, it's helpful to have a single measure that takes into account both homogeneity and completeness. Such a result can be easily achieved using the V-measure (or V-score), which is defined as:

For the Breast Cancer Wisconsin dataset, the V-measure is as follows:

from sklearn.metrics import v_measure_score

print('V-Score: {}'.format(v_measure_score(kmdff['diagnosis'], kmdff['prediction'])))

The output of the previous snippet is as follows:

V-Score: 0.46479332792160793

As expected, the V-Score is an average measure that, in this case, is negatively influenced by a lower homogeneity. Of course, this index doesn't provide any different information, hence it's helpful only to synthesize completeness and homogeneity in a single value. However, with a few simple but tedious mathematical manipulations, it's possible to prove that the V-measure is also symmetric (that is, V(Ypred|Vtrue) = V(Ytrue|Ypred)); therefore, given two independent assignments Y1 and Y2, V(Y1|Y2) it is a measure of agreement between them. Such a scenario is not extremely common, because other measures can achieve a better result. However, such a score could be employed, for example, to check whether two algorithms (possibly based on different strategies) tend to produce the same assignments or if they are discordant. In the latter case, even if the ground truth is unknown, the data scientist can understand that one strategy is surely not as effective as the other one and start an exploration process in order to find out the optimal clustering algorithm.