Statistical measure for the degree of agreement among multiple raters for their classifications of items. In PDD, it provides a metric for the reliability of decisions among different scorers interpreting the same test charts and is the preferred method for gauging inter-rater agreement. See: Fleiss (1971).