Bland and Altman[15] expanded this idea by graphically showing the difference in each point, the average difference and the limits of vertical match with the average of the two horizontal ratings. The resulting Bland-Altman plot shows not only the general degree of compliance, but also whether the agreement is related to the underlying value of the article. For example, two advisors could closely match the estimate of the size of small objects, but could disagree on larger objects. Concordance Reliability among observers Interrater agreement; The reliability of the scores Agreement and kappa statistics were also used to assess the reliability of Inter-Rater services at the study sites for the eight simulated patient diagrams. Data extractions from diagrams simulated by site abstractions were verified with data absolved by the experienced graphic abstractionist who served as the gold standard. Sensitivity and specificity estimates were calculated to compare the results of all raters with the gold standard. While the correlation analyses used (mostly pearson correlations) provide information on the strength of the relationship between two groups of values, they do not cover the agreement between the elderly at all (Bland and Altman, 2003). Kottner et al., 2011). Nevertheless, allegations of inter-rated agreement are often drawn from correlation analyses (see z.B. Bishop and Baird, 2001; Janus, 2001; Van Noord and Prevatt, 2002; Norbury et al., 2004; Bishop et al., 2006; Massa et al., 2008; Gudmundsson and Gretarsson, 2009.) The error of such conclusions is easy to detect: a perfect linear correlation can be obtained if one group of advisors systematically distinguishes itself (by an almost consistent amount) from another, although there is not a single absolute match. On the other hand, an agreement is reached only if points are on the line (or within a field) of the equality of the two ratings (Bland and Altman, 1986; Liao et al., 2010).

Therefore, correlation-based analyses do not measure the match between rats and are not sufficient to briefly assess reliability between boards. As Stemler (2004) pointed out, reliability is not a uniform approach and cannot be covered solely by correlations. To show how the three concepts intertwine the reliability of rats, expressed here as intraclass correlation coefficients (ICC, see Liao et al., 2010; Kottner et al., 2011), agreement (sometimes called consensus), see z.B Stemler, 2004) and correlation (here: Pearson correlations) complement each other in assessing the consistency of ratings.