Item response theory starts with the analysis of test data to fit item response functions for each item (or each response, in the case of polytomous models). These follow a logistic curve, like you see in the examples here. The x-axis is examinee ability or knowledge and the y-axis is the probability of correct response; we expect the top examinees to have near 100% probability but the lower students to have a much lower probability (e.g., 25% if there is guessing on a 4-option multiple choice item).
Items can differ in the shape of this curve, which defines their difficulty, discrimination, and guessing parameters. This allows psychometricians and test developers to truly understand how the item operates – and if it is not operating well, how we can improve it.
Moreover, these IRFs are then used in solving deeper questions in assessment. How do we design multiple test forms that are truly equivalent? How can we link this year’s scores to last year’s to track growth? How can we personalize the assessment for each student?