Science Journal of Education

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Validity Criteria of a Standardized Test as an Opportunity for Efficient Assessment Created from the Teleological Perspective of Incremental Learning

This study aims to comprehensively present ways to determine the validity of a standardized test, highlighting essential criteria that can be widely used. Incremental learning, as conceived by Minsky, represents the construction of knowledge focused on the multitude of interactions between the cognitive components (which build inferences in a decision-making process) or metacognitive components (which support the solving and decision-making process); it requires a special type of feedback (motivational, affective, behavioral, cognitive) through a standardized assessment. Evaluation through standardized tools has unique requirements that are imposed to be able to provide feedback as eloquently and correctly as possible at the same time. One of the particular mandatory criteria is testing the validity of the test, with all its components: internal validity, external validity, content validity (highlighted by the value of the content validity coefficient and the value of the concordance coefficient), criterion validity (with its components competitive validity and predictive validity), and construct validity (which involves both a theoretical and an empirical approach). The examples built in this work illustrate how to calculate the concordance coefficient, Kendall's coefficient, respectively Cohen's coefficient for a set of results obtained by a group of students who two or more evaluators evaluated.

Incremental Learning, Content Validity Coefficient, Inter-Evaluator Agreement Coefficient, Concordance Coefficient, Kendall Coefficient, Κ Cohen Coefficient

Geanina Havârneanu. (2023). Validity Criteria of a Standardized Test as an Opportunity for Efficient Assessment Created from the Teleological Perspective of Incremental Learning. Science Journal of Education, 11(4), 142-149.

Copyright © 2023 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Anastasi, A. (1976). Psychological testing. New York, U. S. A.: Mac Millian Publish, Co., Inc.
2. Anderson, L. W., Krathwohl, D. R. (2001). A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational Objectives. New York, U. S. A.: Addison Wesley Longman, Inc.
3. Bloom, B. S., Krathwohl. D. R. (1956). Taxonomy of educational objectives: The classification of educational goals, by a committee of college and university examiners. Handbook 1: Cognitive domain. New York, U. S. A.: Longmans.
4. Campbell D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin. 54 (4): 297–312.
5. Campbell, D. T., Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin. 56 (2): 81–105.
6. Carpenter, G. A., Grossberg, S., Rosen, D. B. (1991). Fuzzy Art: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks. 4 (6): 759-771.
7. Cohen, J. (1968). Weighed kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin. 70 (4): 213–220.
8. Crocker, L. Algina, J. (1986). Introduction to classical and modern test theory. Orlando, U. S. A.; Holt, Rinehart and Winston Inc.
9. Cronbach, L. J. (1971). Validation test. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.). Washington, DC, U. S. A.; American Council on Education.
10. Davis, G. A. (2003). Prompting middle school science students for reflection: Generic and directed prompts. The Journal of the Learning Sciences. 12.
11. Evans, J. D. (1985). Invitation to psychological research. U. S. A.; New York. CBS College Publishing. 74-82: 232-254.
12. Fielding, N., Fielding, J. (1986). Linking Data: Qualitative Research Methods. London, U. K.; Sage. 4.
13. Gall, M. D., Gall, J. P., Borg. W. R. (2007). Educational research: an introduction. The 8th Edition Pearson Education, Inc. Boston, U. S. A.; 192-227.
14. Gepperth, A., Hammer, B. (2016). Incremental learning algorithms and applications. European Symposium on Artificial Neural Networks (ESANN). Bruges, Belgium.
15. Gilles, J. -L. (2002). Spectral quality of standardized university tests – Development of edumetric indices for the analysis of the spectral quality of evaluations of university student achievements and application to the MOHICAN checkup ’99 tests (PhD thesis in education sciences) (Qualité spectrale des tests standardisés universitaires – Mise au point d’indices édumétriques d’analyse de la qualité spectrale des évaluations des acquis des étudiants universitaires et application aux épreuves MOHICAN checkup ’99 (Thèse de doctorat en sciences de l'éducation)). Université de Liège, Liège, Belgique.
16. Gilles, J. -L., Detroz, P., Crahay, V., Tinnirello, V., Bonet, P. (2011). The ExAMS platform, an "assessment management system" to instrument the construction and quality management of learning assessments. In Blais, Jean-Guy (Ed.) Evaluation of learning and information and communication technology (La plateforme ExAMS, un "assessment management system" pour instrumenter la construction et la gestion qualité des évaluations des apprentissages. In Blais, Jean-Guy (Ed.) Evaluation des apprentissages et technologie de l'information et de la communication). Québec, Canada: Presses de l'Université Laval. 2: 11-40.
17. Gliner, J. A., Morgan G. A. (2000) Research Methods in Applied Settings: An Integrated Approach to Design and Analysis. New Jersey, U. S. A.; Lawrence Erlbaum Associates.
18. Hartmann, E. (1923). Category theory (Kategorienlehre). Edited by Fritz Kern Philosophical Library 72 a/b/c. XXXII. 978-3-7873-2894-9.
19. Husserl, E. (1950). Guiding ideas for a pure phenomenology and a phenomenological philosophy (Ideen zu einer reinen Phänomenologie und phänomenologischen Philosophie). Walter Biemel. Husserl Gesámmelte Werke. Germania: Kluwer Academic Publishers.
20. Kane, M. (2006). Content-related validity evidence in test development. In S. M. Downing, T. M. Haladyna (Eds.). Handbook of test development. Mahwah, New Jersey, U. S. A.; Lawrence Erlbaum Associates. 131-153.
21. Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika. Oxford, U. K.; Oxford University Press. 30 (1/2): 81-93.
22. Lamata M. T., Pelaez J. I. (2002). A method for improving the consistency of judgments. Int. J. Uncertain. Fuzziness. 10: 667–686.
23. Landis, J. R., Koch, G. G. (1977). An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers. Biometrics. 33 (2): 363-374.
24. Laugier, H, Piéron, H, Toulouse, É, Weinberg, D. (1934). Docimological studies on the improvement of exams and competitions. (Etudes docimologiques sur l'amélioration des examens et concours). Paris, France: Conservatoire National des Arts et Métiers.
25. Leclercq, D. (1993). Validity, Reliability, and Acuity of Self-Assessment in Educational Testing. In: Leclercq, D. A., Bruno, J. E. (eds) Item Banking: Interactive Testing and Self-Assessment. NATO ASI Series. 112. Berlin, Germany: Heidelberg. Springer.
26. Legendre, P. (2010). Coefficient of Concordance. Encyclopedia of Research Design. New Jersey: Salkind ed. SAGE Publications Inc. 1: 164-169.
27. Messick, S. (1989). Validity. In R. Linn (Ed. ), Educational Measurement. Washington, DC, U. S. A: Amercian Council on Education and Macmillan: 13-103.
28. Minsky, M., Papert, S. (1972). Progress Report on Artificial Intelligence. AI Memo. 252. MIT Artificial Intelligence Laboratory. Cambridge, Massachusetts, U. S. A.
29. Mislevy, R. J., Almond, R. G., Lukas, J. F. (2003). A brief introduction to evidence-centered design (Research Report 03-16). New Jersey, U. S. A.; Princeton: Educational Testing Service.
30. Mislevy, R. J., Haertel, G. (2006). Implications of evidence-centered design for educational testing. Menlo Park, CA, U. S. A.; SRI International.
31. Mislevy, R. J., Steinberg, L. S., Almond, R. G., Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing. Mahwah, New Jersey, U. S. A.; Erlbaum. 15–48.
32. Muchielli, A. (2002). (coord.). Dictionary of Qualitative Methods in the Humanities and Social Sciences (Dicționar al metodelor calitative în științele umane și sociale). Iasi, Romania: Editura Polirom.
33. Nelsen, R. B. (2001) [1994]. Kendall tau metric. Encyclopedia of Mathematics. Helsinki, Finland: EMS Press.
34. Newell, A., Simon, H. A. (1972). Human Problem Solving. New Jersey, U. S. A.; Prentice-Hall.
35. Rovinelli, R. J., Hambleton, R. K. (1977) On the Use of Content Specialists in the Assessment of Criterion-Referenced Test Item Validity. Tijdschrift Voor Onderwijs Research. 2: 49-60.
36. Rupp, A. A., Gushta, M., Mislevy, R. J., Shaffer, D W. (2010). Evidence-centered Design of Epistemic Games: Measurement Principles for Complex Learning Environments. JTLA. 8 (4).
37. Sandy, S. (1980). The Hawthorne effect. (1st ed.) Lawrence, Kansas, U. S. A.: Tansy Press.
38. Schank, R. (1972). Conceptual Dependency: A Theory of Natural Language Understanding. Cognitive Psychology. 3: 552-631.
39. Shute, V. J., Masduki, I., Donmez, O. (2010). Conceptual Framework for Modeling, Assessing and Supporting Competencies within Game Environments. Technology Instruction Cognition and Learning. 8 (2): 137–161.
40. Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: the ESL research and its implications. TESOL Quarterly. 27: 665- 677.
41. Stan, A. (2002). Testul psihologic. Evolutie, constructie, aplicatii. Iași, România: Editura Polirom.
42. Standards for Educational and Psychological Testing. (2013). Washington, DC, U. S. A.: American Educational Research Association.
43. Streefkerk, R. (2022). Internal vs. External Validity. Understanding Differences and Threats. Scribbr. Retrieved April 17, 2023, from
44. Székely, L. (1950). Productive processes in learning and thinking. Acta Psychologica. 7: 388–407.
45. Turner, R. Carlson, L. A. (2003). Indexes of Item-Objective Congruence for Multidimensional Items. International Journal of Testing. 3 (2): 163-171.