Psychometrics · AI · Measurement Science

Measurement science for fair, AI-driven assessment.

Senior psychometrician and AI research scientist bridging classical measurement theory and production AI. I design, validate, and lead large-scale assessment systems and bring psychometric rigour to LLMs and intelligent learning systems.

What I do Contact me →

// based in the United Kingdom · working across 5 continents

IRT Item Explorer

shiny

Discrimination (a) 1.2

How sharply the item separates ability levels

Difficulty (b) 0.0

Ability level where P(correct) = 0.5

Guessing (c) 0.20

Lower asymptote (pseudo-guessing)

P(θ) = 0.20 + 0.80 / (1 + e^{−1.2(θ − 0.0)})

About

Turning technical rigour into trusted decisions, at scale.

I am a senior psychometrician and AI research scientist with over fifteen years designing, validating, and leading large-scale assessment systems across national and international contexts. My work spans deep theory (item response theory, computerised adaptive testing, Rasch modelling, fairness analysis) and the leadership of teams and programmes that put it into practice.

Increasingly, that work sits where measurement science meets artificial intelligence: building and validating LLM-assisted assessment workflows, evaluating AI scoring against psychometric standards, and bringing governance and rigour to AI in education and psychology.

I hold five concurrent advisory roles with leading institutions and have built trusted relationships with national test centres, ministries, and development partners across Europe, Central Asia, East Africa, and the Caribbean. I care about assessment that is not only reliable, but fair across languages, cultures, and populations.

Expertise

Where psychometrics meets production AI.

Six areas where I bring measurement science, AI/ML, and scientific leadership together.

AI-Enabled Assessment and Skills Inference

Designing assessment and scoring systems that infer skills and abilities from rich response data, combining IRT-based modelling with machine-learning pipelines.

automated scoringskills inferenceML pipelines

LLMs and Intelligent Learning Systems

LLM-assisted item generation, review and classification; prompt engineering for psychometric tasks; integrating language models into research workflows via Python APIs.

LLM workflowsitem generationOpenAI / Anthropic

Validation Methodologies and Measurement Science

The full measurement toolkit: IRT, MIRT, Rasch, CAT and multistage testing, DIF and fairness analysis, equating, and standard setting for high-stakes assessment.

IRT / MIRTCAT / MSTDIF and fairnessequating

Production-Scale AI Systems and Model Governance

Quality assurance and validation of AI/LLM scoring pipelines, bias detection, and critical evaluation of model outputs against measurement and fairness standards.

model governancebias detectionQA and validation

Scientific Leadership Across Product and Engineering Teams

Leading psychometric teams and coordinating across product, engineering, and research. Translating complex methodology into shared roadmaps and actionable evidence.

team leadershipcross-functionalresearch strategy

AI/ML, Psychometrics, LLMs and Large-Scale Production Systems

Bringing it together: large-scale reproducible production systems pairing AI/ML and LLMs with psychometric validity, built on R, Python, SQL, and BigQuery.

large-scale dataR · Python · SQLBigQueryreproducible

Experience

A career across research, industry and academia.

Dec 2021 to PresentUnited Kingdom

Senior Psychometrician

National Foundation for Educational Research (NFER)

Lead a team of psychometric professionals, coordinating across interdisciplinary teams to deliver large-scale assessment programmes.
Integrate AI and machine learning techniques to improve score accuracy, detect bias, and ensure fairness across assessments.
Develop and refine psychometric models using IRT, Rasch analysis, and automated testing algorithms; translate complex data into actionable reports.
Lead international capacity-building work across Central Asia, East Africa, and the Caribbean.

Mar 2021 to Nov 2021Germany

Research Associate

Leibniz Institute for Educational Trajectories (LIfBi)

Managed preprocessing and transformation of large-scale computer-based survey data for advanced psychometric analysis.
Conducted reliability and validity assessments for competence tests; adapted computerised tests for scaling and standardisation.

Aug 2017 to Mar 2021Türkiye

Assistant Professor

Trakya University — Educational Measurement and Evaluation

Taught measurement and evaluation, statistics, and R programming at undergraduate and graduate levels.
Supervised a Computer Adaptive Testing lab focused on personalised assessment and advanced testing research.
Conducted annual psychometric validation of international student selection tests.

2015 to 2016United States

Visiting Scholar

UNC Greensboro, Educational Research Methodology · TÜBİTAK Fellow

Conducted pioneering research in Multidimensional IRT for computerised adaptive testing.
Ran large-scale MIRT-CAT simulation studies; analysed differential item functioning in intelligence scales; calibrated item banks with incomplete data.

2013 to 2016Türkiye

Psychometrician

Hacettepe University, Monitoring and Evaluation Office

Designed psychometric analyses for clinical outcome assessments and international selection examinations.
Managed item banks for summative medical examinations; produced calibration and reliability reports.

Advisory and Affiliations

Five concurrent advisory roles.

An unusually broad network of institutional trust across assessment, research, and open science.

Scientific Advisory · 2025

UCL Centre for Longitudinal Studies

Psychometric and methodological advice on cohort and longitudinal research design, measurement quality, and comparability.

Psychometric Advisor · 2023

Trinity College London

Advisory on Secure English Language Tests and item-bank calibration for CEFR-aligned, UKVI-approved high-stakes assessments.

Advisor Psychometrician · 2025

Council of Europe

Expert guidance on CEFR-aligned assessment frameworks for a foreign-language education quality programme.

International Advisory · 2023

Turkic States and East Africa

Member of international psychometric advisory teams for national examination bodies across Turkic-speaking states; assessment capacity building and Training-of-Trainers across East Africa.

Contributor · 2025

R Core Development Team

Contributions to the R open-source ecosystem: issue resolution, documentation, and broadening participation in statistical computing.

Global Reach

Measurement work across five continents.

National test centres, ministries, and development partners trust me to build assessment capacity and validate high-stakes systems wherever rigorous measurement is needed.

United Kingdom and Europe

National reference testing, longitudinal studies, and CEFR-aligned competence assessment.

United KingdomGermany

Türkiye

University selection, scale development, and psychometric training for national programmes.

Türkiye

Central Asia and Caucasus

National testing systems, standard setting, and training for psychometricians and item writers.

KazakhstanAzerbaijan

East Africa

Foundations for national large-scale assessment and sustainable in-country psychometric capacity.

KenyaZanzibar

Caribbean

Psychometric design and implementation for a national student assessment system.

Belize

Oceania

Psychometric strategy and technical reporting for high-stakes national achievement testing.

Australasia

Selected Research

Publications and books.

A selection from peer-reviewed journals, books, and methodological work in measurement.

2025Book chapter · IGI Global

Learning Theories in Psychology: From Philosophical Roots to Educational Measurement

Özberk, E. H. in Exploring Adult Education Through Learning Theory.

View

2023Early Childhood Education Journal

Number Sense Across the Transition from Preschool to Elementary School: A Latent Profile Analysis

Gözüm, Özberk, Kaya and Uyanık Aktulun.

DOI

2021Int. J. Assessment Tools in Education

Investigating Invariant Item Ordering in Intelligence Tests: Mokken Scale Analysis of KBIT-2

Özberk, Ünsal Özberk, Uluç and Öktem.

DOI

2019Book · PegemA

Data Analysis and Psychometry Applications with R

Atar, Atalay Kabasakal, Ünsal Özberk, Özberk and Kıbrıslıoğlu Uysal.

DOI

2018Applied Psychological Measurement

Software Review of flexMIRT Version 3.5

Cavanaugh, Heiser, Hoeve, Özberk, Patton, Sessoms, Stockdale, Ünsal-Özberk and Wood.

DOI

2017J. Measurement and Evaluation in Education and Psychology

A Comparison of Multidimensional Item Selection Methods in Simple and Complex Test Designs

Özberk and Gelbal.

DOI

Full publication list on ORCID and Google Scholar.