Senior psychometrician and AI research scientist bridging classical measurement theory and production AI. I design, validate, and lead large-scale assessment systems and bring psychometric rigour to LLMs and intelligent learning systems.

I am a senior psychometrician and AI research scientist with over fifteen years designing, validating, and leading large-scale assessment systems across national and international contexts. My work spans deep theory (item response theory, computerised adaptive testing, Rasch modelling, fairness analysis) and the leadership of teams and programmes that put it into practice.
Increasingly, that work sits where measurement science meets artificial intelligence: building and validating LLM-assisted assessment workflows, evaluating AI scoring against psychometric standards, and bringing governance and rigour to AI in education and psychology.
I hold six concurrent advisory roles with leading institutions and have built trusted relationships with national test centres, ministries, and development partners across Europe, Central Asia, East Africa, and the Caribbean. I care about assessment that is not only reliable, but fair across languages, cultures, and populations.
Six areas where I bring measurement science, AI/ML, and scientific leadership together.
Designing assessment and scoring systems that infer skills and abilities from rich response data, combining IRT-based modelling with machine-learning pipelines.
LLM-assisted item generation, review and classification; prompt engineering for psychometric tasks; integrating language models into research workflows via Python APIs.
The full measurement toolkit: IRT, MIRT, Rasch, CAT and multistage testing, DIF and fairness analysis, equating, and standard setting for high-stakes assessment.
Quality assurance and validation of AI/LLM scoring pipelines, bias detection, and critical evaluation of model outputs against measurement and fairness standards.
Leading psychometric teams and coordinating across product, engineering, and research. Translating complex methodology into shared roadmaps and actionable evidence.
Bringing it together: large-scale reproducible production systems pairing AI/ML and LLMs with psychometric validity, built on R, Python, SQL, and BigQuery.
An unusually broad network of institutional trust across assessment, research, and open science.
Psychometric and methodological advice on cohort and longitudinal research design, measurement quality, and comparability.
Advisory on Secure English Language Tests and item-bank calibration for CEFR-aligned, UKVI-approved high-stakes assessments.
Expert guidance on CEFR-aligned assessment frameworks for a foreign-language education quality programme.
Member of international psychometric advisory teams for national examination bodies across Turkic-speaking states; assessment capacity building and Training-of-Trainers across East Africa.
Contributions to the R open-source ecosystem: issue resolution, documentation, and broadening participation in statistical computing.
National test centres, ministries, and development partners trust me to build assessment capacity and validate high-stakes systems wherever rigorous measurement is needed.
National reference testing, longitudinal studies, and CEFR-aligned competence assessment.
University selection, scale development, and psychometric training for national programmes.
National testing systems, standard setting, and training for psychometricians and item writers.
Foundations for national large-scale assessment and sustainable in-country psychometric capacity.
Psychometric design and implementation for a national student assessment system.
Psychometric strategy and technical reporting for high-stakes national achievement testing.
Hands-on applications for exploring psychometric models, AI-assisted assessment, and data science methods.
A selection from peer-reviewed journals, books, and methodological work in measurement.
Open to Director of Research and scientific leadership roles, advisory partnerships, and consulting on psychometrics, AI assessment governance, and large-scale measurement systems.