Item Response Theory and Stan

Item Response Theory and Stan#

In this lab, you will explore item response theory and Bayesian modelling with the Stan language.

Setup#

First, you need to install Stan.

import numpy as np
import pandas as pd
# Colab setup (courtesy of Justin Bois)
#   N.B. This cell may take several minutes to complete (3 mins on the instructor's machine)
import os, sys, subprocess
cmd = "pip install --upgrade iqplot bebi103 arviz cmdstanpy watermark"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
import cmdstanpy; cmdstanpy.install_cmdstan()

Installing CmdStan version: 2.33.1
Install directory: /root/.cmdstan
Downloading CmdStan version 2.33.1
Download successful, file: /tmp/tmpuxm6lth4
Extracting distribution

DEBUG:cmdstanpy:cmd: make build -j1
cwd: None

Unpacked download as cmdstan-2.33.1
Building version cmdstan-2.33.1, may take several minutes, depending on your system.

DEBUG:cmdstanpy:cmd: make examples/bernoulli/bernoulli
cwd: None

Test model compilation
Installed cmdstan-2.33.1

True

Next, you need to download the data and Stan template here. Save it to your own Google Drive as in previous labs, and then mount your drive.

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Unzip the files into a folder (you will be able to find this folder if you click the folder icon in your left sidebar):

!unzip -qq '/content/drive/MyDrive/irt4ancm.zip'

The following cell prints a list of all of the segments used in the experiment, so that you can find and listen to the results. All of the audio was extracted from the official YouTube videos of the Eurovision Song Contest finals.

Background#

The data in this lab come from the Eurovision Song Contest edition of the Hooked on Music experiment. You can try the experiment (here)[https://hooked.amsterdammusiclab.nl/experiment/eurovision_2021].

segment_df = pd.read_csv('irt4ancm/segment_list.csv')
segment_df = segment_df.set_index('segment')
segment_df

	song	country	year	artist	title	start_position	segment_type
segment
1	1	Ukraine	2016	Jamala	1944	0.000	i
2	1	Ukraine	2016	Jamala	1944	7.925	v
3	1	Ukraine	2016	Jamala	1944	39.500	c
4	1	Ukraine	2016	Jamala	1944	72.043	v
5	1	Ukraine	2016	Jamala	1944	132.559	b
...	...	...	...	...	...	...	...
433	69	Czechia	2019	Lake Malawi	Friend of a Friend	78.128	v
434	70	Denmark	2019	Leonora	Love Is Forever	61.508	v
435	71	Cyprus	2019	Tamta	Replay	66.212	v
436	73	Slovenia	2019	Zala Kralj & Gašper Šantl	Sebi	70.698	v
437	75	Serbia	2019	Nevena Božović	Kruna	106.544	v

437 rows × 7 columns

Lab#

Open the irt4ancm.stan file in the right-hand pane. You will make any adjstments to your model here.

from cmdstanpy import CmdStanModel

Every time you change the model, you will need to recompile it.

model = CmdStanModel(model_name="irt4ancm", stan_file="irt4ancm/irt4ancm.stan")

DEBUG:cmdstanpy:Removing /content/drive/MyDrive/irt4ancm/irt4ancm
10:12:10 - cmdstanpy - INFO - compiling stan file /content/drive/MyDrive/irt4ancm/irt4ancm.stan to exe file /content/drive/MyDrive/irt4ancm/irt4ancm
INFO:cmdstanpy:compiling stan file /content/drive/MyDrive/irt4ancm/irt4ancm.stan to exe file /content/drive/MyDrive/irt4ancm/irt4ancm
DEBUG:cmdstanpy:cmd: make /content/drive/MyDrive/irt4ancm/irt4ancm
cwd: /root/.cmdstan/cmdstan-2.30.1
DEBUG:cmdstanpy:Console output:

--- Translating Stan model to C++ code ---
bin/stanc  --o=/content/drive/MyDrive/irt4ancm/irt4ancm.hpp /content/drive/MyDrive/irt4ancm/irt4ancm.stan

--- Compiling, linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS          -c -Wno-ignored-attributes   -x c++ -o /content/drive/MyDrive/irt4ancm/irt4ancm.o /content/drive/MyDrive/irt4ancm/irt4ancm.hpp
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes      -I stan/lib/stan_math/lib/tbb_2020.3/include    -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS                -Wl,-L,"/root/.cmdstan/cmdstan-2.30.1/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/root/.cmdstan/cmdstan-2.30.1/stan/lib/stan_math/lib/tbb"      /content/drive/MyDrive/irt4ancm/irt4ancm.o src/cmdstan/main.o        -Wl,-L,"/root/.cmdstan/cmdstan-2.30.1/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/root/.cmdstan/cmdstan-2.30.1/stan/lib/stan_math/lib/tbb"   stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a  stan/lib/stan_math/lib/tbb/libtbb.so.2 -o /content/drive/MyDrive/irt4ancm/irt4ancm
rm -f /content/drive/MyDrive/irt4ancm/irt4ancm.o

10:12:24 - cmdstanpy - INFO - compiled model executable: /content/drive/MyDrive/irt4ancm/irt4ancm
INFO:cmdstanpy:compiled model executable: /content/drive/MyDrive/irt4ancm/irt4ancm

We fit the model here using all_plays.json, which contains a complete set of data. You may find it more interesting to explore rec_only.json as an alternative, which contains only plays where the participant claimed to recognise the segment.

fit = model.sample(data="irt4ancm/all_plays.json")

DEBUG:cmdstanpy:cmd: /content/drive/MyDrive/irt4ancm/irt4ancm info
cwd: None
10:14:38 - cmdstanpy - INFO - CmdStan start processing
INFO:cmdstanpy:CmdStan start processing

DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: 1
DEBUG:cmdstanpy:CmdStan args: ['/content/drive/MyDrive/irt4ancm/irt4ancm', 'id=1', 'random', 'seed=3603', 'data', 'file=/content/drive/MyDrive/irt4ancm/all_plays.json', 'output', 'file=/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv', 'method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
DEBUG:cmdstanpy:idx 1
DEBUG:cmdstanpy:running CmdStan, num_threads: 1
DEBUG:cmdstanpy:CmdStan args: ['/content/drive/MyDrive/irt4ancm/irt4ancm', 'id=2', 'random', 'seed=3603', 'data', 'file=/content/drive/MyDrive/irt4ancm/all_plays.json', 'output', 'file=/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_2.csv', 'method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
DEBUG:cmdstanpy:idx 2
DEBUG:cmdstanpy:running CmdStan, num_threads: 1
DEBUG:cmdstanpy:CmdStan args: ['/content/drive/MyDrive/irt4ancm/irt4ancm', 'id=3', 'random', 'seed=3603', 'data', 'file=/content/drive/MyDrive/irt4ancm/all_plays.json', 'output', 'file=/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_3.csv', 'method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
DEBUG:cmdstanpy:idx 3
DEBUG:cmdstanpy:running CmdStan, num_threads: 1
DEBUG:cmdstanpy:CmdStan args: ['/content/drive/MyDrive/irt4ancm/irt4ancm', 'id=4', 'random', 'seed=3603', 'data', 'file=/content/drive/MyDrive/irt4ancm/all_plays.json', 'output', 'file=/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_4.csv', 'method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']

10:20:07 - cmdstanpy - INFO - CmdStan done processing.
INFO:cmdstanpy:CmdStan done processing.
DEBUG:cmdstanpy:runset
RunSet: chains=4, chain_ids=[1, 2, 3, 4], num_processes=4
 cmd (chain 1):
	['/content/drive/MyDrive/irt4ancm/irt4ancm', 'id=1', 'random', 'seed=3603', 'data', 'file=/content/drive/MyDrive/irt4ancm/all_plays.json', 'output', 'file=/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv', 'method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 retcodes=[0, 0, 0, 0]
 per-chain output files (showing chain 1 only):
 csv_file:
	/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv
 console_msgs (if any):
	/tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_0-stdout.txt
DEBUG:cmdstanpy:Chain 1 console:
method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 1 (Default)
id = 1 (Default)
data
  file = /content/drive/MyDrive/irt4ancm/all_plays.json
init = 2 (Default)
random
  seed = 3603
output
  file = /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)


Gradient evaluation took 0.005164 seconds
1000 transitions using 10 leapfrog steps per transition would take 51.64 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 2000 [  0%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 32, column 2 to column 33)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 32, column 2 to column 33)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:  100 / 2000 [  5%]  (Warmup)
Iteration:  200 / 2000 [ 10%]  (Warmup)
Iteration:  300 / 2000 [ 15%]  (Warmup)
Iteration:  400 / 2000 [ 20%]  (Warmup)
Iteration:  500 / 2000 [ 25%]  (Warmup)
Iteration:  600 / 2000 [ 30%]  (Warmup)
Iteration:  700 / 2000 [ 35%]  (Warmup)
Iteration:  800 / 2000 [ 40%]  (Warmup)
Iteration:  900 / 2000 [ 45%]  (Warmup)
Iteration: 1000 / 2000 [ 50%]  (Warmup)
Iteration: 1001 / 2000 [ 50%]  (Sampling)
Iteration: 1100 / 2000 [ 55%]  (Sampling)
Iteration: 1200 / 2000 [ 60%]  (Sampling)
Iteration: 1300 / 2000 [ 65%]  (Sampling)
Iteration: 1400 / 2000 [ 70%]  (Sampling)
Iteration: 1500 / 2000 [ 75%]  (Sampling)
Iteration: 1600 / 2000 [ 80%]  (Sampling)
Iteration: 1700 / 2000 [ 85%]  (Sampling)
Iteration: 1800 / 2000 [ 90%]  (Sampling)
Iteration: 1900 / 2000 [ 95%]  (Sampling)
Iteration: 2000 / 2000 [100%]  (Sampling)

 Elapsed Time: 96.686 seconds (Warm-up)
               63.451 seconds (Sampling)
               160.137 seconds (Total)


10:20:07 - cmdstanpy - WARNING - Non-fatal error during sampling:
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 32, column 2 to column 33)
	Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 32, column 2 to column 33)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 33, column 2 to column 33)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 33, column 2 to column 33)
Consider re-running with show_console=True if the above output is unclear!
WARNING:cmdstanpy:Non-fatal error during sampling:
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 32, column 2 to column 33)
	Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 32, column 2 to column 33)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 33, column 2 to column 33)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/content/drive/MyDrive/irt4ancm/irt4ancm.stan', line 33, column 2 to column 33)
Consider re-running with show_console=True if the above output is unclear!

Stan has a handy set of diagnostics that can warn you of any problems with your model fit. For the purposes of this lab, you will probably not have time to fix any problems, but you can report on them in the assignment.

print(fit.diagnose())

DEBUG:cmdstanpy:cmd: /root/.cmdstan/cmdstan-2.30.1/bin/diagnose /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_2.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_3.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_4.csv
cwd: None

Processing csv files: /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv, /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_2.csv, /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_3.csv, /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

If the model is (mostly) problem-free, you can look at a summary of the parameter values. Remember that we get not a specific value but rather a whole distribution on each parameter. Stan reports means, standard error and deviation, and (most popular in the literature) 5%/50%/95% quantiles.

The final three columns are convergence statistics. As a (very) rough rule of thumb, you want N_eff to be above 400 and R_hat to be less than 1.05.

fit.summary()

DEBUG:cmdstanpy:cmd: /root/.cmdstan/cmdstan-2.30.1/bin/stansummary --percentiles= 5,50,95 --sig_figs=6 --csv_filename=/tmp/tmpccgkrh54/stansummary-irt4ancm-ioqyc4k3.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_1.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_2.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_3.csv /tmp/tmpccgkrh54/irt4ancmugp9oyf4/irt4ancm-20220926101439_4.csv
cwd: None

	Mean	MCSE	StdDev	5%	50%	95%	N_Eff	N_Eff/s	R_hat
lp__	-5636.280000	0.809916	25.282800	-5679.260000	-5636.200000	-5595.490000	974.474	3.80931	1.000620
mu	-0.018374	0.013676	1.020980	-1.677350	-0.021882	1.650180	5573.720	21.78820	0.999639
sigma_theta	1.994040	0.002346	0.091265	1.848120	1.991530	2.151640	1513.350	5.91584	1.001110
sigma_delta	0.872888	0.001572	0.049021	0.794273	0.871882	0.956395	972.827	3.80287	1.001640
theta[1]	-2.900060	0.012507	0.799056	-4.322610	-2.838710	-1.689460	4081.970	15.95680	1.000740
...	...	...	...	...	...	...	...	...	...
delta[433]	-0.292742	0.005331	0.355262	-0.878330	-0.300504	0.304796	4440.450	17.35810	0.999877
delta[434]	0.214229	0.005013	0.359587	-0.380022	0.217208	0.800408	5145.810	20.11550	1.000100
delta[435]	-0.021495	0.005051	0.374217	-0.642577	-0.020770	0.606995	5489.670	21.45960	0.999949
delta[436]	0.613609	0.005638	0.380584	-0.008727	0.607133	1.251060	4557.440	17.81540	0.999413
delta[437]	1.059740	0.005813	0.413095	0.392908	1.054800	1.748900	5049.300	19.73820	0.999554

936 rows × 9 columns

Edit irt4ancn.stan to try different models. Ask Ashley for help with the syntax! Handy distributions include:

~ std_normal() for a standard normal (or half-normal) distribution
~ normal(mu, sigma) for a normal distribution with specified mean and standard deviation
~ lognormal(mu, sigma) for a log-normal distribution (handy for discrimination parameters)
~ bernoulli(p) for a Bernoulli distribution parameterised by the probability of success
~ bernoulli_logit(z) for a Bernoulli distribution parameterised by the inverse logistic function of the probability of success

The full 4PL IRT model looks like this:

\(\mathrm{P}[x_{ni} = 1] = \gamma_i + (\zeta_i - \gamma_i) \frac{\mathrm{e}^{\alpha_i(\theta_n - \delta_i)}}{1 + \mathrm{e}^{\alpha_i(\theta_n - \delta_i)}}\)

For the 3PL, \(\zeta\) is fixed to 1.
For the 2PL, \(\zeta\) is fixed to 1 and \(\gamma\) is fixed to 0.
For the 1PL (Rasch model), \(\zeta\) is fixed to 1, \(\gamma\) is fixed to 0, and \(\alpha\) is fixed to 1.

Don’t forget to add priors as you add more parameters!

WARNING: In the 2PL, 3PL, and 4PL, \(\theta\) needs to be distributed as a standard normal distribution and there can be no hyper-parameter \(\sigma_\theta\). Otherwise, the model is not identified, and Stan will run into many problems while sampling.

Assignment#

Explore 1-, 2-, 3- and 4-parameter IRT models for the Hooked on Music data according to the template. Which segments are most difficult? Which are easiest? Most/least discriminating? Are the guessing parameters what you would expect?
Explore an alternative data model (e.g., using rec_only.json or focussing on is_recognised instead of is_verified), again with 1-, 2-, 3- and 4-parameter IRT models. How do your results compare to what you found in Step 1
(Optional) Expand one of your models to include a regression on audio features recognition time, or Goldsmith’s music sophistication to predict difficulty and ability.

Write a short report (250–500 words) summarising your findings and (to the extent you can) any musical explanations or surprising findings based on what you can hear in the songs.

Additional Resources#

The cmdstanpy documentation
The Stan User’s Guide