4 Examples

This is what you are really here for, isn’t it?

I am assuming that you have preprocessed your data to remove any outliers, use any cutoffs etc. These functions should be used on the data that you will then create your scores from, i.e. there should not be further processing to do before you run any of these. If there is, then these internal consistency reliability estimates will not reflect the reliability of the outcome measurements you actually analyse.

4.1 Questions to ask before running splithalf

These questions should feed into what settings are appropriate for your need, and are aimed to make the splithalf function easy to use.

  1. What is the type of data you have?

Are you interested in response times, or accuracy rates?

Knowing this, you can set outcome = “RT”, or outcome = “accuracy”

  1. How is your outcome score calculated?

Say that your response time based task has two trial types; “incongruent” and “congruent”. When you analyse your data will you use the average RT in each trial type, or will you create a difference score (or bias) by e.g. subtracting the average RT in congruent trials from the average RT in incongruent trials. The first I call “average” and the second I call “difference”.

  1. Which method would you like to use to estimate (split-half) reliability?

A super common way is to split the data into odd and even trials. Another is to split by the first half and second half of the trials. Both approaches are implemented in the splithalf funciton. However, I believe that the permutation splithalf approach is the way forward (and it was the reason why this package was developed, so please use it).

4.2 Our example dataset

For this quick example, we will simulate some data. Lets say we have 60 participants, who each complete a task with two blocks (A and B) of 80 trials. Trials are also evenly distributed between “congruent” and “incongruent” trials. For each trial we have RT data, and are assuming that participants were accurate in all trials.

4.3 Difference scores

This is by far the most common outcome measure I have come across, so lets start with that.

Our data will be analysed so that we have two ‘bias’ or ‘difference score’ outcomes. So, within each block, we will take the average RT in congruent trials and subtract the average RT in incongruent trials. Calculating the final scores for each participant and for each block separately might look a bit like this

# A tibble: 120 x 5
# Groups:   participant_number, block_name
#   [120]
   participant_num… block_name congruent
              <int> <fct>          <dbl>
 1                1 A               486.
 2                1 B               517.
 3                2 A               445.
 4                2 B               445.
 5                3 A               485.
 6                3 B               469.
 7                4 A               474.
 8                4 B               475.
 9                5 A               522.
10                5 B               510.
# … with 110 more rows, and 2 more variables:
#   incongruent <dbl>, bias <dbl>

ok, lets see how reliable our A and B outcome scores (“bias”) are.

  condition  n splithalf 95_low 95_high
1         A 60     -0.13  -0.30    0.06
2         B 60      0.02  -0.15    0.21
  spearmanbrown SB_low SB_high
1         -0.22  -0.46    0.11
2          0.04  -0.26    0.35

4.3.1 Reading and reporting the output

The splithalf output gives estimates separately for each condition defined (if no condition is defined, the function assumes that you have only a single condition, which it will call “all” to represent that all trials were included).

The second column (n) gives the number of participants analysed. If, for some reason one participant has too few trials to analyse, or did not complete one condition, this will be reflected here. I suggest you compare this n to your expected n to check that everything is running correctly. If the ns dont match, we have a problem. More likely, it will throw an error, but useful to know.

Next are the estimates; the splithalf column and the associated 95% percentile intervals, and the Spearman-Brown corrected estimate with its own percentile intervals. Unsurprisingly, our simlated random data does not yield internally consistant measurements.

What should I report? Ideally, report everything. I have included 95% percentiles of the estimates to give a picture of the spread of internal consistency estimates. Also included is the spearman-brown corrected estimates, which take into account that the estimates are drawn from half the trials that they could have been. Negative reliabilities are near uninterpretable and the spearman-brown formula is not useful in this case. For comparibility between studies I recommend reporting both the raw and the corrected estimates. Something like the following should be sufficient;

We estimated the internal consitency of bias A and B using a permutation-based splithalf approach (Parsons 2020) with 5000 random splits. The (Spearman-Brown corrected) splithalf internal consistency of bias A was were rSB = -0.22, 95%CI [-0.46,0.11].

— Me, reporting reliability, just now

Simples. I hope :)

4.4 Average scores

OK, lets change things up and look at average scores only. In this case, imagine that we have only a single trial type and we can then ignore the trial type option. We will want separate outcome scores for each block of trials, but this time it is simply the average RT in each block. Lets see how that looks within splithalf. Note that the main difference is that we have omitted the inputs about what trial types to ‘compare’, as this is irrelevant for the current task.

Warning in splithalf(data = sim_data, outcome
= "RT", score = "average", : var.trialnum will
soon be depreciated
  condition  n splithalf 95_low 95_high
1         A 60      0.07  -0.10    0.26
2         B 60     -0.06  -0.23    0.13
  spearmanbrown SB_low SB_high
1          0.13  -0.17    0.41
2         -0.11  -0.38    0.23

4.5 Difference-of-difference scores

The difference of differences score is a bit more complex, and perhaps also less common. I programmed this aspect of the package initially because I had seen a few papers that used a change in bias score in their analysis, and I wondered “I wonder how reliable that is as an individual difference measure”. Be warned, difference scores are nearly always less reliable than raw averages, and it’s very likely that differences-of-differences will be the least reliable amongst the bunch.

So, lets say our dependant/outcome variable in our task is the difference between bias observed in block A and B. So our outcome is calculated something like this.

BiasA = incongruent_A - congruent_A

BiasB = incongruent_B - congruent_B

Outcome = BiasB - BiasA

In our function, we specify this very similarly as in the difference score example. The only change will be changing the score to “difference_of_difference” (largely because I could not think of a better name to use). Note that we will keep the condition list consisting of A and B. But, specifying that we are interested in the difference of differences will lead the function to calculate the outcome scores apropriately.

Warning in splithalf(data = sim_data, outcome
= "RT", score = "difference_of_difference", :
var.trialnum will soon be depreciated
     condition  n splithalf 95_low 95_high
1 change score 60     -0.09  -0.27    0.09
  spearmanbrown SB_low SB_high
1         -0.16  -0.42    0.17

I do not forsee the difference_of_difference option being used often, but I will continue to maintain it.

References

Parsons, Sam. 2020. Splithalf: Robust Estimates of Split Half Reliability. https://doi.org/10.6084/m9.figshare.11956746.v4.