Proportion Correct for Sentences — prop_correct

This function computes the proportion of correct sentence responses per participant. Proportions can either be separated by condition or collapsed across conditions. You will need to ensure each trial is marked with a unique id to correspond to the answer key.

prop_correct_sentence(
  data,
  responses,
  key,
  key.trial,
  id,
  id.trial,
  cutoff = 0,
  flag = FALSE,
  group.by = NULL,
  token.split = " "
)

Arguments

data: a dataframe of the variables you would like to return. Other variables will be included in the scored output and in the participant output if they are a one to one match with the participant id.
responses: a column name in the dataframe that contains the participant answers for each item in quotes (i.e., "column")
key: a vector containing the scoring key or data column name. This column does not have to be included in the original dataframe.
key.trial: a vector containing the trial numbers for each answer. Note: If you input long data (i.e., repeating trial-answer responses), we will take the unique combination of the responses. If a trial number is repeated, you will receive an error. Key and key.trial can also be a separate dataframe, depending on how your output data is formatted.
id: a column name containing participant ID numbers from the original dataframe
id.trial: a column name containing the trial numbers for the participant data from the original dataframe
cutoff: a numeric value that determines the criteria for scoring (i.e., 0 = strictest, 5 = is most lenient). The scoring criteria uses a Levenshtein distance measure to match participant responses to the answer key.
flag: a logical argument if you want to flag participant scores that are outliers using z-scores away from the mean score for group
group.by: an optional argument that can be used to group the output by condition columns. These columns should be in the original dataframe and concatenated c() if there are multiple columns
token.split: an optional argument that can be used to delineate how to separate tokens. The default is a space after punctuation and additional spacing is removed.

Value

DF_Scored: The dataframe of the original response, answer, scoring, and any other or grouping variables. This dataframe can be used to determine if the cutoff score and scoring matched your answer key as intended. Distance measures are not perfect! Issues and suggestions for improvement are welcome.
DF_Participant: A dataframe of the proportion correct by participant, which also includes optional z-scoring, grouping, and other variables.
DF_Group: A dataframe of the summary scores by any optional grouping variables, along with overall total proportion correct scoring.

Details

Note: other columns included in the dataframe will be found in the final scored dataset. If these other columns are between subjects data, they will also be included in the participant dataset (i.e., there's a one to one match of participant ID and column information).

Examples


#This data contains sentence recall test with responses and answers together.
#You can use a separate answer key, but this example will show you an
#embedded answer key. This example also shows how you can use different
#stimuli across participants (i.e., each person sees a randomly selected
#set of trials from a larger set).

data(sentence_data)

scored_output <- prop_correct_sentence(data = sentence_data,
 responses = "Response",
 key = "Sentence",
 key.trial = "Trial.ID",
 id = "Sub.ID",
 id.trial = "Trial.ID",
 cutoff = 1,
 flag = TRUE,
 group.by = "Condition",
 token.split = " ")

head(scored_output$DF_Scored)
#>   Trial.ID Sub.ID            Sentence                Responses Condition
#> 1        1      1 This is a sentence.       this is a sentence         a
#> 2        1      2 This is a sentence.         this is sentence         b
#> 3        1      3 This is a sentence.       this is a sentence         a
#> 4        1      4 This is a sentence.       this is a sentence         b
#> 5        1      5 This is a sentence. this thing is a sentence         a
#> 6        1      6 This is a sentence.       this is a sentence         b
#>               Answer Proportion.Match       Shared.Items Corrected.Items
#> 1 this is a sentence             1.00 this is a sentence            <NA>
#> 2 this is a sentence             0.75   this is sentence            <NA>
#> 3 this is a sentence             1.00 this is a sentence            <NA>
#> 4 this is a sentence             1.00 this is a sentence            <NA>
#> 5 this is a sentence             1.00 this is a sentence            <NA>
#> 6 this is a sentence             1.00 this is a sentence            <NA>
#>   Omitted.Items Extra.Items
#> 1          <NA>        <NA>
#> 2             a        <NA>
#> 3          <NA>        <NA>
#> 4          <NA>        <NA>
#> 5          <NA>       thing
#> 6          <NA>        <NA>

head(scored_output$DF_Participant)
#>   Condition Sub.ID Proportion.Correct Z.Score.Group Z.Score.Participant
#> 1         a      1          0.9777778     0.9160221           1.5637499
#> 2         b      2          0.8020635     1.1547005           0.3553233
#> 3         a      3          0.8317460     0.1508220           0.5594568
#> 4         b      4          0.6457143    -0.5773503          -0.7199254
#> 5         a      5          0.5993651    -1.0668441          -1.0386793
#> 6         b      6          0.6457143    -0.5773503          -0.7199254

head(scored_output$DF_Group)
#>   Condition      Mean         SD N
#> 1         a 0.8029630 0.19084127 3
#> 2         b 0.6978307 0.09026826 3