Title: | Functions for Working with Two- And Four-Parameter Beta Probability Distributions and Psychometric Analysis of Classifications |
---|---|
Description: | Package providing a number of functions for working with Two- and Four-parameter Beta and closely related distributions (i.e., the Gamma- Binomial-, and Beta-Binomial distributions). Includes, among other things: - d/p/q/r functions for Four-Parameter Beta distributions and Generalized "Binomial" (continuous) distributions, and d/p/r- functions for Beta- Binomial distributions. - d/p/q/r functions for Two- and Four-Parameter Beta distributions parameterized in terms of their means and variances rather than their shape-parameters. - Moment generating functions for Binomial distributions, Beta-Binomial distributions, and observed value distributions. - Functions for estimating classification accuracy and consistency, making use of the Classical Test-Theory based 'Livingston and Lewis' (L&L) and 'Hanson and Brennan' approaches. A shiny app is available, providing a GUI for the L&L approach when used for binary classifications. For url to the app, see documentation for the LL.CA() function. Livingston and Lewis (1995) <doi:10.1111/j.1745-3984.1995.tb00462.x>. Lord (1965) <doi:10.1007/BF02289490>. Hanson (1991) <https://files.eric.ed.gov/fulltext/ED344945.pdf>. |
Authors: | Haakon Eidem Haakstad |
Maintainer: | Haakon Eidem Haakstad <[email protected]> |
License: | CC0 |
Version: | 1.9.0 |
Built: | 2024-11-09 22:47:02 UTC |
Source: | https://github.com/haakoneidemhaakstad/betafunctions |
Calculate the ascending (or rising) factorial of a value x
of order r
.
afac(x, r, method = "product")
afac(x, r, method = "product")
x |
A value for which the ascending factorial is to be calculated. |
r |
The power |
method |
The method by which the descending factorials are to be calculated. Default is |
The ascending factorial of value x
raised to the r
'th power.
# To calculate the 4th ascending factorial for a value (e.g., 3.14): afac(x = 3.14, r = 4) # To calculate the 5th ascending factorial for values 3.14, 2.72, and 0.58: afac(x = c(3.14, 2.72, 0.58), r = 5)
# To calculate the 4th ascending factorial for a value (e.g., 3.14): afac(x = 3.14, r = 4) # To calculate the 5th ascending factorial for values 3.14, 2.72, and 0.58: afac(x = c(3.14, 2.72, 0.58), r = 5)
Calculates the Beta value required to produce a Beta probability density distribution with defined moments and parameters. Be advised that not all combinations of moments and parameters can be satisfied (e.g., specifying mean, variance, skewness and kurtosis uniquely determines both location-parameters, meaning that the value of the lower-location parameter will take on which ever value it must, and cannot be specified).
AMS(mean, variance, l = 0, u = 1, sd = NULL)
AMS(mean, variance, l = 0, u = 1, sd = NULL)
mean |
The mean (first raw moment) of the target Standard Beta probability density distribution. |
variance |
The variance (second central moment) of the target Standard Beta probability density distribution. |
l |
The lower-bound location parameter of the Beta distribution. Default is 0 (as it is for the Standard Beta distribution). |
u |
The upper-bound location parameter of the Beta distribution. Default is 1 (as it is for the Standard Beta distribution). |
sd |
Optional alternative to specifying |
A numeric value representing the required value for the Alpha shape-parameter in order to produce a Beta probability density distribution with the target mean and variance, given specified lower- and upper bounds of the Beta distribution.
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0, rescaled to proportion # of maximum. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) / 100 hist(testdata, xlim = c(0, 1)) # To find the alpha shape-parameter of a Standard (two-parameter) Beta # distribution with the same mean and variance as the observed-score # distribution using AMS(): AMS(mean(testdata), var(testdata))
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0, rescaled to proportion # of maximum. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) / 100 hist(testdata, xlim = c(0, 1)) # To find the alpha shape-parameter of a Standard (two-parameter) Beta # distribution with the same mean and variance as the observed-score # distribution using AMS(): AMS(mean(testdata), var(testdata))
Given a vector of false-positive rates and a vector of true-positive rates, calculate the area under the Receiver Operator Characteristic (ROC) curve.
AUC(FPR, TPR)
AUC(FPR, TPR)
FPR |
Vector of False-Positive Rates. |
TPR |
Vector of True-Positive Rates. |
A value representing the area under the ROC curve.
Script originally retrieved and modified from https://blog.revolutionanalytics.com/2016/11/calculating-auc.html.
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. To calculate the # necessary (x, y) coordinates to compute the area under the curve statistic # one can use the LL.ROC() function with the argument # raw.out = TRUE. coords <- LL.ROC(x = testdata, reliability = .7, truecut = 50, min = 0, max = 100, raw.out = TRUE) # To calculate and retrieve the Area Under the Curve (AUC) with the AUC() # function, feed it the raw coordinates calculated above. AUC(coords[, "FPR"], coords[, "TPR"])
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. To calculate the # necessary (x, y) coordinates to compute the area under the curve statistic # one can use the LL.ROC() function with the argument # raw.out = TRUE. coords <- LL.ROC(x = testdata, reliability = .7, truecut = 50, min = 0, max = 100, raw.out = TRUE) # To calculate and retrieve the Area Under the Curve (AUC) with the AUC() # function, feed it the raw coordinates calculated above. AUC(coords[, "FPR"], coords[, "TPR"])
An implementation of the method of moments estimation of two-parameter Beta distribution parameters. Given a vector of values, calculates the shape parameters required to produce a two-parameter Beta distribution with the same mean and variance (i.e., the first two moments) as the observed-score distribution.
Beta.2p.fit(scores = NULL, mean = NULL, variance = NULL, l = 0, u = 1)
Beta.2p.fit(scores = NULL, mean = NULL, variance = NULL, l = 0, u = 1)
scores |
A vector of values to which the two-parameter Beta distribution is to be fitted. The values ought to fall within the [0, 1] interval. |
mean |
The mean of the target Beta distribution. Alternative to feeding the function raw scores. |
variance |
The variance of the target Beta distribution. Alternative to feeding the function raw scores. |
l |
Optional specification of a lower-bound parameter of the Beta distribution. Default is 0 (i.e., the lower-bound of the Standard two-parameter Beta distribution). |
u |
Optional specification of an upper-bound parameter of the Beta distribution. Default is 1 (i.e., the lower-bound of the Standard two-parameter Beta distribution). |
A list of parameter-values required to produce a Standard two-parameter Beta distribution with the same first two moments as the observed distribution.
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) / 100 hist(testdata, xlim = c(0, 1), freq = FALSE) # To fit and retrieve the parameters for a two-parameter Beta distribution # to the observed-score distribution using Beta.2p.fit(): (params.2p <- Beta.2p.fit(testdata)) curve(dbeta(x, params.2p$alpha, params.2p$beta), add = TRUE)
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) / 100 hist(testdata, xlim = c(0, 1), freq = FALSE) # To fit and retrieve the parameters for a two-parameter Beta distribution # to the observed-score distribution using Beta.2p.fit(): (params.2p <- Beta.2p.fit(testdata)) curve(dbeta(x, params.2p$alpha, params.2p$beta), add = TRUE)
An implementation of the method of moments estimation of four-parameter Beta distribution parameters presented by Hanson (1991). Given a vector of values, calculates the shape- and location parameters required to produce a four-parameter Beta distribution with the same mean, variance, skewness and kurtosis (i.e., the first four moments) as the observed-score distribution.
Beta.4p.fit( scores, mean = NULL, variance = NULL, skewness = NULL, kurtosis = NULL )
Beta.4p.fit( scores, mean = NULL, variance = NULL, skewness = NULL, kurtosis = NULL )
scores |
A vector of values to which the four-parameter Beta distribution is to be fitted. |
mean |
If scores are not supplied: specification of the mean for the target four-parameter Beta distribution. |
variance |
If scores are not supplied: specification of the variance for the target four-parameter Beta distribution. |
skewness |
If scores are not supplied: specification of the skewness for the target four-parameter Beta distribution. |
kurtosis |
If scores are not supplied: specification of the kurtosis for the target four-parameter Beta distribution. |
A list of parameter-values required to produce a four-parameter Beta distribution with the same first four moments as the observed distribution.
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes.American College Testing Research Report Series.
Lord, Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100), freq = FALSE) # To fit and retrieve the parameters for a four-parameter Beta distribution # to the observed-score distribution using Beta.4p.fit(): (params.4p <- Beta.4p.fit(testdata)) curve(dBeta.4P(x, params.4p$l, params.4p$u, params.4p$alpha, params.4p$beta), add = TRUE)
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100), freq = FALSE) # To fit and retrieve the parameters for a four-parameter Beta distribution # to the observed-score distribution using Beta.4p.fit(): (params.4p <- Beta.4p.fit(testdata)) curve(dBeta.4P(x, params.4p$l, params.4p$u, params.4p$alpha, params.4p$beta), add = TRUE)
Plotting tool, producing a two-column matrix with values of y
corresponding to locations on x
. Useful for shading areas under the curve when tracing the line for the Beta cumulative probability functions.
Beta.gfx.poly.cdf(from, to, by, alpha, beta, l = 0, u = 1)
Beta.gfx.poly.cdf(from, to, by, alpha, beta, l = 0, u = 1)
from |
The point of the |
to |
The point of the |
by |
The resolution (or spacing) at which to produce |
alpha |
The alpha shape-parameter value for the Standard Beta cumulative probability distribution. |
beta |
The beta shape-parameter for the Standard Beta cumulative probability distribution. |
l |
The lower-bound location parameter of the Beta distribution. |
u |
The upper-bound location parameter of the Beta distribution. |
A two-column matrix with cumulative probability-values of y to plot against corresponding location values of x
.
# To box in an area under a four-parameter Beta cumulative distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters # alpha = 5 and beta = 3, from 0.4 to 0.6: plot(NULL, xlim = c(0, 1), ylim = c(0, 1)) coords <- Beta.gfx.poly.cdf(from = 0.4, to = 0.6, by = 0.001, alpha = 5, beta = 3, l = 0.25, u = 0.75) polygon(coords)
# To box in an area under a four-parameter Beta cumulative distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters # alpha = 5 and beta = 3, from 0.4 to 0.6: plot(NULL, xlim = c(0, 1), ylim = c(0, 1)) coords <- Beta.gfx.poly.cdf(from = 0.4, to = 0.6, by = 0.001, alpha = 5, beta = 3, l = 0.25, u = 0.75) polygon(coords)
Plotting tool, producing a two-column matrix with values of y
corresponding to locations on x
. Useful for shading areas under the curve when tracing the line for the Beta probability density functions.
Beta.gfx.poly.pdf(from, to, by, alpha, beta, l = 0, u = 1)
Beta.gfx.poly.pdf(from, to, by, alpha, beta, l = 0, u = 1)
from |
The point of the |
to |
The point of the x-axis to where y-density values are to be produced. |
by |
The resolution (or spacing) at which to produce y-density values. |
alpha |
The alpha (first) shape-parameter value for the Standard Beta probability density distribution. |
beta |
The beta (second) shape-parameter for the Standard Beta probability density distribution. |
l |
The lower-bound location parameter of the Beta distribution. |
u |
The upper-bound location parameter of the Beta distribution. |
A two-column matrix with density-values of y to plot against corresponding location values of x.
# To box in an area under a four-parameter Beta distribution with location # parameters l = .25 and u = .75, and shape parameters alpha = 5 and # rbeta = 3, from 0.4 to 0.6: plot(NULL, xlim = c(0, 1), ylim = c(0, 7)) coords <- Beta.gfx.poly.pdf(from = 0.4, to = 0.6, by = 0.001, alpha = 5, beta = 3, l = 0.25, u = 0.75) polygon(coords)
# To box in an area under a four-parameter Beta distribution with location # parameters l = .25 and u = .75, and shape parameters alpha = 5 and # rbeta = 3, from 0.4 to 0.6: plot(NULL, xlim = c(0, 1), ylim = c(0, 7)) coords <- Beta.gfx.poly.pdf(from = 0.4, to = 0.6, by = 0.001, alpha = 5, beta = 3, l = 0.25, u = 0.75) polygon(coords)
Plotting tool, producing a two-column matrix with values of y
corresponding to locations on x
. Useful for shading areas under the curve when tracing the line for the Beta probability quantile functions.
Beta.gfx.poly.qdf(from, to, by, alpha, beta, l = 0, u = 1)
Beta.gfx.poly.qdf(from, to, by, alpha, beta, l = 0, u = 1)
from |
The point of the |
to |
The point of the |
by |
The resolution (or spacing) at which to produce |
alpha |
The alpha shape-parameter value for the Standard Beta probability distribution. |
beta |
The beta shape-parameter for the Standard Beta probability distribution. |
l |
The lower-bound location parameter of the Beta distribution. |
u |
The upper-bound location parameter of the Beta distribution. |
A two-column matrix with quantile-values of y
to plot against corresponding location values of x
.
# To box in an area under a four-parameter Beta quantile distribution with # location parameters l = .25 and u = 75, and shape parameters alpha = 5 and # beta = 3, from .4 to .6: plot(NULL, xlim = c(0, 1), ylim = c(0, 1)) coords <- Beta.gfx.poly.qdf(from = 0.4, to = 0.6, by = 0.001, alpha = 5, beta = 3, l = 0.25, u = 0.75) polygon(coords)
# To box in an area under a four-parameter Beta quantile distribution with # location parameters l = .25 and u = 75, and shape parameters alpha = 5 and # beta = 3, from .4 to .6: plot(NULL, xlim = c(0, 1), ylim = c(0, 1)) coords <- Beta.gfx.poly.qdf(from = 0.4, to = 0.6, by = 0.001, alpha = 5, beta = 3, l = 0.25, u = 0.75) polygon(coords)
Estimator for the Beta true-score distribution shape-parameters from the observed-score distribution and Livingston and Lewis' effective test length. Returns a list with entries representing the lower- and upper shape parameters (l and u), and the shape parameters (alpha and beta) of the four-parameters beta distribution, and the effective test length.
Beta.tp.fit( x, min, max, etl = NULL, reliability = NULL, true.model = "4P", failsafe = FALSE, l = 0, u = 1, output = "parameters" )
Beta.tp.fit( x, min, max, etl = NULL, reliability = NULL, true.model = "4P", failsafe = FALSE, l = 0, u = 1, output = "parameters" )
x |
Vector of observed-scores. |
min |
The minimum possible score to attain on the test. |
max |
The maximum possible score to attain on the test. |
etl |
The value of Livingston and Lewis' effective test length. See |
reliability |
Optional specification of the test-score reliability coefficient. If specified, overrides the input of the |
true.model |
The type of Beta distribution which is to be fit to the moments of the true-score distribution. Options are |
failsafe |
Logical. Whether to revert to a fail-safe two-parameter solution should the four-parameter solution contain invalid parameter estimates. |
l |
If |
u |
If |
output |
Option to specify true-score distribution moments as output if the value of the output argument does not equal |
A list with the parameter values of a four-parameter Beta distribution. "l" is the lower location-parameter, "u" the upper location-parameter, "alpha" the first shape-parameter, "beta" the second shape-parameter, and "etl" the effective test length.
Hanson, B. A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing Research Report Series. Retrieved from https://files.eric.ed.gov/fulltext/ED344945.pdf
Lord, F. M. (1965). A strong true-score theory, with applications. Psychometrika. 30(3). pp. 239–270. doi: 10.1007/BF02289490
Rogosa, D. & Finkelman, M. (2004). How Accurate Are the STAR Scores for Individual Students? An Interpretive Guide. Retrieved from http://statweb.stanford.edu/~rag/accguide/guide04.pdf
# Generate some fictional data. Say 1000 individuals take a 100-item test # where all items are equally difficult, and the true-score distribution # is a four-parameter Beta distribution with location parameters l = 0.25, # u = 0.75, alpha = 5, and beta = 3: set.seed(12) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) # Since this test contains items which are all equally difficult, the true # effective test length (etl) is the actual test length. I.e., etl = 100. # To estimate the four-parameter Beta distribution parameters underlying # the draws from the binomial distribution: Beta.tp.fit(testdata, 0, 100, 100) # Imagine a case where the fitting procedure produces an impermissible # estimate (e.g., l < 0 or u > 1). set.seed(1234) testdata <- rbinom(1000, 50, rBeta.4P(1000, 0.25, 0.75, 5, 3)) Beta.tp.fit(testdata, 0, 50, 50) # This example produced an l-value estimate less than 0. One way of # dealing with such an occurrence is to revert to a two-parameter # model, specifying the l and u parameters and estimating the # alpha and beta parameters necessary to produce a Beta distribution # with the same mean and variance as the estimated true-score distribution. # Suppose you have good theoretical reasons to fix the l parameter at a # value of 0.25 (e.g., the test is composed of multiple-choice questions # with four response-options, resulting in a 25% chance of guessing the # correct answer). The l-parameter could be specified to this theoretically # justified value, and the u-parameter could be specified to be equal to the # estimate above (u = 0.7256552) as such: Beta.tp.fit(testdata, 0, 50, 50, true.model = "2P", l = 0.25, u = 0.7256552)
# Generate some fictional data. Say 1000 individuals take a 100-item test # where all items are equally difficult, and the true-score distribution # is a four-parameter Beta distribution with location parameters l = 0.25, # u = 0.75, alpha = 5, and beta = 3: set.seed(12) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) # Since this test contains items which are all equally difficult, the true # effective test length (etl) is the actual test length. I.e., etl = 100. # To estimate the four-parameter Beta distribution parameters underlying # the draws from the binomial distribution: Beta.tp.fit(testdata, 0, 100, 100) # Imagine a case where the fitting procedure produces an impermissible # estimate (e.g., l < 0 or u > 1). set.seed(1234) testdata <- rbinom(1000, 50, rBeta.4P(1000, 0.25, 0.75, 5, 3)) Beta.tp.fit(testdata, 0, 50, 50) # This example produced an l-value estimate less than 0. One way of # dealing with such an occurrence is to revert to a two-parameter # model, specifying the l and u parameters and estimating the # alpha and beta parameters necessary to produce a Beta distribution # with the same mean and variance as the estimated true-score distribution. # Suppose you have good theoretical reasons to fix the l parameter at a # value of 0.25 (e.g., the test is composed of multiple-choice questions # with four response-options, resulting in a 25% chance of guessing the # correct answer). The l-parameter could be specified to this theoretically # justified value, and the u-parameter could be specified to be equal to the # estimate above (u = 0.7256552) as such: Beta.tp.fit(testdata, 0, 50, 50, true.model = "2P", l = 0.25, u = 0.7256552)
Computes Raw, Central, or Standardized moment properties of defined Beta-Binomial probability mass functions.
betabinomialmoments( N, l, u, alpha, beta, types = c("raw", "central", "standardized"), orders = 4 )
betabinomialmoments( N, l, u, alpha, beta, types = c("raw", "central", "standardized"), orders = 4 )
N |
Number of trials. |
l |
The first (lower) location-parameter of the Beta distribution. |
u |
The second (upper) location-parameter of the Beta distribution. |
alpha |
The alpha (first) shape-parameter of the Beta distribution. |
beta |
The beta (second) shape-parameter of the Beta-distribution. |
types |
A character vector determining which moment-types are to be calculated. Permissible values are "raw", "central", and "standardized". |
orders |
The number of moment-orders to be calculated for each of the moment-types. |
A list of moment types, each a list of moment orders.
Hanson, B. A (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing Research Report Series.
# Assume 100 observations of a discrete variable with probabilities of # positive outcomes adhering to a four-parameter Beta distribution with # location parameters l = 0.25 and u = .95, and shape parameters a = 5 and # b = 3. To compute the first four raw, central, and standardized moments of # this distrubution using betabinomialmoments(): betabinomialmoments(N = 100, l = .25, u = .95, alpha = 5, beta = 3, types = c("raw", "central", "standardized"), orders = 4)
# Assume 100 observations of a discrete variable with probabilities of # positive outcomes adhering to a four-parameter Beta distribution with # location parameters l = 0.25 and u = .95, and shape parameters a = 5 and # b = 3. To compute the first four raw, central, and standardized moments of # this distrubution using betabinomialmoments(): betabinomialmoments(N = 100, l = .25, u = .95, alpha = 5, beta = 3, types = c("raw", "central", "standardized"), orders = 4)
Computes the median of a Beta distribution with specified shape- and location parameters.
betamedian(alpha, beta, l = 0, u = 1)
betamedian(alpha, beta, l = 0, u = 1)
alpha |
The alpha shape parameter. |
beta |
The beta shape parameter. |
l |
The first (lower) location parameter. Default set to |
u |
The second (upper) location parameter. Default set to |
# To calculate the median of a two-parameter (standard) Beta distribution with # shape parameters alpha = 5 and beta = 3: betamedian(alpha = 5, beta = 3) # To calculate the median of a four-parameter Beta distribution with shape # parameters alpha = 5 and beta = 3, and location parameters l = 25 and # u = 150: betamedian(alpha = 5, beta = 3, l = 25, u = 150)
# To calculate the median of a two-parameter (standard) Beta distribution with # shape parameters alpha = 5 and beta = 3: betamedian(alpha = 5, beta = 3) # To calculate the median of a four-parameter Beta distribution with shape # parameters alpha = 5 and beta = 3, and location parameters l = 25 and # u = 150: betamedian(alpha = 5, beta = 3, l = 25, u = 150)
Computes the mode of a Beta distribution with specified shape- and location parameters.
betamode(alpha, beta, l = 0, u = 1)
betamode(alpha, beta, l = 0, u = 1)
alpha |
The alpha shape parameter of the Probability Density Distribution. |
beta |
The beta shape parameter of the Probability Density Distribution. |
l |
The first (lower) location parameter of a four-parameter distribution. Default set to |
u |
The second (upper) location parameter of a four-parameter distribution. Default set to |
# To calculate the mode of a two-parameter (standard) Beta distribution with # shape parameters alpha = 5 and beta = 3: betamode(alpha = 5, beta = 3) # To calculate the mode of a four-parameter Beta distribution with shape # parameters alpha = 5 and beta = 3, and location parameters l = 25 and # u = 150: betamode(alpha = 5, beta = 3, l = 25, u = 150)
# To calculate the mode of a two-parameter (standard) Beta distribution with # shape parameters alpha = 5 and beta = 3: betamode(alpha = 5, beta = 3) # To calculate the mode of a four-parameter Beta distribution with shape # parameters alpha = 5 and beta = 3, and location parameters l = 25 and # u = 150: betamode(alpha = 5, beta = 3, l = 25, u = 150)
Computes Raw, Central, or Standardized moment properties of defined Beta probability density distributions.
betamoments( alpha, beta, l = 0, u = 1, types = c("raw", "central", "standardized"), orders = 4 )
betamoments( alpha, beta, l = 0, u = 1, types = c("raw", "central", "standardized"), orders = 4 )
alpha |
The alpha shape parameter. |
beta |
The beta shape parameter. |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
types |
A character vector determining which moment-types are to be calculated. Permissible values are "raw", "central", and "standardized". |
orders |
The number of moment-orders to be calculated for each of the moment-types. |
A list of moment types, each a list of moment orders.
Hanson, B. A (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing Research Report Series.
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the first four raw, central, and standardized # moments of this distribution using betamoments(): betamoments(alpha = 5, beta = 3, l = 0.25, u = 0.75, types = c("raw", "central", "standardized"), orders = 4)
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the first four raw, central, and standardized # moments of this distribution using betamoments(): betamoments(alpha = 5, beta = 3, l = 0.25, u = 0.75, types = c("raw", "central", "standardized"), orders = 4)
Computes Raw, Central, or Standardized moment properties of defined Binomial probability mass functions.
binomialmoments(n, p, types = c("raw", "central", "standardized"), orders = 4)
binomialmoments(n, p, types = c("raw", "central", "standardized"), orders = 4)
n |
Number of Binomial trials |
p |
Probability of success per trial. |
types |
A character vector determining which moment-types are to be calculated. Permissible values are "raw", "central", and "standardized". |
orders |
The number of moment-orders to be calculated for each of the moment-types. |
A list of moment types, each a list of moment orders.
# Assume some variable follows a Binomial distribution with number of trials # equal to 100 and a probability of success on each trial of 0.75. To compute # the first four raw, central, and standardized moments of this distribution # using binomialmoments(): binomialmoments(n = 100, p = 0.75, types = c("raw", "central", "standardized"), orders = 4) # To only compute the (e.g.) standardized moments: binomialmoments(n = 100, p = 0.75, types = "standardized") # To compute moments beyond the fourth order (e.g., the sixth): binomialmoments(n = 100, p = 0.75, orders = 6)
# Assume some variable follows a Binomial distribution with number of trials # equal to 100 and a probability of success on each trial of 0.75. To compute # the first four raw, central, and standardized moments of this distribution # using binomialmoments(): binomialmoments(n = 100, p = 0.75, types = c("raw", "central", "standardized"), orders = 4) # To only compute the (e.g.) standardized moments: binomialmoments(n = 100, p = 0.75, types = "standardized") # To compute moments beyond the fourth order (e.g., the sixth): binomialmoments(n = 100, p = 0.75, orders = 6)
Calculates the Beta value required to produce a Beta probability density distribution with defined moments and parameters. Be advised that not all combinations of moments and parameters can be satisfied (e.g., specifying mean, variance, skewness and kurtosis uniquely determines both location-parameters, meaning that the value of the lower-location parameter will take on which ever value it must, and cannot be specified).
BMS(mean, variance, l = 0, u = 1, sd = NULL)
BMS(mean, variance, l = 0, u = 1, sd = NULL)
mean |
The mean (first raw moment) of the target Standard Beta probability density distribution. |
variance |
The variance (second central moment) of the target Standard Beta probability density distribution. |
l |
The lower-bound location parameter of the Beta distribution. Default is 0 (as it is for the Standard Beta distribution). |
u |
The upper-bound location parameter of the Beta distribution. Default is 1 (as it is for the Standard Beta distribution). |
sd |
Optional alternative to specifying |
A numeric value representing the required value for the Beta shape-parameter in order to produce a Standard Beta probability density distribution with the target mean and variance, given specified lower- and upper bounds of the Beta distribution.
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0, rescaled to proportion # of maximum. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) / 100 hist(testdata, xlim = c(0, 1)) # To find the beta shape-parameter of a Standard (two-parameter) Beta # distribution with the same mean and variance as the observed-score # distribution using BMS(): BMS(mean(testdata), var(testdata)) # To find the beta shape-parameter of a four-parameter Beta # distribution with specified lower- and upper-bounds of l = 0.25 and # u = 0.75 using BMS: BMS(mean(testdata), var(testdata), 0.25, 0.75)
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0, rescaled to proportion # of maximum. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) / 100 hist(testdata, xlim = c(0, 1)) # To find the beta shape-parameter of a Standard (two-parameter) Beta # distribution with the same mean and variance as the observed-score # distribution using BMS(): BMS(mean(testdata), var(testdata)) # To find the beta shape-parameter of a four-parameter Beta # distribution with specified lower- and upper-bounds of l = 0.25 and # u = 0.75 using BMS: BMS(mean(testdata), var(testdata), 0.25, 0.75)
Provides a set of statistics often used for conveying information regarding the certainty of classifications based on tests.
caStats(tp, tn, fp, fn)
caStats(tp, tn, fp, fn)
tp |
The frequency or rate of true-positive classifications. |
tn |
The frequency or rate of true-negative classifications. |
fp |
The frequency or rate of false-positive classifications. |
fn |
The frequency or rate of false-negative classifications. |
A list of diagnostic performance statistics based on true/false positive/negative statistics. Specifically, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Youden's J. (Youden.J), and Accuracy.
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. First, compute the # estimated confusion matrix using LL.CA(): cmat <- LL.CA(x = testdata, reliability = 0.7, cut = 50, min = 0, max = 100)$confusionmatrix # To estimate and retrieve diagnostic performance statistics using caStats(), # feed it the appropriate entries of the confusion matrix. caStats(tp = cmat["True", "Positive"], tn = cmat["True", "Negative"], fp = cmat["False", "Positive"], fn = cmat["False", "Negative"])
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. First, compute the # estimated confusion matrix using LL.CA(): cmat <- LL.CA(x = testdata, reliability = 0.7, cut = 50, min = 0, max = 100)$confusionmatrix # To estimate and retrieve diagnostic performance statistics using caStats(), # feed it the appropriate entries of the confusion matrix. caStats(tp = cmat["True", "Positive"], tn = cmat["True", "Negative"], fp = cmat["False", "Positive"], fn = cmat["False", "Negative"])
Calculates Cronbach's Alpha reliability coefficient of the sum-score.
cba(x)
cba(x)
x |
A data-frame or matrix of numerical values where rows represent respondents, and columns represent items. |
Cronbach's Alpha for the sum-score of supplied variables.
Missing values are treated by passing na.rm = TRUE
to the var
function call.
Be aware that this function does not issue a warning if there are negative correlations between variables in the supplied data-set.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334. doi: 10.1007/BF02310555
# Generate some fictional data. Say 100 students take a 50-item long test # where all items are equally difficult. set.seed(1234) p.success <- rBeta.4P(100, 0.25, 0.75, 5, 3) for (i in 1:50) { if (i == 1) { rawdata <- matrix(nrow = 100, ncol = 50) } rawdata[, i] <- rbinom(100, 1, p.success) } # To calculate Cronbach's Alpha for this test: cba(rawdata)
# Generate some fictional data. Say 100 students take a 50-item long test # where all items are equally difficult. set.seed(1234) p.success <- rBeta.4P(100, 0.25, 0.75, 5, 3) for (i in 1:50) { if (i == 1) { rawdata <- matrix(nrow = 100, ncol = 50) } rawdata[, i] <- rbinom(100, 1, p.success) } # To calculate Cronbach's Alpha for this test: cba(rawdata)
Provides a set of statistics often used for conveying information regarding the consistency of classifications based on tests.
ccStats(ii, ij, ji, jj)
ccStats(ii, ij, ji, jj)
ii |
The frequency or rate of consistent classifications into category "i". |
ij |
The frequency or rate of inconsistent classifications into categories "i" and "j". |
ji |
The frequency or rate of inconsistent classifications into categories "j" and "i". |
jj |
The frequency or rate of consistent classifications into category "j". |
A list of classification consistency statistics. Specifically, the coefficient of consistent classification (p), the coefficient of consistent classification by chance (p_c), the proportion of positive classifications due to chance (p_c_pos), the proportion of negative classifications due to chance (p_c_neg), and Cohen's Kappa coefficient.
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing.
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, .25, .75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. First, compute the # estimated consistency matrix using LL.CA(): cmat <- LL.CA(x = testdata, reliability = .7, cut = 50, min = 0, max = 100)$consistencymatrix # To estimate and retrieve consistency statistics using ccStats(), # feed it the appropriate entries of the consistency matrix. ccStats(ii = cmat["i", "i"], ij = cmat["i", "j"], ji = cmat["j", "i"], jj = cmat["j", "j"])
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, .25, .75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. First, compute the # estimated consistency matrix using LL.CA(): cmat <- LL.CA(x = testdata, reliability = .7, cut = 50, min = 0, max = 100)$consistencymatrix # To estimate and retrieve consistency statistics using ccStats(), # feed it the appropriate entries of the consistency matrix. ccStats(ii = cmat["i", "i"], ij = cmat["i", "j"], ji = cmat["j", "i"], jj = cmat["j", "j"])
Organizes supplied values of true and false positives and negatives into a confusion matrix.
confmat(tp, tn, fp, fn, output = "freq")
confmat(tp, tn, fp, fn, output = "freq")
tp |
The frequency or rate of true-positive classifications. |
tn |
The frequency or rate of true-negative classifications. |
fp |
The frequency or rate of false-positive classifications. |
fn |
The frequency or rate of false-negative classifications. |
output |
Whether the returned output reflects frequencies or proportions. Defaults to returning frequencies. |
A confusion matrix organizing the input values of true and false positive and negatives.
# Generate some true and observed conditions. set.seed(1234) true.ability <- rbeta(50, 4, 4) true.category <- ifelse(true.ability < 0.5, 0, 1) observed.score <- rbinom(50, 50, true.ability) observed.category <- ifelse(observed.score < 25, 0, 1) # Calculate the frequencies of true and false positives and negatives based on the true and # observed conditions. TP <- sum(ifelse(observed.category == 0 & true.category == 0, 1, 0)) FP <- sum(ifelse(observed.category == 0 & true.category != 0, 1, 0)) TN <- sum(ifelse(observed.category == 1 & true.category == 1, 1, 0)) FN <- sum(ifelse(observed.category == 1 & true.category != 1, 1, 0)) # Organize the above values in a confusion matrix using the confmat function: confmat(tp = TP, fp = FP, tn = TN, fn = FN)
# Generate some true and observed conditions. set.seed(1234) true.ability <- rbeta(50, 4, 4) true.category <- ifelse(true.ability < 0.5, 0, 1) observed.score <- rbinom(50, 50, true.ability) observed.category <- ifelse(observed.score < 25, 0, 1) # Calculate the frequencies of true and false positives and negatives based on the true and # observed conditions. TP <- sum(ifelse(observed.category == 0 & true.category == 0, 1, 0)) FP <- sum(ifelse(observed.category == 0 & true.category != 0, 1, 0)) TN <- sum(ifelse(observed.category == 1 & true.category == 1, 1, 0)) FN <- sum(ifelse(observed.category == 1 & true.category != 1, 1, 0)) # Organize the above values in a confusion matrix using the confmat function: confmat(tp = TP, fp = FP, tn = TN, fn = FN)
Gives the density at desired values of x
under the Four-Parameter Beta Probability Density Distribution.
dBeta.4P(x, l, u, alpha, beta)
dBeta.4P(x, l, u, alpha, beta)
x |
Value of |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The first shape parameter. |
beta |
The second shape parameter. |
The value for the probability density at specified values of x
.
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the probability density at a specific point of # the distribution (e.g., 0.5) using dBeta.4P(): dBeta.4P(x = 0.5, l = 0.25, u = 0.75, alpha = 5, beta = 3)
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the probability density at a specific point of # the distribution (e.g., 0.5) using dBeta.4P(): dBeta.4P(x = 0.5, l = 0.25, u = 0.75, alpha = 5, beta = 3)
The Beta Compound Beta distribution: The product of the four-parameter Beta probability density function and the Beta cumulative probability function. Used in the Livingston and Lewis approach to classification accuracy and consistency, the output can be interpreted as the population density of passing scores produced at "x" (a value of true-score).
dBeta.pBeta(x, l, u, alpha, beta, n, c, lower.tail = FALSE)
dBeta.pBeta(x, l, u, alpha, beta, n, c, lower.tail = FALSE)
x |
x-axis input for which |
l |
The lower-bound of the four-parameter Beta distribution. |
u |
The upper-bound of the four-parameter Beta distribution. |
alpha |
The alpha shape-parameter of the Beta density distribution. |
beta |
The beta shape-parameter of the Beta density distribution. |
n |
The number of trials for the Beta cumulative probability distribution. |
c |
The "true-cut" (proportion) of on the Beta cumulative probability distribution. |
lower.tail |
Logical. Whether to compute the lower or upper tail of the Beta cumulative probability distribution. Default is |
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes.American College Testing Research Report Series.
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
Lord, Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
# Given a four-parameter Beta distribution with parameters l = 0.25, u = 0.75, # alpha = 5, and beta = 3, and a Beta error distribution with number of # trials (n) = 10 and a cutoff-point (c) at 50% correct (i.e., proportion correct # of 0.5), the population density of passing scores produced at true-score # (x) = 0.5 can be calculated as: dBeta.pBeta(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5) # Conversely, the density of failing scores produced at x can be calculated # by passing the additional argument "lower.tail = TRUE" to the function. # That is: dBeta.pBeta(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5, lower.tail = TRUE) # By integration, the population proportion of (e.g.) passing scores in some # region of the true-score distribution (e.g. between 0.25 and 0.5) can be # calculated as: integrate(function(x) { dBeta.pBeta(x, 0.25, 0.75, 5, 3, 10, 0.5) }, lower = 0.25, upper = 0.5)
# Given a four-parameter Beta distribution with parameters l = 0.25, u = 0.75, # alpha = 5, and beta = 3, and a Beta error distribution with number of # trials (n) = 10 and a cutoff-point (c) at 50% correct (i.e., proportion correct # of 0.5), the population density of passing scores produced at true-score # (x) = 0.5 can be calculated as: dBeta.pBeta(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5) # Conversely, the density of failing scores produced at x can be calculated # by passing the additional argument "lower.tail = TRUE" to the function. # That is: dBeta.pBeta(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5, lower.tail = TRUE) # By integration, the population proportion of (e.g.) passing scores in some # region of the true-score distribution (e.g. between 0.25 and 0.5) can be # calculated as: integrate(function(x) { dBeta.pBeta(x, 0.25, 0.75, 5, 3, 10, 0.5) }, lower = 0.25, upper = 0.5)
The Beta Compound Binomial distribution: The product of the four-parameter Beta probability density function and the binomial cumulative probability mass function. Used in the Livingston and Lewis approach to classification accuracy and consistency, the output can be interpreted as the population density of passing scores produced at "x" (a value of true-score).
dBeta.pBinom(x, l, u, alpha, beta, n, c, lower.tail = FALSE)
dBeta.pBinom(x, l, u, alpha, beta, n, c, lower.tail = FALSE)
x |
x-axis input for which |
l |
The lower-bound of the four-parameter Beta distribution. |
u |
The upper-bound of the four-parameter Beta distribution. |
alpha |
The alpha shape-parameter of the Beta distribution. |
beta |
The beta shape-parameter of the Beta distribution. |
n |
The number of trials for the Binomial distribution. |
c |
The "true-cut" (proportion) of the Binomial distribution. |
lower.tail |
Logical. Whether to compute the lower or upper tail of the Binomial distribution. Default is |
The Binomial distribution cut-point is up-to but not including, unlike the standard behaviour of base-R pbinom() function.
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes.American College Testing Research Report Series.
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
Lord, Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
# Given a four-parameter Beta distribution with parameters l = 0.25, u = 0.75, # alpha = 5, and beta = 3, and a Binomial error distribution with number of # trials (n) = 10 and a cutoff-point (c) at 50% correct (i.e., proportion correct # of 0.5), the population density of passing scores produced at true-score # (x) = 0 can be calculated as: dBeta.pBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5) # Conversely, the density of failing scores produced at x can be calculated # by passing the additional argument "lower.tail = TRUE" to the function. # That is: dBeta.pBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5, lower.tail = TRUE) #By integration, the population proportion of (e.g.) passing scores in some #region of the true-score distribution (e.g. between 0.25 and 0.5) can be #calculated as: integrate(function(x) { dBeta.pBinom(x, 0.25, .75, 5, 3, 10, 0.5) }, lower = 0.25, upper = 0.5)
# Given a four-parameter Beta distribution with parameters l = 0.25, u = 0.75, # alpha = 5, and beta = 3, and a Binomial error distribution with number of # trials (n) = 10 and a cutoff-point (c) at 50% correct (i.e., proportion correct # of 0.5), the population density of passing scores produced at true-score # (x) = 0 can be calculated as: dBeta.pBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5) # Conversely, the density of failing scores produced at x can be calculated # by passing the additional argument "lower.tail = TRUE" to the function. # That is: dBeta.pBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5, lower.tail = TRUE) #By integration, the population proportion of (e.g.) passing scores in some #region of the true-score distribution (e.g. between 0.25 and 0.5) can be #calculated as: integrate(function(x) { dBeta.pBinom(x, 0.25, .75, 5, 3, 10, 0.5) }, lower = 0.25, upper = 0.5)
The Beta Compound Binomial distribution: The product of the four-parameter Beta probability density function and the binomial cumulative probability mass function. Used in the Livingston and Lewis approach to classification accuracy and consistency, the output can be interpreted as the population density of passing scores produced at "x" (a value of true-score).
dBeta.pGammaBinom(x, l, u, alpha, beta, n, c, lower.tail = FALSE)
dBeta.pGammaBinom(x, l, u, alpha, beta, n, c, lower.tail = FALSE)
x |
x-axis input for which |
l |
The lower-bound of the four-parameter Beta distribution. |
u |
The upper-bound of the four-parameter Beta distribution. |
alpha |
The alpha shape-parameter of the four-parameter Beta distribution. |
beta |
The beta shape-parameter of the four-parameter Beta distribution. |
n |
The number of "trials" for the Gamma-Binomial distribution. |
c |
The "true-cut" (proportion) on the Gamma-Binomial distribution. Need not be an integer (unlike Binomial distribution). |
lower.tail |
Logical. Whether to compute the lower or upper tail of the Binomial distribution. Default is |
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes.American College Testing Research Report Series.
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
Lord, Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
Loeb, D. E. (1992). A generalization of the binomial coefficients. Discrete Mathematics, 105(1-3).
# Given a four-parameter Beta distribution with parameters l = 0.25, u = 0.75, # alpha = 5, and beta = 3, and a Binomial error distribution with number of # trials (n) = 10 and a cutoff-point (c) at 50% correct (i.e., proportion correct # of 0.5), the population density of passing scores produced at true-score # (x) = 0 can be calculated as: dBeta.pGammaBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5) # Conversely, the density of failing scores produced at x can be calculated # by passing the additional argument "lower.tail = TRUE" to the function. # That is: dBeta.pGammaBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10.1, c = 0.5, lower.tail = TRUE) #By integration, the population proportion of (e.g.) passing scores in some #region of the true-score distribution (e.g. between 0.25 and 0.5) can be #calculated as: integrate(function(x) { dBeta.pGammaBinom(x, 0.25, 0.75, 5, 3, 10, 0.5) }, lower = 0.25, upper = 0.5)
# Given a four-parameter Beta distribution with parameters l = 0.25, u = 0.75, # alpha = 5, and beta = 3, and a Binomial error distribution with number of # trials (n) = 10 and a cutoff-point (c) at 50% correct (i.e., proportion correct # of 0.5), the population density of passing scores produced at true-score # (x) = 0 can be calculated as: dBeta.pGammaBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10, c = 0.5) # Conversely, the density of failing scores produced at x can be calculated # by passing the additional argument "lower.tail = TRUE" to the function. # That is: dBeta.pGammaBinom(x = 0.5, l = 0.25, u = 0.75, a = 5, b = 3, n = 10.1, c = 0.5, lower.tail = TRUE) #By integration, the population proportion of (e.g.) passing scores in some #region of the true-score distribution (e.g. between 0.25 and 0.5) can be #calculated as: integrate(function(x) { dBeta.pGammaBinom(x, 0.25, 0.75, 5, 3, 10, 0.5) }, lower = 0.25, upper = 0.5)
Gives the density at x
under the Beta-Binomial PMF.
dBetaBinom(x, N, l, u, alpha, beta)
dBetaBinom(x, N, l, u, alpha, beta)
x |
Value of |
N |
The total number of trials. |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The first shape parameter. |
beta |
The second shape parameter. |
The value for the probability mass at x
given the specified Beta-Binomial distribution.
# Assume some variable follows a Beta-Binomial distribution with 100 number # of trials, and with probabilities of successful trials drawn from a four- # parameter Beta distribution with location parameters l = 0.25 and u = 0.75 # and shape parameters alpha = 5 and beta = 3. To compute the probability # density at a specific point of the distribution (e.g., 50): dBetaBinom(x = 50, N = 100, l = 0.25, u = 0.75, alpha = 5, beta = 3)
# Assume some variable follows a Beta-Binomial distribution with 100 number # of trials, and with probabilities of successful trials drawn from a four- # parameter Beta distribution with location parameters l = 0.25 and u = 0.75 # and shape parameters alpha = 5 and beta = 3. To compute the probability # density at a specific point of the distribution (e.g., 50): dBetaBinom(x = 50, N = 100, l = 0.25, u = 0.75, alpha = 5, beta = 3)
Gives the density at x
under the Beta Compound-Binomial distribution, where the Compound-Binomial distribution is Lord's two-term approximation.
dBetacBinom(x, N, k, l, u, alpha, beta)
dBetacBinom(x, N, k, l, u, alpha, beta)
x |
Value of |
N |
Number of trials. |
k |
Lord's k (see documentation for the |
l |
The lower-bound location parameter of the four-parameter Beta distribution. |
u |
The upper-bound location parameter of the four-parameter Beta distribution. |
alpha |
The first shape-parameter of the four-parameter Beta distribution. |
beta |
The second shape-parameter of the four-parameter Beta distribution. # Assume some variable follows a Beta compound Binomial distribution with 100 # trials, Lord's k = 1, and probabilities of successful trials drawn from a # four-parameter Beta distribution with location-parameters l = .15 and u = # .85, and shape parameters alpha = 6 and beta = 4. To compute the # probability density at a specific point of the distribution (e.g., 50): dBetacBinom(x = 50, N = 100, k = 1, l = .15, u = .85, alpha = 6, beta = 4) |
Calculates the density under specific points of the Standard Beta probability density distribution with defined mean and variance or standard deviation.
dBetaMS(x, mean, variance = NULL, sd = NULL, l = 0, u = 1)
dBetaMS(x, mean, variance = NULL, sd = NULL, l = 0, u = 1)
x |
A specific point on the x-axis of the Standard Beta Probability Density Distribution. |
mean |
The mean of the target Standard Beta probability density distribution. |
variance |
The variance of the target Standard Beta probability density distribution. |
sd |
The standard deviation of the target Standard Beta probability density distribution. |
l |
The lower-bound location parameter. Default set to 0 (the standard Beta distribution). |
u |
The upper-bound location parameter. Default set to 1 (the standard Beta distribution). |
A numeric value representing the required value for the beta Shape-parameter in order to produce a Standard Beta probability density distribution with the target mean and variance.
# To compute the density at a specific point (e.g., 0.5) along the Standard # (two-parameter) Probability Density Distribution with mean of 0.6 and variance of 0.04: dBetaMS(x = 0.5, mean = 0.6, variance = 0.04) # To compute the density at a specific point (e.g., 50) along the four- # parameter Beta distribution with a mean of 60, variance of 400, and lower- # bound of 0 and upper-bound of 100: dBetaMS(x = 50, mean = 60, variance = 400, l = 0, u = 100)
# To compute the density at a specific point (e.g., 0.5) along the Standard # (two-parameter) Probability Density Distribution with mean of 0.6 and variance of 0.04: dBetaMS(x = 0.5, mean = 0.6, variance = 0.04) # To compute the density at a specific point (e.g., 50) along the four- # parameter Beta distribution with a mean of 60, variance of 400, and lower- # bound of 0 and upper-bound of 100: dBetaMS(x = 50, mean = 60, variance = 400, l = 0, u = 100)
Gives the density at x
under Lord's two-term approximation to the compound Binomial PMF.
dcBinom(x, N, k, p)
dcBinom(x, N, k, p)
x |
Value of |
N |
The total number of trials. |
k |
Lord's k (see documentation for the |
p |
Probability of success for each trial. |
# Assume some variable follows a compound Binomial distribution with 100 # trials, a 50% probability of success on each trial, and Lord's k = 1. To # compute the probability density at a specific point of the distribution # (e.g., 50): dcBinom(x = 50, N = 100, k = 1, p = .5)
# Assume some variable follows a compound Binomial distribution with 100 # trials, a 50% probability of success on each trial, and Lord's k = 1. To # compute the probability density at a specific point of the distribution # (e.g., 50): dcBinom(x = 50, N = 100, k = 1, p = .5)
Calculate the descending (or falling) factorial of a value x
of order r
.
dfac(x, r, method = "product")
dfac(x, r, method = "product")
x |
A value for which the descending factorial is to be calculated. |
r |
The power |
method |
The method by which the descending factorials are to be calculated. Default is |
The descending factorial of value x
raised to the r
'th power.
# To calculate the 4th descending factorial for a value (e.g., 3.14): dfac(x = 3.14, r = 4) # To calculate the 5th descending factorial for values 3.14, 2.72, and 0.58: dfac(x = c(3.14, 2.72, 0.58), r = 5)
# To calculate the 4th descending factorial for a value (e.g., 3.14): dfac(x = 3.14, r = 4) # To calculate the 5th descending factorial for values 3.14, 2.72, and 0.58: dfac(x = c(3.14, 2.72, 0.58), r = 5)
Probability density function under the Gamma-extended Binomial distribution.
dGammaBinom(x, size, prob, nc = FALSE)
dGammaBinom(x, size, prob, nc = FALSE)
x |
Vector of quantiles. |
size |
Number of "trials" (zero or more). Need not be integer. |
prob |
Probability of "success" on each "trial". Need not be integer. |
nc |
Whether to include a normalizing constant making sure that the sum of the distribution's density is 1. |
Loeb, D. E. (1992). A generalization of the binomial coefficients. Discrete Mathematics, 105(1-3).
#' # Assume some variable follows a Gamma-Binomial distribution with # "number of trials" = 10.5 and probability of "success" for each "trial" # = 0.75, to compute the probability density to attain a "number of success" # at a specific point (e.g., 7.5 "successes"): dGammaBinom(x = 7.5, size = 10.5, prob = 0.75) # Including a normalizing constant (then diverges from binomial dist.): dGammaBinom(x = 7.5, size = 10.5, prob = 0.75, nc = TRUE) dGammaBinom(x = 7, size = 10, prob = 0.75) == dbinom(7, 10, 0.75) dGammaBinom(x = 7, size = 10, prob = 0.75, nc = TRUE) == dbinom(7, 10, 0.75)
#' # Assume some variable follows a Gamma-Binomial distribution with # "number of trials" = 10.5 and probability of "success" for each "trial" # = 0.75, to compute the probability density to attain a "number of success" # at a specific point (e.g., 7.5 "successes"): dGammaBinom(x = 7.5, size = 10.5, prob = 0.75) # Including a normalizing constant (then diverges from binomial dist.): dGammaBinom(x = 7.5, size = 10.5, prob = 0.75, nc = TRUE) dGammaBinom(x = 7, size = 10, prob = 0.75) == dbinom(7, 10, 0.75) dGammaBinom(x = 7, size = 10, prob = 0.75, nc = TRUE) == dbinom(7, 10, 0.75)
According to Livingston and Lewis (1995), "The effective test length corresponding to a test score is the number of discrete, dichotomously scored, locally independent, equally difficult items required to produce a total score of the same reliability."
ETL(mean, variance, min = 0, max = 1, reliability)
ETL(mean, variance, min = 0, max = 1, reliability)
mean |
The mean of the observed-score distribution. |
variance |
The variance of the observed-score distribution. |
min |
The lower-bound (minimum possible value) of the observed-score distribution. Default is 0 (assuming observed scores represent proportions). |
max |
The upper-bound (maximum possible value) of the observed-score distribution. Default is 1 (assuming observed scores represent proportions). |
reliability |
The reliability of the observed scores (proportion of observed-score distribution variance shared with true-score distribution). |
An estimate of the effective length of a test, given the stability of the observations it produces.
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, .25, .75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the reliability of this test was estimated to 0.7. To estimate and # retrieve the effective test length using ETL(): ETL(mean = mean(testdata), variance = var(testdata), min = 0, max = 100, reliability = .7)
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, .25, .75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the reliability of this test was estimated to 0.7. To estimate and # retrieve the effective test length using ETL(): ETL(mean = mean(testdata), variance = var(testdata), min = 0, max = 100, reliability = .7)
Extends the Binomial coefficient for positive non-integers (including 0) by employing the Gamma rather than the factorial function.
gchoose(n, k)
gchoose(n, k)
n |
In Binomial terms, the number of Binomial "trials". Need not be an integer. |
k |
In Binomial terms, the number of successful "trials". Need not be an integer. |
Not defined for negative integers.
Loeb, D. E. (1992). A generalization of the binomial coefficients. Discrete Mathematics, 105(1-3).
# Compare choose function with gchoose function for integers: gchoose(c(8, 9, 10), c(3, 4, 5)) == choose(c(8, 9, 10), c(3, 4, 5)) # The gchoose function also works for non-integers: gchoose(10.5, 7.5)
# Compare choose function with gchoose function for integers: gchoose(c(8, 9, 10), c(3, 4, 5)) == choose(c(8, 9, 10), c(3, 4, 5)) # The gchoose function also works for non-integers: gchoose(10.5, 7.5)
Estimator for the Beta true-score distribution shape-parameters from the observed-score distribution and Lord's k. Returns a list with entries representing the lower- and upper shape parameters (l and u), and the shape parameters (alpha and beta) of the four-parameters beta distribution, as well as Lord's k and the test length.
HB.beta.tp.fit(x, N, k, true.model = "4P", failsafe = FALSE, l = 0, u = 1)
HB.beta.tp.fit(x, N, k, true.model = "4P", failsafe = FALSE, l = 0, u = 1)
x |
Vector of observed-scores. |
N |
The test length. |
k |
Lord's k (see documentation for the |
true.model |
The type of Beta distribution which is to be fit to the moments of the true-score distribution. Options are |
failsafe |
Logical. Whether to revert to a fail-safe two-parameter solution should the four-parameter solution contain invalid parameter estimates. |
l |
If |
u |
If |
A list with the parameter values of a four-parameter Beta distribution. "l" is the lower location-parameter, "u" the upper location-parameter, "alpha" the first shape-parameter, and "beta" the second shape-parameter. Also includes Lord's k and the test length.
Hanson, B. A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing Research Report Series. Retrieved from https://files.eric.ed.gov/fulltext/ED344945.pdf
Lord, F. M. (1965). A strong true-score theory, with applications. Psychometrika. 30(3). pp. 239–270. doi: 10.1007/BF02289490
# Generate some fictional data. Say 1000 individuals take a 100-item test # where all items are equally difficult, and the true-score distribution # is a four-parameter Beta distribution with location parameters l = 0.25, # u = 0.75, alpha = 5, and beta = 3, and the error distribution is Binomial # with Lord's k = 0: set.seed(12) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) # To estimate the four-parameter Beta distribution parameters from this # sample of observations: HB.beta.tp.fit(testdata, 100, 0)
# Generate some fictional data. Say 1000 individuals take a 100-item test # where all items are equally difficult, and the true-score distribution # is a four-parameter Beta distribution with location parameters l = 0.25, # u = 0.75, alpha = 5, and beta = 3, and the error distribution is Binomial # with Lord's k = 0: set.seed(12) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) # To estimate the four-parameter Beta distribution parameters from this # sample of observations: HB.beta.tp.fit(testdata, 100, 0)
An implementation of what has been come to be known as the "Hanson and Brennan approach" to classification consistency and accuracy, which by employing a compound beta-binomial distribution assumes that true-scores conform to the four-parameter beta distribution, and errors of measurement to a two-term approximation of the compound binomial distribution. Under these assumptions, the expected classification consistency and accuracy of tests can be estimated from observed outcomes and test reliability.
HB.CA( x = NULL, reliability, cut, testlength, true.model = "4P", truecut = NULL, output = c("accuracy", "consistency"), failsafe = TRUE, l = 0, u = 1, modelfit = 10 )
HB.CA( x = NULL, reliability, cut, testlength, true.model = "4P", truecut = NULL, output = c("accuracy", "consistency"), failsafe = TRUE, l = 0, u = 1, modelfit = 10 )
x |
A vector of observed scores, or a list specifying parameter values. If a list is provided, the list entries must be named after the parameters: |
reliability |
The observed-score squared correlation (i.e., proportion of shared variance) with the true-score. |
cut |
The cutoff value for classifying observations into above/below categories. |
testlength |
The total number of test items (or maximum possible score). Must be an integer. |
true.model |
The probability distribution to be fitted to the moments of the true-score distribution. Options are |
truecut |
Optional specification of a "true" cutoff. Useful for producing ROC curves (see documentation for the |
output |
Character vector indicating which types of statistics (i.e, accuracy and/or consistency) are to be computed and included in the output. Permissible values are |
failsafe |
Logical value indicating whether to engage the automatic fail-safe defaulting to the two-parameter Beta true-score distribution if the four-parameter fitting procedure produces impermissible parameter estimates. Default is |
l |
If |
u |
If |
modelfit |
Allows for controlling the chi-square test for model fit by setting the minimum bin-size for expected observations. Can alternatively be set to |
A list containing the estimated parameters necessary for the approach (i.e., the effective test-length and the beta distribution parameters), a chi-square test of model-fit, the confusion matrix containing estimated proportions of true/false pass/fail categorizations for a test, diagnostic performance statistics, and / or a classification consistency matrix and indices. Accuracy output includes a confusion matrix and diagnostic performance indices, and consistency output includes a consistency matrix and consistency indices p
(expected proportion of agreement between two independent test administrations), p_c
(proportion of agreement on two independent administrations expected by chance alone), and Kappa
(Cohen's Kappa).
This implementation of the Hanson-Brennan approach is much slower than the implementation of the Livingston and Lewis approach, as there is no native implementation of Lord's two-term approximation to the Compound-Binomial distribution in R. This implementation uses a "brute-force" method of computing the cumulative probabilities from the compound-Binomial distribution, which will by necessity be more resource intensive.
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing.
Lord. Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
Lewis, Don and Burke, C. J. (1949). The Use and Misuse of the Chi-Square Test. Psychological Bulletin, 46(6).
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 50. # Generate some fictional data. Say, 1000 individuals take a 20-item test. set.seed(1234) p.success <- rBeta.4P(1000, 0.15, 0.85, 6, 4) for (i in 1:20) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 20) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for attaining a pass is 10 items correct, and # that the reliability of this test was estimated using the Cronbach's Alpha # estimator. To estimate and retrieve the estimated parameters, confusion and # consistency matrices, and accuracy and consistency indices using HB.CA(): HB.CA(x = rowSums(rawdata), reliability = cba(rawdata), cut = 10, testlength = 20)
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 50. # Generate some fictional data. Say, 1000 individuals take a 20-item test. set.seed(1234) p.success <- rBeta.4P(1000, 0.15, 0.85, 6, 4) for (i in 1:20) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 20) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for attaining a pass is 10 items correct, and # that the reliability of this test was estimated using the Cronbach's Alpha # estimator. To estimate and retrieve the estimated parameters, confusion and # consistency matrices, and accuracy and consistency indices using HB.CA(): HB.CA(x = rowSums(rawdata), reliability = cba(rawdata), cut = 10, testlength = 20)
An implementation of what has been come to be known as the "Hanson and Brennan approach" to classification consistency and accuracy, which by employing a compound beta-binomial distribution assumes that true-scores conform to the four-parameter beta distribution, and errors of measurement to the binomial distribution. Under these assumptions, the expected classification consistency and accuracy of tests can be estimated from observed outcomes and test reliability.
HB.CA.MC( x = NULL, reliability, cut, testlength, true.model = "4P", failsafe = TRUE, l = 0, u = 1, modelfit = 10 )
HB.CA.MC( x = NULL, reliability, cut, testlength, true.model = "4P", failsafe = TRUE, l = 0, u = 1, modelfit = 10 )
x |
A vector of observed scores, or a list specifying parameter values. If a list is provided, the list entries must be named after the parameters: |
reliability |
The observed-score squared correlation (i.e., proportion of shared variance) with the true-score. |
cut |
A vector of cut-off values for classifying observations into two or more categories. |
testlength |
The total number of test items (or maximum possible score). Must be an integer. |
true.model |
The probability distribution to be fitted to the moments of the true-score distribution. Options are |
failsafe |
Logical value indicating whether to engage the automatic fail-safe defaulting to the two-parameter Beta true-score distribution if the four-parameter fitting procedure produces impermissible parameter estimates. Default is |
l |
If |
u |
If |
modelfit |
Allows for controlling the chi-square test for model fit by setting the minimum bin-size for expected observations. Can alternatively be set to |
A list containing the estimated parameters necessary for the approach (i.e., Lord's k, test-length, and the true-score Beta distribution parameters), a chi-square test of model-fit, the confusion matrix containing estimated proportions of true/false positive/negative categorizations for a test, diagnostic performance statistics, and/or a classification consistency matrix and indices. Accuracy output includes a confusion matrix and diagnostic performance indices, and consistency output includes a consistency matrix and consistency indices p
(expected proportion of agreement between two independent test administrations), p_c
(proportion of agreement on two independent administrations expected by chance alone), and Kappa
(Cohen's Kappa).
This implementation of the Hanson-Brennan approach is much slower than the implementation of the Livingston and Lewis approach, as there is no native implementation of Lord's two-term approximation to the Compound-Binomial distribution in R. This implementation uses a "brute-force" method of computing the cumulative probabilities from the compound-Binomial distribution, which will by necessity be more resource intensive.
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing.
Lord. Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
Lewis, Don and Burke, C. J. (1949). The Use and Misuse of the Chi-Square Test. Psychological Bulletin, 46(6).
# Generate some fictional data. Say, 1000 individuals take a 20-item test. set.seed(1234) p.success <- rBeta.4P(1000, 0.15, 0.85, 6, 4) for (i in 1:20) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 20) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for attaining a pass is 10 items correct, and # that the reliability of this test was estimated using the Cronbach's Alpha # estimator. To estimate and retrieve the estimated parameters, confusion and # consistency matrices, and accuracy and consistency indices using HB.CA(): (output <- HB.CA.MC(x = rowSums(rawdata), reliability = cba(rawdata), cut = c(8, 12), testlength = 20)) # The output for this function can get quite verbose as more categories are # included. The output from the function can be fed to the MC.out.tabular() # function in order to organize the output in a tabular format. MC.out.tabular(output)
# Generate some fictional data. Say, 1000 individuals take a 20-item test. set.seed(1234) p.success <- rBeta.4P(1000, 0.15, 0.85, 6, 4) for (i in 1:20) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 20) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for attaining a pass is 10 items correct, and # that the reliability of this test was estimated using the Cronbach's Alpha # estimator. To estimate and retrieve the estimated parameters, confusion and # consistency matrices, and accuracy and consistency indices using HB.CA(): (output <- HB.CA.MC(x = rowSums(rawdata), reliability = cba(rawdata), cut = c(8, 12), testlength = 20)) # The output for this function can get quite verbose as more categories are # included. The output from the function can be fed to the MC.out.tabular() # function in order to organize the output in a tabular format. MC.out.tabular(output)
Generate a ROC curve plotting the false-positive rate against the true-positive rate at different cut-off values across the observed-score scale.
HB.ROC( x = NULL, reliability, testlength, truecut, true.model = "4P", failsafe = TRUE, l = 0, u = 1, AUC = FALSE, maxJ = FALSE, maxAcc = FALSE, locate = NULL, raw.out = FALSE, grainsize = testlength )
HB.ROC( x = NULL, reliability, testlength, truecut, true.model = "4P", failsafe = TRUE, l = 0, u = 1, AUC = FALSE, maxJ = FALSE, maxAcc = FALSE, locate = NULL, raw.out = FALSE, grainsize = testlength )
x |
A vector of observed results (sum scores) or a list of parameter values (see documentation for the |
reliability |
The reliability coefficient of the test. |
testlength |
The total number of test items (or maximum possible score). Must be an integer. |
truecut |
The point along the x-scale that marks true category membership. |
true.model |
The probability distribution to be fitted to the moments of the true-score distribution. Options are |
failsafe |
If true-model == "4P": Whether to engage a fail-safe reverting to a two-parameter true-score distribution solution should the four-parameter fitting procedure produce impermissible results. Default is TRUE (engage fail-safe in the event of impermissible estimates). |
l |
If |
u |
If |
AUC |
Logical. Calculate and include the area under the curve? Default is |
maxJ |
Logical. Mark the point along the curve where Youden's J statistic is maximized? Default is |
maxAcc |
Logical. Mark the point along the curve where the Accuracy statistic is maximized? Default is |
locate |
Ask the function to locate the cut-point at which sensitivity or NPV is greater than or equal to some value, or specificity or PPV is lesser than or equal to some value. Take as input a character-vector of length 2, with the first argument being which index is to be found (e.g., "sensitivity"), and the second argument the value to locate (e.g., "0.75"). For example: c("sensitivity", "0.75"). |
raw.out |
Give raw coordinates as output rather than plot? Default is |
grainsize |
Specify the number of cutoff-points for which the ROC curve is to be calculated. The greater this number the greater the accuracy. Default is set to the stated test length (N). |
A plot tracing the ROC curve for the test, or matrix of coordinates if raw.out is TRUE
.
This implementation of the Hanson-Brennan approach is much slower than the implementation of the Livingston and Lewis approach, as there is no native implementation of Lord's two-term approximation to the Compound-Binomial distribution in R. This implementation uses a "brute-force" method of computing the cumulative probabilities from the compound-Binomial distribution, which will by necessity be more resource intensive.
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 50. # Generate some fictional data. Say, 1000 individuals take a 20-item test. set.seed(1234) p.success <- rBeta.4P(1000, 0.15, 0.85, 6, 4) for (i in 1:20) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 20) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for attaining a pass is 10 items correct, and # that the reliability of this test was estimated using the Cronbach's Alpha # estimator. To draw the ROC-graph and locate the points at which Youden's J # and Accuracy are maximized: HB.ROC(rowSums(rawdata), cba(rawdata), 20, 10, maxAcc = TRUE, maxJ = TRUE) # For further examples regarding how to use the locate argument to locate # points at which various criteria are satisfied, see documentation for the # LL.ROC() function.
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 50. # Generate some fictional data. Say, 1000 individuals take a 20-item test. set.seed(1234) p.success <- rBeta.4P(1000, 0.15, 0.85, 6, 4) for (i in 1:20) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 20) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for attaining a pass is 10 items correct, and # that the reliability of this test was estimated using the Cronbach's Alpha # estimator. To draw the ROC-graph and locate the points at which Youden's J # and Accuracy are maximized: HB.ROC(rowSums(rawdata), cba(rawdata), 20, 10, maxAcc = TRUE, maxJ = TRUE) # For further examples regarding how to use the locate argument to locate # points at which various criteria are satisfied, see documentation for the # LL.ROC() function.
An implementation of Lords (1965, p. 265) equation 37 for estimating the raw moments of the true-score distribution.
HB.tsm(x, r, N, k)
HB.tsm(x, r, N, k)
x |
Vector of values representing sum-scores. |
r |
The number of raw moments to be calculated. |
N |
The number of test items (i.e., test length). |
k |
Lord's k (see documentation for the |
# Generate some data under the Beta Compound-Binomial distribution, where the # Compound Binomial distribution has 100 trials and Lord's k = 2, and the # Beta distribution has location parameters l = .15 and u = .85, and shape # parameters alpha = 6 and beta = 4: obs <- rBetacBinom(1000, 100, 2, .15, .85, 6, 4) # To estimate the first four raw moments of the underlying Beta distribution: HB.tsm(x = obs, r = 4, N = 100, k = 2)
# Generate some data under the Beta Compound-Binomial distribution, where the # Compound Binomial distribution has 100 trials and Lord's k = 2, and the # Beta distribution has location parameters l = .15 and u = .85, and shape # parameters alpha = 6 and beta = 4: obs <- rBetacBinom(1000, 100, 2, .15, .85, 6, 4) # To estimate the first four raw moments of the underlying Beta distribution: HB.tsm(x = obs, r = 4, N = 100, k = 2)
Calculates the lower-bound value required to produce a Beta probability density distribution with defined moments and parameters. Be advised that not all combinations of moments and parameters can be satisfied (e.g., specifying mean, variance, skewness and kurtosis uniquely determines both location-parameters, meaning that the value of the lower-location parameter will take on which ever value it must, and cannot be specified).
LABMSU( alpha = NULL, beta = NULL, u = NULL, mean = NULL, variance = NULL, sd = NULL )
LABMSU( alpha = NULL, beta = NULL, u = NULL, mean = NULL, variance = NULL, sd = NULL )
alpha |
The alpha (first) shape-parameter of the target Beta probability density distribution. |
beta |
The beta (second) shape-parameter of the target Beta probability density distribution. |
u |
The upper-bound of the Beta distribution. Default is NULL (i.e., does not take a specified u-parameter into account). |
mean |
The mean (first raw moment) of the target Standard Beta probability density distribution. |
variance |
The variance (second central moment) of the target Standard Beta probability density distribution. |
sd |
Optional alternative to specifying |
A numeric value representing the required value for the Beta lower location-parameter (l
) in order to produce a Beta probability density distribution with the target moments and parameters.
# Generate some fictional data. set.seed(1234) testdata <- rBeta.4P(100000, 0.25, 0.75, 5, 3) hist(testdata, xlim = c(0, 1), freq = FALSE) # Suppose you know three of the four necessary parameters to fit a four- # parameter Beta distribution (i. e., u = 0.75, alpha = 5, beta = 3) to this # data. To find the value for the necessary l parameter, estimate the mean # and variance of the distribution: M <- mean(testdata) S2 <- var(testdata) # To find the l parameter necessary to produce a four-parameter Beta # distribution with the target mean, variance, and u, alpha, and beta # parameters using the LMSBAU() function: (l <- LABMSU(alpha = 5, beta = 3, mean = M, variance = S2, u = 0.75)) curve(dBeta.4P(x, l, .75, 5, 3), add = TRUE, lwd = 2)
# Generate some fictional data. set.seed(1234) testdata <- rBeta.4P(100000, 0.25, 0.75, 5, 3) hist(testdata, xlim = c(0, 1), freq = FALSE) # Suppose you know three of the four necessary parameters to fit a four- # parameter Beta distribution (i. e., u = 0.75, alpha = 5, beta = 3) to this # data. To find the value for the necessary l parameter, estimate the mean # and variance of the distribution: M <- mean(testdata) S2 <- var(testdata) # To find the l parameter necessary to produce a four-parameter Beta # distribution with the target mean, variance, and u, alpha, and beta # parameters using the LMSBAU() function: (l <- LABMSU(alpha = 5, beta = 3, mean = M, variance = S2, u = 0.75)) curve(dBeta.4P(x, l, .75, 5, 3), add = TRUE, lwd = 2)
An implementation of what has been come to be known as the "Livingston and Lewis approach" to classification consistency and accuracy, which by employing a compound beta-binomial distribution assumes that true-scores conform to the four-parameter beta distribution, and errors of measurement to the binomial distribution. Under these assumptions, the expected classification consistency and accuracy of tests can be estimated from observed outcomes and test reliability.
LL.CA( x = NULL, reliability, cut, min = 0, max = 1, true.model = "4P", truecut = NULL, output = c("accuracy", "consistency"), failsafe = TRUE, l = 0, u = 1, modelfit = c(nbins = 100, minbin = 10) )
LL.CA( x = NULL, reliability, cut, min = 0, max = 1, true.model = "4P", truecut = NULL, output = c("accuracy", "consistency"), failsafe = TRUE, l = 0, u = 1, modelfit = c(nbins = 100, minbin = 10) )
x |
A vector of observed scores, or a list specifying parameter values. If a list is provided, the list entries must be named after the parameters: |
reliability |
The observed-score squared correlation (i.e., proportion of shared variance) with the true-score. |
cut |
The cutoff value for classifying observations into pass or fail categories. |
min |
The minimum value possible to attain on the test. Default is 0. |
max |
The maximum value possible to attain on the test. Default is 1 (assumes that the values contained in |
true.model |
The probability distribution to be fitted to the moments of the true-score distribution. Options are |
truecut |
Optional specification of a "true" cutoff. Useful for producing ROC curves (see documentation for the |
output |
Character vector indicating which types of statistics (i.e, accuracy and/or consistency) are to be computed and included in the output. Permissible values are |
failsafe |
Logical value indicating whether to engage the automatic fail-safe defaulting to the two-parameter Beta true-score distribution if the four-parameter fitting procedure produces impermissible parameter estimates. Default is |
l |
If |
u |
If |
modelfit |
Allows for controlling the chi-square test for model fit. The argument takes either a vector of two values, or |
A list containing the estimated parameters necessary for the approach (i.e., the effective test-length and the beta distribution parameters), a chi-square test of model-fit, the confusion matrix containing estimated proportions of true/false pass/fail categorizations for a test, diagnostic performance statistics, and / or a classification consistency matrix and indices. Accuracy output includes a confusion matrix and diagnostic performance indices, and consistency output includes a consistency matrix and consistency indices p
(expected proportion of agreement between two independent test administrations), p_c
(proportion of agreement on two independent administrations expected by chance alone), and Kappa
(Cohen's Kappa).
It should be noted that this implementation differs from the original articulation of Livingston and Lewis (1995) in some respects. First, the procedure includes a number of diagnostic performance (accuracy) indices which the original procedure enables but that were not included. Second, the way consistency is calculated differs substantially from the original articulation of the procedure, which made use of a split-half approach. Rather, this implementation uses the approach to estimating classification consistency outlined by Hanson (1991).
A shiny application providing a GUI for this method is available at https://hthaa.shinyapps.io/shinybeta/ .
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing.
Lord. Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
Lewis, Don and Burke, C. J. (1949). The Use and Misuse of the Chi-Square Test. Psychological Bulletin, 46(6).
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. To estimate and # retrieve the estimated parameters, confusion matrix, consistency and # accuracy statistics using LL.CA(): LL.CA(x = testdata, reliability = .7, cut = 50, min = 0, max = 100) # Suppose the true-score parameter estimation procedure arrived at # impermissible parameter estimates (i.e., l < 0, u > 1, alpha < 0, or # beta < 0). For example: set.seed(9) testdata <- rbinom(100, 25, rBeta.4P(100, 0.25, 1, 5, 3)) Beta.tp.fit(testdata, 0, 25, 25, failsafe = TRUE) # Suppose further that you have good grounds for assuming that the lower- # bound parameter is equal to 0.25 (e.g., the test consists of multiple- # choice questions with four response options, leading to a 25% probability # of guessing the correct answer per question), and good reason to believe # that the upper-bound parameter is equal to 1 (i.e., there is no reason to # believe that there are no members of the population who will attain a # perfect score across all possible test-forms.) To set these lower and # upper bounds for the fitting procedure in the LL.CA() function, set # the argument true.model = "2p", and specify the location parameters # l = 0.25 and u = 1: LL.CA(testdata, 0.6287713, 12, 0, 25, true.model = "2p", l = 0.25, u = 1) # Alternatively to supplying scores to which a true-score distribution is # to be fit, a list with true-score distribution parameter values can be # supplied manually along with the effective test length (see documentation # for the ETL() function), foregoing the need for actual data. The list # entries must be named. "l" is the lower-bound and "u" the upper-bound # location parameters of the true-score distribution, "alpha" and "beta" for # the shape parameters, and "etl" for the effective test-length.. trueparams <- list("l" = 0.25, "u" = 0.75, "alpha" = 5, "beta" = 3, "etl" = 50) LL.CA(x = trueparams, cut = 50, min = 0, max = 100)
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # Suppose the cutoff value for attaining a pass is 50 items correct, and # that the reliability of this test was estimated to 0.7. To estimate and # retrieve the estimated parameters, confusion matrix, consistency and # accuracy statistics using LL.CA(): LL.CA(x = testdata, reliability = .7, cut = 50, min = 0, max = 100) # Suppose the true-score parameter estimation procedure arrived at # impermissible parameter estimates (i.e., l < 0, u > 1, alpha < 0, or # beta < 0). For example: set.seed(9) testdata <- rbinom(100, 25, rBeta.4P(100, 0.25, 1, 5, 3)) Beta.tp.fit(testdata, 0, 25, 25, failsafe = TRUE) # Suppose further that you have good grounds for assuming that the lower- # bound parameter is equal to 0.25 (e.g., the test consists of multiple- # choice questions with four response options, leading to a 25% probability # of guessing the correct answer per question), and good reason to believe # that the upper-bound parameter is equal to 1 (i.e., there is no reason to # believe that there are no members of the population who will attain a # perfect score across all possible test-forms.) To set these lower and # upper bounds for the fitting procedure in the LL.CA() function, set # the argument true.model = "2p", and specify the location parameters # l = 0.25 and u = 1: LL.CA(testdata, 0.6287713, 12, 0, 25, true.model = "2p", l = 0.25, u = 1) # Alternatively to supplying scores to which a true-score distribution is # to be fit, a list with true-score distribution parameter values can be # supplied manually along with the effective test length (see documentation # for the ETL() function), foregoing the need for actual data. The list # entries must be named. "l" is the lower-bound and "u" the upper-bound # location parameters of the true-score distribution, "alpha" and "beta" for # the shape parameters, and "etl" for the effective test-length.. trueparams <- list("l" = 0.25, "u" = 0.75, "alpha" = 5, "beta" = 3, "etl" = 50) LL.CA(x = trueparams, cut = 50, min = 0, max = 100)
An implementation of what has been come to be known as the "Livingston and Lewis approach" to classification consistency and accuracy, which by employing a compound beta-binomial distribution assumes that true-scores conform to the four-parameter beta distribution, and errors of measurement to the binomial distribution. Under these assumptions, the expected classification consistency and accuracy of tests can be estimated from observed outcomes and test reliability.
LL.CA.MC( x = NULL, reliability, cut, min = 0, max = 1, true.model = "4P", failsafe = TRUE, l = 0, u = 1, modelfit = c(nbins = 100, minbin = 10) )
LL.CA.MC( x = NULL, reliability, cut, min = 0, max = 1, true.model = "4P", failsafe = TRUE, l = 0, u = 1, modelfit = c(nbins = 100, minbin = 10) )
x |
A vector of observed scores, or a list specifying parameter values. If a list is provided, the list entries must be named after the parameters: |
reliability |
The observed-score squared correlation (i.e., proportion of shared variance) with the true-score. |
cut |
A vector of cut-off values for classifying observations into two or more categories. |
min |
The minimum value possible to attain on the test. Default is 0. |
max |
The maximum value possible to attain on the test. Default is 1 (assumes that the values contained in |
true.model |
The probability distribution to be fitted to the moments of the true-score distribution. Options are |
failsafe |
Logical value indicating whether to engage the automatic fail-safe defaulting to the two-parameter Beta true-score distribution if the four-parameter fitting procedure produces impermissible parameter estimates. Default is |
l |
If |
u |
If |
modelfit |
Allows for controlling the chi-square test for model fit. The argument takes either a vector of two values, or |
A list containing the estimated parameters necessary for the approach (i.e., the effective test-length and the beta distribution parameters), a chi-square test of model-fit, the confusion matrix containing estimated proportions of true/false positive/negative categorizations for a test, diagnostic performance statistics, and/or a classification consistency matrix and indices. Accuracy output includes a confusion matrix and diagnostic performance indices, and consistency output includes a consistency matrix and consistency indices p
(expected proportion of agreement between two independent test administrations), p_c
(proportion of agreement on two independent administrations expected by chance alone), and Kappa
(Cohen's Kappa).
It should be noted that this implementation differs from the original articulation of Livingston and Lewis (1995) in some respects. First, the procedure includes a number of diagnostic performance (accuracy) indices which the original procedure enables but that were not included. Second, the way consistency is calculated differs substantially from the original articulation of the procedure, which made use of a split-half approach. Rather, this implementation uses the approach to estimating classification consistency outlined by Hanson (1991).
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
Hanson, Bradley A. (1991). Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes. American College Testing.
Lord. Frederic M. (1965). A Strong True-Score Theory, With Applications. Psychometrika, 30(3).
Lewis, Don and Burke, C. J. (1949). The Use and Misuse of the Chi-Square Test. Psychological Bulletin, 46(6).
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) p.success <- rBeta.4P(1000, 0.1, 0.95, 5, 3) for (i in 1:100) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 100) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for being placed in the lower category is a score # below 50, second lowest 60, then 70, 80, and 90. Using the cba() function # to estimate the reliability of this test, to use the LL.CA.MC() function # or estimating diagnostic performance and consistency indices of # classifications when using several cut-points: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100) # The output from this function can get quite verbose when operating with # several cut-points. In order to retrieve only model parameter estimates: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$parameters # To retrieve only the model-fit estimate: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$modelfit # To retrieve only the diagnostic performance estimates: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$accuracy # To retrieve only the classification consistency indices: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$consistency # Alternatively, the MC.out.tabular() function can be used to organize the # category-specific indices in a tabular format: MC.out.tabular(LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100))
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) p.success <- rBeta.4P(1000, 0.1, 0.95, 5, 3) for (i in 1:100) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 100) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Suppose the cutoff value for being placed in the lower category is a score # below 50, second lowest 60, then 70, 80, and 90. Using the cba() function # to estimate the reliability of this test, to use the LL.CA.MC() function # or estimating diagnostic performance and consistency indices of # classifications when using several cut-points: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100) # The output from this function can get quite verbose when operating with # several cut-points. In order to retrieve only model parameter estimates: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$parameters # To retrieve only the model-fit estimate: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$modelfit # To retrieve only the diagnostic performance estimates: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$accuracy # To retrieve only the classification consistency indices: LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100)$consistency # Alternatively, the MC.out.tabular() function can be used to organize the # category-specific indices in a tabular format: MC.out.tabular(LL.CA.MC(rowSums(rawdata), cba(rawdata), c(50, 60, 70, 80, 90), min = 0, max = 100))
Generate a ROC curve plotting the false-positive rate against the true-positive rate at different cut-off values across the observed-score scale.
LL.ROC( x = NULL, reliability, min = 0, max = 1, truecut, true.model = "4P", failsafe = TRUE, l = 0, u = 1, AUC = FALSE, maxJ = FALSE, maxAcc = FALSE, locate = NULL, raw.out = FALSE, grainsize = 100 )
LL.ROC( x = NULL, reliability, min = 0, max = 1, truecut, true.model = "4P", failsafe = TRUE, l = 0, u = 1, AUC = FALSE, maxJ = FALSE, maxAcc = FALSE, locate = NULL, raw.out = FALSE, grainsize = 100 )
x |
A vector of observed results. |
reliability |
The reliability coefficient of the test. |
min |
The minimum possible value to attain on the observed-score scale. |
max |
The maximum value possible to attain on the test. Default is 1 (assumes that the values contained in |
truecut |
The true point along the x-scale that marks the categorization-threshold. |
true.model |
The probability distribution to be fitted to the moments of the true-score distribution. Options are |
failsafe |
If true-model == "4P": Whether to engage a fail-safe reverting to a two-parameter true-score distribution solution should the four-parameter fitting procedure produce impermissible results. Default is TRUE (engage fail-safe in the event of impermissible estimates). |
l |
If |
u |
If |
AUC |
Logical. Calculate and include the area under the curve? Default is |
maxJ |
Logical. Mark the point along the curve where Youden's J statistic is maximized? Default is |
maxAcc |
Logical. Mark the point along the curve where the Accuracy statistic is maximized? Default is |
locate |
Ask the function to locate the cut-point at which sensitivity or NPV is greater than or equal to some value, or specificity or PPV is lesser than or equal to some value. Take as input a character-vector of length 2, with the first argument being which index is to be found (e.g., "sensitivity"), and the second argument the value to locate (e.g., "0.75"). For example: c("sensitivity", "0.75"). |
raw.out |
Give raw coordinates as output rather than plot? Default is |
grainsize |
Specify the number of cutoff-points for which the ROC curve is to be calculated. The greater this number the greater the accuracy. Default is 100 points. |
A plot tracing the ROC curve for the test, or matrix of coordinates if raw.out is TRUE
.
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) hist(testdata / 100, xlim = c(0, 1), freq = FALSE) # Suppose the cutoff value for attaining a pass is 50 items correct. # Suppose further that the reliability of the test-scores were estimated to # 0.75. To produce a plot with an ROC curve using LL.ROC(), along with the # AUC statistics and the points at which Youden's J. is maximized: LL.ROC(x = testdata, reliability = 0.7, truecut = 50, min = 0, max = 100, AUC = TRUE, maxJ = TRUE) # Or to locate the point at which accuracy is maximized: LL.ROC(x = testdata, reliability = 0.7, truecut = 50, min = 0, max = 100, maxAcc = TRUE) # Using the example data above, the function can be instructed to locate an # operational cut-point at which sensitivity or specificity is equal to or # greater than some specified value by specifying the "locate" argument with # c("statistic", value). For example, to locate the operational cut-point at # which sensitivity is first equal to or greater than 0.9: LL.ROC(testdata, reliability = 0.7, min = 0, max = 100, truecut = 50, locate = c("sensitivity", 0.9)) # For Negative Predictive value, the point at which it is equal or greater: LL.ROC(testdata, reliability = 0.7, min = 0, max = 100, truecut = 50, locate = c("NPV", 0.9)) # And so on for other statistics such as Specificity and Positive Predictive # Value.
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) hist(testdata / 100, xlim = c(0, 1), freq = FALSE) # Suppose the cutoff value for attaining a pass is 50 items correct. # Suppose further that the reliability of the test-scores were estimated to # 0.75. To produce a plot with an ROC curve using LL.ROC(), along with the # AUC statistics and the points at which Youden's J. is maximized: LL.ROC(x = testdata, reliability = 0.7, truecut = 50, min = 0, max = 100, AUC = TRUE, maxJ = TRUE) # Or to locate the point at which accuracy is maximized: LL.ROC(x = testdata, reliability = 0.7, truecut = 50, min = 0, max = 100, maxAcc = TRUE) # Using the example data above, the function can be instructed to locate an # operational cut-point at which sensitivity or specificity is equal to or # greater than some specified value by specifying the "locate" argument with # c("statistic", value). For example, to locate the operational cut-point at # which sensitivity is first equal to or greater than 0.9: LL.ROC(testdata, reliability = 0.7, min = 0, max = 100, truecut = 50, locate = c("sensitivity", 0.9)) # For Negative Predictive value, the point at which it is equal or greater: LL.ROC(testdata, reliability = 0.7, min = 0, max = 100, truecut = 50, locate = c("NPV", 0.9)) # And so on for other statistics such as Specificity and Positive Predictive # Value.
Calculates Lord's k.
Lords.k(x, N, reliability)
Lords.k(x, N, reliability)
x |
A vector of observed-scores. |
N |
The test length. |
reliability |
The test-score reliability coefficient. |
A value representing Lord's k
# Generate some fictional data. Say 100 students take a 50-item long test # where all items are equally difficult (i.e., where the true Lord's k = 0). set.seed(1234) p.success <- rBeta.4P(100, 0.25, 0.75, 5, 3) for (i in 1:50) { if (i == 1) { rawdata <- matrix(nrow = 100, ncol = 50) } rawdata[, i] <- rbinom(100, 1, p.success) } # Estimate the reliability of these scores with Cronbach's Alpha: reliability <- cba(rawdata) # Estimate Lord's k using Lords.k(): Lords.k(rowSums(rawdata), 50, reliability)
# Generate some fictional data. Say 100 students take a 50-item long test # where all items are equally difficult (i.e., where the true Lord's k = 0). set.seed(1234) p.success <- rBeta.4P(100, 0.25, 0.75, 5, 3) for (i in 1:50) { if (i == 1) { rawdata <- matrix(nrow = 100, ncol = 50) } rawdata[, i] <- rbinom(100, 1, p.success) } # Estimate the reliability of these scores with Cronbach's Alpha: reliability <- cba(rawdata) # Estimate Lord's k using Lords.k(): Lords.k(rowSums(rawdata), 50, reliability)
LL.CA.MC()
function.Function that takes the output from the LL.CA.MC()
function and organizes it in a table with accuracy and consistency indices represented by columns and categories as rows.
MC.out.tabular(x)
MC.out.tabular(x)
x |
The list-output from the |
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) p.success <- rBeta.4P(1000, 0.1, 0.95, 5, 3) for (i in 1:100) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 100) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Estimate accuracy and consistency where the lowest category are scores # below 50, second lowest 60, then 70, 80, and 90. Using the cba() function # to estimate the reliability of this test, to use the LL.CA.MC() function # or estimating diagnostic performance and consistency indices of # classifications when using several cut-points: output <- LL.CA.MC(rowSums(rawdata), cba(rawdata), seq(50, 90, 10), 0, 100) # As this output can get quite verbose as the number of categories increase, # the MC.out.tabular() function can be used to organize the output more # concisely in a tabular format. MC.out.tabular(output)
# Generate some fictional data. Say, 1000 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) p.success <- rBeta.4P(1000, 0.1, 0.95, 5, 3) for (i in 1:100) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 100) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Estimate accuracy and consistency where the lowest category are scores # below 50, second lowest 60, then 70, 80, and 90. Using the cba() function # to estimate the reliability of this test, to use the LL.CA.MC() function # or estimating diagnostic performance and consistency indices of # classifications when using several cut-points: output <- LL.CA.MC(rowSums(rawdata), cba(rawdata), seq(50, 90, 10), 0, 100) # As this output can get quite verbose as the number of categories increase, # the MC.out.tabular() function can be used to organize the output more # concisely in a tabular format. MC.out.tabular(output)
Tool for visually gauging the discrepancy between the observed and model-implied frequencies of observed-scores.
mdlfit.gfx( x, x.tickat = NULL, y.tickat = NULL, y.lim = NULL, main.lab = "Observed vs. Expected Frequencies", x.lab = "Bins", y.lab = "Frequency", x.grid = NULL, y.grid = NULL )
mdlfit.gfx( x, x.tickat = NULL, y.tickat = NULL, y.lim = NULL, main.lab = "Observed vs. Expected Frequencies", x.lab = "Bins", y.lab = "Frequency", x.grid = NULL, y.grid = NULL )
x |
The output object from the |
x.tickat |
The points along the x-axis that bins are to be labeled. Default is |
y.tickat |
The points along the y-axis where frequencies are to be labelled. Default is |
y.lim |
The limits of the y-axis (frequencies). Useful for keeping the scale equal across several plots. |
main.lab |
The main label (title) of the plot. |
x.lab |
The label for the x-axis (the bins). |
y.lab |
The label for the y-axis (the frequencies). |
x.grid |
Control the vertical grid-lines of the plot. Takes |
y.grid |
Control the horizontal grid-lines of the plot. Takes |
# Generate some data. 1000 respondents taking 100 item test: set.seed(060121) p.success <- rBeta.4P(1000, 0.25, 0.75, 5, 3) for (i in 1:100) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 100) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Analyse the accuracy and consistency of the test and store the object: out <- LL.CA(x = rowSums(rawdata), reliability = cba(rawdata), cut = 50, min = 0, max = 100, modelfit = c(nbins = 20, minbin = 1)) # Feed the object to the mdlfit.gfx() function: mdlfit.gfx(out) # Given the number of observations, the y-axis ticks are a bit crowded. We # can make it look less crowded by changing the number of ticks, labels, and # the grid-lines: mdlfit.gfx(out, y.tickat = seq(0, 250, 25), y.lim = c(0, 250), y.grid = seq(0, 250, 12.5))
# Generate some data. 1000 respondents taking 100 item test: set.seed(060121) p.success <- rBeta.4P(1000, 0.25, 0.75, 5, 3) for (i in 1:100) { if (i == 1) { rawdata <- matrix(nrow = 1000, ncol = 100) } rawdata[, i] <- rbinom(1000, 1, p.success) } # Analyse the accuracy and consistency of the test and store the object: out <- LL.CA(x = rowSums(rawdata), reliability = cba(rawdata), cut = 50, min = 0, max = 100, modelfit = c(nbins = 20, minbin = 1)) # Feed the object to the mdlfit.gfx() function: mdlfit.gfx(out) # Given the number of observations, the y-axis ticks are a bit crowded. We # can make it look less crowded by changing the number of ticks, labels, and # the grid-lines: mdlfit.gfx(out, y.tickat = seq(0, 250, 25), y.lim = c(0, 250), y.grid = seq(0, 250, 12.5))
Calculates McDonalds's Omega reliability-coefficient of the sum-score from the Spearman one-factor model using the procedure outlined in McDonald (1999).
mdo(x, fit = FALSE)
mdo(x, fit = FALSE)
x |
A data-frame or matrix of numerical values where rows represent respondents, and columns represent items. |
fit |
Logical. Default is |
If fit = FALSE
, A vector of length 1 containing the estimated McDonalds's Omega reliability-coefficient for the sum-score of the supplied variables. If fit = TRUE
, a list containing the Omega-coefficient reliability-estimate as the first entry, followed by the goodness-of-fit index (GFI), a two-row matrix containing the estimated factor-loadings and error-variances, and the observed and fitted covariance-matrices and the discrepancy matrix.
Missing values are treated by passing na.rm = TRUE
to the var
function call and use = "pairwise.complete.obs"
to the cov
function call.
The function terminates with an error if there are negative covariance-matrix entries.
McDonald, R. P. (1999). Test Theory: A Unified Treatment. Routledge.
# Generate some fictional data. set.seed(1234) rawdata <- matrix(rnorm(500), ncol = 5) common <- rnorm(100) rawdata <- apply(rawdata, 2, function(x) {x + common}) # To estimate McDonald's Omega from this data: mdo(rawdata) # To retrieve additional information such as the GFI fit-index and model- # parameter estimates: mdo(rawdata, fit = TRUE)
# Generate some fictional data. set.seed(1234) rawdata <- matrix(rnorm(500), ncol = 5) common <- rnorm(100) rawdata <- apply(rawdata, 2, function(x) {x + common}) # To estimate McDonald's Omega from this data: mdo(rawdata) # To retrieve additional information such as the GFI fit-index and model- # parameter estimates: mdo(rawdata, fit = TRUE)
Given a fitted Standard (two-parameter) Beta Distribution, return the alpha shape-parameter value where the observed mean becomes the mode.
MLA(alpha, beta, x = NULL, n = NULL)
MLA(alpha, beta, x = NULL, n = NULL)
alpha |
Observed alpha-parameter value for fitted Standard Beta Probability Density Distribution. |
beta |
Observed beta-parameter value for fitted Standard Beta Probability Density Distribution. |
x |
Observed proportion-correct outcome. |
n |
Test-length. |
The Alpha shape-parameter value for the Standard Beta probability density distribution where the observed mean is the expected mode.
# Assuming a prior Standard (two-parameter) Beta distribution is fit, which # yield an alpha parameter of 10 and a beta parameter of 8, calculate the # true-alpha parameter most likely to have produced the observations: MLA(a = 10, b = 8)
# Assuming a prior Standard (two-parameter) Beta distribution is fit, which # yield an alpha parameter of 10 and a beta parameter of 8, calculate the # true-alpha parameter most likely to have produced the observations: MLA(a = 10, b = 8)
Assuming a prior standard (two-parameter) Beta Distribution, return the beta shape-parameter value where the observed mean becomes the mode.
MLB(alpha, beta, x = NULL, n = NULL)
MLB(alpha, beta, x = NULL, n = NULL)
alpha |
Observed alpha-parameter value for fitted Standard Beta Probability Density Distribution. |
beta |
Observed beta-parameter value for fitted Standard Beta Probability Density Distribution. |
x |
Observed proportion-correct outcome. |
n |
Test-length. |
The Beta shape-parameter value for the Standard Beta probability density distribution where the observed mean is the expected mode.
# Assuming a prior Standard (two-parameter) Beta distribution is fit, which # yield an alpha parameter of 10 and a beta parameter of 8, calculate the # true-beta parameter most likely to have produced the observations: MLB(a = 10, b = 8)
# Assuming a prior Standard (two-parameter) Beta distribution is fit, which # yield an alpha parameter of 10 and a beta parameter of 8, calculate the # true-beta parameter most likely to have produced the observations: MLB(a = 10, b = 8)
Assuming a prior Standard (two-parameter) Beta Distribution, returns the expected mean of the distribution under the assumption that the observed value is the most likely value of the distribution.
MLM(alpha, beta, x = NULL, n = NULL)
MLM(alpha, beta, x = NULL, n = NULL)
alpha |
Observed alpha value for fitted Standard Beta Probability Density Distribution. |
beta |
Observed beta value for fitted Standard Beta Probability Density Distribution. |
x |
Observed proportion-correct outcome. |
n |
Test-length. |
The expected mean of the Standard Beta probability density distribution, for which the observed mean is the most likely value.
# Assuming a prior Standard (two-parameter) Beta distribution is fit, which # yield an alpha parameter of 10 and a beta parameter of 8, calculate the # true-mean most likely to have produced the observations: MLM(a = 10, b = 8)
# Assuming a prior Standard (two-parameter) Beta distribution is fit, which # yield an alpha parameter of 10 and a beta parameter of 8, calculate the # true-mean most likely to have produced the observations: MLM(a = 10, b = 8)
Computes Raw, Central, or Standardized moment properties of a vector of observed scores.
observedmoments( x, type = c("raw", "central", "standardized"), orders = 4, correct = TRUE )
observedmoments( x, type = c("raw", "central", "standardized"), orders = 4, correct = TRUE )
x |
A vector of values, the distribution of which moments are to be calculated. |
type |
A character vector determining which moment-types are to be calculated. Permissible values are |
orders |
The number of moment-orders to be calculated for each of the moment-types. |
correct |
Logical. Whether to include bias correction in estimation of orders. Default is |
A list of moment types, each a list of moment orders.
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # To compute the first four raw, central, and standardized moments for this # distribution of observed scores using observedmoments(): observedmoments(x = testdata, type = c("raw", "central", "standardized"), orders = 4, correct = TRUE)
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, 0.25, 0.75, 5, 3)) hist(testdata, xlim = c(0, 100)) # To compute the first four raw, central, and standardized moments for this # distribution of observed scores using observedmoments(): observedmoments(x = testdata, type = c("raw", "central", "standardized"), orders = 4, correct = TRUE)
Function for calculating the proportion of observations up to a specifiable quantile under the Four-Parameter Beta Distribution.
pBeta.4P(q, l, u, alpha, beta, lower.tail = TRUE)
pBeta.4P(q, l, u, alpha, beta, lower.tail = TRUE)
q |
The quantile or a vector of quantiles for which the proportion is to be calculated. |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The first shape parameter. |
beta |
The second shape parameter. |
lower.tail |
Whether the proportion to be calculated is to be under the lower or upper tail. Default is |
A vector of proportions of observations falling under specified quantiles under the four-parameter Beta distribution.
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the cumulative probability at a specific point of # the distribution (e.g., 0.5) # using pBeta.4P(): pBeta.4P(q = 0.5, l = 0.25, u = 0.75, alpha = 5, beta = 3)
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the cumulative probability at a specific point of # the distribution (e.g., 0.5) # using pBeta.4P(): pBeta.4P(q = 0.5, l = 0.25, u = 0.75, alpha = 5, beta = 3)
Function for calculating the proportion of observations up to a specifiable quantile under the Beta-Binomial Probability Distribution.
pBetaBinom(q, N, l, u, alpha, beta, lower.tail = TRUE)
pBetaBinom(q, N, l, u, alpha, beta, lower.tail = TRUE)
q |
The quantile or a vector of quantiles for which the proportion is to be calculated. |
N |
The total number of trials. |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The first shape parameter. |
beta |
The second shape parameter. |
lower.tail |
Whether the proportion to be calculated is to be under the lower or upper tail. Default is |
A vector of proportions of observations falling under specified quantiles under the four-parameter Beta distribution.
# Assume some variable follows a Beta-Binomial distribution with number of # trials = 50, and probabilities of successful trials are drawn from a four- # parameter Beta distribution with location parameters l = 0.25 and u = # 0.75, and shape parameters alpha = 5 and beta = 3. To compute the # cumulative probability at a specific point of the distribution (e.g., 25): pBetaBinom(q = 25, N = 50, l = .25, u = .75, alpha = 5, beta = 3)
# Assume some variable follows a Beta-Binomial distribution with number of # trials = 50, and probabilities of successful trials are drawn from a four- # parameter Beta distribution with location parameters l = 0.25 and u = # 0.75, and shape parameters alpha = 5 and beta = 3. To compute the # cumulative probability at a specific point of the distribution (e.g., 25): pBetaBinom(q = 25, N = 50, l = .25, u = .75, alpha = 5, beta = 3)
Calculates the probability of some specific observation falling under a specified interval ([0, x] or [x, 1]) under the Standard Beta probability density distribution with defined mean and variance or standard deviation.
pBetaMS(q, mean, variance = NULL, sd = NULL, lower.tail = TRUE, l = 0, u = 1)
pBetaMS(q, mean, variance = NULL, sd = NULL, lower.tail = TRUE, l = 0, u = 1)
q |
A specific point on the x-axis of the Standard Beta probability density distribution with a defined mean and variance. |
mean |
The mean of the target Standard Beta probability density distribution. |
variance |
The variance of the target Standard Beta probability density distribution. |
sd |
The standard deviation of the target Standard Beta probability density distribution. |
lower.tail |
Whether the density that should be considered is between the lower-end (i.e., [0 -> x]) or the higher-end of the distribution (i.e., [x -> 1]). |
l |
The lower-bound location parameter. Default set to 0 (the standard Beta distribution). |
u |
The upper-bound location parameter. Default set to 1 (the standard Beta distribution). |
A value representing the probability of a random draw from the Standard Beta probability density distribution with a defined mean and variance being from one of two defined intervals (i.e., [0 -> x] or [x -> 1]).
# To compute the proportion of the density under the lower-end tail of a # point along the Standard (two-parameter) Probability Density Distribution # (e.g., 0.5) with mean of 0.6 and variance of 0.04: pBetaMS(q = 0.5, mean = 0.6, variance = 0.04) # To compute the proportion of the density under the lower-end tail of a # point along the Four-Parameter Beta Probability Density Distribution # (e.g., 50) with mean of 60 and variance of 400, and lower-bound of 0 and # upper-bound of 100: pBetaMS(q = 50, mean = 60, variance = 400, l = 0, u = 100)
# To compute the proportion of the density under the lower-end tail of a # point along the Standard (two-parameter) Probability Density Distribution # (e.g., 0.5) with mean of 0.6 and variance of 0.04: pBetaMS(q = 0.5, mean = 0.6, variance = 0.04) # To compute the proportion of the density under the lower-end tail of a # point along the Four-Parameter Beta Probability Density Distribution # (e.g., 50) with mean of 60 and variance of 400, and lower-bound of 0 and # upper-bound of 100: pBetaMS(q = 50, mean = 60, variance = 400, l = 0, u = 100)
Function for calculating the proportion of observations up to a specifiable quantile under Lord's two-term approximation to the compound Binomial distribution.
pcBinom(q, N, k, p, lower.tail = TRUE)
pcBinom(q, N, k, p, lower.tail = TRUE)
q |
The quantile or vector of quantiles for which the proportion is to be calculated. |
N |
Total number of trials. |
k |
Lord's k (see documentation for the |
p |
Probability of success for each trial. |
lower.tail |
Logical. If TRUE (default), probabilities are P[X<x], otherwise, P[X >= x]. Note that this differs from base-R |
# Assume some variable follows a compound Binomial distribution with 100 # trials, a 50% probability of success on each trial, and Lord's k = 1. To # compute the cumulative probability at a specific point of the distribution # (e.g., 50): pcBinom(q = 50, N = 100, k = 1, p = .5)
# Assume some variable follows a compound Binomial distribution with 100 # trials, a 50% probability of success on each trial, and Lord's k = 1. To # compute the cumulative probability at a specific point of the distribution # (e.g., 50): pcBinom(q = 50, N = 100, k = 1, p = .5)
Extends the cumulative Binomial probability mass function to positive non-integers, effectively turning the mass-function into a density-function.
pGammaBinom(q, size, prob, lower.tail = TRUE)
pGammaBinom(q, size, prob, lower.tail = TRUE)
q |
Vector of quantiles. |
size |
Number of "trials" (zero or more). Need not be integer. |
prob |
Probability of "success" on each "trial". Need not be integer. |
lower.tail |
Logical. If TRUE (default), probabilities are P[X<x], otherwise, P[X >= x]. Note that this differs from base-R |
Loeb, D. E. (1992). A generalization of the binomial coefficients. Discrete Mathematics, 105(1-3).
# Assume some variable follows a Gamma-Binomial distribution with # "number of trials" = 10.5 and probability of "success" for each "trial" # = 0.75, to compute the cumulative probability to attain a "number of # success" below a specific point (e.g., less than 7.5 "successes": pGammaBinom(q = 7.5, size = 10.5, prob = 0.75) # Conversely, to attain a value at or above 7.5: pGammaBinom(q = 7.5, size = 10.5, prob = 0.75, lower.tail = FALSE)
# Assume some variable follows a Gamma-Binomial distribution with # "number of trials" = 10.5 and probability of "success" for each "trial" # = 0.75, to compute the cumulative probability to attain a "number of # success" below a specific point (e.g., less than 7.5 "successes": pGammaBinom(q = 7.5, size = 10.5, prob = 0.75) # Conversely, to attain a value at or above 7.5: pGammaBinom(q = 7.5, size = 10.5, prob = 0.75, lower.tail = FALSE)
Function for calculating the quantile (i.e., value of x
) for a given proportion (i.e., the value of y
) under the Four-Parameter Beta Distribution.
qBeta.4P(p, l, u, alpha, beta, lower.tail = TRUE)
qBeta.4P(p, l, u, alpha, beta, lower.tail = TRUE)
p |
A vector (or single value) of proportions or probabilities for which the corresponding value of |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The first shape parameter. |
beta |
The second shape parameter. |
lower.tail |
Logical. Whether the quantile(s) to be calculated is to be under the lower or upper tail. Default is |
A vector of quantiles for specified probabilities or proportions of observations under the four-parameter Beta distribution.
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the quantile at a specific point of the # distribution (e.g., 0.5) using qBeta.4P(): qBeta.4P(p = 0.5, l = 0.25, u = 0.75, alpha = 5, beta = 3)
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To compute the quantile at a specific point of the # distribution (e.g., 0.5) using qBeta.4P(): qBeta.4P(p = 0.5, l = 0.25, u = 0.75, alpha = 5, beta = 3)
Calculates the quantile corresponding to a specific probability of some observation falling within the [0, x] (lt = TRUE
) or [x, 1] (lt = FALSE
) interval under the Standard Beta probability density distribution with defined mean and variance or standard deviation.
qBetaMS(p, mean, variance = NULL, sd = NULL, lower.tail = TRUE, l = 0, u = 1)
qBetaMS(p, mean, variance = NULL, sd = NULL, lower.tail = TRUE, l = 0, u = 1)
p |
A value of probability marking the point of the Y-axis to correspond to the X-axis. |
mean |
The mean of the target Standard Beta probability density distribution. |
variance |
The variance of the target Standard Beta probability density distribution. |
sd |
The standard deviation of the target Standard Beta probability density distribution. |
lower.tail |
Logical. Specifies which end of the tail for which to calculate quantile. Default is |
l |
The lower-bound location parameter. Default set to 0 (the standard Beta distribution). |
u |
The upper-bound location parameter. Default set to 1 (the standard Beta distribution). |
A numeric value representing the quantile for which the specified proportion of observations fall within.
# To compute the quantile at a specific point (e.g., 0.5) along the Standard # (two-parameter) Probability Density Distribution with mean of 0.6 and variance of 0.04: qBetaMS(p = 0.5, mean = 0.6, variance = 0.04) # To compute the quantile at a specific points(e.g., 0.5) along the four- # parameter Beta distribution with a mean of 60, variance of 400, and lower- # bound of 0 and upper-bound of 100: qBetaMS(p = 0.5, mean = 60, variance = 400, l = 0, u = 100)
# To compute the quantile at a specific point (e.g., 0.5) along the Standard # (two-parameter) Probability Density Distribution with mean of 0.6 and variance of 0.04: qBetaMS(p = 0.5, mean = 0.6, variance = 0.04) # To compute the quantile at a specific points(e.g., 0.5) along the four- # parameter Beta distribution with a mean of 60, variance of 400, and lower- # bound of 0 and upper-bound of 100: qBetaMS(p = 0.5, mean = 60, variance = 400, l = 0, u = 100)
Quantile function for the Gamma-extended Binomial distribution.
qGammaBinom(p, size, prob, lower.tail = TRUE, precision = 1e-07)
qGammaBinom(p, size, prob, lower.tail = TRUE, precision = 1e-07)
p |
Vector of probabilities. |
size |
Number of "trials" (zero or more, including positive non-integers). |
prob |
Probability of success on each "trial". |
lower.tail |
Logical. If TRUE (default), probabilities are P[X < x], otherwise P[X > x]. |
precision |
The precision with which the quantile is to be calculated. Default is 1e-7 (i.e., search terminates when there is no registered change in estimate at the seventh decimal). Tuning this value will impact the time it takes for the search algorithm to arrive at an estimate. |
This function uses a bisection search-algorithm to find the number of successes corresponding to the specified quantile(s). This algorithm is inefficient with respect to the number of iterations required to converge on the solution. More efficient algorithms might be added in later versions.
Loeb, D. E. (1992). A generalization of the binomial coefficients. Discrete Mathematics, 105(1-3).
# For a Gamma-extended Binomial distribution with number of trials = 10 and # probability of success per trial of 0.75, calculate the number of success- # ful trials at or below the 25% quantile: qGammaBinom(p = 0.25, size = 10, prob = 0.75) # Conversely, for a Gamma-extended Binomial distribution with number of # trials = 10 and probability of success per trial of 0.75, calculate the # number of successful trials at or above the 25% quantile: qGammaBinom(p = 0.25, size = 10, prob = 0.75, lower.tail = FALSE)
# For a Gamma-extended Binomial distribution with number of trials = 10 and # probability of success per trial of 0.75, calculate the number of success- # ful trials at or below the 25% quantile: qGammaBinom(p = 0.25, size = 10, prob = 0.75) # Conversely, for a Gamma-extended Binomial distribution with number of # trials = 10 and probability of success per trial of 0.75, calculate the # number of successful trials at or above the 25% quantile: qGammaBinom(p = 0.25, size = 10, prob = 0.75, lower.tail = FALSE)
Calculate model-implied reliability given mean, variance, the minimum and maximum possible scores, and the effective test length.
R.ETL(mean, variance, min = 0, max = 1, ETL)
R.ETL(mean, variance, min = 0, max = 1, ETL)
mean |
The mean of the observed-score distribution. |
variance |
The variance of the observed-score distribution. |
min |
The lower-bound (minimum possible value) of the observed-score distribution. Default is 0 (assuming observed scores represent proportions). |
max |
The upper-bound (maximum possible value) of the observed-score distribution. Default is 1 (assuming observed scores represent proportions). |
ETL |
The effective test length as defined by Livingston and Lewis (1995). |
An estimate of the reliability of a test, given the effective test length, mean, variance, and minimum and maximum possible scores of the observed-score distribution..
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, .25, .75, 5, 3)) hist(testdata, xlim = c(0, 100)) # From the data-generating script above, the effective test length is 100. # To estimate and retrieve the model-implied reliability using R.ETL(): R.ETL(mean = mean(testdata), variance = var(testdata), min = 0, max = 100, ETL = 100)
# Generate some fictional data. Say, 100 individuals take a test with a # maximum score of 100 and a minimum score of 0. set.seed(1234) testdata <- rbinom(100, 100, rBeta.4P(100, .25, .75, 5, 3)) hist(testdata, xlim = c(0, 100)) # From the data-generating script above, the effective test length is 100. # To estimate and retrieve the model-implied reliability using R.ETL(): R.ETL(mean = mean(testdata), variance = var(testdata), min = 0, max = 100, ETL = 100)
Function for generating random numbers from a specified Four-Parameter Beta Distribution.
rBeta.4P(n, l, u, alpha, beta)
rBeta.4P(n, l, u, alpha, beta)
n |
Number of draws. |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The alpha (first) shape parameter. |
beta |
The beta (second) shape parameter. |
A vector with length n
of random values drawn from the Four-Parameter Beta Distribution.
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To draw a random value from this distribution using # rBeta.4P(): rBeta.4P(n = 1, l = 0.25, u = 0.75, alpha = 5, beta = 3)
# Assume some variable follows a four-parameter Beta distribution with # location parameters l = 0.25 and u = 0.75, and shape parameters alpha = 5 # and beta = 3. To draw a random value from this distribution using # rBeta.4P(): rBeta.4P(n = 1, l = 0.25, u = 0.75, alpha = 5, beta = 3)
Random Number Generation under the Beta-Binomial Probability Mass Distribution.
rBetaBinom(n, N, l, u, alpha, beta)
rBetaBinom(n, N, l, u, alpha, beta)
n |
Number of draws. |
N |
Number of trials. |
l |
The first (lower) location parameter. |
u |
The second (upper) location parameter. |
alpha |
The alpha (first) shape parameter. |
beta |
The beta (second) shape parameter. |
A vector with length n
of random values drawn from the Beta-Binomial Distribution.
# To draw a sample of 50 values from a Beta-Binomial distribution with # number of trials = 100, and with success-probabilities drawn from a # Four-Parameter Beta distribution with location parameters l = 0.25 and # u = 0.95, and shape-parameters alpha = 5 and beta = 3: rBetaBinom(n = 50, N = 100, l = 0.25, u = 0.95, alpha = 5, beta = 3)
# To draw a sample of 50 values from a Beta-Binomial distribution with # number of trials = 100, and with success-probabilities drawn from a # Four-Parameter Beta distribution with location parameters l = 0.25 and # u = 0.95, and shape-parameters alpha = 5 and beta = 3: rBetaBinom(n = 50, N = 100, l = 0.25, u = 0.95, alpha = 5, beta = 3)
Random number generation under Lord's Beta Compound-Binomial distribution, where the Compound-Binomial distribution is Lord's two-term approximation.
rBetacBinom(x, N, k, l, u, alpha, beta)
rBetacBinom(x, N, k, l, u, alpha, beta)
x |
Number of draws. |
N |
Number of trials. |
k |
Lord's k (see documentation for the |
l |
The lower-bound location parameter of the four-parameter Beta distribution. |
u |
The upper-bound location parameter of the four-parameter Beta distribution. |
alpha |
The first shape-parameter of the four-parameter Beta distribution. |
beta |
The second shape-parameter of the four-parameter Beta distribution. |
For larger values of k
, the distribution can yield negative probabilities which returns an error.
# To draw a sample of 50 values from a Beta Compound-Binomial distribution # with number of trials = 100, Lord's k = 1, and probabilities of successful # trials drawn from a four-parameter Beta distribution with location- # parameters l = .15 and u = .85, and shape parameters alpha = 6 and # beta = 4: rBetacBinom(x = 50, N = 100, k = 1, l = .15, u = .85, alpha = 6, beta = 4)
# To draw a sample of 50 values from a Beta Compound-Binomial distribution # with number of trials = 100, Lord's k = 1, and probabilities of successful # trials drawn from a four-parameter Beta distribution with location- # parameters l = .15 and u = .85, and shape parameters alpha = 6 and # beta = 4: rBetacBinom(x = 50, N = 100, k = 1, l = .15, u = .85, alpha = 6, beta = 4)
Draws random samples of observations from the Standard Beta probability density distribution with defined mean and variance.
rBetaMS(n, mean, variance = NULL, sd = NULL, l = 0, u = 1)
rBetaMS(n, mean, variance = NULL, sd = NULL, l = 0, u = 1)
n |
Number of observations to be drawn from under the Standard Beta Probability Density Distribution. |
mean |
The mean of the target Standard Beta probability density distribution. |
variance |
The variance of the target Standard Beta probability density distribution. |
sd |
The standard deviation of the target Standard probability density distribution. |
l |
The lower-bound location parameter. Default set to 0 (the standard Beta distribution). |
u |
The upper-bound location parameter. Default set to 1 (the standard Beta distribution). |
A vector of length n
, each value representing a random draw from the Standard Beta probability density distribution with defined mean and variance.
# To draw a random sample of 100 values from a Standard Beta distribution # with a mean of 0.6 and variance = 0.04: rBetaMS(n = 100, mean = 0.6, variance = 0.04) # To draw a random sample of 100 values from a Four-Parameter Beta # 0 and an upper-bound of 100: rBetaMS(n = 100, mean = 60, variance = 400, l = 0, u = 100)
# To draw a random sample of 100 values from a Standard Beta distribution # with a mean of 0.6 and variance = 0.04: rBetaMS(n = 100, mean = 0.6, variance = 0.04) # To draw a random sample of 100 values from a Four-Parameter Beta # 0 and an upper-bound of 100: rBetaMS(n = 100, mean = 60, variance = 400, l = 0, u = 100)
Random Number Generation under Lord's Two-Term Approximation to the Compound Binomial Distribution.
rcBinom(n, N, k, p)
rcBinom(n, N, k, p)
n |
Number of draws. |
N |
Number of trials. |
k |
Lord's k (see documentation for the |
p |
Probability of success for each trial. |
For larger values of k
, the distribution can yield negative probabilities. This function handles such occurrences by adding the absolute value of the minimum probability to all observations if there are any negative probabilities and then normalize the distribution so that the total density is equal to 1.
# To draw a sample of 50 values from a Compound-Binomial distribution with # number of trials = 100, a 50% probability of success for each trial, and # Lord's k = 1: set.seed(1234) rcBinom(n = 50, N = 100, k = 1, p = .5) # To draw values where the probabilities vary for each draw: rcBinom(n = 50, N = 100, k = 1, p = runif(50))
# To draw a sample of 50 values from a Compound-Binomial distribution with # number of trials = 100, a 50% probability of success for each trial, and # Lord's k = 1: set.seed(1234) rcBinom(n = 50, N = 100, k = 1, p = .5) # To draw values where the probabilities vary for each draw: rcBinom(n = 50, N = 100, k = 1, p = runif(50))
Random number generation under the Gamma-extended Binomial distribution.
rGammaBinom(n, size, prob, precision = 1e-04)
rGammaBinom(n, size, prob, precision = 1e-04)
n |
Number of observations. |
size |
Number of "trials" (zero or more). Need not be integer. |
prob |
Probability of "success" on each "trial". Need not be integer. |
precision |
The precision with which the quantile is to be calculated. Default is 1e-4 (i.e., search terminates when there is no registered change in estimate at the fourth decimal). Tuning this value will impact the time it takes for the search algorithm to arrive at an estimate. |
Calls qGammaBinom()
, which makes the random draw slower than what one might be used to (since qGammaBinom()
calls pGammaBinom()
and employs a search-algorithm to find the appropriate value down to a specifiable level of precision).
# Assume some variable follows a Gamma-Binomial distribution with # "number of trials" = 10.5 and probability of "success" for each "trial" # = 0.75 To draw a random value from this distribution: rGammaBinom(n = 1, size = 10, prob = 0.75)
# Assume some variable follows a Gamma-Binomial distribution with # "number of trials" = 10.5 and probability of "success" for each "trial" # = 0.75 To draw a random value from this distribution: rGammaBinom(n = 1, size = 10, prob = 0.75)
An implementation of Lords (1965, p. 265) equation 37 for estimating the raw moments of the true-score distribution, modified to function for the Livingston and Lewis approach.
tsm(x, r, n, method = "product")
tsm(x, r, n, method = "product")
x |
The effective test-score of test-takers. |
r |
The moment-order that is to be calculated (where 1 is the mean, 2 is the raw variance, 3 is the raw skewness, etc.). |
n |
The effective test-length. |
method |
The method by which the descending factorials are to be calculated. Default is |
Lord, F. M. (1965). A strong true-score theory, with applications. Psychometrika. 30(3). pp. 239–270. doi: 10.1007/BF02289490
Livingston, Samuel A. and Lewis, Charles. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement, 32(2).
# Examine the raw moments of the underlying Beta distribution that is to provide the basis for # observed-scores: betamoments(alpha = 5, beta = 3, l = 0.25, u = 0.75, types = "raw") # Generate observed-scores from true-scores by passing the true-scores as binomial probabilities # for the rbinom function. set.seed(1234) obs.scores <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) # Examine the raw moments of the observed-score distribution. observedmoments(obs.scores, type = "raw") # First four estimated raw moment of the proportional true-score distribution from the observed- # score distribution. As all items are equally difficult, the effective test-length is equal to # the actual test-length. tsm(x = obs.scores, r = 1, n = 100) tsm(x = obs.scores, r = 2, n = 100) tsm(x = obs.scores, r = 3, n = 100) tsm(x = obs.scores, r = 4, n = 100) # Which is fairly close to the true raw moments of the proportional true-score distribution # calculated above.
# Examine the raw moments of the underlying Beta distribution that is to provide the basis for # observed-scores: betamoments(alpha = 5, beta = 3, l = 0.25, u = 0.75, types = "raw") # Generate observed-scores from true-scores by passing the true-scores as binomial probabilities # for the rbinom function. set.seed(1234) obs.scores <- rbinom(1000, 100, rBeta.4P(1000, 0.25, 0.75, 5, 3)) # Examine the raw moments of the observed-score distribution. observedmoments(obs.scores, type = "raw") # First four estimated raw moment of the proportional true-score distribution from the observed- # score distribution. As all items are equally difficult, the effective test-length is equal to # the actual test-length. tsm(x = obs.scores, r = 1, n = 100) tsm(x = obs.scores, r = 2, n = 100) tsm(x = obs.scores, r = 3, n = 100) tsm(x = obs.scores, r = 4, n = 100) # Which is fairly close to the true raw moments of the proportional true-score distribution # calculated above.
Calculates the upper-bound value required to produce a Beta probability density distribution with defined moments and parameters. Be advised that not all combinations of moments and parameters can be satisfied (e.g., specifying mean, variance, skewness and kurtosis uniquely determines both location-parameters, meaning that the value of the upper-location parameter will take on which ever value it must, and cannot be specified).
UABMSL( alpha = NULL, beta = NULL, mean = NULL, variance = NULL, l = NULL, sd = NULL )
UABMSL( alpha = NULL, beta = NULL, mean = NULL, variance = NULL, l = NULL, sd = NULL )
alpha |
The alpha shape-parameter of the target Beta probability density distribution. |
beta |
The beta shape-parameter of the target Beta probability density distribution. |
mean |
The mean (first raw moment) of the target Standard Beta probability density distribution. |
variance |
The variance (second central moment) of the target Standard Beta probability density distribution. |
l |
The lower-bound of the Beta distribution. Default is NULL (i.e., does not take a specified l-parameter into account). |
sd |
Optional alternative to specifying |
A numeric value representing the required value for the Beta upper location-parameter (u
) in order to produce a Beta probability density distribution with the target moments and parameters.
# Generate some fictional data. set.seed(1234) testdata <- rBeta.4P(100000, 0.25, 0.75, 5, 3) hist(testdata, xlim = c(0, 1), freq = FALSE) # Suppose you know three of the four necessary parameters to fit a four- # parameter Beta distribution (i. e., l = 0.25, alpha = 5, beta = 3) to this # data. To find the value for the necessary u parameter, estimate the mean # and variance of the distribution: M <- mean(testdata) S2 <- var(testdata) # To find the l parameter necessary to produce a four-parameter Beta # distribution with the target mean, variance, and u, alpha, and beta # parameters using the LMSBAU() function: (u <- UABMSL(alpha = 5, beta = 3, mean = M, variance = S2, l = 0.25)) curve(dBeta.4P(x, 0.25, u, 5, 3), add = TRUE, lwd = 2)
# Generate some fictional data. set.seed(1234) testdata <- rBeta.4P(100000, 0.25, 0.75, 5, 3) hist(testdata, xlim = c(0, 1), freq = FALSE) # Suppose you know three of the four necessary parameters to fit a four- # parameter Beta distribution (i. e., l = 0.25, alpha = 5, beta = 3) to this # data. To find the value for the necessary u parameter, estimate the mean # and variance of the distribution: M <- mean(testdata) S2 <- var(testdata) # To find the l parameter necessary to produce a four-parameter Beta # distribution with the target mean, variance, and u, alpha, and beta # parameters using the LMSBAU() function: (u <- UABMSL(alpha = 5, beta = 3, mean = M, variance = S2, l = 0.25)) curve(dBeta.4P(x, 0.25, u, 5, 3), add = TRUE, lwd = 2)