đťšş StatBook đťš˝ [V1 FREE] | One-Line Statistics with MLR, ANOVA, KDE + 52 others!

INTRODUCING StatBook v1!

Project Logo

Who needs R or MATLAB when you have ROBLOX :cold_face:

Featuring Hypothesis Testing such as Multiple Linear Regression and ANOVA, Kernel Density Estimation, Markov Chains, Gamma/Beta Distribution Generation, and more with just ONE LINE OF CODE!

Welcome to the ultimate guide for StatBook v1! Whether you’re looking for ways to generate random variables in a more diverse and/or dynamic way than math.random does, looking to predict the next move of a player, consistently update analytics for in-game product sales, or simply curious about random variable generations and hypothesis testing, you’re in the right place!


Download [FREE] :cd:

Documentation :books: *highly recommended!!!

GitHub :floppy_disk:

CDF Calculation link for Random Variable Generation *translate the page to english

Graph Functions for Custom Distribution Creation


Get Started

Instructions

How to Use StatBook_v1 Module

Follow these steps to integrate the StatBook_v1 module into your project:

Step 1: Download the Module

Download the StatBook_v1 module from this link.

Step 2: Place the Module

Place the downloaded StatBook_v1 module into ServerScriptService within your Roblox Studio project.

Step 3: Import the Module in Your Script

When using the module in your script, add the following line to import it:

local StatBook = require(game.ServerScriptService.StatBook_v1)

Step 4: Use Functions from the Module

When you need to use a function from this library, always prepend the function call with “StatBook.”. For example:

local StatBook = require(game.ServerScriptService.StatBook_v1)
local result = StatBook.Median(list)

And you are all set!

By following these steps, you’ll be able to use all the statistical functions provided by the StatBook_v1 module in your Roblox game.


Summary of Capabilities

Hypothesis Testing :white_check_mark: :x:

Summary
  • Inference on Numerical Samples: The StatBook.inference(...) function has the ability to run statistical inferences on samples with dependent and independent numerical data. These tests include one and two sample independent/dependent t-tests, sign test, rank-sum test, signed-rank test, ANOVA, and Friedman Test. Returns information about differences in distribution using statistics and p-values, as well as post-hoc tests for individual sample to sample associations in the 3+ sample comparison cases.

NOTE: All tests are two-tailed (except F and Chi-Square Tests). A one-tailed test option will come with StatBook V1.1.

  • Multiple Linear Regression with Mallow’s C(p) Forward Selection: The StatBook.multipleLinearRegression(...) function takes in a list of data points, with each datapoint containing betas/predictors, and the respective dependent variables associated with each data point. The function returns all sorts of information about the regression model, and also has the option to refine a hypothesized model via forward selection using Mallow’s C(p) as selection criteria. This function can be used in complement with StatBook.predictY(...) to subsequently get the predicted y values of a datapoint given all predictors.

  • Categorical Hypothesis Testing: Includes a specialized suite of tests like oddsRatio, oneSampleProportionCI, and twoProportionInference for analyzing categorical data. It also includes a set of Chi-Square Tests such as goodnessOfFit() and chiSquareIndependence() for comparing observed data with expected data in categories.

Random Variate Generation :game_die:

Summary
  • Generate random variables from 15 different distributions: This feature allows users to generate random variables based on various statistical distributions. It includes commonly-used distributions like Normal, Poisson, Binomial, and also more specialized distributions like Chi-square, Weibull, and Log-normal. Users can input the parameters specific to each distribution to return a random variable.

  • Scale and Trim Distributions to get the values you need: A variety of 8 different distributions with greater flexibility over the bounds of the distribution you want to use and the range you want to map your output values to!

  • Create any Customized Distribution: This feature gives users the freedom to create their own probability distributions. The developer inputs a mathematical function (or a set of piecewise functions) in string using Lua syntax alongside its x range of use, which is for instance:

local piecewiseFunctions = {{"func1", func1xmin, func1xmax}, {"func2", func2xmin, func2xmax}}. 
  • Then, the aggregate Probability Density function can be scaled between any range you want, so that random variates are pulled from that range and emulate the distribution specified in piecewiseFunctions. MORE SPECIFIC INFORMATION BELOW.

  • Randomly sample a continuous variable from a discrete dataset: You can use a dataset of discrete values to generate random variables from a continuous representation of itself. The algorithm applies Kernel Density Estimation (KDE) to make each discrete datapoint contain a distribution (kernel) of itself, which the sum of these kernels creates a Probability Density Function, from which the algorithm randomly samples a value within the range of discrete data points, thus generating a “continuous” variable from a discrete set.

  • Simulate a sequence of actions through a Markov Chain: Markov Chain simulations are useful for modeling sequences of events where the outcome of one event affects the outcomes of subsequent events. Users can input the initial state and state transition probabilities to simulate various scenarios.

Basic Statistics :bar_chart:

  • Descriptive Statistics made simple: Provides fundamental statistical functions like mean, median, mode, and range, as well as more advanced metrics like standard deviation and variance.

Complex Functions :chart_with_upwards_trend::chart_with_downwards_trend:

  • Find approximations to functions difficult to solve analytically: Advanced mathematical functions like erf, inverf, gamma, and hypergeometric2f1 are included for specialized statistical needs.

Matrix Operations :heavy_plus_sign::heavy_minus_sign::heavy_multiplication_x:

  • Perform matrix operations: Includes matrix addition, subtraction, multiplication, transposition, and inversion.

Hypothesis Testing :white_check_mark: :x:

How can you use these functions?

How can game developers use these functions?

  • Real-Time Analysis and Reaction: The model could even be used in real-time to adjust game parameters according to ongoing player behavior. Furthermore, you could predict and react to a players actions in-game in real-time.

  • Optimized Monetization/Play Time: If your game has in-app purchases, understanding what players value can help you offer more compelling packages.

  • Feature Selection: The function includes forward regression based on Mallows’s Cp, helping you identify which variables are the most important predictors, enabling you to focus on the most impactful game elements.

  • A/B Testing: If you have two versions of a feature, you can use the inference tests to see which version is more effective at achieving a specific outcome.

  • Player Personalization: Use the model to predict player behaviors and preferences, allowing for a more personalized gaming experience.

Functions


Inference on Numerical Samples (ANOVA, one/two sample independent/dependent t-tests, Wilcoxon rank tests, etc.)

Be sure to check this function out. It’s an important one!

inference(samples, independent, CL, mu0)

inference(samples, independent, CL, mu0)

Overview

Performs statistical inference tests based on the given data. The function will decide which test to use based on the number of samples, whether they are independent or not, and their distribution.

NOTE 1: ALL TESTS IN THIS MODULE ARE TWO-TAILED (besides F and Chi-Square tests). A future update with one-tailed options may come in a future update.

NOTE 2: There aren’t any two or more sample tests able to do a hypothesis test for a certain amount of difference between the means/medians of the samples. By default, all two or more sample tests check for a difference in distribution, that is, ( H_0: D_\mu) or ( D_\eta = 0 )

NOTE 3: If dependent = true then all samples must have the same amount of entries.

NOTE 4: It is highly recommended to return the warning value, as a warning = true value means the results may have a significant degree of inaccuracy due to computational limits.

Parameters

Parameter Type Description
samples Table A table of samples containing the data to be tested.
independent Boolean, Nil Whether the samples are independent or not (Nil for 1-sample).
CL Number, (Nil = 0.95) Confidence level for the statistical tests (Nil = 0.95).
mu0 Number, (Nil = 0) The hypothetical mean tested against in 1-sample test. Defaults to 0 if Nil.

Returns

A table possibly containing the following:

Key Type Description
pValue Number The p-value of the test.
rejectH0 Boolean Whether to reject the null hypothesis.
stat Number The value of the test statistic.
df Number, Table, Nil Degrees of freedom (some tests have two (F), some not applicable).
center Table → Number(s) Contains the mean(s) or median(s) of the dataset(s).
centerComp Number, Nil Comparison value for the center (not applicable for some tests).
lowerCI Number, Nil The lower bound of the confidence interval for mean/median (NA for 3+ sample tests).
upperCI Number, Nil The upper bound of the confidence interval for the mean/median (NA for 3+ sample tests)
dependent Boolean, Nil Whether the test is for dependent samples (Nil for one-sample tests).
parametric Boolean Indicates if the test is parametric.
nSamples Number Number of samples in the test.
testType String Specifies the type of the test.
statType String Specifies the type of the test statistic.
centerType String Specifies what measure of central tendency is being tested.
postHoc Table → Tables, Nil Post-hoc tests with individual test data within each nested table (only for 3+ sample tests).
postHocSig Table → Tables, Nil Only contains Post-hoc tests with significant p-values (only for 3+ sample tests)
warning Nil , True Warnings if applicable (Nil if false or NA).

postHoc and postHocSig subfields (only for 3+ sample tests)

A table possibly containing the following:

Key Type Description
group1 Number The index of the first sample selected in the Post Hoc.
group2 Number The index of the second sample selected in the Post Hoc.
pValue Number The p-value of the Post Hoc test.
alpha Number The alpha needed for significance entailed by the Bonferonni correction.
rejectH0 Boolean Whether to reject the null hypothesis.
stat Number The value of the test statistic.
df Number, Nil Degrees of freedom (some not applicable).
center Table → Numbers Contains the means or medians of the datasets.
centerComp Number Comparison value for the center.
lowerCI Number The lower bound of the confidence interval for mean/median.
upperCI Number The upper bound of the confidence interval for the mean/median
testType String Specifies the type of the test.
statType String Specifies the type of the test statistic.
centerType String Specifies what measure of central tendency is being tested.
warning Nil , True Warnings if applicable (Nil if false or NA).
Examples

Examples

One-Sample Test:

local samples = {
    {12, 15, 14, 10, 13, 8, 13, 16, 8, 15, 22, 4, 7, 8}
}
-- in this case, either one-sample t-test or sign test (depends on normality of sample)
local CL = 0.95
local mu0 = 12
-- if we did not specify mu0, it would default to 0,
local result = StatBook.inference(samples, nil, CL, mu0)
print(result.pValue, result.stat)

Two-Sample Test:

local samples = {
    {12, 15, 14, 10, 13, 6, 18},
    {20, 24, 30, 27, 28, 19, 19}
}
local independent = false
-- in this case, either two-sample dep. t-test or signed-rank test (depends on normality of samples + Folded-F test)
local CL = 0.95
local result = StatBook.inference(samples, independent, CL)
print(result.pValue, result.centerComp, result.lowerCI, result.upperCI)

Three-Plus Sample Test:

local samples = {
    {12, 15, 14, 10, 13, 14},
    {20, 24, 30, 27, 28, 20},
    {16, 25, 19, 20, 22, 18},
    {23, 14, 10, 37, 8, 19}
}
local independent = true
-- in this case, either ANOVA test or Kruskal Wallis test (depends on normality of samples + Levene Test)
local CL = 0.95
local result = StatBook.inference(samples, independent, CL)
print(result.pValue, result.postHocSig.group1, result.postHocSig.group2, result.postHocSig.pValue)

Multiple Linear Regression w/ Mallow’s C(p) forward selection

Be sure to check this function out. It’s an important one! predictY() is also located inside this dropdown, as the two functions are very closely related

multipleLinearRegression(X, Y, forwardReg, diagnostics, CL) AND predictY(X, model, yHat, indices)

multipleLinearRegression(X, Y, forwardReg, diagnostics, CL)

Overview

The multipleLinearRegression function performs multiple linear regression analysis on given data sets. This advanced function includes several features, such as forward regression based on Mallows’s Cp, diagnostic statistics, and calculation of Variance Inflation Factors (VIFs).

NOTE 1: All data point subtables within the ( X ) table must have the same amount of entries. No missing or nil values allowed.

NOTE 2: Before invoking the function, ensure that the ( X ) table is formatted correctly as a 2D table, where each inner table represents a row of the matrix. If ( X ) is not in this format, you may use the module.matTranspose(matrix) function to transpose ( X ) into a compatible layout.

NOTE 3: The VIFs table is only there as a warning. VIF values do not impact the regression model and do not automatically remove multi-collinear predictors. You will have to manually account for this if you choose to remove a predictor yourself and thus rerun multipleLinearRegression again.

Parameters

Parameter Type Description Default
X table The independent variables matrix (2D table). Required
Y table The dependent variable vector (1D table). Required
forwardReg boolean Enables or disables the forward regression process. true
diagnostics boolean Enables or disables diagnostic statistics. true
CL number Confidence level for t-tests and F-test. 0.95

Returns (if diagnostics ~= true)

Variable Type Description
yHat table → number(s) Fitted values for the dependent variable.
indices table → number(s) Indices of betas retained in model from lmOrig to lmNew

Returns (if diagnostics = true)

Variable Type Description Subfields
lmNew table → tables Model after forward selection with Mallow’s C(p) yes
lmOrig table → tables Model before forward selection with Mallow’s C(p) yes
indices table → number(s) Indices of betas retained in model from lmOrig to lmNew

lmNew and lmOrig Subfields*

Variable Type Description Sub-subfields
yHat table → number(s) Fitted values for the dependent variable.
r2 number ( R^2 ) value indicating the goodness of fit.
r2adj number Adjusted ( R^2 ) accounting for # of predictors.
F number F-statistic used for hypothesis testing.
pValueF number p-value of the F-statistic.
BetaInfo table → table Information about predictor coefficients. yes
VIFs* table → table Indicates multicollinearity status. yes

* There isn’t a VIFs subfield in lmOrig.

BetaInfo Sub-subfields

Variable Type Description
predictorIndex table → number The original index of the beta in question.
rejectH0 table → boolean Hypotheses test results for individual betas.
t table → number The t-statistic of the beta in question.
pValue table → boolean The p-value of the beta in question.

VIFs Sub-subfields

Variable Type Description
VIF table → number Variance Inflation Factors of each beta.
summaryVIF table → string A description of potential multicollinearity

Example Usage

-- regression with 6 datapoints and 3 predictors
local X = {{1, 4, 7}, {2, 3, 5}, {3, 2, 1}, {4, 2, 2}, {5, 8, 3}, {3, 6, 2}}
local Y = {3, 3, 2, 2, 4, 5}

local model = StatBook.multipleLinearRegression(X, Y)

print(model.lmNew.pValueF, model.lmOrig.pValueF, model.lmNew.BetaInfo.t, model.lmNew.BetaInfo.pValue) -- can return a lot more than that

-- rest is optional
local Xtest = {1, 5, 6}
local prediction = predictY(Xtest, model)

Subsequent Usage

After acquiring the model from module.multipleLinearRegression, you can employ the module.predictY(X, model, yHat, indices) function directly with the returned model to predict new ( Y ) values based on new ( X ) values. The model object contains all necessary coefficients and information for the prediction.

predictY(X, model, yHat, indices)

Overview

The predictY function predicts the dependent variable ( Y ) based on the independent variable ( X ) and the given model. Optionally, it allows for specific fitted values ( \hat{y} ) and predictor indices to be specified.

Parameters

Parameter Type Description Default
X Table The input vector containing independent variable values. -
model Table The regression model from multipleLinearRegression() -
yHat Table Optional. The fitted values for the intercept and coefficients. nil
indices Table Optional. The indices in the model to be used for prediction. nil

Returns

Return Type Description
YPred Number The predicted value of the dependent variable ( Y ).

Example

local X = {{1, 4, 7}, {2, 3, 5}, {3, 2, 1}, {4, 2, 2}, {5, 8, 3}, {3, 6, 2}}
local Y = {3, 3, 2, 2, 4, 5}

local model = StatBook.multipleLinearRegression(X, Y)

local Xtest = {1, 5, 6}
local YPred = module.predictY(Xtest, model)
print(YPred) 

Random Variate Generation - Go Beyond math.random! :game_die:

Why should you use this?

Ever found yourself doing something similar to this?

Tedious Coding
function skewedRandom()
    local randomNumber = math.random()
    
    if randomNumber < 0.2 then
        return 4
    elseif randomNumber < 0.3 then
        return 3 
    elseif randomNumber < 0.4 then
        return 5 
    elseif randomNumber < 0.5 then
        return 2  
    elseif randomNumber < 0.6 then
        return 6  
    elseif randomNumber < 0.7 then
        return 1
    elseif randomNumber < 0.8 then
        return 7
    elseif randomNumber < 0.9 then
        return 8 
    elseif randomNumber < 0.95 then
        return 9  
    else
        return 10 
    end
end

The standard library in Roblox’s Lua provides a basic random number generator through math.random(). While this function is useful for generating uniformly distributed random numbers, it is limited when it comes to generating numbers from other statistical distributions like Normal, Exponential, Gamma, or as a matter of fact, any type of other possible distribution.

With StatBook, there are easier ways with infinitely many possible distributions.

Functions

Scalable Random Generation with Distributions

generateStandardNormalScaled(...)

generateStandardNormalScaled(desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The generateStandardNormalScaled() function generates a scaled random number based on the standard normal distribution within the specified range.

Parameters

Parameter Type Description
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.001.
UQpercent Number Upper quantile percentage. Default is 0.999.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
random Number A scaled random number in the range [desiredMin, desiredMax].

Example

local random = StatBook.generateStandardNormalScaled(-10, 10)
print(random)  -- Value between -10 and 10
generateNormalScaled(...)

generateNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The generateNormalScaled() function generates a scaled random number based on a normal distribution with a specified mean (( \mu )) and standard deviation (( \sigma )) within the desired range.

Parameters

Parameter Type Description
mu Number The mean of the normal distribution.
sigma Number The standard deviation of the normal distribution.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.001.
UQpercent Number Upper quantile percentage. Default is 0.999.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
random Number A scaled random number in the range [desiredMin, desiredMax].

Example

local random = StatBook.generateNormalScaled(0, 1, -10, 10)
print(random)  -- Output will vary
generateLogNormalScaled(...)

generateLogNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The generateLogNormalScaled() function generates a scaled random number based on a log-normal distribution with a specified mean (( \mu )) and standard deviation (( \sigma )) within the desired range.

Parameters

Parameter Type Description
mu Number The mean of the log-normal distribution.
sigma Number The standard deviation of the log-normal distribution.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.
UQpercent Number Upper quantile percentage. Default is 0.999.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
random Number A scaled random number in the range [desiredMin, desiredMax].

Example

local random = StatBook.generateLogNormalScaled(0, 1, 1, 100)
print(random)  -- Output will vary
generateCauchyScaled(...)

generateCauchyScaled(x0, gamma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The function generates a scaled random number based on a Cauchy distribution with a specified location parameter (( x_0 )) and scale parameter (( \gamma )) within the desired range.

Parameters

Parameter Type Description
x0 Number The location parameter of the Cauchy distribution.
gamma Number The scale parameter of the Cauchy distribution.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.001.
UQpercent Number Upper quantile percentage. Default is 0.999.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
random Number A scaled random number in the range [desiredMin, desiredMax].

Example

local random = StatBook.generateCauchyScaled(0, 1, -10, 10)
print(random)  -- Output will vary
generateExponentialScaled(...)

generateExponentialScaled(lambda, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The function generates a scaled random number based on an Exponential distribution with a specified rate parameter (( \lambda )) within the desired range.

Parameters

Parameter Type Description
lambda Number The rate parameter of the Exponential distribution.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.
UQpercent Number Upper quantile percentage. Default is 0.999.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
random Number A scaled random number in the range [desiredMin, desiredMax].

Example

local random = StatBook.generateExponentialScaled(1, 0, 10)
print(random)  -- Output will vary
generateGammaScaled(...)

generateGammaScaled(alpha, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The function generates a scaled random number based on a Gamma distribution with a specified shape parameter (\( \alpha \)) within the desired range.

Parameters

Parameter Type Description
alpha Number The shape parameter of the Gamma distribution.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.
UQpercent Number Upper quantile percentage. Default is 0.999.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
random Number A scaled random number in the range [desiredMin, desiredMax].

Example

local random = StatBook.generateGammaScaled(2, 0, 10)
print(random)  -- Output will vary
generateBetaScaled()

generateBetaScaled(alpha, beta, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

The function generates a scaled random number based on a Beta distribution with specified shape parameters (( \alpha ) and ( \beta )) within the desired range.

Parameters

Parameter Type Description
alpha Number The first shape parameter of the Beta distribution.
beta Number The second shape parameter of the Beta distribution.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.
LQpercent Number Lower quantile percentage. Default is 0.
UQpercent Number Upper quantile percentage. Default is 1.
lowerQuantile Number Lower quantile value. Calculated by default if not provided.
upperQuantile Number Upper quantile value. Calculated by default if not provided.

Returns

Return Type Description
scaledX Number A scaled random number in the range [desiredMin, desiredMax].

Example

local scaledX = StatBook.generateBetaScaled(2, 5, 0, 1)
print(scaledX)  -- Output will vary

Make a Customized Distribution out of Function(s)

customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)

customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)

Overview

The function generates a random number based on custom piecewise functions within the desired range.

Parameters

Parameter Type Description
piecewiseFunctions Table A table containing subtables, each with a function string, ( x_{\text{min}} ), and ( x_{\text{max}} ) for each piecewise function.
desiredMin Number The minimum desired value of the scaled random number.
desiredMax Number The maximum desired value of the scaled random number.

Returns

Return Type Description
randomX Number A scaled random number in the range [desiredMin, desiredMax].

Example

local functions = {{"x^2", 0, 2}, {"2*x", 2, 4}}
local randomX = StatBook.customizedDistribution(functions, 0, 10)
print(randomX)  -- Output will vary
Special Instructions for using Customized Distribution

To use the Customized Distribution feature, you need to provide input in the form of piecewise functions along with the desired minimum and maximum value range you want. Each piecewise function consists of three parts:

  • A function string (userFunctionString), which is a string representation of the mathematical function you want to evaluate. Your function string can utilize any of the standard Lua math library functions, as they are available in the environment where the function is evaluated.
  • Minimum (xMin) and maximum (xMax) x-values for the domain of the piecewise function.

Use Desmos to help create any distribution you want.

Example

local piecewiseFunctions = {
	{"math.sqrt(-(5 * x + 1) + 1) * (x + 1)", -1, 0}, 
	{"math.sqrt(-(-5 * x + 1) + 1) * (-x + 1)", 0, 1}
}
local desiredMin = -100
local desiredMax = 100
local result = StatBook.customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)

Graph on Desmos


Randomly Generate Continuous Variable From Discrete Dataset w/ KDE

randomFromDataset(values, kernel, percentageOfFrameTime, bandwidth)

randomFromDataset(values, kernel, percentageOfFrameTime, bandwidth)

Overview

Generates a random number based on a given dataset using Kernel Density Estimation (KDE).

Parameters

Parameter Type Description
values Table The dataset from which to generate the random number.
kernel* String The type of kernel to use for the KDE. Default is Gaussian.
percentageOfFrameTime Number The percentage of frame time allowed for the function to run. Default is 0.1.
bandwidth Number The bandwidth to use in the KDE. Calculated by default if not provided.

*Options for kernel are: “Gaussian”, “Epanechnikov”, “Uniform”, “Triangular”, “Biweight”, “Cosine”, “Logistic”, and “Sigmoid”.

Returns

Return Type Description
xRandom Number A random number generated based on the KDE of the dataset.

Example

local dataset = {1, 2, 3, 3, 4, 4, 5, 6, 7}
local kernelType = "Gaussian"
local percentageOfFrameTime = 0.1
local randomValue = StatBook.randomFromDataset(dataset, kernelType, percentageOfFrameTime)
print(randomValue)  -- Output will vary

Simulate a sequence of actions through a Markov Chain

markovChain(states, transitionProbs, startState, length, returnFullSequence)

markovChain(states, transitionProbs, startState, length, returnFullSequence)

Overview

Generates a sequence of states based on a Markov Chain model.

Parameters

Parameter Type Description
states Table List of possible states in the Markov Chain.
transitionProbs Table Transition probability matrix between states.
startState Any The state to start the sequence from.
length Number The length of the sequence to be generated.
returnFullSequence Boolean Whether to return the full sequence or just the final state. Default is false.

Returns

Return Type Description
sequence or sequence[length] Table or Any Returns the entire sequence if returnFullSequence is true; otherwise, returns the last state.

Example

local states = {"Sunny", "Cloudy", "Rainy"}
local transitionProbs = {
	Sunny = {Sunny = 0.8, Cloudy = 0.15, Rainy = 0.05},
	Cloudy = {Sunny = 0.2, Cloudy = 0.6, Rainy = 0.2},
	Rainy = {Sunny = 0.1, Cloudy = 0.3, Rainy = 0.6}
}
local startState = "Sunny"
local length = 10
local sequence = StatBook.markovChain(states, transitionProbs, startState, length, true)
print(sequence)  -- Output will be a table representing the sequence

Do you see yourself using this module? Be honest : )

  • Yes
  • Maybe
  • Probably Not
  • No

0 voters

Please, provide feedback, tell me anything you would like to see implemented in the future, or announce any bugs/malfunctions/inaccuracies you encounter below!

StatBook V1.1 is coming… which means I am eager to add more! Tell me in the comments below, anything you would like to be seen implemented by me! :smiley:

12 Likes

If you choose to use the scaled distributions option, heres a guide:

NOTE: The Special Instructions are for all cases. It is VERY helpful.

Special Instructions for Scaled Generation (ex. generateGammaScaled())

Inverse CDF Calculation link for Random Variable Generation *translate the page to english

The default of the LQpercent can be either 0 or 0.001, and the default UQpercent can be 0.999 or 1, both depending on the distribution used. The values LQpercent and UQpercent are inverse CDF values (which then correspond to a certain X value). The X value resulting from computing the Inverse CDF of LQpercent or UQpercent gives us the lower/upper bound to which our desiredMin and desiredMax will be bounded.

Therefore, in order to have an idea of the shape of the distribution resulting when we select parameters for the distribution, we need to know the inverse CDF and where the desiredMin and desiredMax are being mapped to on this distribution. An example below should clearly illustrate how to get results you want.

Let’s say our goal is to use a distribution from which to pull random variates. We want our target range of random values possible to return to be between 5 and 10. Furthermore, let’s say we want this distribution to have a peak probability to return 6, where 5 and 10 each have the lowest probability.

The first step to do when analyzing how to write our function, is to shift this range to (0, desiredMax - desiredMin) when analyzing, even if we will be putting 5 as desiredMin and 10 as desiredMax. Thus, for the steps below, our range will be 0 to 5, where 1 will be the peak probability of interest.

Let us say we want to use the Gamma function. We need to see where our Gamma distribution is decided to be “trimmed” as it is a distribution that is defined positively until infinity. The places X where our distribution are trimmed on he lower end and upper end are defined by the default LQpercent or UQpercent, which LQpercent = 0 and UQpercent = 0.999 for the StatBook.randomGammaScaled() function.

Now. we go onto this link I shared above to see what the Inverse CDF looks like and how the alpha parameter works for the Gamma distribution. We do this by navigating to Start → Gamma Distribution → Gamma Distribution (percentage point). It should bring this page.

In order to manipulate the Gamma distribution to fit specific criteria, we need to adjust its parameters. The goal is to find a combination of parameters that modifies the range and maximum probability of the distribution to specific multiples of a base set.

Let’s denote the base set as (minRange, maxProbability, maxRange) → (0, 1, 5).

By changing the alpha and UQpercent parameters (since beta is fixed at 1), we aim to transform this base set into a new set, where each element is a multiple of the corresponding element in the base set. For example:

  • Base set multiplied by 0.5: (0 * 0.5, 1 * 0.5, 5 * 0.5) = (0, 0.5, 5)
  • Base set multiplied by 2: (0 * 2, 1 * 2, 5 * 2) = (0, 2, 10)
  • and others…

In an example with UQpercent set to 0.9972 and alpha set to 3, the transformed set becomes (0, 2, 10), which fits our criteria of being a multiple of the base set.

In the image, the green circle shows us that the minimum range is 0, the black oval shows us that the highest probability is 2, and the blue circles show that our maximum range is 10. Since (0, 2, 10) is a multiple of 2 of our (0,1,5), which then (0, 1, 5) can be transformed to be ((0+5), (1+5), (5+5)) → (5, 6, 10), we see that this works. Remember, we had to start with 0 as our minRange as the gamma function starts from there.

We already determined that (0, 2, 10) which is compatible with our intended (5, 6, 10) is achievable with alpha = 3 and UQpercent = 0.9972. Thus, our function to return a random variate using the Gamma distribution ranging from 5 to 10 as possible values, where values closer to 6 are the most common outcomes, is:

local randomVariate = StatBook.generateGammaScaled(3, 5, 10, 0, 0.9972)

A continuation of Random Variate Generation - not scalable

Functions

Random Generation with Distributions (no scale, only parameter tuning)

generateStandardNormal()

generateStandardNormal()

Overview

The generateStandardNormal() function generates a random number that follows a standard normal distribution using the Box-Muller transform.

Parameters

No parameters are required.

Returns

Return Type Description
x Number A random number from a standard normal distribution.

Example

local x = StatBook.generateStandardNormal()
print(x)  -- Output will vary
generateNormal(mu, sigma)

generateNormal(mu, sigma)

Overview

The generateNormal function generates a random number following a normal distribution characterized by a given mean (mu) and standard deviation (sigma).

Parameters

Parameter Type Description Default
mu Number The mean of the normal distribution. -
sigma Number The standard deviation of the normal distribution. -

Returns

Return Type Description
x Number A random number following the specified normal distribution.

Example

local mu = 0
local sigma = 1
local randomNum = StatBook.generateNormal(mu, sigma)
print(randomNum)  -- Output will vary based on random generation
generateGamma(alpha, beta)

generateGamma(alpha, beta)

Overview

The generateGamma(alpha, beta) function generates a random number that follows a Gamma distribution with the given shape parameter ( \alpha ) and scale parameter ( \beta ). The function uses the Marsaglia and Tsang method for this purpose.

Parameters

Parameter Type Description Default
alpha Number The shape parameter of the Gamma distribution. -
beta Number The scale parameter of the Gamma distribution. -

Returns

Return Type Description
x Number A random number from a Gamma distribution with parameters ( \alpha ) and ( \beta ).

Example

local alpha = 2
local beta = 1
local x = StatBook.generateGamma(alpha, beta)
print(x)  -- Output will vary
generateInverseGamma(alpha, beta)

generateInverseGamma(alpha, beta)

Overview

The generateInverseGamma(alpha, beta) function generates a random number that follows an Inverse Gamma distribution using the Marsaglia and Tsang method for Gamma distribution and then taking the reciprocal.

Parameters

Parameter Type Description
alpha Number The shape parameter of the Inverse Gamma distribution.
beta Number The scale parameter of the Inverse Gamma distribution.

Returns

Return Type Description
x Number A random number from an Inverse Gamma distribution.

Example

local x = StatBook.generateInverseGamma(2, 1)
print(x)  -- Output will vary
generateExponential(lambda)

generateExponential(lambda)

Overview

The generateExponential(lambda) function generates a random number that follows an Exponential distribution.

Parameters

Parameter Type Description
lambda Number The rate parameter of the Exponential distribution.

Returns

Return Type Description
x Number A random number from an Exponential distribution.

Example

local x = StatBook.generateExponential(0.5)
print(x)  -- Output will vary
generateBeta(alpha, beta)

generateBeta(alpha, beta)

Overview

The generateBeta(alpha, beta) function generates a random number that follows a Beta distribution.

Parameters

Parameter Type Description
alpha Number The first shape parameter of the Beta distribution.
beta Number The second shape parameter of the Beta distribution.

Returns

Return Type Description
result Number A random number from a Beta distribution.

Example

local result = StatBook.generateBeta(2, 5)
print(result)  -- Output will vary
generateBetaPrime(alpha, beta)

generateBetaPrime(alpha, beta)

Overview

The generateBetaPrime(alpha, beta) function generates a random number that follows a beta prime distribution.

Parameters

Parameter Type Description
alpha Number The shape parameter alpha for the beta prime distribution. Must be greater than 0.
beta Number The shape parameter beta for the beta prime distribution. Must be greater than 0.

Returns

Return Type Description
result Number A random number from a beta prime distribution.

Example

local result = StatBook.generateBetaPrime(1, 1)
print(result)  -- Output will vary
generateLogNormal(mu, sigma)

generateLogNormal(mu, sigma)

Overview

The generateLogNormal(mu, sigma) function generates a random number that follows a log-normal distribution.

Parameters

Parameter Type Description
mu Number The mean parameter of the underlying normal distribution.
sigma Number The standard deviation parameter of the underlying normal distribution.

Returns

Return Type Description
result Number A random number from a log-normal distribution.

Example

local result = StatBook.generateLogNormal(0, 1)
print(result)  -- Output will vary

generateLevy(c, mu)

Overview

The generateLevy(c, mu) function generates a random number that follows a LĂ©vy distribution.

Parameters

Parameter Type Description
c Number The scale parameter for the LĂ©vy distribution. Must be greater than 0.
mu Number The location parameter for the LĂ©vy distribution.

Returns

Return Type Description
result Number A random number from a LĂ©vy distribution.

Example

local result = StatBook.generateLevy(1, 0)
print(result)  -- Output will vary
generatePoisson(lambda)

generatePoisson(lambda)

Overview

The generatePoisson(lambda) function generates a random number that follows a Poisson distribution.

Parameters

Parameter Type Description
lambda Number The average rate of events per interval for the Poisson distribution. Must be greater than 0.

Returns

Return Type Description
result Number A random number from a Poisson distribution.

Example

local result = StatBook.generatePoisson(5)
print(result)  -- Output will vary
generateCauchy(x0, gamma)

generateCauchy(x0, gamma)

Overview

The generateCauchy(x0, gamma) function generates a random number that follows a Cauchy distribution.

Parameters

Parameter Type Description
x0 Number The location parameter of the Cauchy distribution.
gamma Number The scale parameter of the Cauchy distribution.

Returns

Return Type Description
result Number A random number from a Cauchy distribution.

Example

local result = StatBook.generateCauchy(0, 1)
print(result)  -- Output will vary
generateWeibull(alpha, beta)

generateWeibull(alpha, beta)

Overview

The generateWeibull(alpha, beta) function generates a random number that follows a Weibull distribution.

Parameters

Parameter Type Description
alpha Number The scale parameter of the Weibull distribution.
beta Number The shape parameter of the Weibull distribution.

Returns

Return Type Description
result Number A random number from a Weibull distribution.

Example

local result = StatBook.generateWeibull(1, 2)
print(result)  -- Output will vary
generateChiSquare(df)

generateChiSquare(df)

Overview

The generateChiSquare(df) function generates a random number that follows a Chi-Square distribution with degrees of freedom df.

Parameters

Parameter Type Description
df Number Degrees of freedom for the Chi-Square distribution.

Returns

Return Type Description
result Number A random number from a Chi-Square distribution.

Example

local result = StatBook.generateChiSquare(5)
print(result)  -- Output will vary
generatePareto(alpha, xm)

generatePareto(alpha, xm)

Overview

The generatePareto(alpha, xm) function generates a random number that follows a Pareto distribution with shape parameter alpha and scale parameter xm.

Parameters

Parameter Type Description
alpha Number The shape parameter for the Pareto distribution.
xm Number The scale parameter for the Pareto distribution.

Returns

Return Type Description
result Number A random number from a Pareto distribution.

Example

local result = StatBook.generatePareto(2, 1)
print(result)  -- Output will vary
generateT(df)

generateT(df)

Overview

The generateT(df) function generates a random number that follows a Student’s t-distribution with df degrees of freedom.

Parameters

Parameter Type Description
df Number The degrees of freedom for the t-distribution.

Returns

Return Type Description
result Number A random number from a Student’s t-distribution.

Example

local result = StatBook.generateT(10)
print(result)  -- Output will vary
3 Likes

Categorical Hypothesis Testing

Functions
oddsRatio(O11, O12, O21, O22, CL)

oddsRatio(O11, O12, O21, O22, CL)

Overview

The oddsRatio function calculates the odds ratio for a 2x2 contingency table, along with the confidence intervals and hypothesis testing for independence. It can be particularly useful in epidemiological studies and statistical analysis of categorical data.

Parameters

Parameter Type Description Default
O11 Number Count for group 1 with characteristic A. -
O12 Number Count for group 1 without characteristic A. -
O21 Number Count for group 2 with characteristic A. -
O22 Number Count for group 2 without characteristic A. -
CL Number Confidence level for the confidence interval of the odds ratio. 0.95

Returns

Return Type Description
OR Number The calculated odds ratio.
rejectH0 Bool Whether to reject the null hypothesis of independence.
lowerCI Number Lower bound of the confidence interval for the odds ratio.
upperCI Number Upper bound of the confidence interval for the odds ratio.

Example

local O11 = 13
local O12 = 9
local O21 = 8
local O22 = 6
local CL = 0.95
local result = oddsRatio(O11, O12, O21, O22, CL)
print(result.OR, result.rejectH0, result.lowerCI, result.upperCI)  -- Output will vary based on the input
oneSampleProportionCI(k, n, CL)

oneSampleProportionCI(k, n, CL)

Overview

The oneSampleProportionCI function calculates a confidence interval for a proportion in a statistical population, based on the proportion observed in a sample. The function employs the Wald-Agresti-Coull (WAC) method, a modified version of the standard Wald method to calculate the confidence interval.

Parameters

Parameter Type Description Default
k Number Number of successful outcomes in the sample. -
n Number Total number of trials in the sample. -
CL Number Confidence level for the confidence interval. 0.95

Returns

Return Type Description
pHat Number The estimated proportion based on the sample.
lowerCI Number Lower bound of the confidence interval for the proportion.
upperCI Number Upper bound of the confidence interval for the proportion.
testType String Specifies the type of test conducted, in this case, “One Sample Proportion CI”.

Example

local k = 55
local n = 100
local CL = 0.95
local result = oneSampleProportionCI(k, n, CL)
print(result.pHat, result.lowerCI, result.upperCI, result.testType)  -- Output will vary based on the input
singleProportionInference(k, n, p, CL)

singleProportionInference(k, n, p, CL)

Description

The singleProportionInference(k, n, p, CL) function performs hypothesis testing for a single proportion. It chooses between using a Large Sample Proportion Test or an Exact Binomial Test based on the sample size and the probability.

Parameters

Parameter Type Description Default
k number The number of successes in the sample. Required
n number The sample size. Required
p number The hypothesized population proportion. Required
CL number The Confidence Level for the test. 0.95

Returns

Variable Type Description
pValue number The p-value of the test.
rejectH0 boolean Indicates whether to reject the null hypothesis.
stat number The test statistic (Z for Large Sample, None for Exact).
df number Degrees of freedom (1 for Large Sample, None for Exact).
pTest number The hypothesized population proportion.
pHat number The sample proportion.
lowerCI number Lower bound of the confidence interval.
upperCI number Upper bound of the confidence interval.
parametric boolean Indicates if the test is parametric (true for Large Sample, false for Exact).
testType string Type of the test conducted (“Large Sample Proportion Test” or “Exact Binomial Test”).
statType string Type of the statistic used (“Z” for Large Sample, None for Exact).

Examples

-- Example 1: Large sample size
local result = StatBook.singleProportionInference(40, 100, 0.35, 0.95)
-- Output will show Large Sample Proportion Test results


-- Example 2: Small sample size
local result = StatBook.singleProportionInference(4, 10, 0.35, 0.95)
-- Output will show Exact Binomial Test results

##Notes

  • If (n * p) >= 5 and (n * (1 - p)) >= 5, a Large Sample Proportion Test is conducted.
  • Otherwise, an Exact Binomial Test is conducted.
twoProportionInference(k1, n1, k2, n2, CL)

twoProportionInference(k1, n1, k2, n2, CL)

Overview

The twoProportionInference function performs statistical inference on two independent proportions. It calculates the confidence interval and p-value for the difference between two proportions ( p_1 ) and ( p_2 ).

Parameters

Parameter Type Description Default
k1 Number Number of successful outcomes in the first sample. -
n1 Number Total number of trials in the first sample. -
k2 Number Number of successful outcomes in the second sample. -
n2 Number Total number of trials in the second sample. -
CL Number Confidence level for the confidence interval. 0.95

Returns

Return Type Description
pValue Number The p-value of the Z-test.
rejectH0 Boolean Whether to reject the null hypothesis at the given alpha.
stat Number The Z-score of the test.
pHat Table → Number Estimated proportions for both samples and overall.
lowerCI Number Lower bound of the confidence interval for ( p_1 - p_2 ).
upperCI Number Upper bound of the confidence interval for ( p_1 - p_2 ).
parametric Boolean Whether the test is parametric (always true for Z-test).
testType String Specifies the type of test, “Two Proportion Test”.
statType String Specifies the type of statistic used, “Z”.
warning Boolean Whether the sample size is too small for a reliable test.

Example

local k1 = 50
local n1 = 100
local k2 = 40
local n2 = 90
local CL = 0.95
local result = twoProportionInference(k1, n1, k2, n2, CL)
print(result.pValue, result.rejectH0, result.stat, result.lowerCI, result.upperCI)  -- Output will vary based on the input
goodnessOfFit(observed, expectedProportions, CL)

goodnessOfFit(observed, expectedProportions, CL)

Overview

The goodnessOfFit function performs a Pearson’s Chi-Squared Goodness of Fit Test. This test is used to determine if the observed frequency distribution of a variable matches the expected frequency distribution.

Parameters

Parameter Type Description Default
observed Table Array of observed frequencies for each category. -
expectedProportions Table Array of expected proportions for each category. -
CL Number Confidence level for the test. 0.95

Returns

Return Type Description
pValue Number The p-value of the Chi-Squared Test.
rejectH0 Boolean Whether to reject the null hypothesis at the given alpha.
stat Number The Chi-Squared statistic.
df Number The degrees of freedom.
parametric Boolean Whether the test is parametric (always true for this test).
testType String Specifies the type of test, “Pearson’s Goodness of Fit Test”.
statType String Specifies the type of statistic used, “Chi-Square”.
warning Boolean Whether the sample size is too small for a reliable test.

Example

local observed = {50, 40, 30, 25}
local expectedProportions = {0.3, 0.3, 0.2, 0.2}
local CL = 0.95
local result = goodnessOfFit(observed, expectedProportions, CL)
print(result.pValue, result.rejectH0, result.stat, result.df, result.warning)  -- Output will vary based on the input
chiSquareIndependence(matrix, CL)

chiSquareIndependence(matrix, CL)

Overview

The chiSquareIndependence function performs Pearson’s Chi-Squared Test for Independence OR Homogeneity. This test checks whether two categorical variables are independent of each other.

Parameters

Parameter Type Description Default
matrix Table The contingency table as a 2D array. -
CL Number Confidence level for the test. 0.95

Returns

Return Type Description
pValue Number The p-value of the Chi-Squared Test.
rejectH0 Boolean Whether to reject the null hypothesis at the given alpha.
stat Number The Chi-Squared statistic.
df Number The degrees of freedom.
parametric Boolean Whether the test is parametric (always true for this test).
testType String Specifies the type of test, “Pearson’s Test for Independence/Homogeneity”.
statType String Specifies the type of statistic used, “Chi-Square”.
warning Boolean Whether the sample size is too small for a reliable test.

Example

local matrix = {{19, 24}, {43, 32}}
local CL = 0.95
local result = chiSquareIndependence(matrix, CL)
print(result.pValue, result.rejectH0, result.stat, result.df, result.warning)  -- Output will vary based on the input

Basic Statistics :bar_chart:

Functions
mean(list)

mean(list)

Overview

The mean function calculates the arithmetic mean, commonly known as the average, of a given list of numbers. The function sums up all the elements in the list and divides it by the total number of elements to determine the mean value.

Parameters

Parameter Name Type Description Required Default Value
list table A list of numerical values for which the mean will be calculated. The list must contain at least one numerical value. Yes N/A

Returns

Type Description Possible Values
number The mean (average) of the elements in the list. The return value will be a floating-point number if the mean is not an integer. Any numerical value

Constraints

  • The list parameter must be a table containing numerical values only.
  • The table must have at least one element; otherwise, the function will return an undefined result due to division by zero.

Example Use

local myList = {1, 2, 3, 4, 5}
local result = StatBook.mean(myList)

print(result)  -- Output will be 3
median(list)

median(list)

Overview

The median function calculates the median value from a given list of numbers. The median is the middle value in a data set sorted in ascending order. For a list with an odd number of elements, the median is the exact middle value. For a list with an even number of elements, the median is the average of the two middle values.

Parameters

Parameter Name Type Description Required Default Value
list table A list of numerical values to find the median from. The list must contain at least one numerical value. Yes N/A

Returns

Type Description Possible Values
number or nil The median value of the elements in the list. If the list is empty or nil values are encountered, returns nil. Any numerical value or nil

Constraints

  • The list parameter must be a table containing numerical values only.
  • The table must have at least one element; otherwise, the function will return nil.

Example Use

local myList = {7, 2, 3, 6, 5}
local result = StatBook.median(myList)

print(result)  -- Output will be 5
mode(list)

mode(list)

Overview

The mode function calculates the mode(s) of a given list of numbers. The mode is the number(s) that appear most frequently in the data set. If multiple numbers have the same highest frequency, all of them are returned as modes in a table.

Parameters

Parameter Name Type Description Required Default Value
list table A list of numerical values to find the mode from. The list must contain at least one numerical value. Yes N/A

Returns

Type Description Possible Values
table A table containing the mode(s) of the list. If there are multiple modes, all will be included in the returned table. A table containing numerical values

Constraints

  • The list parameter must be a table containing numerical values only.
  • The table must have at least one element; otherwise, the function will return an empty table.

Example Use

local myList = {1, 2, 3, 2, 2, 4}

local result = StatBook.mode(myList)

-- The modes of the list is 2 as it appears most frequently
for _, v in ipairs(result) do
    print(v)
end
range(list)

range(list)

Overview

The range function calculates the range of a given list of numbers. The range is the difference between the maximum and minimum values in the list.

Parameters

Parameter Name Type Description Required Default Value
list table A list of numerical values to find the range from. The list must contain at least two numerical values. Yes N/A

Returns

Type Description Possible Values
number The range of the list, calculated as the difference between the maximum and minimum values. Any numerical value

Constraints

  • The list parameter must be a table containing numerical values only.
  • The table must have at least two elements; otherwise, the function will return an undefined result.

Example Use

local myList = {1, 2, 3, 4, 5}

local result = StatBook.range(myList)

-- The range of the list {1, 2, 3, 4, 5} is (5 - 1) = 4
print(result)
interquartileRange(values)

interquartileRange(values)

Overview

The interquartileRange function calculates the Interquartile Range (IQR) of a given list of numbers. The IQR is the range between the first quartile (Q1) and the third quartile (Q3) of a data set, providing a measure of statistical dispersion.

Parameters

Parameter Name Type Description Required Default Value
values table A list of numerical values for which the IQR will be calculated. The list must contain at least two numerical values. Yes N/A

Returns

Type Description Possible Values
number The Interquartile Range (IQR) of the elements in the list. The return value will be a floating-point number. Any numerical value

Constraints

  • The values parameter must be a table containing numerical values only.
  • The table must have at least two elements; otherwise, an error will be thrown.

Example Use

local myValues = {1, 2, 3, 4, 5}

-- The IQR of the list {1, 2, 3, 4, 5} will be calculated
local result = StatBook.interquartileRange(myValues)

-- Output will be the calculated IQR
print(result)
variance(list)

variance(list)

Overview

The variance function calculates the sample variance of a given list of numbers. Variance is a statistical measurement of the spread between numbers in a dataset.

Parameters

Parameter Name Type Description Required Default Value
list table A list of numerical values for which the variance will be calculated. The list must contain at least two numerical values. Yes N/A

Returns

Type Description Possible Values
number The sample variance of the elements in the list. The return value will be a floating-point number. Any numerical value

Constraints

  • The list parameter must be a table containing numerical values only.
  • The table must have at least two elements; otherwise, the function will return 0 as there’s not enough data to calculate the variance.
  • It finds the sample variance, not population variance.

Example Use

local myList = {1, 2, 3, 4, 5}

-- The sample variance will be 2.5
local result = StatBook.variance(myList)
print(result)
standardDeviation(list)

standardDeviation(list)

Overview

The standardDeviation function calculates the sample standard deviation (SD) of a given list of numbers. The standard deviation is a measure of the amount of variation or dispersion of a set of values.

Parameters

Parameter Name Type Description Required Default Value
list table A list of numerical values for which the standard deviation will be calculated. The list must contain at least two numerical values. Yes N/A

Returns

Type Description Possible Values
number The sample standard deviation of the elements in the list. The return value will be a floating-point number. Any numerical value

Constraints

  • The list parameter must be a table containing numerical values only.
  • The table must have at least two elements; otherwise, the function will return 0 as there’s not enough data to calculate the standard deviation.
  • It finds the sample SD, not population SD.

Example Use

local myList = {1, 2, 3, 4, 5}

-- The standard deviation of the list {1, 2, 3, 4, 5} will be calculated
local result = StatBook.standardDeviation(myList)

-- Output will be the calculated standard deviation
print(result)
sumOfSquares(list)

sumOfSquares(list)

Overview

The sumOfSquares function calculates the sum of squares of deviations from the mean for a given list of numbers. The function utilizes the mean of the list to calculate each deviation.

Parameters

Parameter Type Description
list table A list of numbers to calculate the sum of squares for.

Returns

Type Description
number The sum of squares of deviations from the mean.

Example Use

local StatBook = require("StatBook")
local myList = {1, 2, 3, 4, 5}
local result = StatBook.sumOfSquares(myList)
print(result)  -- Output will depend on the values in myList
factorial(x)

factorial(x)

Overview

The factorial function computes the factorial of a given non-negative integer ( n ). The factorial, denoted ( n! ), is the product of all positive integers less than or equal to ( n ). For example, ( 5! = 5 * 4 * 3 * 2 * 1 = 120 ).

Parameters

Parameter Name Type Description Required Default Value
x number The non-negative integer for which the factorial will be calculated. Yes N/A

Returns

Type Description Possible Values
number The factorial of the input number ( x ). Any non-negative integer

Constraints

  • The x parameter must be a non-negative integer.
  • Factorial of negative integers is undefined, so such input should be avoided.

Example Use

local number = 5

-- The factorial of 5 is 5 * 4 * 3 * 2 * 1 = 120
local result = StatBook.factorial(number)

print(result)

Complex Functions :chart_with_upwards_trend::chart_with_downwards_trend:

Functions
erf(x)

erf(x)

Overview

The erf function computes the error function of a given real number ( x ). The error function is defined as:

[\text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt]

In this implementation, the error function is approximated by a series expansion up to 100 terms.

Parameters

Parameter Name Type Description Required Default Value
x number The real number for which the error function will be computed. Yes N/A

Returns

Type Description Possible Values
number The error function value of the input ( x ). Any real number

Constraints

  • The x parameter must be a real number.

Example Use

local number = 1.0

-- The error function of 1.0 will be calculated
local result = StatBook.erf(number)

-- Output will be the error function value for 1.0
print(result)
inverf(x)

inverf(x)

Overview

The inverf function calculates the inverse of the error function ( \text{erf}^{-1}(x) ) using the Newton-Raphson method for numerical approximation.

Parameters

Parameter Type Description
x Number The value to find the inverse error function of. Must be in the range ([-1, 1]).

Returns

Return Type Description
inv Number The calculated inverse error function value ( \text{erf}^{-1}(x) ).

Example

local x = 0.5
local result = StatBook.inverf(x)
print(result)  -- Output will be approximately 0.4769
gamma(x)

gamma(x)

Overview

The gamma function calculates the Gamma function ( \Gamma(x) ) using the Lanczos approximation method.

Parameters

Parameter Type Description
x Number The value to find the Gamma function of.

Returns

Return Type Description
gam Number The calculated Gamma function value ( \Gamma(x) ).

Example

local x = 5
local result = StatBook.gamma(x)
print(result)
hypergeometric2f1(a, b, c, z)

hypergeometric2f1(a, b, c, z)

Overview

The hypergeometric2f1 function calculates the hypergeometric function ( , _2F_1(a, b; c; z) ) using a series approximation.

Parameters

Parameter Type Description
a Number First parameter of the hypergeometric function.
b Number Second parameter of the hypergeometric function.
c Number Third parameter of the hypergeometric function.
z Number Argument for which the hypergeometric function is calculated.

Returns

Return Type Description
hypergeom Number The calculated hypergeometric function value.

Example

local a = 1
local b = 2
local c = 3
local z = 0.5
local result = StatBook.hypergeometric2f1(a, b, c, z)
print(result)  -- Output will vary depending on input parameters
incompleteBeta(a, b, x)

incompleteBeta(a, b, x)

Overview

The incompleteBeta function calculates the incomplete Beta function ( I_x(a, b) ) for given parameters ( a ), ( b ), and ( x ).

Parameters

Parameter Type Description
a Number First parameter of the incomplete Beta function.
b Number Second parameter of the incomplete Beta function.
x Number Value at which the incomplete Beta function is evaluated.

Returns

Return Type Description
incbeta Number The calculated value of the incomplete Beta function.

Example

local a = 2.5
local b = 1.5
local x = 0.4
local result = StatBook.incompleteBeta(a, b, x)
print(result)  -- Output will vary depending on input parameters
regularizedIncompleteBeta(a, b, x)

regularizedIncompleteBeta(a, b, x)

Overview

The regularizedIncompleteBeta function calculates the regularized incomplete Beta function ( I_x(a, b) ) for given parameters ( a ), ( b ), and ( x ).

Parameters

Parameter Type Description
a Number First parameter of the regularized incomplete Beta function.
b Number Second parameter of the regularized incomplete Beta function.
x Number Value at which the regularized incomplete Beta function is evaluated.

Returns

Return Type Description
regincbeta Number The calculated value of the regularized incomplete Beta function.

Example

local a = 2.5
local b = 1.5
local x = 0.4
local result = StatBook.regularizedIncompleteBeta(a, b, x)
print(result)  -- Output will vary depending on input parameters

Matrix Operations :heavy_plus_sign::heavy_minus_sign::heavy_multiplication_x:

Functions
matAdd(A, B)

matAdd(A, B)

The matAdd function performs element-wise addition between two matrices A and B. Both matrices must have the same dimensions for the operation to be valid.

Parameters

Parameter Type Description Default
A table The first matrix, represented as a 2D table. Required
B table The second matrix, also represented as a 2D table. Required

Returns

Variable Type Description
C table A new matrix, represented as a 2D table, that is the result of A plus B.

Example

local A = {
  {1, 2},
  {3, 4}
}

local B = {
  {2, 1},
  {4, 3}
}

local result = StatsBook.matAdd(A, B) 
matSubtract(A, B)

matSubtract(A, B)

The matSubtract function performs element-wise subtraction between two matrices A and B. Both matrices must have the same dimensions for the operation to be valid.

Parameters

Parameter Type Description Default
A table The first matrix, represented as a 2D table. Required
B table The second matrix, also represented as a 2D table. Required

Returns

Variable Type Description
C table A new matrix, represented as a 2D table, that is the result of A minus B.

Example

local A = {
  {1, 2},
  {3, 4}
}

local B = {
  {2, 1},
  {4, 3}
}

local result = StatsBook.matSubtract(A, B) 

Notes

  • Both matrices A and B must have the same dimensions. Otherwise, the function may throw an error or return incorrect results.
matMult(A, B)

matMult(A, B)

The matMult function performs matrix multiplication between two matrices A and B. The function assumes that the matrices are in correct dimensions for multiplication to proceed. It returns a new matrix C which is the result of the multiplication.

Parameters

Parameter Type Description Default
A table The first matrix, represented as a 2D table. Required
B table The second matrix, represented as a 2D table. Required

Returns

Variable Type Description
resultMatrix table A new matrix represented as a 2D table, resulting from A multiplied by B.

Example

local A = {
  {1, 2},
  {3, 4}
}

local B = {
  {2, 0},
  {1, 2}
}

local C = StatsBook.matMult(A, B) 

##Notes

  • The function does not handle cases where the matrices are not of compatible dimensions for multiplication. Make sure the number of columns in A matches the number of rows in B.
scalarMatMult(scalar, matrix)

scalarMatMult(scalar, matrix)

The scalarMatMult function multiplies a given scalar with every element of a provided matrix. The function returns a new matrix containing the results.

Parameters

Parameter Type Description Default
scalar number The scalar value to multiply with the matrix. Required
matrix table The matrix, represented as a 2D table. Required

Returns

Variable Type Description
resultMatrix table A new matrix represented as a 2D table, resulting from the scalar multiplication of the input matrix.

Example

local scalar = 2
local matrix = {
  {1, 2},
  {3, 4}
}

local result = StatsBook.scalarMatMult(scalar, matrix) 
matTranspose(matrix)

matTranspose(matrix)

The matTranspose function takes a given matrix matrix and returns its transpose. The transpose of a matrix is obtained by flipping the matrix over its diagonal.

Parameters

Parameter Type Description Default
matrix table The matrix to be transposed, represented as a 2D table. Required

Returns

Variable Type Description
resultMatrix table A new matrix represented as a 2D table, which is the transpose of the input matrix matrix.

Example

local matrix = {
  {1, 2},
  {3, 4},
  {5, 6}
}

local result = StatsBook.matTranspose(matrix) 
matInverse(matrix)

matInverse(matrix)

The matInverse function calculates the inverse of a square matrix, if it exists. The function will return nil if the matrix is not square or if the determinant is zero (indicating that the matrix is not invertible).

Parameters

Parameter Type Description Default
matrix table The square matrix to be inverted, represented as a 2D table. Required

Returns

Variable Type Description
inverseMatrix table A new matrix represented as a 2D table, which is the inverse of the input matrix.
nil nil If the matrix is not square or if the matrix is singular (determinant is zero).

Example

local matrix = {
  {2, -1, 0},
  {-1, 2, -1},
  {0, -1, 2}
}

local result = StatsBook.matInverse(matrix)

Notes

  • If the matrix is not square or if the matrix is singular (determinant is zero), the result is nil. Ensure your matrix is square and the determinant is not 0.
4 Likes

Interesting. Most of the functions here (the inferential and random variates specifically) aren’t particularly useful to the average developer and game development but is more suited towards something like plugins. I’ve thought of using 2 sample t-tests to my plugin at some point but I couldn’t ensure independence nor randomness with the samples.

Are there plans to add stuff like skewness and kurtosis? Or methods to detect outliers?

2 Likes

I have to agree with you that the inferential function will have limited use. These tests are independent → categorical (or continuous) and dependent → continuous, which will have limited use in game development. However, if you are indeed comparing two different populations of, for instance, Premium and non-Premium users (or male vs female) and seeing whether one is more likely to do something than the other, then it will have use.

I find it interesting that you say random variates are not particularly useful, when I was actually thinking that would be the most useful aspect of this module. For instance see this code below:

Problem with math.random
function skewedRandom()
    local randomNumber = math.random()
    
    if randomNumber < 0.2 then
        return 4
    elseif randomNumber < 0.3 then
        return 3 
    elseif randomNumber < 0.4 then
        return 5 
    elseif randomNumber < 0.5 then
        return 2  
    elseif randomNumber < 0.6 then
        return 6  
    elseif randomNumber < 0.7 then
        return 1
    elseif randomNumber < 0.8 then
        return 7
    elseif randomNumber < 0.9 then
        return 8 
    elseif randomNumber < 0.95 then
        return 9  
    else
        return 10 
    end
end

Do you know of a method that reliably gives near-zero discreteness of random generation of non-uniform distributions only being able to use math.random?

Sure, I guess your point is that the discreteness is a feature which developers are willing to trade for simplicity (kinda…) when trying to mimic a non-uniform distribution. But if one really cares about their probability distribution to be purely continuous with more detailed and malleable shapes, then my module provides a number of solutions to that problem.

You are right that independent t-tests (or any kind of congruent test w different # of categories like ANOVA) need to ensure independence and randomness. Randomness is a hard one to nail down, and ultimately, there is nothing I can do in my code to steer the user towards collecting a random sample.

However, since you mention t-tests generally (which I presume includes the category of a paired t-test), I don’t see why independence would be a requirement. If you have a dependent test (checking same sample over different treatments), you just need the right type of test (paired t-test, signed rank, Friedman Test).

Absolutely, and thanks for the idea! That will be easy and effective to implement.

1 Like

wow i dont know what this is and i didnt even read it
so innovational
im definetely gonna use this.
great​:grin::+1:t2:

3 Likes

I will be uploading a simple video tutorial soon. I understand it’s difficult for people to follow along just from text alone.

2 Likes