𝚺 StatBook 𝚽 [V1 FREE] | One-Line Statistics with MLR, ANOVA, KDE + 52 others!

armenia1997 · October 4, 2023, 11:40am

INTRODUCING StatBook v1!

Project Logo

Who needs R or MATLAB when you have ROBLOX

Featuring Hypothesis Testing such as Multiple Linear Regression and ANOVA, Kernel Density Estimation, Markov Chains, Gamma/Beta Distribution Generation, and more with just ONE LINE OF CODE!

Welcome to the ultimate guide for StatBook v1! Whether you’re looking for ways to generate random variables in a more diverse and/or dynamic way than math.random does, looking to predict the next move of a player, consistently update analytics for in-game product sales, or simply curious about random variable generations and hypothesis testing, you’re in the right place!

Download [FREE]

Documentation **highly recommended!!!*

GitHub

CDF Calculation link for Random Variable Generation *translate the page to english

Graph Functions for Custom Distribution Creation

Get Started

Instructions

How to Use StatBook_v1 Module

Follow these steps to integrate the StatBook_v1 module into your project:

Step 1: Download the Module

Download the StatBook_v1 module from this link.

Step 2: Place the Module

Place the downloaded StatBook_v1 module into ServerScriptService within your Roblox Studio project.

Step 3: Import the Module in Your Script

When using the module in your script, add the following line to import it:

local StatBook = require(game.ServerScriptService.StatBook_v1)

Step 4: Use Functions from the Module

When you need to use a function from this library, always prepend the function call with “StatBook.”. For example:

local StatBook = require(game.ServerScriptService.StatBook_v1)
local result = StatBook.Median(list)

And you are all set!

By following these steps, you’ll be able to use all the statistical functions provided by the StatBook_v1 module in your Roblox game.

Summary of Capabilities

Hypothesis Testing

Summary

Inference on Numerical Samples: The StatBook.inference(...) function has the ability to run statistical inferences on samples with dependent and independent numerical data. These tests include one and two sample independent/dependent t-tests, sign test, rank-sum test, signed-rank test, ANOVA, and Friedman Test. Returns information about differences in distribution using statistics and p-values, as well as post-hoc tests for individual sample to sample associations in the 3+ sample comparison cases.

NOTE: All tests are two-tailed (except F and Chi-Square Tests). A one-tailed test option will come with StatBook V1.1.

Multiple Linear Regression with Mallow’s C(p) Forward Selection: The StatBook.multipleLinearRegression(...) function takes in a list of data points, with each datapoint containing betas/predictors, and the respective dependent variables associated with each data point. The function returns all sorts of information about the regression model, and also has the option to refine a hypothesized model via forward selection using Mallow’s C(p) as selection criteria. This function can be used in complement with StatBook.predictY(...) to subsequently get the predicted y values of a datapoint given all predictors.
Categorical Hypothesis Testing: Includes a specialized suite of tests like oddsRatio, oneSampleProportionCI, and twoProportionInference for analyzing categorical data. It also includes a set of Chi-Square Tests such as goodnessOfFit() and chiSquareIndependence() for comparing observed data with expected data in categories.

Random Variate Generation

Summary

Generate random variables from 15 different distributions: This feature allows users to generate random variables based on various statistical distributions. It includes commonly-used distributions like Normal, Poisson, Binomial, and also more specialized distributions like Chi-square, Weibull, and Log-normal. Users can input the parameters specific to each distribution to return a random variable.
Scale and Trim Distributions to get the values you need: A variety of 8 different distributions with greater flexibility over the bounds of the distribution you want to use and the range you want to map your output values to!
Create any Customized Distribution: This feature gives users the freedom to create their own probability distributions. The developer inputs a mathematical function (or a set of piecewise functions) in string using Lua syntax alongside its x range of use, which is for instance:

local piecewiseFunctions = {{"func1", func1xmin, func1xmax}, {"func2", func2xmin, func2xmax}}.

Then, the aggregate Probability Density function can be scaled between any range you want, so that random variates are pulled from that range and emulate the distribution specified in piecewiseFunctions. MORE SPECIFIC INFORMATION BELOW.
Randomly sample a continuous variable from a discrete dataset: You can use a dataset of discrete values to generate random variables from a continuous representation of itself. The algorithm applies Kernel Density Estimation (KDE) to make each discrete datapoint contain a distribution (kernel) of itself, which the sum of these kernels creates a Probability Density Function, from which the algorithm randomly samples a value within the range of discrete data points, thus generating a “continuous” variable from a discrete set.
Simulate a sequence of actions through a Markov Chain: Markov Chain simulations are useful for modeling sequences of events where the outcome of one event affects the outcomes of subsequent events. Users can input the initial state and state transition probabilities to simulate various scenarios.

Basic Statistics

Descriptive Statistics made simple: Provides fundamental statistical functions like mean, median, mode, and range, as well as more advanced metrics like standard deviation and variance.

Complex Functions

Find approximations to functions difficult to solve analytically: Advanced mathematical functions like erf, inverf, gamma, and hypergeometric2f1 are included for specialized statistical needs.

Matrix Operations

Perform matrix operations: Includes matrix addition, subtraction, multiplication, transposition, and inversion.

Hypothesis Testing

How can you use these functions?

How can game developers use these functions?

Real-Time Analysis and Reaction: The model could even be used in real-time to adjust game parameters according to ongoing player behavior. Furthermore, you could predict and react to a players actions in-game in real-time.
Optimized Monetization/Play Time: If your game has in-app purchases, understanding what players value can help you offer more compelling packages.
Feature Selection: The function includes forward regression based on Mallows’s Cp, helping you identify which variables are the most important predictors, enabling you to focus on the most impactful game elements.
A/B Testing: If you have two versions of a feature, you can use the inference tests to see which version is more effective at achieving a specific outcome.
Player Personalization: Use the model to predict player behaviors and preferences, allowing for a more personalized gaming experience.

Functions

Inference on Numerical Samples (ANOVA, one/two sample independent/dependent t-tests, Wilcoxon rank tests, etc.)

Be sure to check this function out. It’s an important one!

inference(samples, independent, CL, mu0)

`inference(samples, independent, CL, mu0)`

Overview

Performs statistical inference tests based on the given data. The function will decide which test to use based on the number of samples, whether they are independent or not, and their distribution.

NOTE 1: ALL TESTS IN THIS MODULE ARE TWO-TAILED (besides F and Chi-Square tests). A future update with one-tailed options may come in a future update.

NOTE 2: There aren’t any two or more sample tests able to do a hypothesis test for a certain amount of difference between the means/medians of the samples. By default, all two or more sample tests check for a difference in distribution, that is, ( H_0: D_\mu) or ( D_\eta = 0 )

NOTE 3: If dependent = true then all samples must have the same amount of entries.

NOTE 4: It is highly recommended to return the warning value, as a warning = true value means the results may have a significant degree of inaccuracy due to computational limits.

Parameters

Parameter	Type	Description
`samples`	Table	A table of samples containing the data to be tested.
`independent`	Boolean, Nil	Whether the samples are independent or not (Nil for 1-sample).
`CL`	Number, (Nil = 0.95)	Confidence level for the statistical tests (Nil = 0.95).
`mu0`	Number, (Nil = 0)	The hypothetical mean tested against in 1-sample test. Defaults to 0 if Nil.

Returns

A table possibly containing the following:

Key	Type	Description
`pValue`	Number	The p-value of the test.
`rejectH0`	Boolean	Whether to reject the null hypothesis.
`stat`	Number	The value of the test statistic.
`df`	Number, Table, Nil	Degrees of freedom (some tests have two (F), some not applicable).
`center`	Table → Number(s)	Contains the mean(s) or median(s) of the dataset(s).
`centerComp`	Number, Nil	Comparison value for the center (not applicable for some tests).
`lowerCI`	Number, Nil	The lower bound of the confidence interval for mean/median (NA for 3+ sample tests).
`upperCI`	Number, Nil	The upper bound of the confidence interval for the mean/median (NA for 3+ sample tests)
`dependent`	Boolean, Nil	Whether the test is for dependent samples (Nil for one-sample tests).
`parametric`	Boolean	Indicates if the test is parametric.
`nSamples`	Number	Number of samples in the test.
`testType`	String	Specifies the type of the test.
`statType`	String	Specifies the type of the test statistic.
`centerType`	String	Specifies what measure of central tendency is being tested.
`postHoc`	Table → Tables, Nil	Post-hoc tests with individual test data within each nested table (only for 3+ sample tests).
`postHocSig`	Table → Tables, Nil	Only contains Post-hoc tests with significant p-values (only for 3+ sample tests)
`warning`	Nil , True	Warnings if applicable (Nil if false or NA).

`postHoc` and `postHocSig` subfields (only for 3+ sample tests)

A table possibly containing the following:

Key	Type	Description
`group1`	Number	The index of the first sample selected in the Post Hoc.
`group2`	Number	The index of the second sample selected in the Post Hoc.
`pValue`	Number	The p-value of the Post Hoc test.
`alpha`	Number	The alpha needed for significance entailed by the Bonferonni correction.
`rejectH0`	Boolean	Whether to reject the null hypothesis.
`stat`	Number	The value of the test statistic.
`df`	Number, Nil	Degrees of freedom (some not applicable).
`center`	Table → Numbers	Contains the means or medians of the datasets.
`centerComp`	Number	Comparison value for the center.
`lowerCI`	Number	The lower bound of the confidence interval for mean/median.
`upperCI`	Number	The upper bound of the confidence interval for the mean/median
`testType`	String	Specifies the type of the test.
`statType`	String	Specifies the type of the test statistic.
`centerType`	String	Specifies what measure of central tendency is being tested.
`warning`	Nil , True	Warnings if applicable (Nil if false or NA).

Examples

One-Sample Test:

local samples = {
    {12, 15, 14, 10, 13, 8, 13, 16, 8, 15, 22, 4, 7, 8}
}
-- in this case, either one-sample t-test or sign test (depends on normality of sample)
local CL = 0.95
local mu0 = 12
-- if we did not specify mu0, it would default to 0,
local result = StatBook.inference(samples, nil, CL, mu0)
print(result.pValue, result.stat)

Two-Sample Test:

local samples = {
    {12, 15, 14, 10, 13, 6, 18},
    {20, 24, 30, 27, 28, 19, 19}
}
local independent = false
-- in this case, either two-sample dep. t-test or signed-rank test (depends on normality of samples + Folded-F test)
local CL = 0.95
local result = StatBook.inference(samples, independent, CL)
print(result.pValue, result.centerComp, result.lowerCI, result.upperCI)

Three-Plus Sample Test:

local samples = {
    {12, 15, 14, 10, 13, 14},
    {20, 24, 30, 27, 28, 20},
    {16, 25, 19, 20, 22, 18},
    {23, 14, 10, 37, 8, 19}
}
local independent = true
-- in this case, either ANOVA test or Kruskal Wallis test (depends on normality of samples + Levene Test)
local CL = 0.95
local result = StatBook.inference(samples, independent, CL)
print(result.pValue, result.postHocSig.group1, result.postHocSig.group2, result.postHocSig.pValue)

Multiple Linear Regression w/ Mallow’s C(p) forward selection

Be sure to check this function out. It’s an important one! predictY() is also located inside this dropdown, as the two functions are very closely related

multipleLinearRegression(X, Y, forwardReg, diagnostics, CL) AND predictY(X, model, yHat, indices)

`multipleLinearRegression(X, Y, forwardReg, diagnostics, CL)`

Overview

The multipleLinearRegression function performs multiple linear regression analysis on given data sets. This advanced function includes several features, such as forward regression based on Mallows’s Cp, diagnostic statistics, and calculation of Variance Inflation Factors (VIFs).

NOTE 1: All data point subtables within the ( X ) table must have the same amount of entries. No missing or nil values allowed.

NOTE 2: Before invoking the function, ensure that the ( X ) table is formatted correctly as a 2D table, where each inner table represents a row of the matrix. If ( X ) is not in this format, you may use the module.matTranspose(matrix) function to transpose ( X ) into a compatible layout.

NOTE 3: The VIFs table is only there as a warning. VIF values do not impact the regression model and do not automatically remove multi-collinear predictors. You will have to manually account for this if you choose to remove a predictor yourself and thus rerun multipleLinearRegression again.

Parameters

Parameter	Type	Description	Default
`X`	table	The independent variables matrix (2D table).	Required
`Y`	table	The dependent variable vector (1D table).	Required
`forwardReg`	boolean	Enables or disables the forward regression process.	true
`diagnostics`	boolean	Enables or disables diagnostic statistics.	true
`CL`	number	Confidence level for t-tests and F-test.	0.95

Returns (if `diagnostics` ~= `true`)

Variable	Type	Description
`yHat`	table → number(s)	Fitted values for the dependent variable.
`indices`	table → number(s)	Indices of betas retained in model from lmOrig to lmNew

Returns (if `diagnostics` = `true`)

Variable	Type	Description	Subfields
`lmNew`	table → tables	Model after forward selection with Mallow’s C(p)	yes
`lmOrig`	table → tables	Model before forward selection with Mallow’s C(p)	yes
`indices`	table → number(s)	Indices of betas retained in model from lmOrig to lmNew

`lmNew` and `lmOrig` Subfields*

Variable	Type	Description	Sub-subfields
`yHat`	table → number(s)	Fitted values for the dependent variable.
`r2`	number	( R^2 ) value indicating the goodness of fit.
`r2adj`	number	Adjusted ( R^2 ) accounting for # of predictors.
`F`	number	F-statistic used for hypothesis testing.
`pValueF`	number	p-value of the F-statistic.
`BetaInfo`	table → table	Information about predictor coefficients.	yes
`VIFs`*	table → table	Indicates multicollinearity status.	yes

* There isn’t a VIFs subfield in lmOrig.

`BetaInfo` Sub-subfields

Variable	Type	Description
`predictorIndex`	table → number	The original index of the beta in question.
`rejectH0`	table → boolean	Hypotheses test results for individual betas.
`t`	table → number	The t-statistic of the beta in question.
`pValue`	table → boolean	The p-value of the beta in question.

`VIFs` Sub-subfields

Variable	Type	Description
`VIF`	table → number	Variance Inflation Factors of each beta.
`summaryVIF`	table → string	A description of potential multicollinearity

Example Usage

-- regression with 6 datapoints and 3 predictors
local X = {{1, 4, 7}, {2, 3, 5}, {3, 2, 1}, {4, 2, 2}, {5, 8, 3}, {3, 6, 2}}
local Y = {3, 3, 2, 2, 4, 5}

local model = StatBook.multipleLinearRegression(X, Y)

print(model.lmNew.pValueF, model.lmOrig.pValueF, model.lmNew.BetaInfo.t, model.lmNew.BetaInfo.pValue) -- can return a lot more than that

-- rest is optional
local Xtest = {1, 5, 6}
local prediction = predictY(Xtest, model)

Subsequent Usage

After acquiring the model from module.multipleLinearRegression, you can employ the module.predictY(X, model, yHat, indices) function directly with the returned model to predict new ( Y ) values based on new ( X ) values. The model object contains all necessary coefficients and information for the prediction.

`predictY(X, model, yHat, indices)`

Overview

The predictY function predicts the dependent variable ( Y ) based on the independent variable ( X ) and the given model. Optionally, it allows for specific fitted values ( \hat{y} ) and predictor indices to be specified.

Parameters

Parameter	Type	Description	Default
`X`	Table	The input vector containing independent variable values.	-
`model`	Table	The regression model from `multipleLinearRegression()`	-
`yHat`	Table	Optional. The fitted values for the intercept and coefficients.	nil
`indices`	Table	Optional. The indices in the model to be used for prediction.	nil

Returns

Return	Type	Description
`YPred`	Number	The predicted value of the dependent variable ( Y ).

Example

local X = {{1, 4, 7}, {2, 3, 5}, {3, 2, 1}, {4, 2, 2}, {5, 8, 3}, {3, 6, 2}}
local Y = {3, 3, 2, 2, 4, 5}

local model = StatBook.multipleLinearRegression(X, Y)

local Xtest = {1, 5, 6}
local YPred = module.predictY(Xtest, model)
print(YPred)

Random Variate Generation - Go Beyond `math.random`!

Why should you use this?

Ever found yourself doing something similar to this?

Tedious Coding

function skewedRandom()
    local randomNumber = math.random()
    
    if randomNumber < 0.2 then
        return 4
    elseif randomNumber < 0.3 then
        return 3 
    elseif randomNumber < 0.4 then
        return 5 
    elseif randomNumber < 0.5 then
        return 2  
    elseif randomNumber < 0.6 then
        return 6  
    elseif randomNumber < 0.7 then
        return 1
    elseif randomNumber < 0.8 then
        return 7
    elseif randomNumber < 0.9 then
        return 8 
    elseif randomNumber < 0.95 then
        return 9  
    else
        return 10 
    end
end

The standard library in Roblox’s Lua provides a basic random number generator through math.random(). While this function is useful for generating uniformly distributed random numbers, it is limited when it comes to generating numbers from other statistical distributions like Normal, Exponential, Gamma, or as a matter of fact, any type of other possible distribution.

With StatBook, there are easier ways with infinitely many possible distributions.

Functions

Scalable Random Generation with Distributions

generateStandardNormalScaled(...)

`generateStandardNormalScaled(desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The generateStandardNormalScaled() function generates a scaled random number based on the standard normal distribution within the specified range.

Parameters

Parameter	Type	Description
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.001.
`UQpercent`	Number	Upper quantile percentage. Default is 0.999.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`random`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local random = StatBook.generateStandardNormalScaled(-10, 10)
print(random)  -- Value between -10 and 10

generateNormalScaled(...)

`generateNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The generateNormalScaled() function generates a scaled random number based on a normal distribution with a specified mean (( \mu )) and standard deviation (( \sigma )) within the desired range.

Parameters

Parameter	Type	Description
`mu`	Number	The mean of the normal distribution.
`sigma`	Number	The standard deviation of the normal distribution.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.001.
`UQpercent`	Number	Upper quantile percentage. Default is 0.999.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`random`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local random = StatBook.generateNormalScaled(0, 1, -10, 10)
print(random)  -- Output will vary

generateLogNormalScaled(...)

`generateLogNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The generateLogNormalScaled() function generates a scaled random number based on a log-normal distribution with a specified mean (( \mu )) and standard deviation (( \sigma )) within the desired range.

Parameters

Parameter	Type	Description
`mu`	Number	The mean of the log-normal distribution.
`sigma`	Number	The standard deviation of the log-normal distribution.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.
`UQpercent`	Number	Upper quantile percentage. Default is 0.999.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`random`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local random = StatBook.generateLogNormalScaled(0, 1, 1, 100)
print(random)  -- Output will vary

generateCauchyScaled(...)

`generateCauchyScaled(x0, gamma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The function generates a scaled random number based on a Cauchy distribution with a specified location parameter (( x_0 )) and scale parameter (( \gamma )) within the desired range.

Parameters

Parameter	Type	Description
`x0`	Number	The location parameter of the Cauchy distribution.
`gamma`	Number	The scale parameter of the Cauchy distribution.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.001.
`UQpercent`	Number	Upper quantile percentage. Default is 0.999.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`random`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local random = StatBook.generateCauchyScaled(0, 1, -10, 10)
print(random)  -- Output will vary

generateExponentialScaled(...)

`generateExponentialScaled(lambda, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The function generates a scaled random number based on an Exponential distribution with a specified rate parameter (( \lambda )) within the desired range.

Parameters

Parameter	Type	Description
`lambda`	Number	The rate parameter of the Exponential distribution.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.
`UQpercent`	Number	Upper quantile percentage. Default is 0.999.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`random`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local random = StatBook.generateExponentialScaled(1, 0, 10)
print(random)  -- Output will vary

generateGammaScaled(...)

`generateGammaScaled(alpha, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The function generates a scaled random number based on a Gamma distribution with a specified shape parameter (\( \alpha \)) within the desired range.

Parameters

Parameter	Type	Description
`alpha`	Number	The shape parameter of the Gamma distribution.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.
`UQpercent`	Number	Upper quantile percentage. Default is 0.999.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`random`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local random = StatBook.generateGammaScaled(2, 0, 10)
print(random)  -- Output will vary

generateBetaScaled()

`generateBetaScaled(alpha, beta, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

Overview

The function generates a scaled random number based on a Beta distribution with specified shape parameters (( \alpha ) and ( \beta )) within the desired range.

Parameters

Parameter	Type	Description
`alpha`	Number	The first shape parameter of the Beta distribution.
`beta`	Number	The second shape parameter of the Beta distribution.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.
`LQpercent`	Number	Lower quantile percentage. Default is 0.
`UQpercent`	Number	Upper quantile percentage. Default is 1.
`lowerQuantile`	Number	Lower quantile value. Calculated by default if not provided.
`upperQuantile`	Number	Upper quantile value. Calculated by default if not provided.

Returns

Return	Type	Description
`scaledX`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local scaledX = StatBook.generateBetaScaled(2, 5, 0, 1)
print(scaledX)  -- Output will vary

Make a Customized Distribution out of Function(s)

customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)

`customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)`

Overview

The function generates a random number based on custom piecewise functions within the desired range.

Parameters

Parameter	Type	Description
`piecewiseFunctions`	Table	A table containing subtables, each with a function string, ( x_{\text{min}} ), and ( x_{\text{max}} ) for each piecewise function.
`desiredMin`	Number	The minimum desired value of the scaled random number.
`desiredMax`	Number	The maximum desired value of the scaled random number.

Returns

Return	Type	Description
`randomX`	Number	A scaled random number in the range `[desiredMin, desiredMax]`.

Example

local functions = {{"x^2", 0, 2}, {"2*x", 2, 4}}
local randomX = StatBook.customizedDistribution(functions, 0, 10)
print(randomX)  -- Output will vary

Special Instructions for using Customized Distribution

To use the Customized Distribution feature, you need to provide input in the form of piecewise functions along with the desired minimum and maximum value range you want. Each piecewise function consists of three parts:

A function string (userFunctionString), which is a string representation of the mathematical function you want to evaluate. Your function string can utilize any of the standard Lua math library functions, as they are available in the environment where the function is evaluated.
Minimum (xMin) and maximum (xMax) x-values for the domain of the piecewise function.

Use Desmos to help create any distribution you want.

Example

local piecewiseFunctions = {
	{"math.sqrt(-(5 * x + 1) + 1) * (x + 1)", -1, 0}, 
	{"math.sqrt(-(-5 * x + 1) + 1) * (-x + 1)", 0, 1}
}
local desiredMin = -100
local desiredMax = 100
local result = StatBook.customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)

Graph on Desmos

Randomly Generate Continuous Variable From Discrete Dataset w/ KDE

randomFromDataset(values, kernel, percentageOfFrameTime, bandwidth)

`randomFromDataset(values, kernel, percentageOfFrameTime, bandwidth)`

Overview

Generates a random number based on a given dataset using Kernel Density Estimation (KDE).

Parameters

Parameter	Type	Description
`values`	Table	The dataset from which to generate the random number.
`kernel`*	String	The type of kernel to use for the KDE. Default is Gaussian.
`percentageOfFrameTime`	Number	The percentage of frame time allowed for the function to run. Default is 0.1.
`bandwidth`	Number	The bandwidth to use in the KDE. Calculated by default if not provided.

*Options for kernel are: “Gaussian”, “Epanechnikov”, “Uniform”, “Triangular”, “Biweight”, “Cosine”, “Logistic”, and “Sigmoid”.

Returns

Return	Type	Description
`xRandom`	Number	A random number generated based on the KDE of the dataset.

Example

local dataset = {1, 2, 3, 3, 4, 4, 5, 6, 7}
local kernelType = "Gaussian"
local percentageOfFrameTime = 0.1
local randomValue = StatBook.randomFromDataset(dataset, kernelType, percentageOfFrameTime)
print(randomValue)  -- Output will vary

Simulate a sequence of actions through a Markov Chain

markovChain(states, transitionProbs, startState, length, returnFullSequence)

`markovChain(states, transitionProbs, startState, length, returnFullSequence)`

Overview

Generates a sequence of states based on a Markov Chain model.

Parameters

Parameter	Type	Description
`states`	Table	List of possible states in the Markov Chain.
`transitionProbs`	Table	Transition probability matrix between states.
`startState`	Any	The state to start the sequence from.
`length`	Number	The length of the sequence to be generated.
`returnFullSequence`	Boolean	Whether to return the full sequence or just the final state. Default is `false`.

Returns

Return	Type	Description
`sequence` or `sequence[length]`	Table or Any	Returns the entire sequence if `returnFullSequence` is true; otherwise, returns the last state.

Example

local states = {"Sunny", "Cloudy", "Rainy"}
local transitionProbs = {
	Sunny = {Sunny = 0.8, Cloudy = 0.15, Rainy = 0.05},
	Cloudy = {Sunny = 0.2, Cloudy = 0.6, Rainy = 0.2},
	Rainy = {Sunny = 0.1, Cloudy = 0.3, Rainy = 0.6}
}
local startState = "Sunny"
local length = 10
local sequence = StatBook.markovChain(states, transitionProbs, startState, length, true)
print(sequence)  -- Output will be a table representing the sequence

Do you see yourself using this module? Be honest : )

Yes
Maybe
Probably Not
No

0 voters

Please, provide feedback, tell me anything you would like to see implemented in the future, or announce any bugs/malfunctions/inaccuracies you encounter below!

StatBook V1.1 is coming… which means I am eager to add more! Tell me in the comments below, anything you would like to be seen implemented by me!

armenia1997 · October 4, 2023, 11:53am

If you choose to use the scaled distributions option, heres a guide:

NOTE: The Special Instructions are for all cases. It is VERY helpful.

Special Instructions for Scaled Generation (ex. generateGammaScaled())

Inverse CDF Calculation link for Random Variable Generation *translate the page to english

The default of the LQpercent can be either 0 or 0.001, and the default UQpercent can be 0.999 or 1, both depending on the distribution used. The values LQpercent and UQpercent are inverse CDF values (which then correspond to a certain X value). The X value resulting from computing the Inverse CDF of LQpercent or UQpercent gives us the lower/upper bound to which our desiredMin and desiredMax will be bounded.

Therefore, in order to have an idea of the shape of the distribution resulting when we select parameters for the distribution, we need to know the inverse CDF and where the desiredMin and desiredMax are being mapped to on this distribution. An example below should clearly illustrate how to get results you want.

Let’s say our goal is to use a distribution from which to pull random variates. We want our target range of random values possible to return to be between 5 and 10. Furthermore, let’s say we want this distribution to have a peak probability to return 6, where 5 and 10 each have the lowest probability.

The first step to do when analyzing how to write our function, is to shift this range to (0, desiredMax - desiredMin) when analyzing, even if we will be putting 5 as desiredMin and 10 as desiredMax. Thus, for the steps below, our range will be 0 to 5, where 1 will be the peak probability of interest.

Let us say we want to use the Gamma function. We need to see where our Gamma distribution is decided to be “trimmed” as it is a distribution that is defined positively until infinity. The places X where our distribution are trimmed on he lower end and upper end are defined by the default LQpercent or UQpercent, which LQpercent = 0 and UQpercent = 0.999 for the StatBook.randomGammaScaled() function.

Now. we go onto this link I shared above to see what the Inverse CDF looks like and how the alpha parameter works for the Gamma distribution. We do this by navigating to Start → Gamma Distribution → Gamma Distribution (percentage point). It should bring this page.

In order to manipulate the Gamma distribution to fit specific criteria, we need to adjust its parameters. The goal is to find a combination of parameters that modifies the range and maximum probability of the distribution to specific multiples of a base set.

Let’s denote the base set as (minRange, maxProbability, maxRange) → (0, 1, 5).

By changing the alpha and UQpercent parameters (since beta is fixed at 1), we aim to transform this base set into a new set, where each element is a multiple of the corresponding element in the base set. For example:

Base set multiplied by 0.5: (0 * 0.5, 1 * 0.5, 5 * 0.5) = (0, 0.5, 5)
Base set multiplied by 2: (0 * 2, 1 * 2, 5 * 2) = (0, 2, 10)
and others…

In an example with UQpercent set to 0.9972 and alpha set to 3, the transformed set becomes (0, 2, 10), which fits our criteria of being a multiple of the base set.

In the image, the green circle shows us that the minimum range is 0, the black oval shows us that the highest probability is 2, and the blue circles show that our maximum range is 10. Since (0, 2, 10) is a multiple of 2 of our (0,1,5), which then (0, 1, 5) can be transformed to be ((0+5), (1+5), (5+5)) → (5, 6, 10), we see that this works. Remember, we had to start with 0 as our minRange as the gamma function starts from there.

We already determined that (0, 2, 10) which is compatible with our intended (5, 6, 10) is achievable with alpha = 3 and UQpercent = 0.9972. Thus, our function to return a random variate using the Gamma distribution ranging from 5 to 10 as possible values, where values closer to 6 are the most common outcomes, is:

local randomVariate = StatBook.generateGammaScaled(3, 5, 10, 0, 0.9972)

A continuation of Random Variate Generation - not scalable

Functions

Random Generation with Distributions (no scale, only parameter tuning)

generateStandardNormal()

`generateStandardNormal()`

Overview

The generateStandardNormal() function generates a random number that follows a standard normal distribution using the Box-Muller transform.

Parameters

No parameters are required.

Returns

Return	Type	Description
`x`	Number	A random number from a standard normal distribution.

Example

local x = StatBook.generateStandardNormal()
print(x)  -- Output will vary

generateNormal(mu, sigma)

`generateNormal(mu, sigma)`

Overview

The generateNormal function generates a random number following a normal distribution characterized by a given mean (mu) and standard deviation (sigma).

Parameters

Parameter	Type	Description	Default
`mu`	Number	The mean of the normal distribution.	-
`sigma`	Number	The standard deviation of the normal distribution.	-

Returns

Return	Type	Description
`x`	Number	A random number following the specified normal distribution.

Example

local mu = 0
local sigma = 1
local randomNum = StatBook.generateNormal(mu, sigma)
print(randomNum)  -- Output will vary based on random generation

generateGamma(alpha, beta)

`generateGamma(alpha, beta)`

Overview

The generateGamma(alpha, beta) function generates a random number that follows a Gamma distribution with the given shape parameter ( \alpha ) and scale parameter ( \beta ). The function uses the Marsaglia and Tsang method for this purpose.

Parameters

Parameter	Type	Description	Default
`alpha`	Number	The shape parameter of the Gamma distribution.	-
`beta`	Number	The scale parameter of the Gamma distribution.	-

Returns

Return	Type	Description
`x`	Number	A random number from a Gamma distribution with parameters ( \alpha ) and ( \beta ).

Example

local alpha = 2
local beta = 1
local x = StatBook.generateGamma(alpha, beta)
print(x)  -- Output will vary

generateInverseGamma(alpha, beta)

`generateInverseGamma(alpha, beta)`

Overview

The generateInverseGamma(alpha, beta) function generates a random number that follows an Inverse Gamma distribution using the Marsaglia and Tsang method for Gamma distribution and then taking the reciprocal.

Parameters

Parameter	Type	Description
`alpha`	Number	The shape parameter of the Inverse Gamma distribution.
`beta`	Number	The scale parameter of the Inverse Gamma distribution.

Returns

Return	Type	Description
`x`	Number	A random number from an Inverse Gamma distribution.

Example

local x = StatBook.generateInverseGamma(2, 1)
print(x)  -- Output will vary

generateExponential(lambda)

`generateExponential(lambda)`

Overview

The generateExponential(lambda) function generates a random number that follows an Exponential distribution.

Parameters

Parameter	Type	Description
`lambda`	Number	The rate parameter of the Exponential distribution.

Returns

Return	Type	Description
`x`	Number	A random number from an Exponential distribution.

Example

local x = StatBook.generateExponential(0.5)
print(x)  -- Output will vary

generateBeta(alpha, beta)

`generateBeta(alpha, beta)`

Overview

The generateBeta(alpha, beta) function generates a random number that follows a Beta distribution.

Parameters

Parameter	Type	Description
`alpha`	Number	The first shape parameter of the Beta distribution.
`beta`	Number	The second shape parameter of the Beta distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Beta distribution.

Example

local result = StatBook.generateBeta(2, 5)
print(result)  -- Output will vary

generateBetaPrime(alpha, beta)

`generateBetaPrime(alpha, beta)`

Overview

The generateBetaPrime(alpha, beta) function generates a random number that follows a beta prime distribution.

Parameters

Parameter	Type	Description
`alpha`	Number	The shape parameter alpha for the beta prime distribution. Must be greater than 0.
`beta`	Number	The shape parameter beta for the beta prime distribution. Must be greater than 0.

Returns

Return	Type	Description
`result`	Number	A random number from a beta prime distribution.

Example

local result = StatBook.generateBetaPrime(1, 1)
print(result)  -- Output will vary

generateLogNormal(mu, sigma)

`generateLogNormal(mu, sigma)`

Overview

The generateLogNormal(mu, sigma) function generates a random number that follows a log-normal distribution.

Parameters

Parameter	Type	Description
`mu`	Number	The mean parameter of the underlying normal distribution.
`sigma`	Number	The standard deviation parameter of the underlying normal distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a log-normal distribution.

Example

local result = StatBook.generateLogNormal(0, 1)
print(result)  -- Output will vary

`generateLevy(c, mu)`

Overview

The generateLevy(c, mu) function generates a random number that follows a Lévy distribution.

Parameters

Parameter	Type	Description
`c`	Number	The scale parameter for the Lévy distribution. Must be greater than 0.
`mu`	Number	The location parameter for the Lévy distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Lévy distribution.

Example

local result = StatBook.generateLevy(1, 0)
print(result)  -- Output will vary

generatePoisson(lambda)

`generatePoisson(lambda)`

Overview

The generatePoisson(lambda) function generates a random number that follows a Poisson distribution.

Parameters

Parameter	Type	Description
`lambda`	Number	The average rate of events per interval for the Poisson distribution. Must be greater than 0.

Returns

Return	Type	Description
`result`	Number	A random number from a Poisson distribution.

Example

local result = StatBook.generatePoisson(5)
print(result)  -- Output will vary

generateCauchy(x0, gamma)

`generateCauchy(x0, gamma)`

Overview

The generateCauchy(x0, gamma) function generates a random number that follows a Cauchy distribution.

Parameters

Parameter	Type	Description
`x0`	Number	The location parameter of the Cauchy distribution.
`gamma`	Number	The scale parameter of the Cauchy distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Cauchy distribution.

Example

local result = StatBook.generateCauchy(0, 1)
print(result)  -- Output will vary

generateWeibull(alpha, beta)

`generateWeibull(alpha, beta)`

Overview

The generateWeibull(alpha, beta) function generates a random number that follows a Weibull distribution.

Parameters

Parameter	Type	Description
`alpha`	Number	The scale parameter of the Weibull distribution.
`beta`	Number	The shape parameter of the Weibull distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Weibull distribution.

Example

local result = StatBook.generateWeibull(1, 2)
print(result)  -- Output will vary

generateChiSquare(df)

`generateChiSquare(df)`

Overview

The generateChiSquare(df) function generates a random number that follows a Chi-Square distribution with degrees of freedom df.

Parameters

Parameter	Type	Description
`df`	Number	Degrees of freedom for the Chi-Square distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Chi-Square distribution.

Example

local result = StatBook.generateChiSquare(5)
print(result)  -- Output will vary

generatePareto(alpha, xm)

`generatePareto(alpha, xm)`

Overview

The generatePareto(alpha, xm) function generates a random number that follows a Pareto distribution with shape parameter alpha and scale parameter xm.

Parameters

Parameter	Type	Description
`alpha`	Number	The shape parameter for the Pareto distribution.
`xm`	Number	The scale parameter for the Pareto distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Pareto distribution.

Example

local result = StatBook.generatePareto(2, 1)
print(result)  -- Output will vary

generateT(df)

`generateT(df)`

Overview

The generateT(df) function generates a random number that follows a Student’s t-distribution with df degrees of freedom.

Parameters

Parameter	Type	Description
`df`	Number	The degrees of freedom for the t-distribution.

Returns

Return	Type	Description
`result`	Number	A random number from a Student’s t-distribution.

Example

local result = StatBook.generateT(10)
print(result)  -- Output will vary

armenia1997 · October 4, 2023, 11:56am

Categorical Hypothesis Testing

Functions

oddsRatio(O11, O12, O21, O22, CL)

`oddsRatio(O11, O12, O21, O22, CL)`

Overview

The oddsRatio function calculates the odds ratio for a 2x2 contingency table, along with the confidence intervals and hypothesis testing for independence. It can be particularly useful in epidemiological studies and statistical analysis of categorical data.

Parameters

Parameter	Type	Description	Default
`O11`	Number	Count for group 1 with characteristic A.	-
`O12`	Number	Count for group 1 without characteristic A.	-
`O21`	Number	Count for group 2 with characteristic A.	-
`O22`	Number	Count for group 2 without characteristic A.	-
`CL`	Number	Confidence level for the confidence interval of the odds ratio.	0.95

Returns

Return	Type	Description
`OR`	Number	The calculated odds ratio.
`rejectH0`	Bool	Whether to reject the null hypothesis of independence.
`lowerCI`	Number	Lower bound of the confidence interval for the odds ratio.
`upperCI`	Number	Upper bound of the confidence interval for the odds ratio.

Example

local O11 = 13
local O12 = 9
local O21 = 8
local O22 = 6
local CL = 0.95
local result = oddsRatio(O11, O12, O21, O22, CL)
print(result.OR, result.rejectH0, result.lowerCI, result.upperCI)  -- Output will vary based on the input

oneSampleProportionCI(k, n, CL)

`oneSampleProportionCI(k, n, CL)`

Overview

The oneSampleProportionCI function calculates a confidence interval for a proportion in a statistical population, based on the proportion observed in a sample. The function employs the Wald-Agresti-Coull (WAC) method, a modified version of the standard Wald method to calculate the confidence interval.

Parameters

Parameter	Type	Description	Default
`k`	Number	Number of successful outcomes in the sample.	-
`n`	Number	Total number of trials in the sample.	-
`CL`	Number	Confidence level for the confidence interval.	0.95

Returns

Return	Type	Description
`pHat`	Number	The estimated proportion based on the sample.
`lowerCI`	Number	Lower bound of the confidence interval for the proportion.
`upperCI`	Number	Upper bound of the confidence interval for the proportion.
`testType`	String	Specifies the type of test conducted, in this case, “One Sample Proportion CI”.

Example

local k = 55
local n = 100
local CL = 0.95
local result = oneSampleProportionCI(k, n, CL)
print(result.pHat, result.lowerCI, result.upperCI, result.testType)  -- Output will vary based on the input

singleProportionInference(k, n, p, CL)

`singleProportionInference(k, n, p, CL)`

Description

The singleProportionInference(k, n, p, CL) function performs hypothesis testing for a single proportion. It chooses between using a Large Sample Proportion Test or an Exact Binomial Test based on the sample size and the probability.

Parameters

Parameter	Type	Description	Default
`k`	number	The number of successes in the sample.	Required
`n`	number	The sample size.	Required
`p`	number	The hypothesized population proportion.	Required
`CL`	number	The Confidence Level for the test.	0.95

Returns

Variable	Type	Description
`pValue`	number	The p-value of the test.
`rejectH0`	boolean	Indicates whether to reject the null hypothesis.
`stat`	number	The test statistic (Z for Large Sample, None for Exact).
`df`	number	Degrees of freedom (1 for Large Sample, None for Exact).
`pTest`	number	The hypothesized population proportion.
`pHat`	number	The sample proportion.
`lowerCI`	number	Lower bound of the confidence interval.
`upperCI`	number	Upper bound of the confidence interval.
`parametric`	boolean	Indicates if the test is parametric (true for Large Sample, false for Exact).
`testType`	string	Type of the test conducted (“Large Sample Proportion Test” or “Exact Binomial Test”).
`statType`	string	Type of the statistic used (“Z” for Large Sample, None for Exact).

Examples

-- Example 1: Large sample size
local result = StatBook.singleProportionInference(40, 100, 0.35, 0.95)
-- Output will show Large Sample Proportion Test results


-- Example 2: Small sample size
local result = StatBook.singleProportionInference(4, 10, 0.35, 0.95)
-- Output will show Exact Binomial Test results

##Notes

If (n * p) >= 5 and (n * (1 - p)) >= 5, a Large Sample Proportion Test is conducted.
Otherwise, an Exact Binomial Test is conducted.

twoProportionInference(k1, n1, k2, n2, CL)

`twoProportionInference(k1, n1, k2, n2, CL)`

Overview

The twoProportionInference function performs statistical inference on two independent proportions. It calculates the confidence interval and p-value for the difference between two proportions ( p_1 ) and ( p_2 ).

Parameters

Parameter	Type	Description	Default
`k1`	Number	Number of successful outcomes in the first sample.	-
`n1`	Number	Total number of trials in the first sample.	-
`k2`	Number	Number of successful outcomes in the second sample.	-
`n2`	Number	Total number of trials in the second sample.	-
`CL`	Number	Confidence level for the confidence interval.	0.95

Returns

Return	Type	Description
`pValue`	Number	The p-value of the Z-test.
`rejectH0`	Boolean	Whether to reject the null hypothesis at the given alpha.
`stat`	Number	The Z-score of the test.
`pHat`	Table → Number	Estimated proportions for both samples and overall.
`lowerCI`	Number	Lower bound of the confidence interval for ( p_1 - p_2 ).
`upperCI`	Number	Upper bound of the confidence interval for ( p_1 - p_2 ).
`parametric`	Boolean	Whether the test is parametric (always true for Z-test).
`testType`	String	Specifies the type of test, “Two Proportion Test”.
`statType`	String	Specifies the type of statistic used, “Z”.
`warning`	Boolean	Whether the sample size is too small for a reliable test.

Example

local k1 = 50
local n1 = 100
local k2 = 40
local n2 = 90
local CL = 0.95
local result = twoProportionInference(k1, n1, k2, n2, CL)
print(result.pValue, result.rejectH0, result.stat, result.lowerCI, result.upperCI)  -- Output will vary based on the input

goodnessOfFit(observed, expectedProportions, CL)

`goodnessOfFit(observed, expectedProportions, CL)`

Overview

The goodnessOfFit function performs a Pearson’s Chi-Squared Goodness of Fit Test. This test is used to determine if the observed frequency distribution of a variable matches the expected frequency distribution.

Parameters

Parameter	Type	Description	Default
`observed`	Table	Array of observed frequencies for each category.	-
`expectedProportions`	Table	Array of expected proportions for each category.	-
`CL`	Number	Confidence level for the test.	0.95

Returns

Return	Type	Description
`pValue`	Number	The p-value of the Chi-Squared Test.
`rejectH0`	Boolean	Whether to reject the null hypothesis at the given alpha.
`stat`	Number	The Chi-Squared statistic.
`df`	Number	The degrees of freedom.
`parametric`	Boolean	Whether the test is parametric (always true for this test).
`testType`	String	Specifies the type of test, “Pearson’s Goodness of Fit Test”.
`statType`	String	Specifies the type of statistic used, “Chi-Square”.
`warning`	Boolean	Whether the sample size is too small for a reliable test.

Example

local observed = {50, 40, 30, 25}
local expectedProportions = {0.3, 0.3, 0.2, 0.2}
local CL = 0.95
local result = goodnessOfFit(observed, expectedProportions, CL)
print(result.pValue, result.rejectH0, result.stat, result.df, result.warning)  -- Output will vary based on the input

chiSquareIndependence(matrix, CL)

`chiSquareIndependence(matrix, CL)`

Overview

The chiSquareIndependence function performs Pearson’s Chi-Squared Test for Independence OR Homogeneity. This test checks whether two categorical variables are independent of each other.

Parameters

Parameter	Type	Description	Default
`matrix`	Table	The contingency table as a 2D array.	-
`CL`	Number	Confidence level for the test.	0.95

Returns

Return	Type	Description
`pValue`	Number	The p-value of the Chi-Squared Test.
`rejectH0`	Boolean	Whether to reject the null hypothesis at the given alpha.
`stat`	Number	The Chi-Squared statistic.
`df`	Number	The degrees of freedom.
`parametric`	Boolean	Whether the test is parametric (always true for this test).
`testType`	String	Specifies the type of test, “Pearson’s Test for Independence/Homogeneity”.
`statType`	String	Specifies the type of statistic used, “Chi-Square”.
`warning`	Boolean	Whether the sample size is too small for a reliable test.

Example

local matrix = {{19, 24}, {43, 32}}
local CL = 0.95
local result = chiSquareIndependence(matrix, CL)
print(result.pValue, result.rejectH0, result.stat, result.df, result.warning)  -- Output will vary based on the input

Basic Statistics

Functions

mean(list)

`mean(list)`

Overview

The mean function calculates the arithmetic mean, commonly known as the average, of a given list of numbers. The function sums up all the elements in the list and divides it by the total number of elements to determine the mean value.

Parameters

Parameter Name	Type	Description	Required	Default Value
`list`	table	A list of numerical values for which the mean will be calculated. The list must contain at least one numerical value.	Yes	N/A

Returns

Type	Description	Possible Values
number	The mean (average) of the elements in the list. The return value will be a floating-point number if the mean is not an integer.	Any numerical value

Constraints

The list parameter must be a table containing numerical values only.
The table must have at least one element; otherwise, the function will return an undefined result due to division by zero.

Example Use

local myList = {1, 2, 3, 4, 5}
local result = StatBook.mean(myList)

print(result)  -- Output will be 3

median(list)

`median(list)`

Overview

The median function calculates the median value from a given list of numbers. The median is the middle value in a data set sorted in ascending order. For a list with an odd number of elements, the median is the exact middle value. For a list with an even number of elements, the median is the average of the two middle values.

Parameters

Parameter Name	Type	Description	Required	Default Value
`list`	table	A list of numerical values to find the median from. The list must contain at least one numerical value.	Yes	N/A

Returns

Type	Description	Possible Values
number or nil	The median value of the elements in the list. If the list is empty or nil values are encountered, returns `nil`.	Any numerical value or `nil`

Constraints

The list parameter must be a table containing numerical values only.
The table must have at least one element; otherwise, the function will return nil.

Example Use

local myList = {7, 2, 3, 6, 5}
local result = StatBook.median(myList)

print(result)  -- Output will be 5

mode(list)

`mode(list)`

Overview

The mode function calculates the mode(s) of a given list of numbers. The mode is the number(s) that appear most frequently in the data set. If multiple numbers have the same highest frequency, all of them are returned as modes in a table.

Parameters

Parameter Name	Type	Description	Required	Default Value
`list`	table	A list of numerical values to find the mode from. The list must contain at least one numerical value.	Yes	N/A

Returns

Type	Description	Possible Values
table	A table containing the mode(s) of the list. If there are multiple modes, all will be included in the returned table.	A table containing numerical values

Constraints

The list parameter must be a table containing numerical values only.
The table must have at least one element; otherwise, the function will return an empty table.

Example Use

local myList = {1, 2, 3, 2, 2, 4}

local result = StatBook.mode(myList)

-- The modes of the list is 2 as it appears most frequently
for _, v in ipairs(result) do
    print(v)
end

range(list)

`range(list)`

Overview

The range function calculates the range of a given list of numbers. The range is the difference between the maximum and minimum values in the list.

Parameters

Parameter Name	Type	Description	Required	Default Value
`list`	table	A list of numerical values to find the range from. The list must contain at least two numerical values.	Yes	N/A

Returns

Type	Description	Possible Values
number	The range of the list, calculated as the difference between the maximum and minimum values.	Any numerical value

Constraints

The list parameter must be a table containing numerical values only.
The table must have at least two elements; otherwise, the function will return an undefined result.

Example Use

local myList = {1, 2, 3, 4, 5}

local result = StatBook.range(myList)

-- The range of the list {1, 2, 3, 4, 5} is (5 - 1) = 4
print(result)

interquartileRange(values)

`interquartileRange(values)`

Overview

The interquartileRange function calculates the Interquartile Range (IQR) of a given list of numbers. The IQR is the range between the first quartile (Q1) and the third quartile (Q3) of a data set, providing a measure of statistical dispersion.

Parameters

Parameter Name	Type	Description	Required	Default Value
`values`	table	A list of numerical values for which the IQR will be calculated. The list must contain at least two numerical values.	Yes	N/A

Returns

Type	Description	Possible Values
number	The Interquartile Range (IQR) of the elements in the list. The return value will be a floating-point number.	Any numerical value

Constraints

The values parameter must be a table containing numerical values only.
The table must have at least two elements; otherwise, an error will be thrown.

Example Use

local myValues = {1, 2, 3, 4, 5}

-- The IQR of the list {1, 2, 3, 4, 5} will be calculated
local result = StatBook.interquartileRange(myValues)

-- Output will be the calculated IQR
print(result)

variance(list)

`variance(list)`

Overview

The variance function calculates the sample variance of a given list of numbers. Variance is a statistical measurement of the spread between numbers in a dataset.

Parameters

Parameter Name	Type	Description	Required	Default Value
`list`	table	A list of numerical values for which the variance will be calculated. The list must contain at least two numerical values.	Yes	N/A

Returns

Type	Description	Possible Values
number	The sample variance of the elements in the list. The return value will be a floating-point number.	Any numerical value

Constraints

The list parameter must be a table containing numerical values only.
The table must have at least two elements; otherwise, the function will return 0 as there’s not enough data to calculate the variance.
It finds the sample variance, not population variance.

Example Use

local myList = {1, 2, 3, 4, 5}

-- The sample variance will be 2.5
local result = StatBook.variance(myList)
print(result)

standardDeviation(list)

`standardDeviation(list)`

Overview

The standardDeviation function calculates the sample standard deviation (SD) of a given list of numbers. The standard deviation is a measure of the amount of variation or dispersion of a set of values.

Parameters

Parameter Name	Type	Description	Required	Default Value
`list`	table	A list of numerical values for which the standard deviation will be calculated. The list must contain at least two numerical values.	Yes	N/A

Returns

Type	Description	Possible Values
number	The sample standard deviation of the elements in the list. The return value will be a floating-point number.	Any numerical value

Constraints

The list parameter must be a table containing numerical values only.
The table must have at least two elements; otherwise, the function will return 0 as there’s not enough data to calculate the standard deviation.
It finds the sample SD, not population SD.

Example Use

local myList = {1, 2, 3, 4, 5}

-- The standard deviation of the list {1, 2, 3, 4, 5} will be calculated
local result = StatBook.standardDeviation(myList)

-- Output will be the calculated standard deviation
print(result)

sumOfSquares(list)

`sumOfSquares(list)`

Overview

The sumOfSquares function calculates the sum of squares of deviations from the mean for a given list of numbers. The function utilizes the mean of the list to calculate each deviation.

Parameters

Parameter	Type	Description
`list`	table	A list of numbers to calculate the sum of squares for.

Returns

Type	Description
number	The sum of squares of deviations from the mean.

Example Use

local StatBook = require("StatBook")
local myList = {1, 2, 3, 4, 5}
local result = StatBook.sumOfSquares(myList)
print(result)  -- Output will depend on the values in myList

factorial(x)

`factorial(x)`

Overview

The factorial function computes the factorial of a given non-negative integer ( n ). The factorial, denoted ( n! ), is the product of all positive integers less than or equal to ( n ). For example, ( 5! = 5 * 4 * 3 * 2 * 1 = 120 ).

Parameters

Parameter Name	Type	Description	Required	Default Value
`x`	number	The non-negative integer for which the factorial will be calculated.	Yes	N/A

Returns

Type	Description	Possible Values
number	The factorial of the input number ( x ).	Any non-negative integer

Constraints

The x parameter must be a non-negative integer.
Factorial of negative integers is undefined, so such input should be avoided.

Example Use

local number = 5

-- The factorial of 5 is 5 * 4 * 3 * 2 * 1 = 120
local result = StatBook.factorial(number)

print(result)

Complex Functions

Functions

erf(x)

`erf(x)`

Overview

The erf function computes the error function of a given real number ( x ). The error function is defined as:

[\text{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt]

In this implementation, the error function is approximated by a series expansion up to 100 terms.

Parameters

Parameter Name	Type	Description	Required	Default Value
`x`	number	The real number for which the error function will be computed.	Yes	N/A

Returns

Type	Description	Possible Values
number	The error function value of the input ( x ).	Any real number

Constraints

The x parameter must be a real number.

Example Use

local number = 1.0

-- The error function of 1.0 will be calculated
local result = StatBook.erf(number)

-- Output will be the error function value for 1.0
print(result)

inverf(x)

`inverf(x)`

Overview

The inverf function calculates the inverse of the error function ( \text{erf}^{-1}(x) ) using the Newton-Raphson method for numerical approximation.

Parameters

Parameter	Type	Description
`x`	Number	The value to find the inverse error function of. Must be in the range ([-1, 1]).

Returns

Return	Type	Description
`inv`	Number	The calculated inverse error function value ( \text{erf}^{-1}(x) ).

Example

local x = 0.5
local result = StatBook.inverf(x)
print(result)  -- Output will be approximately 0.4769

gamma(x)

`gamma(x)`

Overview

The gamma function calculates the Gamma function ( \Gamma(x) ) using the Lanczos approximation method.

Parameters

Parameter	Type	Description
`x`	Number	The value to find the Gamma function of.

Returns

Return	Type	Description
`gam`	Number	The calculated Gamma function value ( \Gamma(x) ).

Example

local x = 5
local result = StatBook.gamma(x)
print(result)

hypergeometric2f1(a, b, c, z)

`hypergeometric2f1(a, b, c, z)`

Overview

The hypergeometric2f1 function calculates the hypergeometric function ( , _2F_1(a, b; c; z) ) using a series approximation.

Parameters

Parameter	Type	Description
`a`	Number	First parameter of the hypergeometric function.
`b`	Number	Second parameter of the hypergeometric function.
`c`	Number	Third parameter of the hypergeometric function.
`z`	Number	Argument for which the hypergeometric function is calculated.

Returns

Return	Type	Description
`hypergeom`	Number	The calculated hypergeometric function value.

Example

local a = 1
local b = 2
local c = 3
local z = 0.5
local result = StatBook.hypergeometric2f1(a, b, c, z)
print(result)  -- Output will vary depending on input parameters

incompleteBeta(a, b, x)

`incompleteBeta(a, b, x)`

Overview

The incompleteBeta function calculates the incomplete Beta function ( I_x(a, b) ) for given parameters ( a ), ( b ), and ( x ).

Parameters

Parameter	Type	Description
`a`	Number	First parameter of the incomplete Beta function.
`b`	Number	Second parameter of the incomplete Beta function.
`x`	Number	Value at which the incomplete Beta function is evaluated.

Returns

Return	Type	Description
`incbeta`	Number	The calculated value of the incomplete Beta function.

Example

local a = 2.5
local b = 1.5
local x = 0.4
local result = StatBook.incompleteBeta(a, b, x)
print(result)  -- Output will vary depending on input parameters

regularizedIncompleteBeta(a, b, x)

`regularizedIncompleteBeta(a, b, x)`

Overview

The regularizedIncompleteBeta function calculates the regularized incomplete Beta function ( I_x(a, b) ) for given parameters ( a ), ( b ), and ( x ).

Parameters

Parameter	Type	Description
`a`	Number	First parameter of the regularized incomplete Beta function.
`b`	Number	Second parameter of the regularized incomplete Beta function.
`x`	Number	Value at which the regularized incomplete Beta function is evaluated.

Returns

Return	Type	Description
`regincbeta`	Number	The calculated value of the regularized incomplete Beta function.

Example

local a = 2.5
local b = 1.5
local x = 0.4
local result = StatBook.regularizedIncompleteBeta(a, b, x)
print(result)  -- Output will vary depending on input parameters

Matrix Operations

Functions

matAdd(A, B)

`matAdd(A, B)`

The matAdd function performs element-wise addition between two matrices A and B. Both matrices must have the same dimensions for the operation to be valid.

Parameters

Parameter	Type	Description	Default
`A`	table	The first matrix, represented as a 2D table.	Required
`B`	table	The second matrix, also represented as a 2D table.	Required

Returns

Variable	Type	Description
`C`	table	A new matrix, represented as a 2D table, that is the result of `A` plus `B`.

Example

local A = {
  {1, 2},
  {3, 4}
}

local B = {
  {2, 1},
  {4, 3}
}

local result = StatsBook.matAdd(A, B)

matSubtract(A, B)

`matSubtract(A, B)`

The matSubtract function performs element-wise subtraction between two matrices A and B. Both matrices must have the same dimensions for the operation to be valid.

Parameters

Parameter	Type	Description	Default
`A`	table	The first matrix, represented as a 2D table.	Required
`B`	table	The second matrix, also represented as a 2D table.	Required

Returns

Variable	Type	Description
`C`	table	A new matrix, represented as a 2D table, that is the result of `A` minus `B`.

Example

local A = {
  {1, 2},
  {3, 4}
}

local B = {
  {2, 1},
  {4, 3}
}

local result = StatsBook.matSubtract(A, B)

Notes

Both matrices A and B must have the same dimensions. Otherwise, the function may throw an error or return incorrect results.

matMult(A, B)

`matMult(A, B)`

The matMult function performs matrix multiplication between two matrices A and B. The function assumes that the matrices are in correct dimensions for multiplication to proceed. It returns a new matrix C which is the result of the multiplication.

Parameters

Parameter	Type	Description	Default
`A`	table	The first matrix, represented as a 2D table.	Required
`B`	table	The second matrix, represented as a 2D table.	Required

Returns

Variable	Type	Description
`resultMatrix`	table	A new matrix represented as a 2D table, resulting from `A` multiplied by `B`.

Example

local A = {
  {1, 2},
  {3, 4}
}

local B = {
  {2, 0},
  {1, 2}
}

local C = StatsBook.matMult(A, B)

##Notes

The function does not handle cases where the matrices are not of compatible dimensions for multiplication. Make sure the number of columns in A matches the number of rows in B.

scalarMatMult(scalar, matrix)

`scalarMatMult(scalar, matrix)`

The scalarMatMult function multiplies a given scalar with every element of a provided matrix. The function returns a new matrix containing the results.

Parameters

Parameter	Type	Description	Default
`scalar`	number	The scalar value to multiply with the matrix.	Required
`matrix`	table	The matrix, represented as a 2D table.	Required

Returns

Variable	Type	Description
`resultMatrix`	table	A new matrix represented as a 2D table, resulting from the scalar multiplication of the input matrix.

Example

local scalar = 2
local matrix = {
  {1, 2},
  {3, 4}
}

local result = StatsBook.scalarMatMult(scalar, matrix)

matTranspose(matrix)

`matTranspose(matrix)`

The matTranspose function takes a given matrix matrix and returns its transpose. The transpose of a matrix is obtained by flipping the matrix over its diagonal.

Parameters

Parameter	Type	Description	Default
`matrix`	table	The matrix to be transposed, represented as a 2D table.	Required

Returns

Variable	Type	Description
`resultMatrix`	table	A new matrix represented as a 2D table, which is the transpose of the input matrix `matrix`.

Example

local matrix = {
  {1, 2},
  {3, 4},
  {5, 6}
}

local result = StatsBook.matTranspose(matrix)

matInverse(matrix)

`matInverse(matrix)`

The matInverse function calculates the inverse of a square matrix, if it exists. The function will return nil if the matrix is not square or if the determinant is zero (indicating that the matrix is not invertible).

Parameters

Parameter	Type	Description	Default
`matrix`	table	The square matrix to be inverted, represented as a 2D table.	Required

Returns

Variable	Type	Description
`inverseMatrix`	table	A new matrix represented as a 2D table, which is the inverse of the input matrix.
`nil`	nil	If the matrix is not square or if the matrix is singular (determinant is zero).

Example

local matrix = {
  {2, -1, 0},
  {-1, 2, -1},
  {0, -1, 2}
}

local result = StatsBook.matInverse(matrix)

Notes

If the matrix is not square or if the matrix is singular (determinant is zero), the result is nil. Ensure your matrix is square and the determinant is not 0.

SpaceDice999 · October 5, 2023, 1:01am

Interesting. Most of the functions here (the inferential and random variates specifically) aren’t particularly useful to the average developer and game development but is more suited towards something like plugins. I’ve thought of using 2 sample t-tests to my plugin at some point but I couldn’t ensure independence nor randomness with the samples.

Are there plans to add stuff like skewness and kurtosis? Or methods to detect outliers?

armenia1997 · October 5, 2023, 6:33am

I have to agree with you that the inferential function will have limited use. These tests are independent → categorical (or continuous) and dependent → continuous, which will have limited use in game development. However, if you are indeed comparing two different populations of, for instance, Premium and non-Premium users (or male vs female) and seeing whether one is more likely to do something than the other, then it will have use.

I find it interesting that you say random variates are not particularly useful, when I was actually thinking that would be the most useful aspect of this module. For instance see this code below:

Problem with math.random

function skewedRandom()
    local randomNumber = math.random()
    
    if randomNumber < 0.2 then
        return 4
    elseif randomNumber < 0.3 then
        return 3 
    elseif randomNumber < 0.4 then
        return 5 
    elseif randomNumber < 0.5 then
        return 2  
    elseif randomNumber < 0.6 then
        return 6  
    elseif randomNumber < 0.7 then
        return 1
    elseif randomNumber < 0.8 then
        return 7
    elseif randomNumber < 0.9 then
        return 8 
    elseif randomNumber < 0.95 then
        return 9  
    else
        return 10 
    end
end

Do you know of a method that reliably gives near-zero discreteness of random generation of non-uniform distributions only being able to use math.random?

Sure, I guess your point is that the discreteness is a feature which developers are willing to trade for simplicity (kinda…) when trying to mimic a non-uniform distribution. But if one really cares about their probability distribution to be purely continuous with more detailed and malleable shapes, then my module provides a number of solutions to that problem.

You are right that independent t-tests (or any kind of congruent test w different # of categories like ANOVA) need to ensure independence and randomness. Randomness is a hard one to nail down, and ultimately, there is nothing I can do in my code to steer the user towards collecting a random sample.

However, since you mention t-tests generally (which I presume includes the category of a paired t-test), I don’t see why independence would be a requirement. If you have a dependent test (checking same sample over different treatments), you just need the right type of test (paired t-test, signed rank, Friedman Test).

Absolutely, and thanks for the idea! That will be easy and effective to implement.

onlinefeladatok · October 8, 2023, 3:05pm

wow i dont know what this is and i didnt even read it
so innovational
im definetely gonna use this.
great:grin:

armenia1997 · October 8, 2023, 8:15pm

I will be uploading a simple video tutorial soon. I understand it’s difficult for people to follow along just from text alone.

𝚺 StatBook 𝚽 [V1 FREE] | One-Line Statistics with MLR, ANOVA, KDE + 52 others!

INTRODUCING StatBook v1!

Who needs R or MATLAB when you have ROBLOX

Featuring Hypothesis Testing such as Multiple Linear Regression and ANOVA, Kernel Density Estimation, Markov Chains, Gamma/Beta Distribution Generation, and more with just ONE LINE OF CODE!

Download [FREE]

Documentation *highly recommended!!!

GitHub

CDF Calculation link for Random Variable Generation *translate the page to english

Graph Functions for Custom Distribution Creation

Get Started

How to Use StatBook_v1 Module

Step 1: Download the Module

Step 2: Place the Module

Step 3: Import the Module in Your Script

Step 4: Use Functions from the Module

And you are all set!

Summary of Capabilities

Hypothesis Testing

Random Variate Generation

Basic Statistics

Complex Functions

Matrix Operations

Hypothesis Testing

How can game developers use these functions?

Functions

Inference on Numerical Samples (ANOVA, one/two sample independent/dependent t-tests, Wilcoxon rank tests, etc.)

inference(samples, independent, CL, mu0)

Overview

Parameters

Returns

postHoc and postHocSig subfields (only for 3+ sample tests)

Examples

Multiple Linear Regression w/ Mallow’s C(p) forward selection

multipleLinearRegression(X, Y, forwardReg, diagnostics, CL)

Overview

Parameters

Returns (if diagnostics ~= true)

Returns (if diagnostics = true)

lmNew and lmOrig Subfields*

BetaInfo Sub-subfields

VIFs Sub-subfields

Example Usage

Subsequent Usage

predictY(X, model, yHat, indices)

Overview

Parameters

Returns

Example

Random Variate Generation - Go Beyond math.random!

Functions

Scalable Random Generation with Distributions

generateStandardNormalScaled(desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

Parameters

Returns

Example

generateNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

Parameters

Returns

Example

generateLogNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

Parameters

Returns

Example

generateCauchyScaled(x0, gamma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

Parameters

Returns

Example

generateExponentialScaled(lambda, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

Parameters

Returns

Example

generateGammaScaled(alpha, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)

Overview

Parameters

Returns

Documentation **highly recommended!!!*

`inference(samples, independent, CL, mu0)`

`postHoc` and `postHocSig` subfields (only for 3+ sample tests)

`multipleLinearRegression(X, Y, forwardReg, diagnostics, CL)`

Returns (if `diagnostics` ~= `true`)

Returns (if `diagnostics` = `true`)

`lmNew` and `lmOrig` Subfields*

`BetaInfo` Sub-subfields

`VIFs` Sub-subfields

`predictY(X, model, yHat, indices)`

Random Variate Generation - Go Beyond `math.random`!

`generateStandardNormalScaled(desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`generateNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`generateLogNormalScaled(mu, sigma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`generateCauchyScaled(x0, gamma, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`generateExponentialScaled(lambda, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`generateGammaScaled(alpha, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`generateBetaScaled(alpha, beta, desiredMin, desiredMax, LQpercent, UQpercent, lowerQuantile, upperQuantile)`

`customizedDistribution(piecewiseFunctions, desiredMin, desiredMax)`

`randomFromDataset(values, kernel, percentageOfFrameTime, bandwidth)`

`markovChain(states, transitionProbs, startState, length, returnFullSequence)`

`generateStandardNormal()`

`generateNormal(mu, sigma)`

`generateGamma(alpha, beta)`

`generateInverseGamma(alpha, beta)`

`generateExponential(lambda)`

`generateBeta(alpha, beta)`

`generateBetaPrime(alpha, beta)`

`generateLogNormal(mu, sigma)`