buildNum {conjurer} | R Documentation |
Build Numeric Data
buildNum(n, st, en, disp, outliers)
n |
A number. This specifies the number of values to be generated. |
st |
A number. This defines the starting value of the number of data points. |
en |
A number. This defines the ending value of the number of data points. |
disp |
A number between |
outliers |
A number. This signifies the presence of outliers. If set to value 1, then outliers are generated randomly. If set to value 0, then no outliers are generated. The presence of outliers is a very common occurrence and hence setting the outliers to 1 is recommended. However, there are instances where outliers are not needed. For example, if the objective of data generation is solely for visualization purposes then outliers may not be needed. |
This function helps in generating numeric data such as age, height, weight etc. This function could be used along with other functions such as buildCust
to make it more meaningful. The data distribution function uses the formulation of
sin((r*a)*x) + c
Where,
r is the random value such that 0.8 <= r <= 1.2
. This adds +/-
20% randomness to the parameter a
.
a is the parameter such that, -(pi/2) <= a <= (pi/2)
.
x is a variable such that, (pi/2) <= x <= (pi/2)
.
c is a constant such that 2 <= c <= 5
.
The key component of this function is disp
. This helps in controlling the dispersion of the distribution. Let us assume that one would like to generate age of people in years. Furthermore, let us assume that the range of the age is between 23 and 80. If disp = 1
, then the function will generate more data with a negative slope i.e more people with age closer to 23 than 80. If disp = 1
is used, then the opposite will be true. However, if one would like to generate data that is visually similar to normal distribution i.e more people in the middle age group and less towards 23 or 80, then disp = 0.5
could be used.
It is recommended to firstly plot the code and inspect visually to check which distribution is needed.
A dataframe
age <- buildNum(n = 10, st = 23, en = 80, disp = 0.5, outliers = 1)
plot(age) #visualize the resulting distribution