frequency_matching {KODAMA} | R Documentation |
Frequency Matching
Description
A method to select unbalanced groupd in a cohort.
Usage
frequency_matching (data,label,times=5,seed=1234)
Arguments
data |
a data.frame of data. |
label |
a classification of the groups. |
times |
The ratio between the two groups. |
seed |
a single number for random number generation. |
Value
The function returns a list with 2 items or 4 items (if a test data set is present):
data |
the data after the frequency matching. |
label |
the label after the frequency matching. |
selection |
the rows selected for the frequency matching. |
Author(s)
Stefano Cacciatore
References
Cacciatore S, Luchinat C, Tenori L
Knowledge discovery by accuracy maximization.
Proc Natl Acad Sci U S A 2014;111(14):5117-22. doi: 10.1073/pnas.1220873111. Link
Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA
KODAMA: an updated R package for knowledge discovery and data mining.
Bioinformatics 2017;33(4):621-623. doi: 10.1093/bioinformatics/btw705. Link
Examples
data(clinical)
hosp=clinical[,"Hospital"]
gender=clinical[,"Gender"]
GS=clinical[,"Gleason score"]
BMI=clinical[,"BMI"]
age=clinical[,"Age"]
A=categorical.test("Gender",gender,hosp)
B=categorical.test("Gleason score",GS,hosp)
C=continuous.test("BMI",BMI,hosp,digits=2)
D=continuous.test("Age",age,hosp,digits=1)
# Analysis without matching
rbind(A,B,C,D)
# The order is important. Right is more important than left in the vector
# So, Ethnicity will be more important than Age
var=c("Age","BMI","Gleason score")
t=frequency_matching(clinical[,var],clinical[,"Hospital"],times=1)
newdata=clinical[t$selection,]
hosp.new=newdata[,"Hospital"]
gender.new=newdata[,"Gender"]
GS.new=newdata[,"Gleason score"]
BMI.new=newdata[,"BMI"]
age.new=newdata[,"Age"]
A=categorical.test("Gender",gender.new,hosp.new)
B=categorical.test("Gleason score",GS.new,hosp.new)
C=continuous.test("BMI",BMI.new,hosp.new,digits=2)
D=continuous.test("Age",age.new,hosp.new,digits=1)
# Analysis with matching
rbind(A,B,C,D)