ipOversampling {sambia}R Documentation

Plain replication of each observation by inverse-probability weights

Description

This method corrects for the sample selection bias by the plain replication of each observation in the sample according to its IP weight, i.e. in a stratified random sample one replicates an observation of stratum h by the factor w_h.

Usage

ipOversampling(data, weights, normalize = FALSE)

Arguments

data

a data frame containing the observations rowwise, along with their corresponding categorical strata feature(s).

weights

a numerical vector whose length must coincide with the number of the rows of data. The i-th value contains the inverse-probability e.g. determines how often the i-th observation of data shall be replicated.

normalize

If weight vector should be normalized, i.e. the smallest entry of the vector will be set to 1.

Details

If the numeric vector contains numbers which are not natural but real, they will be rounded.

Author(s)

Norbert Krautenbacher, Kevin Strauss, Maximilian Mandl, Christiane Fuchs

Examples

library(smotefamily)
library(sambia)
data.example <- sample_generator(100,ratio = 0.80)
result <- gsub('n','0',data.example[,'result'])
result <- gsub('p','1',result)
data.example[,'result'] <- as.numeric(result)
weights <- data.example[,'result']
weights <- ifelse(weights==1,1,4)
sample <- sambia::ipOversampling(data.example,weights)

[Package sambia version 0.1.0 Index]