schools {CMatching} R Documentation

## Schools data set (NELS-88)

### Description

Data set used by Kreft and De Leeuw in their book Introducing Multilevel Modeling, Sage (1988) to analyse the relationship between math score and time spent by students to do math homework. The data set is a subsample of NELS-88 data consisting of 10 handpicked schools from the 1003 schools in the full data set. Students are nested within schools and information is available both at the school and student level.

### Usage

`data("schools")`

### Format

A data frame with 260 observations on the following 19 variables.

`schid`

School ID: a numeric vector identyfing each school.

`stuid`

The student ID.

`ses`

Socioeconomic status.

`meanses`

Mean ses for the school.

`homework`

The number of hours spent weekly doing homeworks.

`white`

A dummy for white race (=1) versus non-white (=0).

`parented`

Parents highest education level.

`public`

Public school: 1=public, 0=non public.

`ratio`

Student-teacher ratio.

`percmin`

Percent minority in school.

`math`

Math score

`sex`

Sex: 1=male, 2=female.

`race`

Race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American.

`sctype`

Type of school: 1=public, 2=catholic, 3= Private other religion, 4=Private non-r.

`cstr`

Classroom environment structure: ordinal from 1=not accurate to 5=very much accurate.

`scsize`

School size: ordinal from 1=[1,199) to 7=[1200+).

`urban`

Urbanicity: 1=Urban, 2=Suburban, 3=Rural.

`region`

Geographic region of the school: NE=1,NC=2,South=3,West=4.

`schnum`

Standardized school ID.

### Source

Ita G G Kreft, Jan De Leeuw 1988. Introducing Multilevel Modeling, Sage National Education Longitudinal Study of 1988 (NELS:88): https://nces.ed.gov/surveys/nels88/

### Examples

```data(schools)

# Kreft and De Leeuw, Introducing Multilevel Modeling, Sage (1988).
# The data set is the subsample of NELS-88 data consisting of 10 handpicked schools
# from the 1003 schools in the full data set.

# Suppose that the effect of homeworks on math score is unconfounded conditional on X and
# unobserved school features (we assume this only for illustrative purpouse)

# Let us consider the following variables:

X<-schools\$ses #X<-as.matrix(schools[,c("ses","white","public")])
Y<-schools\$math
Tr<-ifelse(schools\$homework>1,1,0)
Group<-schools\$schid
# Note that when Group is missing, NULL or there is only one Group the function
# returns the output of the Match function with a warning.

# Let us assume that the effect of homeworks (Tr) on math score (Y)
# is unconfounded conditional on X and other unobserved schools features.
# Several strategies to handle unobserved group characteristics
# are described in Arpino & Cannas, 2016 (see References).

# Multivariate Matching on covariates in X
#(default parameters: one-to-one matching on X with replacement with a caliper of 0.25).

### Matching within schools
mw<-MatchW(Y=Y, Tr=Tr, X=X, Group=Group, caliper=0.1)

# compare balance before and after matching
bmw  <- MatchBalance(Tr~X,data=schools,match.out=mw)

# calculate proportion of matched observations
(mw\$orig.treated.nobs-mw\$ndrops)/mw\$orig.treated.nobs

# check number of drops by school
mw\$orig.ndrops.by.group

# examine output
mw                   # complete list of results
summary(mw)  # basic statistics

#### Propensity score matching

# estimate the propensity score (ps) model

mod <- glm(Tr~ses+parented+public+sex+race+urban,
eps <- fitted(mod)

# eg 1: within-school propensity score matching
psmw <- MatchW(Y=schools\$math, Tr=Tr, X=eps, Group=schools\$schid, caliper=0.1)

# We can use other strategies for controlling unobserved cluster covariates
# by using different specifications of ps (see Arpino and Mealli for details):

# eg 2: standard propensity score matching using ps estimated
# from a logit model with dummies for schools

mod <- glm(Tr ~ ses + parented + public + sex + race + urban
eps <- fitted(mod)

dpsm <- MatchW(Y=schools\$math, Tr=Tr, X=eps, caliper=0.1)
# this is equivalent to run Match with X=eps

# eg3: standard propensity score matching using ps estimated from
# multilevel logit model (random intercept at the school level)

require(lme4)
mod<-glmer(Tr ~ ses + parented + public + sex + race + urban + (1|schid),