CollegePlaying {Lahman}R Documentation

CollegePlaying table

Description

Information on schools players attended, by player

Usage

data(CollegePlaying)

Format

A data frame with 17350 observations on the following 3 variables.

playerID

Player ID code

schoolID

school ID code

yearID

Year player attended school

Details

This data set reflects a change in the Lahman schema for the 2015 version. The old SchoolsPlayers table was replaced with this new table called CollegePlaying.

According to the documentation, this change reflects advances in the compilation of this data, largely led by Ted Turocy. The old table reported college attendance for major league players by listing a start date and end date. The new version has a separate record for each year that a player attended. This allows us to better account for players who attended multiple colleges or skipped a season, as well as to identify teammates.

Source

Lahman, S. (2023) Lahman's Baseball Database, 1871-2022, 2022 version, https://www.seanlahman.com/baseball-archive/statistics/

Examples

data(CollegePlaying)
head(CollegePlaying)

## Q: What are the top universities for producing MLB players?
SPcount <- table(CollegePlaying$schoolID)
SPcount[SPcount>50]

library("lattice")
dotplot(SPcount[SPcount>50])
dotplot(sort(SPcount[SPcount>50]))

## Q: How many schools are represented in this dataset?
length(table(CollegePlaying$schoolID))

# Histogram of the number of players from each school who played in MLB:
with(CollegePlaying, 
     hist(table(schoolID), xlab = "Number of players",
                           main = ""))

[Package Lahman version 11.0-0 Index]