yeast4 {imbalance} | R Documentation |
Imbalanced binary yeast protein localization sites
Description
Imbalanced binary dataset containing protein traits for predicting their cellular localization sites.
Usage
yeast4
Format
A data frame with 1484 instances, 51 of which belong to positive class, and 9 variables:
- Mcg
McGeoch's method for signal sequence recognition. Continuous attribute.
- Gvh
Von Heijne's method for signal sequence recognition. Continuous attribute.
- Alm
Score of the ALOM membrane spanning region prediction program. Continuous attribute.
- Mit
Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins. Continuous attribute.
- Erl
Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute. Discrete attribute.
- Pox
Peroxisomal targeting signal in the C-terminus. Continuous attribute.
- Vac
Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins. Continuous attribute.
- Nuc
Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins. Continuous attribute.
- Class
Two possible classes: positive (membrane protein, uncleaved signal), negative (rest of localizations).
Source
See Also
Original available in UCI ML Repository.