ecoli1 {imbalance} | R Documentation |
Imbalanced binary ecoli protein localization sites
Description
Imbalanced binary dataset containing protein traits for predicting their cellular localization sites.
Usage
ecoli1
Format
A data frame with 336 instances, 77 of which belong to positive class, and 8 variables:
- Mcg
McGeoch's method for signal sequence recognition. Continuous attribute.
- Gvh
Von Heijne's method for signal sequence recognition. Continuous attribute.
- Lip
von Heijne's Signal Peptidase II consensus sequence score. Discrete attribute.
- Chg
Presence of charge on N-terminus of predicted lipoproteins. Discrete attribute.
- Aac
Score of discriminant analysis of the amino acid content of outer membrane and periplasmic proteins. Continuous attribute.
- Alm1
Score of the ALOM membrane spanning region prediction program. Continuous attribute.
- Alm2
score of ALOM program after excluding putative cleavable signal regions from the sequence. Continuous attribute.
- Class
Two possible classes: positive (type im), negative (the rest).
Source
See Also
Original available in UCI ML Repository.