screen_time {survivoR} | R Documentation |
Screen Time
Description
A dataset summarising the screen time of contestants on the TV show Survivor. Currently only contains Season 1-4 and 42.
Usage
screen_time
Format
This data frame contains the following columns:
version_season
Version season key
episode
Episode number
castaway_id
ID of the castaway (primary key). Also includes two special IDs of host (i.e. Jeff Probst) or unknown (the image detection couldn't identify the face with sufficient accuracy)
screen_time
Estimated screen time for the individual in seconds.
Details
Individuals' screen time is calculated, at a high-level, via the following process:
Frames are sampled from episodes on a 1 second time interval
MTCNN detects the human faces within each frame
VGGFace2 converts each detected face into a 512d vector space
A training set of labelled images (1 for each contestant + 3 for Jeff Probst) is processed in the same way to determine where they sit in the vector space. TODO: This could be made more accurate by increasing the number of training images per contestant.
The Euclidean distance is calculated for the faces detected in the frame to each of the contestants in the season (+Jeff). If the minimum distance is greater than 1.2 the face is labelled as "unknown". TODO: Review how robust this distance cutoff truly is - currently based on manual review of Season 42.
A multi-class SVM is trained on the training set to label faces. For any face not identified as "unknown", the vector embedding is run into this model and a label is generated.
All labelled faces are aggregated together, with an assumption of 1 full second of screen time each time a face is seen.