R: Subset the top percent of a dataframe by a specific column

top.percent.by {fishRman}

R Documentation

Subset the top percent of a dataframe by a specific column

Description

Function that sorts a dataframe in descending order for a specific column, calculates the sum of all rows for that column, applies the chosen percentage to said sum, and subsets the minimum number of consecutive rows needed to reach this value.

Usage

top.percent.by(df, percentage, by)

Arguments

`df`	A dataframe object as downloaded from GFW's Google Big Data Query.
`percentage`	Number. The 'x' in 'the top x percent of the dataframe'.
`by`	Character. The name of the column for which the percentage will be calculated.

Value

A dataframe.

Examples


dated <- c("2020-01-01", "2020-01-02")
lat <- c(40, 41)
lon <- c(12,13)
mmsi <- c("34534555", "25634555")
hours <- c(0, 5)
fishing_hours <- c(1,9)

df <- data.frame(dated, lat, lon, mmsi, hours, fishing_hours)

who.fishs.the.most <- top.percent.by(df, 90, "fishing_hours")

print(who.fishs.the.most)

[Package fishRman version 1.2.3 Index]