| extract_salary {priceR} | R Documentation | 
Extract numeric salary from text data
Description
Extract numeric salary from text data. 'extract_salary' automatically converts weekly and hourly rates to amounts per annum.
Usage
extract_salary(salary_text, exclude_below, exclude_above, salary_range_handling,
include_periodicity, hours_per_workday, days_per_workweek, working_weeks_per_year)
Arguments
| salary_text | A character string, or vector of character strings. | 
| exclude_below | A lower bound. Anything lower than this number will be replaced with NA. | 
| exclude_above | An upper bound. Anything above this number will be replaced with NA. | 
| salary_range_handling | A method of handling salary ranges. Defaults to returning an average of the range; can also be set to "max" or "min". | 
| include_periodicity | Set to TRUE to return an additional column stating the detected peridicity in the character string. Periodicity is assumed to be 'Annual' unless evidence is found to the contrary. | 
| hours_per_workday | Set assumed number of hours in the workday. Only affects annualisation of rates indentified as Daily. Default is 8 hours. | 
| days_per_workweek | Set assumed number of days per workweek. Only affects annualisation of rates indentified as Daily. Default is 5 days. | 
| working_weeks_per_year | Set assumed number of working weeks in the year. Only affects annualisation of rates indentified as Daily or Weekly. Default is 50 weeks. | 
Value
A data.frame of 1 column, or 2 columns if include_periodicity is set to TRUE
Examples
# Provide a salary string and 'extract_salary' and will extract the salary and return it
extract_salary("$160,000 per annum")
# 160000
# If a range is present, the average will be taken by default
extract_salary("$160,000 - $180000.00 per annum")
# 170000
# Take the 'min' or 'max' of a salary range by setting salary_range_handling parameter accordingly
extract_salary("$160,000 - $180000.00 per annum", salary_range_handling = "min")
# 160000
# Extract salaries from character string(s)
annual_salaries <- c("$160,000 - $180000.00 per annum",
                     "$160000.00 - $180000.00 per annum",
                     "$145000 - $155000.00 per annum",
                     "$70000.00 - $90000 per annum",
                     "$70000.00 - $90000.00 per annum plus 15.4% super",
                     "$80000.00 per annum plus 15.4% super",
                     "60,000 - 80,000",
                     "$78,686 to $89,463 pa, plus 15.4% superannuation",
                     "80k - 100k")
extract_salary(annual_salaries)
# 170000 170000 150000  80000  53338  40008  70000  56055  90000
# Note the fifth, sixth, and eighth elements are averages including '15' (undesirable)
# Using exclude_below parameter avoids this (see below)
# Automatically detect, extract, and annualise daily rates
daily_rates <- c("$200 daily", "$400 - $600 per day", "Day rate negotiable dependent on experience")
extract_salary(daily_rates)
# 48000 120000     NA
# Automatically detect, extract, and annualise hourly rates
hourly_rates <- c("$80 - $100+ per hour", "APS6/EL1 hourly rate contract")
extract_salary(hourly_rates)
# 172800   6720
# Note 6720 is undesirable. Setting the exclude_below and exclude_above sensibly avoids this
salaries <- c(annual_salaries, daily_rates, hourly_rates)
# Setting lower and upper bounds provides a catch-all to remove unrealistic results
# Out of bounds values will be converted to NA
extract_salary(salaries, exclude_below = 20000, exclude_above = 600000)
# 170000 170000 150000  80000  80000  80000  70000  84074  90000  48000 120000     NA 172800     NA
# extract_salary automatically annualises hourly and daily rates
# It does so by making assumptions about the number of working weeks in a year,
# days per workweek, and hours per workday
# And the assumed number of hours per workday can be changed from the default (8)
# The assumed number of workdays per workweek can be changed from the default (5)
# The assumed number of working weeks in year can be changed from the default (50)
# E.g.
extract_salary(salaries, hours_per_workday = 7, days_per_workweek = 4,
               working_weeks_per_year = 46, exclude_below = 20000)
# 170000 170000 150000  80000  53338  40008  70000  56055  90000  36800  92000     NA 115920     NA
# To see which salaries were detected as hourly or weekly, set include_periodicity to TRUE
extract_salary(salaries, include_periodicity = TRUE, exclude_below = 20000)
# salary periodicity
# 1  170000      Annual
# 2  170000      Annual
# 3  150000      Annual
# 4   80000      Annual
# 5   80000      Annual
# 6   80000      Annual
# 7   70000      Annual
# 8   84074      Annual
# 9   90000      Annual
# 10  48000       Daily
# 11 120000       Daily
# 12     NA       Daily
# 13 172800      Hourly
# 14     NA      Hourly