get_job_suggestions {occupationMeasurement} | R Documentation |
Make coding suggestions based on a user's open-ended text input.
Description
Given a text
input, find up to num_suggestions
possible occupation categories.
Usage
get_job_suggestions(
text,
suggestion_type = "auxco-1.2.x",
num_suggestions = 5,
suggestion_type_options = list(),
aggregate_score_threshold = 0.02,
item_score_threshold = 0,
distinctions = TRUE,
steps = list(simbased_wordwise = list(algorithm = algo_similarity_based_reasoning,
parameters = list(sim_name = "wordwise")), simbased_substring = list(algorithm =
algo_similarity_based_reasoning, parameters = list(sim_name = "substring"))),
include_general_id = FALSE
)
Arguments
text |
The raw text input from the user. |
suggestion_type |
Which type of suggestion to use / provide. Possible options are "auxco-1.2.x" and "kldb-2010". |
num_suggestions |
The maximum number of suggestions to show. This is an upper bound and less suggestions may be returned. Defaults to 5. |
suggestion_type_options |
A list with options for generating
suggestions. Supported options:
- |
aggregate_score_threshold |
A single value or named list of thresholds
between 0 and 1. If it is a list, each entry should correspond to one of
the |
item_score_threshold |
A threshold between 0 and 1 (usually very small, default 0). Results from any step will only be returned if they are greater than the specified threshold. Allows the removal of highly implausible suggestions. |
distinctions |
Whether or not to add additional distinctions to similar occupational categories to the source code. Defaults to TRUE. |
steps |
A list with the algorithms to use and their parameters. Each entry of the list should contain a nested list with two entries: algorithm (the algorithm's function itself) and parameters (the parameters to pass onto the algorithm). Each algorithm will also always have access to a default set of three parameters:
list( # try similarity "one word at most 1 letter different" first list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "wordwise", min_aggregate_prob = 0.535 ) ), # since everything else failed, try "substring" similarity list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "substring", min_aggregate_prob = 0.02 ) ) ) |
include_general_id |
Whether a general column, called "id" should always be returned. This will automatically contain the appropriate id for different suggestion_types i.e. for "auxco-1-2.x" it will contain the same data as the column "auxco_id". |
Details
The procedure implemented here is, roughly speaking, as follows:
Predict categories from KldB 2010, including their scores. The first algorithm mentioned in
steps
is used (default:algo_similarity_based_reasoning()
).Convert the predicted KldB 2010 categories to
suggestion_type
(default:auxco-1.2.x
, an n:m mapping, scores are mapped accordingly.). See internal functionconvert_suggestions()
for details.Remove predicted categories if their score is below
item_score_threshold
and only keep thenum_suggestions
top-ranked suggestions.Start anew, trying the next algorithm in
steps
, if the the top-ranked suggestions have a low chance to be correct. (Technically, this happens if the summed score of thenum_suggestions
top-ranked suggestions is belowaggregate_score_threshold
.)If
suggestion_type == "auxco-1.2.x"
anddistinctions == TRUE
, insert additional and (highly) similar categories or replace existing ones. See internal functionadd_distinctions_auxco()
. Reorder and keep only thenum_suggestions
top-ranked suggestions. Auxco categories which were added during this step can be identified by their scores: It equals 0.05 for categories with high similarity and 0.005 for categories with medium similarity.
Value
A data.table with suggestions or NULL if no suggestions were found.
Examples
## Not run:
if (interactive()) {
get_job_suggestions("Koch")
}
if (interactive()) {
get_job_suggestions("Schlosser")
}
## End(Not run)