csu_merge_cases_pop {Rcan}R Documentation

csu_merge_cases_pop

Description

csu_merge_cases_pop merges registry data and population data, group by year and other user defined variable (sex, registry, etc...).

Usage

csu_merge_cases_pop(df_cases, 
	df_pop,
	var_age,
	var_cases="cases",
	var_py=NULL,
	group_by=NULL) 
	

Arguments

df_cases

Registry data group by 5 years-age group (need to be R data.frame format, see examples to import csv file).

df_pop

Population data group by 5-years age group (need to be R data.frame format, see examples to import csv file).

var_age

Age variable. Several format are accepted

1 0-4 0
2 5-9 5
3 10-14 10
... ... ...
17 80-84 80
18 85+ 85

This variable must be a variable with the same column name in both dataset (df_cases and df_pop).
Age >= 85 in the df_pop dataset will be aggregated as 85+.

var_cases

Cases variable in the df_cases dataset.

var_py

(Optional) If population is "long format", name of the population variable in the df_pop dataset.
If population data is wide format (see details), var_py must be NULL.

group_by

(Optional) A vector of variables to create the different population (sex, country, etc...).
Each variable must be a variable with the same column name in both dataset (df_cases and df_pop).
Do not include the "year" variable since it is automatically detected (see details).

Details

This function merges registry data and population for further analysis.
Both datasets must be group by 5-years age group.
If present, the year information in format "yyyy" will be detected automatically.
2 formats are accepted for population data:.
Long format: (year and population are 2 variables)

sex age pop year
1 1 116128 2005
1 2 130995 2005
1 3 137556 2005
... ... ... ...
2 16 27171 2007
2 17 13585 2007
2 18 13585 2007

Wide format: (One column per year and no population variable, "yyyy" year format must be included in columns name)

sex age Y2013 Y2014 Y2015
1 0-4 215607 237346 247166
1 5-9 160498 152190 152113
1 10-14 175676 171794 165406
... ... ... ... ...
2 75-79 20625 20868 23434
2 80-84 7187 7276 7620
2 85+ 2551 2597 2617

Value

Return a dataframe.

Author(s)

Mathieu Laversanne

See Also

csu_group_cases csu_asr csu_cumrisk csu_eapc csu_ageSpecific csu_ageSpecific_top csu_bar_top csu_time_trend csu_trendCohortPeriod

Examples


# you can import your data from csv file using read.csv:
# mydata <-  read.csv("mydata.csv", sep=",")

data(ICD_group_GLOBOCAN)
data(data_individual_file)
data(data_population_file)

#group individual data by 
# 5 year age group 
# ICD grouping from dataframe ICD_group_GLOBOCAN
# year (extract from date of incidence)

df_data_year <- csu_group_cases(data_individual_file,
  var_age="age",
  group_by=c("sex", "regcode", "reglabel"),
  df_ICD = ICD_group_GLOBOCAN,
  var_ICD  ="site",
  var_year = "doi")     

#Merge 5-years age grouped data with population by year (automatic) and sex

df_data <- csu_merge_cases_pop(
	df_data_year, 
	data_population_file, 
	var_age = "age_group",
	var_cases = "cases",
	var_py = "pop",
	group_by = c("sex"))


# you can export your result as csv file using write.csv:
# write.csv(result, file="result.csv")
				  		  

[Package Rcan version 1.3.82 Index]