mkseas {seas} | R Documentation |
Make a date into a seasonal factor
Description
Discretizes a date within a year into a bin (or factor
)
for analysis, such as 11-day groups or by month.
Usage
mkseas(x, width = 11, start.day = 1, calendar, year)
Arguments
x |
A It can also be an integer specifying the Julian day (specify
If it is omitted, the full number of days will be calculated for the
year, determined by either |
width |
either |
start.day |
this is the start of the season, specified as either a
as a |
calendar |
used to determine the number of days per year and per
bin; if not specified, a proleptic Gregorian calendar is assumed;
see |
year |
required if |
Details
This useful date function groups days of a year into discrete
bins (or into a factor
). Statistical and plotting
functions can be applied to a variable contained within each bin. An
example of this would be to find the monthly temperature averages,
where month is the bin.
If width
is integer
, the width of each bin
(except for the last) will be exactly width
days. Since the
number of days in a year are not consistent, nor are always perfectly
divisible by width
, the numbers of days in the last bin will
vary. mkseas
determines that last bin must have at least 20% of
the number of observations for a leap year, otherwise it is merged
into the second to last bin (which will have extra numbers of
days). If width
is numeric
(i.e. 366/12
),
the width of each bin varies slightly. Using width = 366/12
is
slightly different than width = "mon"
. Leap years only affect
the last bin.
Other common classifications based on the Gregorian calendar can be
used if width
is given a character
array. All of
these systems are arbitrary: having different numbers of days in each
bin, and leap years affecting the number of days in February. The most
common, of course, is by month ("mon"
). Meteorological
quarterly seasons ("DJF"
) are based on grouping three months,
starting with December. This style of grouping is commonly used in
climate literature, and is preferred over the season names
‘winter’, ‘spring’, ‘summer’, and
‘autumn’, which apply to only one hemisphere. The less common
annual quarterly divisions ("JFM"
) are similar, except that
grouping begins with January. Zodiac divisions ("zod"
) are
included for demonstrative purposes, and are based on the Tropical
birth dates (common in Western-culture horoscopes) starting with Aries
(March 21).
Here are the complete list of options for the width
argument:
-
numeric
: the width of each bin (or group) in days -
366/n
: divide the year inton
sections -
"mon"
: month intervals (abbreviated month names) -
"month"
: month intervals (full month names) -
"DJF"
: meteorological quarterly divisions: DJF, MAM, JJA, SON -
"JFM"
: annual quarterly divisions: JFM, AMJ, JAS, OND -
"JF"
: annual six divisions: JF, MA, AJ, JA, SO, ND -
"zod"
: zodiac intervals (abbreviated symbol names) -
"zodiac"
: zodiac intervals (full zodiac names)
If a non-Gregorian calendar is used (see year.length
),
the number of days in a year can be set using calendar
attribute in the date
column (using attr
).
For example, attr(x$date,"calendar") <- "365_day"
will set the
dates using a 365-day per year calendar, where February is always
28-days in length. If this attribute is not set, it is assumed a
normal Gregorian calendar is used. Calendars with 360-days per year
(or 30-days per month) are incorrectly handled, since February cannot
have 30 days, however this can be forced by including a duplicate
February date in x
for each year.
Value
Returns an array of factor
s for each date given in x
.
The factor also has four attributes: width
, start.day
,
calendar
(assumed to be 366, unless from attribute set in
Date
), and an array days
showing the maximum
number of days in each bin.
See examples for its application.
Locale warning
Month names generated using "mon"
or "months"
are locale
specific, and depend on your operating system and system language
settings. Normally, abbreviated month names should have exactly three
characters or less, with no trailing decimals. However,
Microsoft-based operating systems have an inconsistent set of
abbreviated month names between locales. For example, abbreviated
month names in English locales have three letters with no period at
the end, while French locales have 3–4 letters with a decimal at the
end. If your OS is POSIX, you should have consistent month names in
any locale. This can be fixed by setting
options("seas.month.len") <- 3
, which forces the length of the
months to be three-characters in length.
To avoid any issues supporting locales, or to use English month names,
simply revert to a C locale: Sys.setlocale(loc="C")
.
Note
The phase of the Gregorian solar year (begins Julian day 1, or January
1st) is not in sync with the phase of "DJF"
(begins Julian day
335/336) or "zod"
(begins Julian day 80/81). If either of these
systems are to be used, ensure that there are several years of
data, or that the phase of the data is the same as the beginning
Julian day.
For instance, if one years worth of data beginning on Julian day 1 is
factored into "DJF"
bins, the first bin will mix data from the
first three months, and from the last month. The last three bins will
have a continuous set of data. If the values are not perfectly
periodic, the first bin will have higher variance, due to the mixing
of data separated by nearly a year.
Author(s)
Mike Toews
References
https://en.wikipedia.org/wiki/Solar_calendar
See Also
Examples
# Demonstrate the number of days in each category
ylab <- "Number of days"
barplot(table(mkseas(width="mon", year=2005)),
main="Number of days in each month",
ylab=ylab)
barplot(table(mkseas(width="zod", year=2005)),
main="Number of days in each zodiac sign",
ylab=ylab)
barplot(table(mkseas(width="DJF", year=2005)),
main="Number of days in each meteorological season",
ylab=ylab)
barplot(table(mkseas(width=5, year=2004)),
main="5-day categories", ylab=ylab)
barplot(table(mkseas(width=11, year=2005)),
main="11-day categories", ylab=ylab)
barplot(table(mkseas(width=366 / 12, year=2005)),
main="Number of days in 12-section year",
sub="Note: not exactly the same as months")
# Application using synthetic data
dat <- data.frame(date=as.Date(paste(2005, 1:365), "%Y %j"),
value=(-cos(1:365 * 2 * pi / 365) * 10 + rnorm(365) * 3 + 10))
attr(dat$date, "calendar") <- "365_day"
dat$d5 <- mkseas(dat, 5)
dat$d11 <- mkseas(dat, 11)
dat$month <- mkseas(dat, "mon")
dat$DJF <- mkseas(dat, "DJF")
plot(value ~ date, dat)
plot(value ~ d5, dat)
plot(value ~ d11, dat)
plot(value ~ month, dat)
plot(value ~ DJF, dat)
head(dat)
tapply(dat$value, dat$month, mean, na.rm=TRUE)
tapply(dat$value, dat$DJF, mean, na.rm=TRUE)
dat[which.max(dat$value),]
dat[which.min(dat$value),]
# start on a different day
st.day <- as.Date("2000-06-01")
dat$month <- mkseas(dat, "mon", start.day=st.day)
dat$d11 <- mkseas(dat, 11, start.day=st.day)
dat$DJF <- mkseas(dat, "DJF", start.day=st.day)
plot(value ~ d11, dat,
main=.seasxlab(11, start.day=st.day))
plot(value ~ month, dat,
main=.seasxlab("mon", start.day=st.day))
plot(value ~ DJF, dat,
main=.seasxlab("DJF", start.day=st.day))