google_analytics {googleAnalyticsR} | R Documentation |
Get Google Analytics v4 data
Description
Fetch Google Analytics data using the v4 API. For the v3 API use google_analytics_3, for GA4's Data API use ga_data. See website help for lots of examples: Google Analytics Reporting API v4 in R
Usage
google_analytics(
viewId,
date_range = NULL,
metrics = NULL,
dimensions = NULL,
dim_filters = NULL,
met_filters = NULL,
filtersExpression = NULL,
order = NULL,
segments = NULL,
pivots = NULL,
cohorts = NULL,
max = 1000,
samplingLevel = c("DEFAULT", "SMALL", "LARGE"),
metricFormat = NULL,
histogramBuckets = NULL,
anti_sample = FALSE,
anti_sample_batches = "auto",
slow_fetch = FALSE,
useResourceQuotas = NULL,
rows_per_call = 10000L
)
google_analytics_4(...)
Arguments
viewId |
viewId of data to get. |
date_range |
character or date vector of format |
metrics |
Metric(s) to fetch as a character vector. You do not need to
supply the |
dimensions |
Dimension(s) to fetch as a character vector. You do not need to
supply the |
dim_filters |
A filter_clause_ga4 wrapping dim_filter |
met_filters |
A filter_clause_ga4 wrapping met_filter |
filtersExpression |
A v3 API style simple filter string. Not used with other filters. |
order |
An order_type object |
segments |
List of segments as created by segment_ga4 |
pivots |
Pivots of the data as created by pivot_ga4 |
cohorts |
Cohorts created by make_cohort_group |
max |
Maximum number of rows to fetch. Defaults at 1000. Use -1 to fetch all results. Ignored when |
samplingLevel |
Sample level |
metricFormat |
If supplying calculated metrics, specify the metric type |
histogramBuckets |
For numeric dimensions such as hour, a list of buckets of data. |
anti_sample |
If TRUE will split up the call to avoid sampling. |
anti_sample_batches |
"auto" default, or set to number of days per batch. 1 = daily. |
slow_fetch |
For large, complicated API requests this bypasses some API hacks that may result in 500 errors. For smaller queries, leave this as |
useResourceQuotas |
If using GA360, access increased sampling limits.
Default |
rows_per_call |
Set how many rows are requested by the API per call, up to a maximum of 100000. |
... |
Arguments passed to google_analytics |
Value
A Google Analytics data.frame, with attributes showing row totals, sampling etc.
Row requests
By default the API call will use v4 batching that splits requests into 5 separate calls of 10k rows each. This can go up to 100k, so this means up to 500k rows can be fetched per API call, however the API servers will fail with a 500 error if the query is too complicated as the processing time at Google's end gets too long. In this case, you may want to tweak the rows_per_call
argument downwards, or fall back to using slow_fetch = FALSE
which will send an API request one at a time. If fetching data via scheduled scripts this is recommended as the default.
Anti-sampling
anti_sample
being TRUE ignores max
as the API call is split over days
to mitigate the sampling session limit, in which case a row limit won't work. Take the top rows
of the result yourself instead e.g. head(ga_data_unsampled, 50300)
anti_sample
being TRUE will also set samplingLevel='LARGE'
to minimise
the number of calls.
Resource Quotas
If you are on GA360 and have access to resource quotas,
set the useResourceQuotas=TRUE
and set the Google Cloud
client ID to the project that has resource quotas activated,
via gar_set_client or options.
Caching
By default local caching is turned on for v4 API requests. This means that making the same request as one this session will read from memory and not make an API call. You can also set the cache to disk via the ga_cache_call function. This can be useful when running RMarkdown reports using data.
Metrics
Metrics support calculated metrics like ga:users / ga:sessions if you supply them in a named vector.
You must supply the correct 'ga:' prefix unlike normal metrics
You can mix calculated and normal metrics like so:
customMetric <- c(sessionPerVisitor = "ga:sessions / ga:visitors", "bounceRate", "entrances")
You can also optionally supply a metricFormat
parameter that must be
the same length as the metrics. metricFormat
can be:
METRIC_TYPE_UNSPECIFIED, INTEGER, FLOAT, CURRENCY, PERCENT, TIME
All metrics are currently parsed to as.numeric when in R.
Dimensions
Supply a character vector of dimensions, with or without ga:
prefix.
Optionally for numeric dimension types such as
ga:hour, ga:browserVersion, ga:sessionsToTransaction
, etc. supply
histogram buckets suitable for histogram plots.
If non-empty, we place dimension values into buckets after string to int64. Dimension values that are not the string representation of an integral value will be converted to zero. The bucket values have to be in increasing order. Each bucket is closed on the lower end, and open on the upper end. The "first" bucket includes all values less than the first boundary, the "last" bucket includes all values up to infinity. Dimension values that fall in a bucket get transformed to a new dimension value. For example, if one gives a list of "0, 1, 3, 4, 7", then we return the following buckets: -
bucket #1: values < 0, dimension value "<0"
bucket #2: values in [0,1), dimension value "0"
bucket #3: values in [1,3), dimension value "1-2"
bucket #4: values in [3,4), dimension value "3"
bucket #5: values in [4,7), dimension value "4-6"
bucket #6: values >= 7, dimension value "7+"
Examples
## Not run:
library(googleAnalyticsR)
## authenticate, or use the RStudio Addin "Google API Auth" with analytics scopes set
ga_auth()
## get your accounts
account_list <- ga_account_list()
## account_list will have a column called "viewId"
account_list$viewId
## View account_list and pick the viewId you want to extract data from
ga_id <- 123456
# examine the meta table to see metrics and dimensions you can query
meta
## simple query to test connection
google_analytics(ga_id,
date_range = c("2017-01-01", "2017-03-01"),
metrics = "sessions",
dimensions = "date")
## change the quotaUser to fetch under
google_analytics(1234567, date_range = c("30daysAgo", "yesterday"), metrics = "sessions")
options("googleAnalyticsR.quotaUser" = "test_user")
google_analytics(1234567, date_range = c("30daysAgo", "yesterday"), metrics = "sessions")
## End(Not run)