pagerank {birankr} | R Documentation |
Estimate PageRank
Description
Estimate PageRank (centrality scores) of nodes from an edge list or adjacency matrix. If data is a bipartite graph, estimates PageRank based on a one-mode projection of the input. If the data is an edge list, returns ranks ordered by the unique values in the supplied edge list (first by unique senders, then by unique receivers).
Usage
pagerank(
data,
is_bipartite = TRUE,
project_mode = c("rows", "columns"),
sender_name = NULL,
receiver_name = NULL,
weight_name = NULL,
rm_weights = FALSE,
duplicates = c("add", "remove"),
return_data_frame = TRUE,
alpha = 0.85,
max_iter = 200,
tol = 1e-04,
verbose = FALSE
)
Arguments
data |
Data to use for estimating PageRank. Can contain unipartite or bipartite graph data, either formatted as an edge list (class data.frame, data.table, or tibble (tbl_df)) or as an adjacency matrix (class matrix or dgCMatrix). |
is_bipartite |
Indicate whether input data is bipartite (rather than unipartite/one-mode). Defaults to TRUE. |
project_mode |
Mode for which to return PageRank estimates. Parameter ignored if is_bipartite = FALSE. Defaults to "rows" (the first column of an edge list). |
sender_name |
Name of sender column. Parameter ignored if data is an adjacency matrix. Defaults to first column of edge list. |
receiver_name |
Name of sender column. Parameter ignored if data is an adjacency matrix. Defaults to the second column of edge list. |
weight_name |
Name of edge weights. Parameter ignored if data is an adjacency matrix. Defaults to edge weights = 1. |
rm_weights |
Removes edge weights from graph object before estimating PageRank. Defaults to FALSE. |
duplicates |
How to treat duplicate edges if any in data. Parameter ignored if data is an adjacency matrix. If option "add" is selected, duplicated edges and corresponding edge weights are collapsed via addition. Otherwise, duplicated edges are removed and only the first instance of a duplicated edge is used. Defaults to "add". |
return_data_frame |
Return results as a data frame with node names in the first column and ranks in the second column. If set to FALSE, the function just returns a named vector of ranks. Defaults to TRUE. |
alpha |
Dampening factor. Defaults to 0.85. |
max_iter |
Maximum number of iterations to run before model fails to converge. Defaults to 200. |
tol |
Maximum tolerance of model convergence. Defaults to 1.0e-4. |
verbose |
Show the progress of this function. Defaults to FALSE. |
Details
The default optional arguments are likely well-suited for most users. However, it is critical to change the is.bipartite function to FALSE when working with one mode data. In addition, when estimating PageRank in unipartite edge lists that contain nodes with outdegrees or indegrees equal to 0, it is recommended that users append self-ties to the edge list to ensure that the returned PageRank estimates are ordered intuitively.
Value
A dataframe containing each node name and node rank. If return_data_frame changed to FALSE or input data is classed as an adjacency matrix, returns a vector of node ranks. Does not return node ranks for isolates.
References
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. "The pagerank citation ranking: Bringing order to the web". Technical report, Stanford InfoLab, 1999
Examples
#Prepare one-mode data
df_one_mode <- data.frame(
sender = sample(x = 1:10000, size = 10000, replace = TRUE),
receiver = sample(x = 1:10000, size = 10000, replace = TRUE)
)
#Add self-loops for all nodes
unique_ids <- unique(c(df_one_mode$sender, df_one_mode$receiver))
df_one_mode <- rbind(df_one_mode, data.frame(sender = unique_ids,
receiver = unique_ids))
#Estimate PageRank in one-mode data
PageRank <- pagerank(data = df_one_mode, is_bipartite = FALSE)
#Estimate PageRank in two-mode data
df_two_mode <- data.frame(
patient_id = sample(x = 1:10000, size = 10000, replace = TRUE),
provider_id = sample(x = 1:5000, size = 10000, replace = TRUE)
)
PageRank <- pagerank(data = df_two_mode)