A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker


[Up] [Top]

Documentation for package ‘robotstxt’ version 0.7.13

Help Pages

%>% re-export magrittr pipe operator
as.list.robotstxt_text Method as.list() for class robotstxt_text
fix_url fix_url
get_robotstxt downloading robots.txt file
get_robotstxts function to get multiple robotstxt files
get_robotstxt_http_get storage for http request response objects
guess_domain function guessing domain from path
http_domain_changed http_domain_changed
http_subdomain_changed http_subdomain_changed
http_was_redirected http_was_redirected
is_suspect_robotstxt is_suspect_robotstxt
is_valid_robotstxt function that checks if file is valid / parsable robots.txt file
list_merge Merge a number of named lists in sequential order
null_to_defeault null_to_defeault
on_client_error_default rt_request_handler
on_domain_change_default rt_request_handler
on_file_type_mismatch_default rt_request_handler
on_not_found_default rt_request_handler
on_redirect_default rt_request_handler
on_server_error_default rt_request_handler
on_sub_domain_change_default rt_request_handler
on_suspect_content_default rt_request_handler
parse_robotstxt function parsing robots.txt
paths_allowed check if a bot has permissions to access page(s)
paths_allowed_worker_spiderbar paths_allowed_worker spiderbar flavor
print.robotstxt printing robotstxt
print.robotstxt_text printing robotstxt_text
remove_domain function to remove domain from path
request_handler_handler request_handler_handler
robotstxt Generate a representations of a robots.txt file
rt_cache get_robotstxt() cache
rt_last_http storage for http request response objects
rt_request_handler rt_request_handler