Title: | Download and Cache Files Safely |
---|---|
Description: | The goal of dlr is to provide a friendly wrapper around the common pattern of downloading a file if that file does not already exist locally. |
Authors: | Jonathan Bratt [aut] , Jon Harmon [aut, cre] , Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph, fnd] |
Maintainer: | Jon Harmon <[email protected]> |
License: | Apache License (>= 2) |
Version: | 1.0.1.9001 |
Built: | 2024-11-11 05:00:09 UTC |
Source: | https://github.com/macmillancontentscience/dlr |
App cache directories can depend on the user's operating system and an
overall R_USER_CACHE_DIR
environment variable. We also respect a per-app
option (appname.dir
), and a per-app environment variable
(APPNAME_CACHE_DIR
). This function returns the path that will be used for a
given app's cache.
app_cache_dir(appname, verbose = interactive())
app_cache_dir(appname, verbose = interactive())
appname |
Character; the name of the application that will "own" the cache, such as the name of a package. |
The full path to the app's cache directory.
app_cache_dir("myApp")
app_cache_dir("myApp")
Construct the full path to the cached version of a file within a particular app's cache, using the source path of the file to make sure the cache filename is unique.
construct_cached_file_path(source_path, appname, extension = "")
construct_cached_file_path(source_path, appname, extension = "")
source_path |
Character scalar; the full path to the source file. |
appname |
Character; the name of the application that will "own" the cache, such as the name of a package. |
extension |
Character scalar; an optional filename extension. |
The full path to the processed version of source_path in the app's cache directory.
construct_cached_file_path( source_path = "my/file.txt", appname = "dlr", extension = "rds" )
construct_cached_file_path( source_path = "my/file.txt", appname = "dlr", extension = "rds" )
Given the path to a file, construct a unique filename using the hash of the path.
construct_processed_filename(source_path, extension = "")
construct_processed_filename(source_path, extension = "")
source_path |
Character scalar; the full path to the source file. |
extension |
Character scalar; an optional filename extension. |
A unique filename for a processed version of the file.
construct_processed_filename( source_path = "my/file.txt", extension = "rds" )
construct_processed_filename( source_path = "my/file.txt", extension = "rds" )
Create the default path expected by app_cache_dir()
.
create_app_cache_dir(appname)
create_app_cache_dir(appname)
appname |
Character; the name of the application that will "own" the cache, such as the name of a package. |
A normalized path to a cache directory. The directory is created if the user has write access and the directory does not exist.
# Executing this function creates a cache directory. create_app_cache_dir("dlr")
# Executing this function creates a cache directory. create_app_cache_dir("dlr")
This function wraps maybe_process()
, specifying the app's cache directory.
maybe_cache( source_path, appname, filename = construct_processed_filename(source_path), process_f = readRDS, process_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
maybe_cache( source_path, appname, filename = construct_processed_filename(source_path), process_f = readRDS, process_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
source_path |
Character scalar; the path to the raw file. Paths starting
with |
appname |
Character; the name of the application that will "own" the cache, such as the name of a package. |
filename |
Character; an optional filename for the cached version of the
file. By default, a filename is constructed using
|
process_f |
A function or one-sided formula to use to process the source
file. |
process_args |
An optional list of additional arguments to |
write_f |
A function or one-sided formula to use to save the processed
file. The processed object will be passed as the first argument to this
function, and |
write_args |
An optional list of additional arguments to |
force_process |
A logical scalar indicating whether we should process the source file even if the target already exists. This can be particularly useful if you wish to redownload a file. |
The normalized target_path
.
if (interactive()) { target_path <- maybe_cache( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", appname = "dlr", process_f = read.csv ) target_path unlink(target_path) }
if (interactive()) { target_path <- maybe_cache( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", appname = "dlr", process_f = read.csv ) target_path unlink(target_path) }
Sometimes you just need to get a processed file to a particular location, without reading the data. For example, you might need to download a lookup table used by various functions in a package, independent of a particular function call that needs the data. This function does the processing if it hasn't already been done.
maybe_process( source_path, target_path, process_f = readRDS, process_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
maybe_process( source_path, target_path, process_f = readRDS, process_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
source_path |
Character scalar; the path to the raw file. Paths starting
with |
target_path |
Character scalar; the path where the processed version of the file should be stored. |
process_f |
A function or one-sided formula to use to process the source
file. |
process_args |
An optional list of additional arguments to |
write_f |
A function or one-sided formula to use to save the processed
file. The processed object will be passed as the first argument to this
function, and |
write_args |
An optional list of additional arguments to |
force_process |
A logical scalar indicating whether we should process the source file even if the target already exists. This can be particularly useful if you wish to redownload a file. |
The normalized target_path
.
if (interactive()) { temp_filename <- tempfile() maybe_process( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", target_path = temp_filename, process_f = read.csv ) unlink(temp_filename) }
if (interactive()) { temp_filename <- tempfile() maybe_process( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", target_path = temp_filename, process_f = read.csv ) unlink(temp_filename) }
This function wraps read_or_process()
, specifying an app's cache directory
as the target directory.
read_or_cache( source_path, appname, filename = construct_processed_filename(source_path), process_f = readRDS, process_args = NULL, read_f = readRDS, read_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
read_or_cache( source_path, appname, filename = construct_processed_filename(source_path), process_f = readRDS, process_args = NULL, read_f = readRDS, read_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
source_path |
Character scalar; the path to the raw file. Paths starting
with |
appname |
Character; the name of the application that will "own" the cache, such as the name of a package. |
filename |
Character; an optional filename for the cached version of the
file. By default, a filename is constructed using
|
process_f |
A function or one-sided formula to use to process the source
file. |
process_args |
An optional list of additional arguments to |
read_f |
A function or one-sided formula to use to read the processed
file. |
read_args |
An optional list of additional arguments to |
write_f |
A function or one-sided formula to use to save the processed
file. The processed object will be passed as the first argument to this
function, and |
write_args |
An optional list of additional arguments to |
force_process |
A logical scalar indicating whether we should process the source file even if the target already exists. This can be particularly useful if you wish to redownload a file. |
The processed object.
if (interactive()) { austin_smoke_free <- read_or_cache( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", appname = "dlr", process_f = read.csv ) head(austin_smoke_free) } if (interactive()) { # Calling the function a second time gives the result instantly. austin_smoke_free <- read_or_cache( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", appname = "dlr", process_f = read.csv ) head(austin_smoke_free) } if (interactive()) { # Remove the generated file. unlink( construct_cached_file_path( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co" ) ) }
if (interactive()) { austin_smoke_free <- read_or_cache( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", appname = "dlr", process_f = read.csv ) head(austin_smoke_free) } if (interactive()) { # Calling the function a second time gives the result instantly. austin_smoke_free <- read_or_cache( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", appname = "dlr", process_f = read.csv ) head(austin_smoke_free) } if (interactive()) { # Remove the generated file. unlink( construct_cached_file_path( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co" ) ) }
Often, a file must be processed before being usable in R. It can be useful to save the processed contents of that file in a standard format, such as RDS, so that the file does not need to be processed the next time it is loaded.
read_or_process( source_path, target_path, process_f = readRDS, process_args = NULL, read_f = readRDS, read_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
read_or_process( source_path, target_path, process_f = readRDS, process_args = NULL, read_f = readRDS, read_args = NULL, write_f = saveRDS, write_args = NULL, force_process = FALSE )
source_path |
Character scalar; the path to the raw file. Paths starting
with |
target_path |
Character scalar; the path where the processed version of the file should be stored. |
process_f |
A function or one-sided formula to use to process the source
file. |
process_args |
An optional list of additional arguments to |
read_f |
A function or one-sided formula to use to read the processed
file. |
read_args |
An optional list of additional arguments to |
write_f |
A function or one-sided formula to use to save the processed
file. The processed object will be passed as the first argument to this
function, and |
write_args |
An optional list of additional arguments to |
force_process |
A logical scalar indicating whether we should process the source file even if the target already exists. This can be particularly useful if you wish to redownload a file. |
The processed object.
if (interactive()) { temp_filename <- tempfile() austin_smoke_free <- read_or_process( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", target_path = temp_filename, process_f = read.csv ) head(austin_smoke_free) } # Calling the function a second time gives the result instantly. if (interactive()) { austin_smoke_free <- read_or_process( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", target_path = temp_filename, process_f = read.csv ) head(austin_smoke_free) } if (interactive()) { # Remove the generated file. unlink(temp_filename) }
if (interactive()) { temp_filename <- tempfile() austin_smoke_free <- read_or_process( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", target_path = temp_filename, process_f = read.csv ) head(austin_smoke_free) } # Calling the function a second time gives the result instantly. if (interactive()) { austin_smoke_free <- read_or_process( "https://query.data.world/s/owqxojjiphaypjmlxldsp566lck7co", target_path = temp_filename, process_f = read.csv ) head(austin_smoke_free) } if (interactive()) { # Remove the generated file. unlink(temp_filename) }
Override the default paths used by app_cache_dir()
.
set_app_cache_dir(appname, cache_dir = NULL)
set_app_cache_dir(appname, cache_dir = NULL)
appname |
Character; the name of the application that will "own" the cache, such as the name of a package. |
cache_dir |
Character scalar; a path to a cache directory. |
A normalized path to a cache directory. The directory is created if
the user has write access and the directory does not exist. An option is
also set so future calls to app_cache_dir()
will respect the
change.
# Executing this function creates a cache directory. set_app_cache_dir(appname = "dlr", cache_dir = "/my/cache/path")
# Executing this function creates a cache directory. set_app_cache_dir(appname = "dlr", cache_dir = "/my/cache/path")
The default timeout for downloads is 60 seconds. This is not long enough for
many of the files that are downloaded using this package. We therefore supply
a convenience function to easily change this setting. You can permanently
change this default by setting R_DEFAULT_INTERNET_TIMEOUT
in your
.Renviron
.
set_timeout(seconds = 600L)
set_timeout(seconds = 600L)
seconds |
The number of seconds to set as the timeout (default 600 seconds). |
A list with the old timeout
setting (invisibly).
getOption("timeout") old_setting <- set_timeout() getOption("timeout") options(old_setting)
getOption("timeout") old_setting <- set_timeout() getOption("timeout") options(old_setting)