I have command line arguments or configuration parameter stored as strings. I want to use them as numbers, dates, boolean etc. A conversion functionality, using C is usually provided by the sscanf(...) function. The function realizes type conversion and syntax check with a hint to the error location in the incoming string, using format strings to define the conversion context. I know there are POSIX functions to scan strings for date and time formats available (strptime(...)) and the common conversions as.numeric(...)
Is there a similar function like sscanf(...) available for R?
--- @margusl -- use case request -----------------------
In the INI file below, the important lines are:
[PROCESS]
FEATURE.HDR = ACTIVE:NORM:SAMPLES:FILE
FEATURE.FMT = %s:%d:%d:%s"
The pseudo function cfg.get.table(...)
# Psedo functionality
scan <- sscanf(format, str)
# with
sscan <- function(format,str) {
...
return(list(error=str OR NULL, data=list(...))
}
sketches the expected functionality. I added the docopt part to explain the intended CLI/INI functionality for @G. Grothendieck. The question is about an existing sscanf(...) R feature. I am aware that it could be done using Rcpp, but I am not sure how to generate the list from format string the without writing a format parser for the '%XX' tags.
Docopt specification:
usage:
tool (-h | --help)
tool (-v | --version)
tool \
--config <CONFIG.FILE> \
--work-path <WORK.PATH> \
--feature-set <FEATURE.SET>
...
options:
-h, --help Show this screen.
-v, --version Show version.
--config <CONFIG> Configuration file.
--work-path <WORK.PATH> Work path, usually the project path
--feature-set <FEATURE.SET> Feature set chosen from config (input files)
...
Configuration file:
[COMMON]
...
[CLI.TO.CONFIG]
# CLI to CFG transformation rules
WORK.PATH = /PATH/PROJECT
FEATURE.SET = /PROCESS/FEATURE.SET
ALGORITHM = /PROCESS/ALGORITHM
...
# PATH Settings addressing the project setup
[PATH]
PROJECT = ${/SYSTEM/HOME}/project/data
INCOMING = ${/PATH/PROJECT}/Incoming
...
# Parameter needed for the process
[PROCESS]
OVERWRITE = OFF
USE.SUBSETS = OFF
SAMPLE.SIZE = 10000
ALGORITHM = RPART
FEATURE.SET = FEATURE.GRAY.MIN
FEATURE.HDR = ACTIVE:NORM:SAMPLES:FILE
FEATURE.FMT = %s:%d:%d:%s
...
[FEATURE.GRAY.MIN]
SPIX = ON:0:1000:${/PROCESS/SPIXEL.RASTER}
GRAY = OFF:1:2000:${/PATH/INCOMING}/ge-feature-gn.tif
VVIX = ON:0:1000:${/PATH/INCOMING}/ge-feature-b3n-vdni.tif
...
R-Pseudo Code
require(configr)
require(docopt)
source('lib/helper.R')
# Set the CLI specification and the configuration files
DOC.FILE <- 'etc/tool.docopt'
CFG.FILE <- 'etc/tool.ini'
test.args <- c(
"--feature-set", "FEATURE.GRAY.MIN",
"--config", CFG.FILE,
"--work-path", getwd(),
...
)
# Parse the command line parameter
CLI <- cli.parse(DOC.FILE, test.args)
if (! is.null(CLI$error) ) {
io.error(FALSE, "Invalid command line options!\n DETAILS: %s",
CLI$error)
}
# Transfer CLI to CFG and resolve interpolated vars
CFG <- cfg.parse(CFG.FILE, CLI, T)
if (! is.null(CFG$error) ) {
io.error(FALSE, "Invalid CONFIGURATION: '%s'!\n DETAILS: %s",
CFG.FILE, CFG$error)
}
# FT.SET addresses [PROCESS]
#. -> FEATURE.SET = FEATURE.GRAY.MIN'
FT.SET <- cfg.get.parameter(CFG, '%s', '/PROCESS/FEATURE.SET')
# FT.HDR addresses [PROCESS]
# -> FEATURE.HEADER = 'ACTIVE:METHOD:SAMPLES:FILE'
FT.HDR <- cfg.get.parameter(CFG, '%s', '/PROCESS/FEATURE.HDR')
# FT.FMT addesses [PROCESS]
#. -> FEATURE.HEADER = '%s:%d:%d:%s'
FT.FMT <- cfg.get.parameter(CFG, '%s', '/PROCESS/FEATURE.FMT')
...
# FT.TAB addresses row entries [FEATURE.GRAY.MIN]
# NAME ON/OFF METHOD:INT SAMPLES:INT FILE:STRING
FT.TAB <- cfg.get.table(CFG, FT.SET, FT.HDR, FT.FMT)
...
# Function could look like this:
# config - nested list containing the config ...INI
# section - section name addressed by the feature set
# header - header of the expected data frame
# format - format string of the expected layout
cfg.get.table <- function(config, section, header, format) {
sec <- config[[section]]
rows <- names(sec)
thdr <- unlist(strsplit(value,':'))
df <- data.frame()
for(ix in 1:length(rows)) {
id <- rows[ix]
str <- sec[[id]]
# --------- EXPECTED FUNCTION ------------------
scan <- sscanf(format, str)
# ----------------------------------------------
if (! is.null(scan$error) {
stop($scan$error)
}
if (ix == 1) {
df <- data.frame(id=id, unlist(scan$data))
} else {
df <- rbind(df,data.frame(id=id, unlist(scan$data)))
}
}
# Generate a well defined header as defined in the config
names(df) <- c('ID', thdr);
return(df)
}