Resource Reference¶

In this chapter we describe the different plugins bundled with Fastr (e.g. IOPlugins, ExecutionPlugins). The reference is build automatically from code, so after installing a new plugin the documentation has to be rebuild for it to be included in the docs.

CollectorPlugin Reference¶

CollectorPlugins are used for finding and collecting the output data of outputs part of a FastrInterface

scheme	`CollectorPlugin`
JsonCollector	JsonCollector
PathCollector	PathCollector
StdoutCollector	StdoutCollector

JsonCollector¶

The JsonCollector plugin allows a program to print out the result in a pre-defined JSON format. It is then used as values for fastr.

The working is as follows:

The location of the output is taken
If the location is None, go to step 5
The substitutions are performed on the location field (see below)
The location is used as a regular expression and matched to the stdout line by line
The matched string (or entire stdout if location is None) is loaded as a json
The data is parsed by set_result

The structure of the JSON has to follow the a predefined format. For normal Nodes the format is in the form:

[value1, value2, value3]

where the multiple values represent the cardinality.

For a FlowNodes the format is the form:

{
  'sample_id1': [value1, value2, value3],
  'sample_id2': [value4, value5, value6]
}

This allows the tool to create multiple output samples in a single run.

PathCollector¶

The PathCollector plugin for the FastrInterface. This plugin uses the location fields to find data on the filesystem. To use this plugin the method of the output has to be set to path

The general working is as follows:

The location field is taken from the output
The substitutions are performed on the location field (see below)
The updated location field will be used as a regular expression filter
The filesystem is scanned for all matching files/directory

The special substitutions performed on the location use the Format Specification Mini-Language Format Specification Mini-Language. The predefined fields that can be used are:

inputs, an objet with the input values (use like {inputs.image[0]})
outputs, an object with the output values (use like {outputs.result[0]})
special which has two subfields:
- special.cardinality, the index of the current cardinality
- special.extension, is the extension for the output DataType

Example use:

<output ... method="path" location="{output.directory[0]}/TransformParameters.{special.cardinality}.{special.extension}"/>

Given the output directory ./nodeid/sampleid/result, the second sample in the output and filetype with a txt extension, this would be translated into:

<output ... method="path" location="./nodeid/sampleid/result/TransformParameters.1.txt>

StdoutCollector¶

The StdoutCollector can collect data from the stdout stream of a program. It filters the stdout line by line matching a predefined regular expression.

The general working is as follows:

The location field is taken from the output
The substitutions are performed on the location field (see below)
The updated location field will be used as a regular expression filter
The stdout is scanned line by line and the regular expression filter is applied

The special substitutions performed on the location use the Format Specification Mini-Language Format Specification Mini-Language. The predefined fields that can be used are:

inputs, an objet with the input values (use like {inputs.image[0]})
outputs, an object with the output values (use like {outputs.result[0]})
special which has two subfields:
- special.cardinality, the index of the current cardinality
- special.extension, is the extension for the output DataType

Note

because the plugin scans line by line, it is impossible to catch multi-line output into a single value

ExecutionPlugin Reference¶

This class is the base for all Plugins to execute jobs somewhere. There are many methods already in place for taking care of stuff. Most plugins should only need to redefine a few abstract methods:

__init__ the constructor
cleanup a clean up function that frees resources, closes connections, etc
_queue_job the method that queues the job for execution
_cancel_job cancels a previously queued job
_release_job releases a job that is currently held
_job_finished extra callback for when a job finishes

Not all of the functions need to actually do anything for a plugin. There are examples of plugins that do not really need a cleanup, but for safety you need to implement it. Just using a pass for the method could be fine in such a case.

Warning

When overwriting other function, extreme care must be taken not to break the plugins working.

scheme	`ExecutionPlugin`
BlockingExecution	BlockingExecution
DRMAAExecution	DRMAAExecution
LinearExecution	LinearExecution
ProcessPoolExecution	ProcessPoolExecution
RQExecution	RQExecution

BlockingExecution¶

The blocking execution plugin is a special plugin which is meant for debug purposes. It will not queue jobs but immediately execute them inline, effectively blocking fastr until the Job is finished. It is the simplest execution plugin and can be used as a template for new plugins or for testing purposes.

DRMAAExecution¶

A DRMAA execution plugin to execute Jobs on a Grid Engine cluster. It uses a configuration option for selecting the queue to submit to. It uses the python drmaa package.

Note

To use this plugin, make sure the drmaa package is installed and that the execution is started on an SGE submit host with DRMAA libraries installed.

Note

This plugin is at the moment tailored to SGE, but it should be fairly easy to make different subclasses for different DRMAA supporting systems.

Configuration fields

name	type	description	default
drmaa_queue	str	The default queue to use for jobs send to the scheduler	‘week’

LinearExecution¶

An execution engine that has a background thread that executes the jobs in order. The queue is a simple FIFO queue and there is one worker thread that operates in the background. This plugin is meant as a fallback when other plugins do not function properly. It does not multi-processing so it is safe to use in environments that do no support that.

ProcessPoolExecution¶

A local execution plugin that uses multiprocessing to create a pool of worker processes. This allows fastr to execute jobs in parallel with true concurrency. The number of workers can be specified in the fastr configuration, but the default amount is the number of cores - 1 with a minimum of 1.

Warning

The ProcessPoolExecution does not check memory requirements of jobs and running many workers might lead to memory starvation and thus an unresponsive system.

Configuration fields

name	type	description	default
process_pool_worker_number	int	Number of workers to use in a process pool	3

RQExecution¶

A execution plugin based on Redis Queue. Fastr will submit jobs to the redis queue and workers will peel the jobs from the queue and process them.

This system requires a running redis database and the database url has to be set in the fastr configuration.

Note

This execution plugin required the redis and rq packages to be installed before it can be loaded properly.

Configuration fields

name	type	description	default
rq_queue	str	The redis queue to use	‘default’
rq_host	str	The url of the redis serving the redis queue	‘redis://localhost:6379/0’

FlowPlugin Reference¶

Plugin that can manage an advanced data flow. The plugins override the execution of node. The execution receives all data of a node in one go, so not split per sample combination, but all data on all inputs in one large payload. The flow plugin can then re-order the data and create resulting samples as it sees fits. This can be used for all kinds of specialized data flows, e.g. cross validation.

To create a new FlowPlugin there is only one method that needs to be implemented: execute.

scheme	`FlowPlugin`
CrossValidation	CrossValidation

CrossValidation¶

Advanced flow plugin that generated a cross-validation data flow. The node need an input with data and an input number of folds. Based on that the outputs test and train will be supplied with a number of data sets.

IOPlugin Reference¶

IOPlugins are used for data import and export for the sources and sinks. The main use of the IOPlugins is during execution (see Execution). The IOPlugins can be accessed via fastr.ioplugins, but generally there should be no need for direct interaction with these objects. The use of is mainly via the URL used to specify source and sink data.

scheme	`IOPlugin`
CommaSeperatedValueFile	CommaSeperatedValueFile
FileSystem	FileSystem
Null	Null
Reference	Reference
VirtualFileSystem	VirtualFileSystem
VirtualFileSystemRegularExpression	VirtualFileSystemRegularExpression
VirtualFileSystemValueList	VirtualFileSystemValueList
XNATStorage	XNATStorage

CommaSeperatedValueFile¶

The CommaSeperatedValueFile an expand-only type of IOPlugin. No URLs can actually be fetched, but it can expand a single URL into a larger amount of URLs.

The csv:// URL is a vfs:// URL with a number of query variables available. The URL mount and path should point to a valid CSV file. The query variable then specify what column(s) of the file should be used.

The following variable can be set in the query:

variable	usage
value	the column containing the value of interest, can be int for index or string for key
id	the column containing the sample id (optional)
header	indicates if the first row is considered the header, can be `true` or `false` (optional)
delimiter	the delimiter used in the csv file (optional)
quote	the quote character used in the csv file (optional)
reformat	a reformatting string so that `value = reformat.format(value)` (used before relative_path)
relative_path	indicates the entries are relative paths (for files), can be `true` or `false` (optional)

The header is by default false if the neither the value and id are set as a string. If either of these are a string, the header is required to define the column names and it automatically is assumed true

The delimiter and quota characters of the file should be detected automatically using the Sniffer, but can be forced by setting them in the URL.

Example of valid csv URLs:

# Use the first column in the file (no header row assumed)
csv://mount/some/dir/file.csv?value=0

# Use the images column in the file (first row is assumed header row)
csv://mount/some/dir/file.csv?value=images

# Use the segmentations column in the file (first row is assumed header row)
# and use the id column as the sample id
csv://mount/some/dir/file.csv?value=segmentations&id=id

# Use the first column as the id and the second column as the value
# and skip the first row (considered the header)
csv://mount/some/dir/file.csv?value=1&id=0&header=true

# Use the first column and force the delimiter to be a comma
csv://mount/some/dir/file.csv?value=0&delimiter=,

FileSystem¶

The FileSystem plugin is create to handle file:// type or URLs. This is generally not a good practice, as this is not portable over between machines. However, for test purposes it might be useful.

The URL scheme is rather simple: file://host/path (see wikipedia for details)

We do not make use of the host part and at the moment only support localhost (just leave the host empty) leading to file:/// URLs.

Warning

This plugin ignores the hostname in the URL and does only accept driver letters on Windows in the form c:/

Null¶

The Null plugin is create to handle null:// type or URLs. These URLs are indicating the sink should not do anything. The data is not written to anywhere. Besides the scheme, the rest of the URL is ignored.

Reference¶

The Reference plugin is create to handle ref:// type or URLs. These URLs are to make the sink just write a simple reference file to the data. The reference file contains the DataType and the value so the result can be reconstructed. It for files just leaves the data on disk by reference. This plugin is not useful for production, but is used for testing purposes.

VirtualFileSystem¶

The virtual file system class. This is an IOPlugin, but also heavily used internally in fastr for working with directories. The VirtualFileSystem uses the vfs:// url scheme.

A typical virtual filesystem url is formatted as vfs://mountpoint/relative/dir/from/mount.ext

Where the mountpoint is defined in the Config file. A list of the currently known mountpoints can be found in the fastr.config object

>>> fastr.config.mounts
{'example_data': '/home/username/fastr-feature-documentation/fastr/fastr/examples/data',
 'home': '/home/username/',
 'tmp': '/home/username/FastrTemp'}

This shows that a url with the mount home such as vfs://home/tempdir/testfile.txt would be translated into /home/username/tempdir/testfile.txt.

There are a few default mount points defined by Fastr (that can be changed via the config file).

mountpoint	default location
home	the users home directory (`expanduser('~/')`)
tmp	the fastr temprorary dir, defaults to `tempfile.gettempdir()`
example_data	the fastr example data directory, defaults `$FASTRDIR/example/data`

VirtualFileSystemRegularExpression¶

The VirtualFileSystemValueList an expand-only type of IOPlugin. No URLs can actually be fetched, but it can expand a single URL into a larger amount of URLs.

A vfsregex:// URL is a vfs URL that can contain regular expressions on every level of the path. The regular expressions follow the re module definitions.

An example of a valid URLs would be:

vfsregex://tmp/network_dir/.*/.*/__fastr_result__.pickle.gz
vfsregex://tmp/network_dir/nodeX/(?P<id>.*)/__fastr_result__.pickle.gz

The first URL would result in all the __fastr_result__.pickle.gz in the working directory of a Network. The second URL would only result in the file for a specific node (nodeX), but by adding the named group id using (?P<id>.*) the sample id of the data is automatically set to that group (see Regular Expression Syntax under the special characters for more info on named groups in regular expression).

Concretely if we would have a directory vfs://mount/somedir containing:

image_1/Image.nii
image_2/image.nii
image_3/anotherimage.nii
image_5/inconsistentnamingftw.nii

we could match these files using vfsregex://mount/somedir/(?P<id>image_\d+)/.*\.nii which would result in the following source data after expanding the URL:

{'image_1': 'vfs://mount/somedir/image_1/Image.nii',
 'image_2': 'vfs://mount/somedir/image_2/image.nii',
 'image_3': 'vfs://mount/somedir/image_3/anotherimage.nii',
 'image_5': 'vfs://mount/somedir/image_5/inconsistentnamingftw.nii'}

Showing the power of this regular expression filtering. Also it shows how the ID group from the URL can be used to have sensible sample ids.

Warning

due to the nature of regexp on multiple levels, this method can be slow when having many matches on the lower level of the path (because the tree of potential matches grows) or when directories that are parts of the path are very large.

VirtualFileSystemValueList¶

The VirtualFileSystemValueList an expand-only type of IOPlugin. No URLs can actually be fetched, but it can expand a single URL into a larger amount of URLs. A vfslist:// URL basically is a url that points to a file using vfs. This file then contains a number lines each containing another URL.

If the contents of a file vfs://mount/some/path/contents would be:

vfs://mount/some/path/file1.txt
vfs://mount/some/path/file2.txt
vfs://mount/some/path/file3.txt
vfs://mount/some/path/file4.txt

Then using the URL vfslist://mount/some/path/contents as source data would result in the four files being pulled.

Note

The URLs in a vfslist file do not have to use the vfs scheme, but can use any scheme known to the Fastr system.

XNATStorage¶

Warning

As this IOPlugin is under development, it has not been thoroughly tested.

The XNATStorage plugin is an IOPlugin that can download data from and upload data to an XNAT server. It uses its own xnat:// URL scheme. This is a scheme specific for this plugin and though it looks somewhat like the XNAT rest interface, a different type or URL.

Data resources can be access directly by a data url:

xnat://xnat.example.com/data/archive/projects/sandbox/subjects/subject001/experiments/experiment001/scans/T1/resources/DICOM
xnat://xnat.example.com/data/archive/projects/sandbox/subjects/subject001/experiments/*_BRAIN/scans/T1/resources/DICOM

In the second URL you can see a wildcard being used. This is possible at long as it resolves to exactly one item.

The id query element will change the field from the default experiment to subject and the label query element sets the use of the label as the fastr id (instead of the XNAT id) to True (the default is False)

To disable https transport and use http instead the query string can be modified to add insecure=true. This will make the plugin send requests over http:

xnat://xnat.example.com/data/archive/projects/sandbox/subjects/subject001/experiments/*_BRAIN/scans/T1/resources/DICOM?insecure=true

For sinks it is import to know where to save the data. Sometimes you want to save data in a new assessor/resource and it needs to be created. To allow the Fastr sink to create an object in XNAT, you have to supply the type as a query parameter:

xnat://xnat.bmia.nl/data/archive/projects/sandbox/subjects/S01/experiments/_BRAIN/assessors/test_assessor/resources/IMAGE/files/image.nii.gz?resource_type=xnat:resourceCatalog&assessor_type=xnat:qcAssessmentData

Valid options are: subject_type, experiment_type, assessor_type, scan_type, and resource_type.

If you want to do a search where multiple resources are returned, it is possible to use a search url:

xnat://xnat.example.com/search?projects=sandbox&subjects=subject[0-9][0-9][0-9]&experiments=*_BRAIN&scans=T1&resources=DICOM

This will return all DICOMs for the T1 scans for experiments that end with _BRAIN that belong to a subjectXXX where XXX is a 3 digit number. By default the ID for the samples will be the experiment XNAT ID (e.g. XNAT_E00123). The wildcards that can be the used are the same UNIX shell-style wildcards as provided by the module fnmatch.

It is possible to change the id to a different fields id or label. Valid fields are project, subject, experiment, scan, and resource:

xnat://xnat.example.com/search?projects=sandbox&subjects=subject[0-9][0-9][0-9]&experiments=*_BRAIN&scans=T1&resources=DICOM&id=subject&label=true

The following variables can be set in the search query:

variable	default	usage
projects	`*`	The project(s) to select, can contain wildcards (see `fnmatch`)
subjects	`*`	The subject(s) to select, can contain wildcards (see `fnmatch`)
experiments	`*`	The experiment(s) to select, can contain wildcards (see `fnmatch`)
scans	`*`	The scan(s) to select, can contain wildcards (see `fnmatch`)
resources	`*`	The resource(s) to select, can contain wildcards (see `fnmatch`)
id	`experiment`	What field to use a the id, can be: project, subject, experiment, scan, or resource
label	`false`	Indicate the XNAT label should be used as fastr id, options `true` or `false`
insecure	`false`	Change the url scheme to be used to http instead of https
regex	`false`	Change search to use regex `re.match()` instead of fnmatch for matching

For storing credentials the .netrc file can be used. This is a common way to store credentials on UNIX systems. It is required that the file is only accessible by the owner only or a NetrcParseError will be raised. A netrc file is really easy to create, as its entries look like:

machine xnat.example.com
        login username
        password secret123

See the netrc module or the GNU inet utils website for more information about the .netrc file.

Note

On windows the location of the netrc file is assumed to be os.path.expanduser('~/_netrc'). The leading underscore is because windows does not like filename starting with a dot.

Note

For scan the label will be the scan type (this is initially the same as the series description, but can be updated manually or the XNAT scan type cleanup).

Warning

labels in XNAT are not guaranteed to be unique, so be careful when using them as the sample ID.

For background on XNAT, see the XNAT API DIRECTORY for the REST API of XNAT.

Interface Reference¶

Abstract base class of all Interfaces. Defines the minimal requirements for all Interface implementations.

scheme	`Interface`
FastrInterface	FastrInterface
FlowInterface	FlowInterface
NipypeInterface	NipypeInterface

FastrInterface¶

The default Interface for fastr. For the command-line Tools as used by fastr.

FlowInterface¶

The Interface use for AdvancedFlowNodes to create the advanced data flows that are not implemented in the fastr. This allows nodes to implement new data flows using the plugin system.

The definition of FlowInterfaces are very similar to the default FastrInterfaces.

Note

A flow interface should be using a specific FlowPlugin

NipypeInterface¶

Experimental interfaces to using nipype interfaces directly in fastr tools, only using a simple reference.

To create a tool using a nipype interface just create an interface with the correct type and set the nipype argument to the correct class. For example in an xml tool this would become:

<interface class="NipypeInterface">
  <nipype_class>nipype.interfaces.elastix.Registration</nipype_class>
</interface>

Note

To use these interfaces nipype should be installed on the system.

Warning

This interface plugin is basically functional, but highly experimental!