Quick start guide¶
This manual will show users how to install Fastr, configure Fastr, construct and run simple networks, and add tool definitions.
Installation¶
You can install Fastr either using pip, or from the source code.
Installing via pip¶
You can simply install fastr using pip
:
pip install fastr
Note
You might want to consider installing fastr
in a virtualenv
Installing from source code¶
To install from source code, use Mercurial via the command-line:
git clone https://gitlab.com/radiology/infrastructure/fastr.git # for http
git clone git@gitlab.com:radiology/infrastructure/fastr.git # for ssh
If you prefer a GUI you can try TortoiseGIT (Windows, Linux and Mac OS X) or SourceTree (Windows and Mac OS X). The address of the repository is (given for both http and ssh):
https://gitlab.com/radiology/infrastructure/fastr.git
git@gitlab.com:radiology/infrastructure/fastr.git
To install to your current Python environment, run:
cd fastr/
pip install .
This installs the scripts and packages in the default system folders. For
windows this is the python site-packages
directory for the fastr python
library and Scripts
directory for the executable scripts. For Ubuntu this
is in the /usr/local/lib/python3.x/dist-packages/
and /usr/local/bin/
respectively.
Note
If you want to develop fastr, you might want to use pip install -e .
to get an editable install
Note
You might want to consider installing fastr
in a virtualenv
Note
On windows
python
and theScripts
directory are not on the system PATH by default. You can add these by going toSystem -> Advanced Options -> Environment variables
.On mac you need the Xcode Command Line Tools. These can be installed using the command
xcode-select --install
.
Configuration¶
Fastr has defaults for all settings so it can be run out of the box to test the examples. However, when you want to create your own Networks, use your own data, or use your own Tools, it is required to edit your config file.
Fastr will search for a config file named config.py
in the $FASTRHOME
directory
(which defaults to ~/.fastr/
if it is not set). So if $FASTRHOME
is set the ~/.fastr/
will be ignored.
For a sample configuration file and a complete overview of the options in config.py
see
the Config file section.
Creating a simple network¶
If Fastr is properly installed and configured, we can start creating networks. Creating a network is very simple:
>>> import fastr
>>> network = fastr.create_network(id='example', version='1.0')
Now we have an empty network, the next step is to create some nodes and links. Imagine we want to create the following network:
Creating nodes¶
We will create the nodes and add them to the network. This is done via the network create_
methods.
Let’s create two source nodes, one normal node, and one sink:
>>> source1 = network.create_source('Int', id='source1')
>>> sink1 = network.create_sink('Int', id='sink1')
>>> addint = network.create_node('fastr/math/AddInt:1.0', tool_version='1.0', id='addint')
The functions Network.create_source
,
Network.create_sink
and Network.create_node
create the desired node and add it into the Network.
A SourceNode and SinkNode only require the datatype to be specified. A Node requires a
Tool to be instantiated from. The id
option is optional for all four, but makes
it easier to identify the nodes and read the logs. The tool is defined by a namespace,
the id and the version of the command. Many packages have multiple version which are
available. The tool_version
argument reflects the version of the Fastr wrapper which
describes how the command can be called. For reproducibility also these are checked as they
might be updated as well.
There is an easy way to add a constant to an input, by using a shortcut method.
If you assign a list
or tuple
to an item in the input list, it
will automatically create a ConstantNode and a Link
between the ContantNode and the given Input:
>>> [1, 3, 3, 7] >> addint.inputs['right_hand']
Link link_0 (network: example):
fastr:///networks/example/1.0/nodelist/const__addint__right_hand/outputs/output ==> fastr:///networks/example/1.0/nodelist/addint/inputs/right_hand/0
The created constant would have the id const_addint__right_hand_0
as it
automatically names the new constant const_$nodeid__$inputid_$number
.
Note
The use of the >>
, <<
, and =
operators for linking is discussed
bellow in section Creating links.
In an interactive python session we can simply look at the basic layout of
the node using the repr
function. Just type the name of the variable holding
the node and it will print a human readable representation:
>>> source1
SourceNode source1 (tool: Source:1.0 v1.0)
Inputs | Outputs
-------------------------------------------
| output (Int)
>>> addint
Node addint (tool: AddInt:1.0 v1.0)
Inputs | Outputs
---------------------------------------------
left_hand (Int) | result (Int)
right_hand (Int) |
This tool has inputs of type Int, so the sources and sinks need to have a matching datatype.
The tools and datatypes available are stored in fastr.tools
and
fastr.types
. These variables are created when fastr
is
imported for the first time. They contain all the datatype and tools specified
by the yaml, json or xml files in the search paths. To get an overview of the tools
and datatypes loaded by fastr:
>>> fastr.tools
ToolManager
...
fastr/math/Add:1.0 1.0 : ...fastr...resources...tools...fastr...math...1.0...add.yaml
fastr/math/AddInt:1.0 1.0 : ...fastr...resources...tools...fastr...math...1.0...addint.yaml
...
>>> fastr.types
DataTypeManager
...
Directory : <URLType: Directory>
...
Float : <ValueType: Float>
...
Int : <ValueType: Int>
...
String : <ValueType: String>
...
The fastr.tools
variable contains all tools that Fastr could find during
initalization. Tools can be chosen in two tways:
tools[id]
which returns the newest version of the tool
tools[id, version]
which returns the specified version of the tool
Creating links¶
So now we have a network with 4 nodes defined, however there is no relation between the nodes yet. For this we have to create some links.
>>> link1 = source1.output >> addint.inputs['left_hand']
>>> link2 = sink1.inputs['input'] << addint.outputs['result']
This asks the network to create links and immediately store them inside the network. A link always points from an Output to an Input (note that SubOutput or SubInputs are also valid). A SourceNode has only 1 output which is fixed, so it is easy to find. However, addImage has two inputs and one output, this requires us to specify which output we need. A normal node has a mapping with Inputs and one with Outputs. They can be indexed with the appropriate id’s. The function returns the links, but you only need that if you are planning to change the properties of a link.
The operators with >>
and <<
clearly indicate the direction of the desired data flow.
Also they return the created link, which is easy if you want to change the flow in a link later on.
The last short hand uses the assignment, but it cannot return the created link and changing the link
later on is more difficult.
Create an image of the Network¶
For checking your Network it is very useful to have a graphical representation
of the network. This can be achieved using the Network.draw
method.
>>> network.draw()
'example.svg'
This will create a figure in the path returned by the function that looks like:
Note
for this to work you need to have graphviz installed
Running a Network¶
Running a network locally is almost as simple as calling the Network.execute
method:
>>> source_data = {'source1': {'s1': 4, 's2': 5, 's3': 6, 's4': 7}}
>>> sink_data = {'sink1': 'vfs://tmp/fastr_result_{sample_id}.txt'}
>>> run = network.execute(source_data, sink_data)
# Lots output will appear on the stdout while running
# Show if the run was successful or if errors were encountered
>>> run.result
True
As you can see the execute method needs data for the sources and sinks. This
has to be supplied in two dict
that have keys matching every
source/sink id
in the network. Not supplying data for every source and
sink will result in an error, although it is possible to pass an empty
list
to a source.
Note
The values of the source data have to be simple values or urls
and values of the sink data have to be url templates. To see
what url schemes are available and how they work see
IOPlugin Reference. For the sink url
templates see SinkeNode.set_data
For source nodes you can supply a list
or a dict
with values.
If you supply a dict
the keys will be interpreted as sample ids and
the values as the corresponding values. If you supply a list
, keys
will be generated in the form of id_{N}
where N will be index of the value
in the list.
Warning
As a dict
does not have a fixed order, when a
dict
is supplied the samples are ordered by key to get
a fixed order! For a list
the original order is retained.
For the sink data, an url template has to be supplied that governs how the data
is stored. The mini-lanuage (the replacement fields) are described in the
SinkNode.set_data
method.
To rerun a stopped/crashed pipeline check the user manual on Continuing a Network