Development and Design Documentation¶
In this chapter we will discuss the design of Fastr in more detail. We give pointers for development and add the design documents as we currently envision Fastr. This is both for people who are interested in the Fastr develop and for current developers to have an archive of the design decision agreed upon.
Sample flow in Fastr¶
The current Sample flow is the following:
The idea is that we make a common interface for all classes that are related to the flow of Samples. For this we propose the following mixin classes that provide the interface and allow for better code sharing. The basic structure of the classes is given in the following diagram:
The abstract and mixin methods are as follows:
ABC | Inherits from | Abstract Methods | Mixin methods |
---|---|---|---|
HasDimensions |
dimensions |
size dimnames |
|
HasSamples |
HasDimensions |
__getitem__ |
__contains__ __iter__ iteritems items indexes ids |
ContainsSamples |
HasSamples |
samples |
__getitem__ __setitem__ dimensions |
ForwardsSamples |
HasSamples |
source index_to_target index_to_source combine_samples combine_dimensions |
__getitem__ dimensions |
Note
Though the flow is currently working like this, the mixins are not yet created.
Network Execution¶
The network execution should contain a number of steps:
Network
- Creates a
NetworkRun
based on the current layout
- Creates a
NetworkRun
- Transform the
Network
(possibly joining Nodes of certain interface into a combined NodeRun etc) - Start generation of the Job Direct Acyclic Graph (DAG)
- Transform the
SchedulingPlugin
- Prioritize Jobs based on some predefined rules
- Combine certain
Jobs
to improve efficiency (e.g. minimize i/o on a grid)
ExecutionPlugin
- Run a (list of)
Jobs
. If there is more than one jobs, run them sequentially on same execution host using a local temp for intermediate files. - On finished callback: Updated DAG with newly ready jobs, or remove cancelled jobs
- Run a (list of)
This could be visualized as the following loop:
The callback of the ExecutionPlugin
to the NetworkRun
would trigger
the execution of the relevant NodeRuns
and the addition of more Jobs
to the JobDAG
.
Note
The Job DAG should be thread-safe as it could be both read and extended at the same time.
Note
If a list of jobs is send to the ExecutionPlugin
to be run as
on Job on an external execution platform, the resources should be
combined as follows: memory=max, cores=max, runtime=sum
Note
If there are execution hosts that have mutliple cores the
ExecutionPlugin
should manage this (for example by using pilot
jobs). The SchedulingPlugin
creates units that should be run
sequentially on the resources noted and will not attempt
parallelization
A NetworkRun
would be contain similar information as the Network
but
not have functionality for editting/changing it. It would contain the
functionality to execute the Network and track the status and samples. This
would allow Network.execute
to create multiple concurent runs that operate
indepent of each other. Also editting a Network
after the run started would
have no effect on that run.
Note
This is a plan, not yet implemented
Note
For this to work, it would be important for a Jobs to have forward and backward dependency links.
SchedulingPlugins¶
The idea of the plugin is that it would give a priority on Jobs created by a
Network
. This could be done based on different strategies:
- Based on (sorted) sample id’s, so that one sample is always prioritized over others. The idea is that samples are process as much as possible in order, finishing the first sample first. Only processing other samples if there is left-over capacity.
- Based on distance to a (particular)
Sink
. This is to generate specific results as quick as possible. It would not focus on specific samples, but give priority to whatever sample is closest to being finished. - Based on the distance to from a
Souce
. Based on the sign of the weight it would either keep all samples on the same stage as much as possible, only progressing to a newNodeRun
when all samples are done with the previousNodeRun
, or it would push samples with accelerated rates.
Additionally it will group Jobs
to be executed on a single host. This could
reduce i/o and limited the number of jobs an external scheduler has to track.
Note
The interface for such a plugin has not yet been established.
Secrets¶
“Something that is kept or meant to be kept unknown or unseen by others.”
Using secrets¶
Fastr IOPlugins that need authentication data should use the Fastr SecretService for retrieving such data. The SecretService can be used as follows.
from fastr.utils.secrets import SecretService
from fastr.utils.secrets.exceptions import CouldNotRetrieveCredentials
secret_service = SecretService()
try:
password = secret_service.find_password_for_user('testserver.lan:9000', 'john-doe')
except CouldNotRetrieveCredentials:
# the password was not found
pass
Implementing a SecretProvider¶
A SecretProvider is implemented as follows:
- Create a file in fastr/utils/secrets/providers/<yourprovidername>.py
- Use the template below to write your SecretProvider
- Add the secret provider to fastr/utils/secrets/providers/__init__.py
- Add the secret provider to fastr/utils/secrets/secretservice.py: import it and add it to the array in function _init_providers
from fastr.utils.secrets.secretprovider import SecretProvider
from fastr.utils.secrets.exceptions import CouldNotRetrieveCredentials, CouldNotSetCredentials, CouldNotDeleteCredentials, NotImplemented
try:
# this is where libraries can be imported
# we don't want fastr to crash if a specific
# library is unavailable
# import my-libary
except (ImportError, ValueError) as e:
pass
class KeyringProvider(SecretProvider):
def __init__(self):
# if libraries are imported in the code above
# we need to check if import was succesfull
# if it was not, raise a RuntimeError
# so that FASTR ignores this SecretProvider
# if 'my-library' not in globals():
# raise RuntimeError("my-library module required")
pass
def get_password_for_user(self, machine, username):
# This function should return the password as a string
# or raise a CouldNotRetrieveCredentials error if the password
# is not found.
# In the event that this function is unsupported a
# NotImplemented exception should be thrown
raise NotImplemented()
def set_password_for_user(self, machine, username, password):
# This function should set the password for a specified
# machine + user. If anything goes wrong while setting
# the password a CouldNotSetCredentials error should be raised.
# In the event that this function is unsupported a
# NotImplemented exception should be thrown
raise NotImplemented()
def del_password_for_user(self, machine, username):
# This function should delete the password for a specified
# machine + user. If anything goes wrong while setting
# the password a CouldNotDeleteCredentials error should be raised.
# In the event that this function is unsupported a
# NotImplemented exception should be thrown
raise NotImplemented()