Releases: martian-lang/martian
Release v4.0.4
pug
is no longer used to build graph.html for the user interface.- The minified graph.html for the user interface is now deployed in the
serve
directory with everything else, rather than intemplates
. - Memory request information is now reported in the
_perf
file,
alongside actual memory usage reporting. - The scripts used to check whether a job is still running on SGE or
Slurm clusters are now compatible with python 3. - The UUID set for the pipestance is now exposed to stage code via the
MRO_UUID
environment variable. Themartian
helper module for
python stage code exposes this through theget_pipestance_uuid()
convenience method. - Added type hint comments for many of the functions in the
martian
helper
module for python stage code. In particular, addNoReturn
annotations
to themartian.throw()
andmartian.exit()
functions, to get better
results from linting tools on code using those methods. - Various minor dependency updates.
Martian 4.0.3
This is a minor bugfix release.
- Fix some typos in log messages.
- Updates to the vscode extension for mro files. The extension is now published to the vscode extension marketplace for easy installation.
- Clean up logging around resource exhaustion.
- Bump several dependency versions.
- Fix the symlinks for
mrc
,mrf
, etc. in the release tarball. - Add new API methods in the
github.com/martian-lang/martian/martian/syntax/ast_builder
package for buildingsyntax.Ast
and related objects from go structures, via reflection.
Release v4.0.2
This is a bugfix release.
- Fix a bunch more bugs around map call.
- Fix some cases where map call is used inside a pipeline which was
also map-called. - Fix restarting of pipelines which use map call over collections
where the size is not known until runtime. - Fix map call over a collection which is the output of a stage which
might be disabled.
- Fix some cases where map call is used inside a pipeline which was
- Fix a couple of bugs related to filtering struct inputs or outputs.
- Improvements to the vscode extension.
- Fix a few syntax highlighting glitches.
- Add support for calling mrf as a formatter.
Release v4.0.1
Compatibility fixes
Paths with "unusual" characters
Paths and environment variables expanded in cluster-mode templates are
now quoted where appropriate, to avoid issues if paths contain
whitespace or characters like ; or &.
When searching for stage metadata files, martian will no longer use
filepath.Glob(), which can cause problems if the root pipestance
directory contains characters which are interpreted as glob patterns,
such as * or ?.
Python 3.6
In Python 2, strings are bytes, whereas in python 3 they're unicode.
Python 3.7 will use the system encoding (usually utf-8) for conversions
by default, but 3.6 seems to default to ASCII, which can cause problems.
Also, reformat all python files with black.
slurm
Add a queue-check script for slurm, so that mrp can detect when jobs
fail in the queue.
Behavior changes
mro check
mro check
(formerly mrc
) will now always attempt to construct a call
graph when checking files. This can find errors which the
pipeline-by-pipeline checker might miss, for example passing null to a
pipeline input that is used to disable a call later on, or passing arrays of
differing lengths where further down the stack something will map call
over both of them.
mrp
On linux systems, mrp
will now log the kernel and libc versions on the
host system. This can be very helpful later on when debugging stage
code failures.
Bug fixes
Error reporting
Fix several cases where martian would crash instead of reporting a
useful error message.
Several error messages now provide more helpful context.
map call
Several fixes were made for map call
.
Fixed many bugs for cases where map call
is used inside a pipeline
that is itself mapped.
The outputs of mapped pipelines, where the map was over an array or
map whose length or keys were known at mro compile time should now be
correct. Previously, there were cases where one might end up with a
struct of arrays instead of an array of structs as intended.
The invocation files for pipelines in the above cases will now use
split
appropriately rather than erroneous values.
Martian 4.0.0
Martian 4 is a major version update, meaning it may contain breaking
changes as well as new features.
The main features for this release are an overhauled type system with
support for typed maps and structs, and a map call
construct, which
replaces and extends the (now removed) sweep
functionality.
New types
Typed maps
Stage and pipeline input and output parameters may now be declared with
a type like map<int>
. These are dictionaries with string keys and
values of the given type. If a top-level pipeline has an output that is
a map over file types, for example map<csv>
, then the resulting output
directory will contain a subdirectory with the name of that parameter,
within which there will be files named like <map key>.csv
. In order
to encourage clarity in pipeline definitions, directly nesting typed maps
(e.g. map<map>
or map<map<int>>
) is not permitted, however one can have
a map of structs (see below) which may contain more typed maps.
Struct types
It is now possible to declare struct data types as well. These look
just like stage or pipeline definitions, except they have no in
or
out
specifiers on the parameter names (and of course have no calls or
src
parameter). Similar to typed maps, structs which contain file
types also get a directory in the top-level output. They can of course
be nested, and there may be typed maps of structs. This, at long last,
allows pipelines to organize the files in their output directories into
subdirectories.
With the addition of structs, the previous behavior where passing
foo = STAGE_NAME,
was equivalent to
foo = STAGE_NAME.default
is no longer assumed. Instead of an implicit reference to the "default"
output, the bound reference is now to a struct containing all of the
stage's outputs. In order to support this, any stage or pipeline name
now implicitly defines a struct with the same members as the output
parameters of the stage or pipeline.
Structs are decomposable, so STAGE_NAME.foo.bar.baz
is legal if foo is
a struct with a member bar, which is also a struct with a member baz.
In addition, the .
operator allows "projecting" through typed maps and arrays.
That is, if we have
struct Bar (
int baz,
)
struct Foo (
Bar[] bar,
)
stage STAGE(
out map<Foo> foo,
src comp "stagecode",
)
then STAGE.foo.bar.baz
would have a type of map<int[]>
. This
becomes especially useful when working with the next feature, map calls.
An input parameter that takes a [map or array of] struct A can accept
a [map or array of] another struct B so long as all of the members of A
are present on B and have the same types. If B has members which A does
not have, they are filtered out when generating the arguments which are
passed to stage code. This allows stages to add additional output
fields without breaking downstream users.
Map calls
It is now possible to call a stage or pipeline once for each value in an
array or typed map. That is,
map call ANALYZE(
sample = split self.samples,
params = self.params,
)
In a map call
, at least one parameter's value must be preceded by the
split
keyword. If more than one parameter is split, all such
parameters must either be arrays with the same length, maps with the
same set of keys, or null. In this example, ANALYZE
is called once
for every value in samples
. If ANALYZE
is a pipeline, and some of
the stages within it don't depend on sample
(or on other stages which
do), the work for those stages gets shared between each call.
If samples
was an array, then the result of this call is an array with
the same length. If it was a map, the result is a map with the same keys.
This allows for reducing the data, e.g.
call META_ANALYSIS(
analyses = ANALYZE,
)
Tools
- The
mrc
andmrf
commands have been merged into a single
mro
command. Symlink aliases formrc
andmrf
still work.mro check
works just likemrc
.mro format
works just likemrf
.mro graph
has options for querying the call graph of a pipeline,
including outputting the entire graph to json or graphviz dot
format, querying the source of an input to a call, or tracing the
stages which depend on the output of a call.mro edit
has various refactoring tools for renaming stages, inputs,
and outputs, as well as finding and eliminating unused outputs or calls.
- The
mrg
command accepts a--reverse
option which causes it to
generate an invocation.json file from a given mro file. - When a syntax error is encountered in
mrc
, the expected token
is now provided.
Runtime changes
- It is now an error for a call to be disabled based on a null value.
Previously, null was treated as equivalent tofalse
, which was not
always the author's intent. - Thread reservations may now be in terms of 100ths of a core. This
is intended for use in stages which for example are mostly blocking
waiting for external inputs, or perhaps downloading files. - Memory reservations may now be non-integral numbers of GB.
They are tracked at MiB granularity. - Pre-populated
_outs
files no longer contain strings for keys which
are opposed to be arrays of file types.
Other changes
- The mro parser is now significantly faster and uses less memory.
mrjob
andmro
can now be compiled and run on darwin OS.- The build now relies entirely on go modules, rather than submodules.
- coffeescript is no longer involved in the build for the web front-end.
- Remove vendored web dependencies. Rely on npm.
- The journal files used for coordination between
mrp
andmrjob
no longer include the sample ID. Long sample IDs could cause
the journal file name to exceed the filesystem's file name length
limits. - The repository now includes bazel rule definitions for
mro_library
,mro_test
, andmrf_test
, among others. See
the documentation in tools/docs/mro_rules.md.
v4.0.0 preview 1
Martian 4 is a major version update, meaning it may contain breaking
changes as well as new features.
The main features for this release are an overhauled type system with
support for typed maps and structs, and a map call
construct, which
replaces and extends the (now removed) sweep
functionality.
Stage and pipeline input and output parameters may now be declared with
a type like map<int>
. These are dictionaries with string keys and
values of the given type. If a top-level pipeline has an output that is
a map over file types, for example map<csv>
, then the resulting output
directory will contain a subdirectory with the name of that parameter,
within which there will be files named like <map key>.csv
. In order
to encourage clarity in pipeline definitions, directly nesting typed maps
(e.g. map<map>
or map<map<int>>
) is not permitted, however one can have
a map of structs (see below) which may contain more typed maps.
It is now possible to declare struct data types as well. These look
just like stage or pipeline definitions, except they have no in
or
out
specifiers on the parameter names (and of course have no calls or
src
parameter). Similar to typed maps, structs which contain file
types also get a directory in the top-level output. They can of course
be nested, and there may be typed maps of structs. This, at long last,
allows pipelines to organize the files in their output directories into
subdirectories.
With the addition of structs, the previous behavior where passing
foo = STAGE_NAME,
was equivilent to
foo = STAGE_NAME.default
is no longer assumed. Instead of an implicit reference to the "default"
output, the bound reference is now to a struct containing all of the
stage's outputs. In order to support this, any stage or pipeline name
now implicitly defines a struct with the same members as the output
parameters of the stage or pipeline.
Structs are decomposable, so STAGE_NAME.foo.bar.baz
is legal if foo is
a struct with a member bar, which is also a struct with a member baz.
In addition, the .
operator allows "projecting" through typed maps and arrays.
That is, if we have
struct Bar (
int baz,
)
struct Foo (
Bar[] bar,
)
stage STAGE(
out map<Foo> foo,
src comp "stagecode",
)
then STAGE.foo.bar.baz
would have a type of map<int[]>
. This
becomes especially useful when working with the next feature, map calls.
An input parameter that takes a [map or array of] struct A can accept
a [map or array of] another struct B so long as all of the members of A
are present on B and have the same types. If B has members which A does
not have, they are filtered out when generating the arguments which are
passed to stage code. This allows stages to add additional output
fields without breaking downstream users.
It is now possible to call a stage or pipeline once for each value in an
array or typed map. That is,
map call ANALYZE(
sample = split self.samples,
params = self.params,
)
In a map call
, at least one parameter's value must be preceded by the
split
keyword. If more than one parameter is split, all such
parameters must either be arrays with the same length, maps with the
same set of keys, or null. In this example, ANALYZE
is called once
for every value in samples
. If ANALYZE
is a pipeline, and some of
the stages within it don't depend on sample
(or on other stages which
do), the work for those stages gets shared between each call.
If samples
was an array, then the result of this call is an array with
the same length. If it was a map, the result is a map with the same keys.
This allows for reducing the data, e.g.
call META_ANALYSIS(
analyses = ANALYZE,
)
-
Previously, if a stage had an output that was an array of files,
eitherfile[]
or a user-defined type likejson[]
, the_outs
file
for the stage code was pre-populated with a string file name. A
string is, of course, not an array, so this was incorrect behavior.
The field will now be pre-populated with an empty array. -
Values which are bound to
disabled
modifiers for calls are no longer
permitted to havenull
values. Previously,null
was treated as
equivalent tofalse
. It is now an error. -
The mro parser is now significantly faster and uses less memory.
-
The
mrc
andmrf
commands have been combined into a newmro
tool
with subcommandscheck
andformat
corresponding to the
functionality of the old tools, e.g.mro check
is equivalent to
mrc
. -
The newly combined
mro
tool has two additional subcommands with new
functionality.mro graph
has options for displaying or querying the call graph
for a pipeline.mro edit
has options for refactoring pipeline code, for example
removing or renaming inputs or outputs, and finding and eliminating
dead code.
-
The
mrg
command accepts a--reverse
option which causes it to
generate aninvocation.json
file from a given.mro
file. -
mrjob
can now be compiled and run on darwin OS. -
Some packages were refactored, resulting in reduced sizes for the
compiled binaries. -
Stage thread and memory requests may now be floating point numbers.
Values will be rounded up to the nearest 1% of 1 thread or 1MB.
The use case for fractional thread reservations would be for example a
stage which might be waiting for an external signal, e.g. a file
coming into existence, or for example a stage which has a
CPU-intensive task on one thread and a secondary thread that is
compressing the output stream but is mostly blocked waiting for the
first thread. In cluster mode, reservations are still rounded up to
the nearest integer. -
A new option
--vdrmode=strict
turns onvolatile = strict
for all
stages by default, unless they have explicitly opted out of it. This
is expected to be turned on by default in a future version. -
mrp
now accepts flags to add an https certificate and private key
file for serving its user interface. Key management is the user's
responsibility. -
The
bazel
build is no longer experimental.
Bazel rules for working with pipelines are provided in thetools
subdirectory. -
The syntax highlighting grammar for sublime and atom editors has been
significantly updated, and a Visual Studio Code language definition
is also provided.
Release v3.2.5
This is expected to be the final release before the 4.0 series.
Behavior changes:
mrp
/mrc
will now return errors if an@include
directive could
refer to more than one possible file, rather than taking the first
one it finds (which can be hard to debug).- The json structure accepted by
mrg
now accepts anmro_file
key to
specify the file to@include
for the call, rather than forcing it to
do an exhaustive search of all mro files in theMROPATH
. - Include paths in
_invocation
files are now relative to the
MROPATH
. _invocation
files are now produced by the same formatting code used
inmrf
. Previously, parameters were formatted as if they were json,
which was compatible with mro, butmrf
formats a few things slightly
differently.- Remove
mrt
, which has been broken for quite some time. mrc
andmrp
now both use the same logic for locating stage code.mrjob
will now shut down (killing the stage code) if it detects its
log file has been deleted (which usually means the upstreammrp
thinks the job is already dead, or that the entire pipestance was
deleted).
Enhancements:
- Update the syntax highlighting grammar for sublimetext, and share it
with atom and (new) vscode syntax highlighting configs. - Improve test coverage, silence some expected warning messages from
tests, and fix many golint errors (though by no means all of them). - Expose methods in the syntax package for parsing files without
checking them for consistency and/or processing includes. - Add
Unwrap
methods to several error types, for forward compatibility
with Go 1.13. The minimum version required to build martian remains
Go 1.11 for now, and the release is built with go 1.12. - Add methods to the syntax package to parse expressions independently
of a full mro file.
Bug fixes:
- Fix a bug with VDR which was preventing some files generated by
strict-mode stages from being deleted when appropriate. - Fix a crash when parsing 2-byte unicode characters whose utf-8
representations require 3 bytes.
v3.2.5-pre1
Release candidate 3.2.5
v3.2.4 release
* Expanded the API for the `syntax` package to be allow parsing without compiling, and to allow parsing of value expressions from strings. * Fixed an issue where the wrong python module could, under some circumstances, be imported by the python adapter in place of the correct stage code. * The `_invocation` files for each stage and subpipeline, as well as the mro generated by `mrg`, is now generated by the same formatter code used by `mrf`. * Names which are not legal POSIX filenames are no longer allowed as explicit output names on output parameters for stages or pipelines. In addition, `mrc` will reject names which are illegal file names on Microsoft Windows filesystems. * Minor build system improvements.
v3.2.3 Release
- Regardless of
--jobinterval
setting,mrp
will now never have more than one
queue command pending at a time, to avoid overloading the job server or
equivalent. mrp
will now shut down if the pipestance log file has been deleted, even if
a new one has been created in its place. This prevents bad behavior in the
somewhat common case where the pipestance directory (including the log) and
lock files) has been deleted.- The script to check the status of the job queue on SGE has been modified to
be faster and use less ram. - Fixed a bug in handling of the retain parameter for pipelines.
- The mount options are now reported for the mounts on which the pipestance and
binary directories are located. - Memory cgroups limits are now detected, reported, and used as default limits
where applicable. This should be especially helpful for users submittingmrp
to a cluster such as SLURM which uses memory cgroups to prevent jobs from
using too much memory, by preventingmrp
from trying to use more than the
job's allowance. - Martian can now (experimentally) be built experimentally with bazel.