Skip to content

Releases: martian-lang/martian

Release v4.0.4

08 Apr 17:59
1b3a60f
Compare
Choose a tag to compare
  • pug is no longer used to build graph.html for the user interface.
  • The minified graph.html for the user interface is now deployed in the
    serve directory with everything else, rather than in templates.
  • Memory request information is now reported in the _perf file,
    alongside actual memory usage reporting.
  • The scripts used to check whether a job is still running on SGE or
    Slurm clusters are now compatible with python 3.
  • The UUID set for the pipestance is now exposed to stage code via the
    MRO_UUID environment variable. The martian helper module for
    python stage code exposes this through the get_pipestance_uuid()
    convenience method.
  • Added type hint comments for many of the functions in the martian helper
    module for python stage code. In particular, add NoReturn annotations
    to the martian.throw() and martian.exit() functions, to get better
    results from linting tools on code using those methods.
  • Various minor dependency updates.

Martian 4.0.3

20 Feb 05:33
Compare
Choose a tag to compare

This is a minor bugfix release.

  • Fix some typos in log messages.
  • Updates to the vscode extension for mro files. The extension is now published to the vscode extension marketplace for easy installation.
  • Clean up logging around resource exhaustion.
  • Bump several dependency versions.
  • Fix the symlinks for mrc, mrf, etc. in the release tarball.
  • Add new API methods in the github.com/martian-lang/martian/martian/syntax/ast_builder package for building syntax.Ast and related objects from go structures, via reflection.

Release v4.0.2

19 Oct 22:01
3b31faf
Compare
Choose a tag to compare

This is a bugfix release.

  • Fix a bunch more bugs around map call.
    • Fix some cases where map call is used inside a pipeline which was
      also map-called.
    • Fix restarting of pipelines which use map call over collections
      where the size is not known until runtime.
    • Fix map call over a collection which is the output of a stage which
      might be disabled.
  • Fix a couple of bugs related to filtering struct inputs or outputs.
  • Improvements to the vscode extension.
    • Fix a few syntax highlighting glitches.
    • Add support for calling mrf as a formatter.

Release v4.0.1

26 Aug 02:45
Compare
Choose a tag to compare

Compatibility fixes

Paths with "unusual" characters

Paths and environment variables expanded in cluster-mode templates are
now quoted where appropriate, to avoid issues if paths contain
whitespace or characters like ; or &.

When searching for stage metadata files, martian will no longer use
filepath.Glob(), which can cause problems if the root pipestance
directory contains characters which are interpreted as glob patterns,
such as * or ?.

Python 3.6

In Python 2, strings are bytes, whereas in python 3 they're unicode.
Python 3.7 will use the system encoding (usually utf-8) for conversions
by default, but 3.6 seems to default to ASCII, which can cause problems.

Also, reformat all python files with black.

slurm

Add a queue-check script for slurm, so that mrp can detect when jobs
fail in the queue.

Behavior changes

mro check

mro check (formerly mrc) will now always attempt to construct a call
graph when checking files. This can find errors which the
pipeline-by-pipeline checker might miss, for example passing null to a
pipeline input that is used to disable a call later on, or passing arrays of
differing lengths where further down the stack something will map call
over both of them.

mrp

On linux systems, mrp will now log the kernel and libc versions on the
host system. This can be very helpful later on when debugging stage
code failures.

Bug fixes

Error reporting

Fix several cases where martian would crash instead of reporting a
useful error message.

Several error messages now provide more helpful context.

map call

Several fixes were made for map call.

Fixed many bugs for cases where map call is used inside a pipeline
that is itself mapped.

The outputs of mapped pipelines, where the map was over an array or
map whose length or keys were known at mro compile time should now be
correct. Previously, there were cases where one might end up with a
struct of arrays instead of an array of structs as intended.

The invocation files for pipelines in the above cases will now use
split appropriately rather than erroneous values.

Martian 4.0.0

17 Jun 22:15
Compare
Choose a tag to compare

Martian 4 is a major version update, meaning it may contain breaking
changes as well as new features.

The main features for this release are an overhauled type system with
support for typed maps and structs, and a map call construct, which
replaces and extends the (now removed) sweep functionality.

New types

Typed maps

Stage and pipeline input and output parameters may now be declared with
a type like map<int>. These are dictionaries with string keys and
values of the given type. If a top-level pipeline has an output that is
a map over file types, for example map<csv>, then the resulting output
directory will contain a subdirectory with the name of that parameter,
within which there will be files named like <map key>.csv. In order
to encourage clarity in pipeline definitions, directly nesting typed maps
(e.g. map<map> or map<map<int>>) is not permitted, however one can have
a map of structs (see below) which may contain more typed maps.

Struct types

It is now possible to declare struct data types as well. These look
just like stage or pipeline definitions, except they have no in or
out specifiers on the parameter names (and of course have no calls or
src parameter). Similar to typed maps, structs which contain file
types also get a directory in the top-level output. They can of course
be nested, and there may be typed maps of structs. This, at long last,
allows pipelines to organize the files in their output directories into
subdirectories.

With the addition of structs, the previous behavior where passing

    foo = STAGE_NAME,

was equivalent to

    foo = STAGE_NAME.default

is no longer assumed. Instead of an implicit reference to the "default"
output, the bound reference is now to a struct containing all of the
stage's outputs. In order to support this, any stage or pipeline name
now implicitly defines a struct with the same members as the output
parameters of the stage or pipeline.

Structs are decomposable, so STAGE_NAME.foo.bar.baz is legal if foo is
a struct with a member bar, which is also a struct with a member baz.
In addition, the . operator allows "projecting" through typed maps and arrays.
That is, if we have

struct Bar (
    int baz,
)

struct Foo (
    Bar[] bar,
)

stage STAGE(
    out map<Foo> foo,
    src comp     "stagecode",
)

then STAGE.foo.bar.baz would have a type of map<int[]>. This
becomes especially useful when working with the next feature, map calls.

An input parameter that takes a [map or array of] struct A can accept
a [map or array of] another struct B so long as all of the members of A
are present on B and have the same types. If B has members which A does
not have, they are filtered out when generating the arguments which are
passed to stage code. This allows stages to add additional output
fields without breaking downstream users.

Map calls

It is now possible to call a stage or pipeline once for each value in an
array or typed map. That is,

map call ANALYZE(
    sample = split self.samples,
    params = self.params,
)

In a map call, at least one parameter's value must be preceded by the
split keyword. If more than one parameter is split, all such
parameters must either be arrays with the same length, maps with the
same set of keys, or null. In this example, ANALYZE is called once
for every value in samples. If ANALYZE is a pipeline, and some of
the stages within it don't depend on sample (or on other stages which
do), the work for those stages gets shared between each call.

If samples was an array, then the result of this call is an array with
the same length. If it was a map, the result is a map with the same keys.
This allows for reducing the data, e.g.

call META_ANALYSIS(
    analyses = ANALYZE,
)

Tools

  • The mrc and mrf commands have been merged into a single
    mro command. Symlink aliases for mrc and mrf still work.
    • mro check works just like mrc.
    • mro format works just like mrf.
    • mro graph has options for querying the call graph of a pipeline,
      including outputting the entire graph to json or graphviz dot
      format, querying the source of an input to a call, or tracing the
      stages which depend on the output of a call.
    • mro edit has various refactoring tools for renaming stages, inputs,
      and outputs, as well as finding and eliminating unused outputs or calls.
  • The mrg command accepts a --reverse option which causes it to
    generate an invocation.json file from a given mro file.
  • When a syntax error is encountered in mrc, the expected token
    is now provided.

Runtime changes

  • It is now an error for a call to be disabled based on a null value.
    Previously, null was treated as equivalent to false, which was not
    always the author's intent.
  • Thread reservations may now be in terms of 100ths of a core. This
    is intended for use in stages which for example are mostly blocking
    waiting for external inputs, or perhaps downloading files.
  • Memory reservations may now be non-integral numbers of GB.
    They are tracked at MiB granularity.
  • Pre-populated _outs files no longer contain strings for keys which
    are opposed to be arrays of file types.

Other changes

  • The mro parser is now significantly faster and uses less memory.
  • mrjob and mro can now be compiled and run on darwin OS.
  • The build now relies entirely on go modules, rather than submodules.
  • coffeescript is no longer involved in the build for the web front-end.
  • Remove vendored web dependencies. Rely on npm.
  • The journal files used for coordination between mrp and mrjob
    no longer include the sample ID. Long sample IDs could cause
    the journal file name to exceed the filesystem's file name length
    limits.
  • The repository now includes bazel rule definitions for
    mro_library, mro_test, and mrf_test, among others. See
    the documentation in tools/docs/mro_rules.md.

v4.0.0 preview 1

30 Apr 20:13
Compare
Choose a tag to compare
v4.0.0 preview 1 Pre-release
Pre-release

Martian 4 is a major version update, meaning it may contain breaking
changes as well as new features.

The main features for this release are an overhauled type system with
support for typed maps and structs, and a map call construct, which
replaces and extends the (now removed) sweep functionality.

Stage and pipeline input and output parameters may now be declared with
a type like map<int>. These are dictionaries with string keys and
values of the given type. If a top-level pipeline has an output that is
a map over file types, for example map<csv>, then the resulting output
directory will contain a subdirectory with the name of that parameter,
within which there will be files named like <map key>.csv. In order
to encourage clarity in pipeline definitions, directly nesting typed maps
(e.g. map<map> or map<map<int>>) is not permitted, however one can have
a map of structs (see below) which may contain more typed maps.

It is now possible to declare struct data types as well. These look
just like stage or pipeline definitions, except they have no in or
out specifiers on the parameter names (and of course have no calls or
src parameter). Similar to typed maps, structs which contain file
types also get a directory in the top-level output. They can of course
be nested, and there may be typed maps of structs. This, at long last,
allows pipelines to organize the files in their output directories into
subdirectories.

With the addition of structs, the previous behavior where passing

    foo = STAGE_NAME,

was equivilent to

    foo = STAGE_NAME.default

is no longer assumed. Instead of an implicit reference to the "default"
output, the bound reference is now to a struct containing all of the
stage's outputs. In order to support this, any stage or pipeline name
now implicitly defines a struct with the same members as the output
parameters of the stage or pipeline.

Structs are decomposable, so STAGE_NAME.foo.bar.baz is legal if foo is
a struct with a member bar, which is also a struct with a member baz.
In addition, the . operator allows "projecting" through typed maps and arrays.
That is, if we have

struct Bar (
    int baz,
)

struct Foo (
    Bar[] bar,
)

stage STAGE(
    out map<Foo> foo,
    src comp     "stagecode",
)

then STAGE.foo.bar.baz would have a type of map<int[]>. This
becomes especially useful when working with the next feature, map calls.

An input parameter that takes a [map or array of] struct A can accept
a [map or array of] another struct B so long as all of the members of A
are present on B and have the same types. If B has members which A does
not have, they are filtered out when generating the arguments which are
passed to stage code. This allows stages to add additional output
fields without breaking downstream users.

It is now possible to call a stage or pipeline once for each value in an
array or typed map. That is,

map call ANALYZE(
    sample = split self.samples,
    params = self.params,
)

In a map call, at least one parameter's value must be preceded by the
split keyword. If more than one parameter is split, all such
parameters must either be arrays with the same length, maps with the
same set of keys, or null. In this example, ANALYZE is called once
for every value in samples. If ANALYZE is a pipeline, and some of
the stages within it don't depend on sample (or on other stages which
do), the work for those stages gets shared between each call.

If samples was an array, then the result of this call is an array with
the same length. If it was a map, the result is a map with the same keys.
This allows for reducing the data, e.g.

call META_ANALYSIS(
    analyses = ANALYZE,
)
  • Previously, if a stage had an output that was an array of files,
    either file[] or a user-defined type like json[], the _outs file
    for the stage code was pre-populated with a string file name. A
    string is, of course, not an array, so this was incorrect behavior.
    The field will now be pre-populated with an empty array.

  • Values which are bound to disabled modifiers for calls are no longer
    permitted to have null values. Previously, null was treated as
    equivalent to false. It is now an error.

  • The mro parser is now significantly faster and uses less memory.

  • The mrc and mrf commands have been combined into a new mro tool
    with subcommands check and format corresponding to the
    functionality of the old tools, e.g. mro check is equivalent to
    mrc.

  • The newly combined mro tool has two additional subcommands with new
    functionality.

    • mro graph has options for displaying or querying the call graph
      for a pipeline.
    • mro edit has options for refactoring pipeline code, for example
      removing or renaming inputs or outputs, and finding and eliminating
      dead code.
  • The mrg command accepts a --reverse option which causes it to
    generate an invocation.json file from a given .mro file.

  • mrjob can now be compiled and run on darwin OS.

  • Some packages were refactored, resulting in reduced sizes for the
    compiled binaries.

  • Stage thread and memory requests may now be floating point numbers.
    Values will be rounded up to the nearest 1% of 1 thread or 1MB.
    The use case for fractional thread reservations would be for example a
    stage which might be waiting for an external signal, e.g. a file
    coming into existence, or for example a stage which has a
    CPU-intensive task on one thread and a secondary thread that is
    compressing the output stream but is mostly blocked waiting for the
    first thread. In cluster mode, reservations are still rounded up to
    the nearest integer.

  • A new option --vdrmode=strict turns on volatile = strict for all
    stages by default, unless they have explicitly opted out of it. This
    is expected to be turned on by default in a future version.

  • mrp now accepts flags to add an https certificate and private key
    file for serving its user interface. Key management is the user's
    responsibility.

  • The bazel build is no longer experimental.
    Bazel rules for working with pipelines are provided in the tools
    subdirectory.

  • The syntax highlighting grammar for sublime and atom editors has been
    significantly updated, and a Visual Studio Code language definition
    is also provided.

Release v3.2.5

24 Sep 21:04
03394a4
Compare
Choose a tag to compare

This is expected to be the final release before the 4.0 series.

Behavior changes:

  • mrp/mrc will now return errors if an @include directive could
    refer to more than one possible file, rather than taking the first
    one it finds (which can be hard to debug).
  • The json structure accepted by mrg now accepts an mro_file key to
    specify the file to @include for the call, rather than forcing it to
    do an exhaustive search of all mro files in the MROPATH.
  • Include paths in _invocation files are now relative to the
    MROPATH.
  • _invocation files are now produced by the same formatting code used
    in mrf. Previously, parameters were formatted as if they were json,
    which was compatible with mro, but mrf formats a few things slightly
    differently.
  • Remove mrt, which has been broken for quite some time.
  • mrc and mrp now both use the same logic for locating stage code.
  • mrjob will now shut down (killing the stage code) if it detects its
    log file has been deleted (which usually means the upstream mrp
    thinks the job is already dead, or that the entire pipestance was
    deleted).

Enhancements:

  • Update the syntax highlighting grammar for sublimetext, and share it
    with atom and (new) vscode syntax highlighting configs.
  • Improve test coverage, silence some expected warning messages from
    tests, and fix many golint errors (though by no means all of them).
  • Expose methods in the syntax package for parsing files without
    checking them for consistency and/or processing includes.
  • Add Unwrap methods to several error types, for forward compatibility
    with Go 1.13. The minimum version required to build martian remains
    Go 1.11 for now, and the release is built with go 1.12.
  • Add methods to the syntax package to parse expressions independently
    of a full mro file.

Bug fixes:

  • Fix a bug with VDR which was preventing some files generated by
    strict-mode stages from being deleted when appropriate.
  • Fix a crash when parsing 2-byte unicode characters whose utf-8
    representations require 3 bytes.

v3.2.5-pre1

24 Sep 19:51
Compare
Choose a tag to compare
Release candidate 3.2.5

v3.2.4 release

15 Jul 20:01
Compare
Choose a tag to compare
* Expanded the API for the `syntax` package to be allow parsing without
  compiling, and to allow parsing of value expressions from strings.
* Fixed an issue where the wrong python module could, under some
  circumstances, be imported by the python adapter in place of the
  correct stage code.
* The `_invocation` files for each stage and subpipeline, as well as the
  mro generated by `mrg`, is now generated by the same formatter code used
  by `mrf`.
* Names which are not legal POSIX filenames are no longer allowed as
  explicit output names on output parameters for stages or pipelines.
  In addition, `mrc` will reject names which are illegal file names on
  Microsoft Windows filesystems.
* Minor build system improvements.

v3.2.3 Release

15 Jul 19:45
Compare
Choose a tag to compare
  • Regardless of --jobinterval setting, mrp will now never have more than one
    queue command pending at a time, to avoid overloading the job server or
    equivalent.
  • mrp will now shut down if the pipestance log file has been deleted, even if
    a new one has been created in its place. This prevents bad behavior in the
    somewhat common case where the pipestance directory (including the log) and
    lock files) has been deleted.
  • The script to check the status of the job queue on SGE has been modified to
    be faster and use less ram.
  • Fixed a bug in handling of the retain parameter for pipelines.
  • The mount options are now reported for the mounts on which the pipestance and
    binary directories are located.
  • Memory cgroups limits are now detected, reported, and used as default limits
    where applicable. This should be especially helpful for users submitting mrp
    to a cluster such as SLURM which uses memory cgroups to prevent jobs from
    using too much memory, by preventing mrp from trying to use more than the
    job's allowance.
  • Martian can now (experimentally) be built experimentally with bazel.