Skip to content

v3.2.0-pre1: Martian 3.2.0 release candidate.

Compare
Choose a tag to compare
@adam-azarchs adam-azarchs released this 14 Jan 18:49
Major new features:
* The Python stage code adapter now works with Python 3.
* Martian can now account for virtual address space size, in addition to
  physical memory.
  * Normally, virtual address space (vmem) size is ignored, since modern
    linux systems have no good reason to restrict it - vmem size is not
    the same as rss+swap, contrary to inexplicably popular belief.
  * In local mode, a limit may be specified with the `--localvmem` flag.
  * A limit will also be imposed automatically if a virtual size rlimit
    (e.g. `ulimit -d` or `ulimit -v`) is detected by mrp.  SGE's
    `h_vmem`, `s_vmem`, `h_data`, and `s_data` resource specifiers set
    these limits.
  * In cluster mode job templates, users may now use `__MRO_VMEM_GB__`
    and related variables in the same way as the existing
    `__MRO_MEM_GB__` variables to get the predicted virtual address
    space (vmem) size rather than the physical memory requirement.
  * The job mode configuration for cluster modes found in
    `jobmanagers/config.json` may set the `mem_is_vmem` key to `true`,
    in which case `__MRO_MEM_GB__` and related template variables will
    also use the virtual address space size, for backwards compatibility
    with existing user templates (most SGE clusters mistakenly enforce
    virtual size, if they handle anything like memory reservations at
    all).
  * Stages may specify a `vmem_gb` requirement in addition to `mem_gb`,
    through all of the same existing mechanisms:
    * Specifying `using ( vmem_gb = 4, )` in the mro declaration of the
      stage.
    * Specifying `__vmem_gb` in the chunk or join definitions returned
      by a split phase.
    * In overrides.json.
  * Stages which do not specify a vmem requirement will be allocated an
    amount equal to their physical memory requirement plus a constant
    specified in the `extra_vmem_per_job` key configured in
    `jobmanagers/config.json`.
  * With `--monitor`, `mrjob` will now restrict stage virtual size as
    well as physical size, to make sure the requests are being set
    correctly.  It will include its own virtual size in the restriction,
    but will not include the virtual size of profiling jobs (e.g.
   `perf record`) which may be running alongside the stage code.

Minor improvements:
* mrp now checks for stage completion whenever local-mode jobs complete.
  Previously it would check every 3 seconds regardless.  For very short
  jobs (such as, frequently, split phases) this results in shorter
  pipeline wall times.  While the impact on large pipelines should be
  tiny in percentage terms, this significantly accelerates integration
  tests.
* `make tarball` now produces both `tar.gz` and `tar.xz`.
* Improvements to tests.
  * Integration tests can now run in parallel (`make -j longtests`)
  * Fix some bugs in integration test result validation.
  * More test coverage for both unit and integration tests.
* Pipelines should be more robust against missed or delayed updates
  from the pipestance journal directory.  Rather than timing out,
  mrp will now check whether the file exists if a notification wasn't
  seen.
* `mrjob` now includes its own memory usage in the statistics included
  in the jobinfo, which are used to generate the `_perf` summary..

Bug fixes:
* Fix a potential deadlock when mrp receives a signal (e.g. from `kill`)
  or a shutdown request over the API while it is in the middle of
  starting or restarting a pipeline.
* Fix a crash in `mrf --includes` if a stage called by a pipeline was
  not present in the transitive includes of the file defining the
  pipeline.
* Fix a bug in `mrf --includes` which resulted in duplicate declarations
  for existing user-defined file types.
* Updated npm dependencies.
* `mrjob` will now begin waiting on the profiling command (e.g.
  `perf record`) immediately, rather than waiting until the stage code
  finishes.  This prevents zombie processes lying around if the
  profiling command finishes before the stage code.