Skip to content

Martian 3.2.0

Compare
Choose a tag to compare
@adam-azarchs adam-azarchs released this 14 Jan 18:46

Martian 3.2.0 release.
Major new features:

  • The Python stage code adapter now works with Python 3.
  • Martian can now account for virtual address space size, in addition to
    physical memory.
    • Normally, virtual address space (vmem) size is ignored, since modern
      linux systems have no good reason to restrict it - vmem size is not
      the same as rss+swap, contrary to inexplicably popular belief.
    • In local mode, a limit may be specified with the --localvmem flag.
    • A limit will also be imposed automatically if a virtual size rlimit
      (e.g. ulimit -d or ulimit -v) is detected by mrp. SGE's
      h_vmem, s_vmem, h_data, and s_data resource specifiers set
      these limits.
    • In cluster mode job templates, users may now use __MRO_VMEM_GB__
      and related variables in the same way as the existing
      __MRO_MEM_GB__ variables to get the predicted virtual address
      space (vmem) size rather than the physical memory requirement.
    • The job mode configuration for cluster modes found in
      jobmanagers/config.json may set the mem_is_vmem key to true,
      in which case __MRO_MEM_GB__ and related template variables will
      also use the virtual address space size, for backwards compatibility
      with existing user templates (most SGE clusters mistakenly enforce
      virtual size, if they handle anything like memory reservations at
      all). This is turned on by default for SGE.
    • Stages may specify a vmem_gb requirement in addition to mem_gb,
      through all of the same existing mechanisms:
      • Specifying using ( vmem_gb = 4, ) in the mro declaration of the
        stage.
      • Specifying __vmem_gb in the chunk or join definitions returned
        by a split phase.
      • In overrides.json.
    • Stages which do not specify a vmem requirement will be allocated an
      amount equal to their physical memory requirement plus a constant
      specified in the extra_vmem_per_job key configured in
      jobmanagers/config.json.
    • With --monitor, mrjob will now restrict stage virtual size as
      well as physical size, to make sure the requests are being set
      correctly. It will include its own virtual size in the restriction,
      but will not include the virtual size of profiling jobs (e.g.
      perf record) which may be running alongside the stage code.
  • Update graph UI page
    • Reduce the amount of excess bytes required to render the page.
      • Inline the 7% of bootstrap.min.css we actually use.
      • Remove the fonts, just use an svg icon instead.
      • Remove the clipboard button, since it hasn't actually worked in a
        long time.
    • Remove dead js files. These files either were already not being
      included in the serve package or are no longer required.
    • Concatenate javascript source files together.
    • Remove duplicated DOM element IDs.
    • Get angular, dagare-d3 from npm, as well as support libraries d3 and
      lodash. This means we're no longer shipping an insecure version of
      lodash.
    • Add pan/zoom now works on the graph page.
  • MRO syntax now supports escaping for string literals, using json
    escaping syntax.

Minor improvements:

  • mrp now checks for stage completion whenever local-mode jobs complete.
    Previously it would check every 3 seconds regardless. For very short
    jobs (such as, frequently, split phases) this results in shorter
    pipeline wall times. While the impact on large pipelines should be
    tiny in percentage terms, this significantly accelerates integration
    tests.
  • make tarball now produces both tar.gz and tar.xz.
  • Improvements to tests.
    • Integration tests can now run in parallel (make -j longtests)
    • Fix some bugs in integration test result validation.
    • More test coverage for both unit and integration tests.
  • Pipelines should be more robust against missed or delayed updates
    from the pipestance journal directory. Rather than timing out,
    mrp will now check whether the file exists if a notification wasn't
    seen.
  • mrjob now includes its own memory usage in the statistics included
    in the jobinfo, which are used to generate the _perf summary..

Bug fixes:

  • Fix a potential deadlock when mrp receives a signal (e.g. from kill)
    or a shutdown request over the API while it is in the middle of
    starting or restarting a pipeline.
  • Fix a crash in mrf --includes if a stage called by a pipeline was
    not present in the transitive includes of the file defining the
    pipeline.
  • Fix a bug in mrf --includes which resulted in duplicate declarations
    for existing user-defined file types.
  • Updated npm dependencies.
  • mrjob will now begin waiting on the profiling command (e.g.
    perf record) immediately, rather than waiting until the stage code
    finishes. This prevents zombie processes lying around if the
    profiling command finishes before the stage code.
  • mrp will no longer read chunk _outs files if no chunk outputs
    were expected, e.g. for pre-flight stages. This prevents spurious
    errors when chunk outputs were not a dictionary object. It also
    means chunk outputs need to be properly declared if the stage has
    no outputs.