Martian 3.2.0
Martian 3.2.0 release.
Major new features:
- The Python stage code adapter now works with Python 3.
- Martian can now account for virtual address space size, in addition to
physical memory.- Normally, virtual address space (vmem) size is ignored, since modern
linux systems have no good reason to restrict it - vmem size is not
the same as rss+swap, contrary to inexplicably popular belief. - In local mode, a limit may be specified with the
--localvmem
flag. - A limit will also be imposed automatically if a virtual size rlimit
(e.g.ulimit -d
orulimit -v
) is detected by mrp. SGE's
h_vmem
,s_vmem
,h_data
, ands_data
resource specifiers set
these limits. - In cluster mode job templates, users may now use
__MRO_VMEM_GB__
and related variables in the same way as the existing
__MRO_MEM_GB__
variables to get the predicted virtual address
space (vmem) size rather than the physical memory requirement. - The job mode configuration for cluster modes found in
jobmanagers/config.json
may set themem_is_vmem
key totrue
,
in which case__MRO_MEM_GB__
and related template variables will
also use the virtual address space size, for backwards compatibility
with existing user templates (most SGE clusters mistakenly enforce
virtual size, if they handle anything like memory reservations at
all). This is turned on by default for SGE. - Stages may specify a
vmem_gb
requirement in addition tomem_gb
,
through all of the same existing mechanisms:- Specifying
using ( vmem_gb = 4, )
in the mro declaration of the
stage. - Specifying
__vmem_gb
in the chunk or join definitions returned
by a split phase. - In overrides.json.
- Specifying
- Stages which do not specify a vmem requirement will be allocated an
amount equal to their physical memory requirement plus a constant
specified in theextra_vmem_per_job
key configured in
jobmanagers/config.json
. - With
--monitor
,mrjob
will now restrict stage virtual size as
well as physical size, to make sure the requests are being set
correctly. It will include its own virtual size in the restriction,
but will not include the virtual size of profiling jobs (e.g.
perf record
) which may be running alongside the stage code.
- Normally, virtual address space (vmem) size is ignored, since modern
- Update graph UI page
- Reduce the amount of excess bytes required to render the page.
- Inline the 7% of bootstrap.min.css we actually use.
- Remove the fonts, just use an svg icon instead.
- Remove the clipboard button, since it hasn't actually worked in a
long time.
- Remove dead js files. These files either were already not being
included in the serve package or are no longer required. - Concatenate javascript source files together.
- Remove duplicated DOM element IDs.
- Get angular, dagare-d3 from npm, as well as support libraries d3 and
lodash. This means we're no longer shipping an insecure version of
lodash. - Add pan/zoom now works on the graph page.
- Reduce the amount of excess bytes required to render the page.
- MRO syntax now supports escaping for string literals, using json
escaping syntax.
Minor improvements:
- mrp now checks for stage completion whenever local-mode jobs complete.
Previously it would check every 3 seconds regardless. For very short
jobs (such as, frequently, split phases) this results in shorter
pipeline wall times. While the impact on large pipelines should be
tiny in percentage terms, this significantly accelerates integration
tests. make tarball
now produces bothtar.gz
andtar.xz
.- Improvements to tests.
- Integration tests can now run in parallel (
make -j longtests
) - Fix some bugs in integration test result validation.
- More test coverage for both unit and integration tests.
- Integration tests can now run in parallel (
- Pipelines should be more robust against missed or delayed updates
from the pipestance journal directory. Rather than timing out,
mrp will now check whether the file exists if a notification wasn't
seen. mrjob
now includes its own memory usage in the statistics included
in the jobinfo, which are used to generate the_perf
summary..
Bug fixes:
- Fix a potential deadlock when mrp receives a signal (e.g. from
kill
)
or a shutdown request over the API while it is in the middle of
starting or restarting a pipeline. - Fix a crash in
mrf --includes
if a stage called by a pipeline was
not present in the transitive includes of the file defining the
pipeline. - Fix a bug in
mrf --includes
which resulted in duplicate declarations
for existing user-defined file types. - Updated npm dependencies.
mrjob
will now begin waiting on the profiling command (e.g.
perf record
) immediately, rather than waiting until the stage code
finishes. This prevents zombie processes lying around if the
profiling command finishes before the stage code.mrp
will no longer read chunk_outs
files if no chunk outputs
were expected, e.g. for pre-flight stages. This prevents spurious
errors when chunk outputs were not a dictionary object. It also
means chunk outputs need to be properly declared if the stage has
no outputs.