Skip to content

Tags: alejandro-rivera/mrjob

Tags

v0.4.4

EMRgency!

 * runners:
   * EMR:
     * Create IAM objects as needed (unbreaks mrjob for new accounts) (Yelp#999)
     * --iam-job-flow-role renamed to --iam-instance-profile (Yelp#1001)
     * new --iam-service-role option (Yelp#1005)

v0.4.3

SO many bugfixes

 * jobs:
   * MRStep's constructor treats kwarg=None same as not setting it (Yelp#970)
   * parse_counters() and parse_output() are deprecated (Yelp#829)
   * self.mr is deprecated in favor of MRStep (Yelp#815)
 * runners:
   * All runners:
     * You can now set strict_protocols from mrjob.conf (Yelp#726)
       * new --no-strict-protocols command-line option
     * streaming output from closed runner shows a warning (Yelp#853)
   * EMR:
     * --check-input-paths and --no-check-input-paths options (Yelp#864)
     * skip (very slow) validation of s3 buckets if boto < 2.25.0 (Yelp#865)
     * Fix for max_hours_idle bug that was terminating job flows early (Yelp#932)
     * Job flows are visible to all IAM users by default (Yelp#922)
     * --emr-api-param allows users to pass additional parameters to boto's
       EMR API (Yelp#879)
       * unset paramaters with --no-emr-api-param
     * bootstrap_python_packages (deprecated) now works on 3.x EMR AMIs (Yelp#863)
     * Use TERMINATE_CLUSTER instead of deprecated TERMINATE_JOB_FLOW (Yelp#974)
     * updated EC2 instance type data for pooling (Yelp#995)
   * Hadoop:
     * exclude hadoop source jars when looking for streaming jar (Yelp#861)
     * Fixed mkdir_on_hdfs for Hadoop version 2.x (Yelp#923)
     * Fixed hadoop_bin on Windows (Yelp#843)
   * Local
     * bootstrap mrjob by default (Yelp#984)
   * Inline
     * fix for add_file_option() (Yelp#851)
     * cd to job's working directory before instantiating mrjob class (Yelp#988)
 * Use pytest to run tests (Yelp#898)
 * collect-emr-active-stats subcommand (Yelp#947)
 * Using xtrace flag to get more output during bootstrap (Yelp#943)
 * Fixed log printouts for command line tools (Yelp#901)
 * Fix to avoid interpreting windows paths as URIs (Yelp#880)
 * Better error message when ssh keyfile is missing (Yelp#858)
 * Update EMR tool ISO8601 parsing to be consistent with EMR runner (Yelp#869)
 * Dropped support for Python 2.5 (Yelp#713)
   * Dropped support for the 1.x EMR AMI series, which uses Python 2.5

v0.4.2

that's one small step for a JAR

 * jobs:
   * can interpolate input and output path(s) into arguments of JarSteps,
     so they can be part of multi-step jobs (Yelp#773)
     * see mrjob/examples/mr_jar_step_example.py
   * JarStep now takes keyword arguments only (Yelp#769)
     * removed useless "name" field; "step_args" is now just "args"
   * MRJobStep (usually accessed via MRJob.mr()) is now MRStep
 * runners:
   * All runners:
     * --setup is now fully functional (Yelp#206)
       * --python-archive, --setup-cmd, and --setup-script are deprecated
     * --bootstrap option works and uses sh (Yelp#206)
       * --bootstrap-cmd, --bootstrap-file, --bootstrap-python-package,
         --bootstrap-script are deprecated
     * setup commands can no longer corrupt a task's input and output (Yelp#803)
     * sh_bin is now "sh -e" by default so setup fails fast (Yelp#810)
       * default is "/bin/sh -e" on EMR
   * EMR:
     * JarSteps work again (Yelp#763)
     * auto-uploads jars for JarSteps (Yelp#772)
       * JARs on the EMR instances can be accessed with file:/// URIs
     * ssh_cat() no longer raises an error when catting a file
       containing an error (Yelp#807)
     * Fixed SignatureDoesNotMatchError that happens with boto 2.10.0+
       with Python prior to 2.7.5 (Yelp#778)
   * Hadoop:
     * now handles JarSteps too (Yelp#770)
 * Fix to mrjob.parse.urlparse() that was breaking Python 2.5
 * mrjob.util.buffer_iterator_to_line_iterator() is now more efficient
   and uses a bounded amount of memory
 * bz2 decompression no longer discards data (Yelp#817)

v0.4.1

secondary sort and self-terminating job flows

 * jobs:
   * SORT_VALUES: Secondary sort by value (Yelp#240)
     * see mrjob/examples/
   * can now override jobconf() again (Yelp#656)
   * renamed mrjob.compat.get_jobconf_value() to jobconf_from_env()
   * examples:
     * bash_wrap/ (mapper/reducer_cmd() example)
     * mr_most_used_word.py (two step job)
     * mr_next_word_stats.py (SORT_VALUES example)
 * runners:
   * All runners:
     * single --setup option works but is not yet documented (Yelp#206)
     * setup now uses sh rather than python internally
   * EMR runner:
     * max_hours_idle: self-terminating idle job flows (Yelp#628)
       * mins_to_end_of_hour option gives finer control over self-termination.
     * Can reuse pooled job flows where previous job failed (Yelp#633)
     * Throws IOError if output path already exists (Yelp#634)
     * Gracefully handles SSL cert issues (Yelp#621, Yelp#706)
     * Automatically infers EMR/S3 endpoints from region (Yelp#658)
     * ls() supports s3n:// schema (Yelp#672)
     * Fixed log parsing crash on JarSteps (Yelp#645)
     * visible_to_all_users works with boto <2.8.0 (Yelp#701)
     * must use --interpreter with non-Python scripts (Yelp#683)
     * cat() can decompress gzipped data (Yelp#601)
   * Hadoop runner:
     * check_input_paths: can disable input path checking (Yelp#583)
     * cat() can decompress gzipped data (Yelp#601)
   * Inline/Local runners:
     * Fixed counter parsing for multi-step jobs in inline mode
     * Supports per-step jobconf (Yelp#616)
 * Documentation revamp
 * mrjob.parse.urlparse() works consistently across Python versions (Yelp#686)
 * deprecated:
   * many constants in mrjob.emr replaced with functions in mrjob.aws
 * removed deprecated features:
   * old conf locations (~/.mrjob and in PYTHONPATH) (Yelp#747)
   * built-in protocols must be instances (Yelp#488)

v0.4.0

v0.4, 2013-04-30 -- Slouching toward nirvana

v0.4.0pre3

0.4 RC3

v0.4.0pre2

0.4 RC2

v0.4.0pre1

First release candidate for the 0.4 release

v0.3.5

v0.3.5, 2012-08-21 -- The Last Ride of v0.3.x[?]

 * EMR:
   * --pool-wait-minutes option lets you wait up to X minutes before creating a
     job flow (Yelp#455)
   * Job flow ID included in error messages on failure (Yelp#452)
   * JOB and JOB_FLOW cleanup options (Yelp#485, Yelp#455)
 * EMR and Hadoop:
   * Compatibility fixes related to deprecated options and Hadoop's bizarre
     non-sequential version numbers (Yelp#489, Yelp#534)
 * Other:
   * Warn when *_PROTOCOL is not a class (Yelp#490)
 * Bug fixes:
   * Unicode strings can be used when specifying interpreters (Yelp#431)
   * --enable-emr-logging no longer causes the wrong counters/logs to be parsed
     (Yelp#446)
   * TMP_DIR inserted into 'sort' environment variables (Yelp#477)
   * Setting hadoop_home in mrjob.conf works again
   * Gzipped input files work when specified with relative paths (Yelp#494)
   * Passthrough options are not re-ordered when sent to Hadoop Streaming
     (Yelp#509)
   * json module is supported again if simplejson doesn't exist (Yelp#544)
   * HadoopJobRunner.path_exists() is no longer backwards (Yelp#549)

v0.3.4.1

v0.3.4.1, 2012-06-12 -- The test suite doesn't catch everything...

 * Local mode doesn't try to send multiple mappers to the same output file
   when using multiple compressed files as input