Tags: alejandro-rivera/mrjob
Tags
SO many bugfixes * jobs: * MRStep's constructor treats kwarg=None same as not setting it (Yelp#970) * parse_counters() and parse_output() are deprecated (Yelp#829) * self.mr is deprecated in favor of MRStep (Yelp#815) * runners: * All runners: * You can now set strict_protocols from mrjob.conf (Yelp#726) * new --no-strict-protocols command-line option * streaming output from closed runner shows a warning (Yelp#853) * EMR: * --check-input-paths and --no-check-input-paths options (Yelp#864) * skip (very slow) validation of s3 buckets if boto < 2.25.0 (Yelp#865) * Fix for max_hours_idle bug that was terminating job flows early (Yelp#932) * Job flows are visible to all IAM users by default (Yelp#922) * --emr-api-param allows users to pass additional parameters to boto's EMR API (Yelp#879) * unset paramaters with --no-emr-api-param * bootstrap_python_packages (deprecated) now works on 3.x EMR AMIs (Yelp#863) * Use TERMINATE_CLUSTER instead of deprecated TERMINATE_JOB_FLOW (Yelp#974) * updated EC2 instance type data for pooling (Yelp#995) * Hadoop: * exclude hadoop source jars when looking for streaming jar (Yelp#861) * Fixed mkdir_on_hdfs for Hadoop version 2.x (Yelp#923) * Fixed hadoop_bin on Windows (Yelp#843) * Local * bootstrap mrjob by default (Yelp#984) * Inline * fix for add_file_option() (Yelp#851) * cd to job's working directory before instantiating mrjob class (Yelp#988) * Use pytest to run tests (Yelp#898) * collect-emr-active-stats subcommand (Yelp#947) * Using xtrace flag to get more output during bootstrap (Yelp#943) * Fixed log printouts for command line tools (Yelp#901) * Fix to avoid interpreting windows paths as URIs (Yelp#880) * Better error message when ssh keyfile is missing (Yelp#858) * Update EMR tool ISO8601 parsing to be consistent with EMR runner (Yelp#869) * Dropped support for Python 2.5 (Yelp#713) * Dropped support for the 1.x EMR AMI series, which uses Python 2.5
that's one small step for a JAR * jobs: * can interpolate input and output path(s) into arguments of JarSteps, so they can be part of multi-step jobs (Yelp#773) * see mrjob/examples/mr_jar_step_example.py * JarStep now takes keyword arguments only (Yelp#769) * removed useless "name" field; "step_args" is now just "args" * MRJobStep (usually accessed via MRJob.mr()) is now MRStep * runners: * All runners: * --setup is now fully functional (Yelp#206) * --python-archive, --setup-cmd, and --setup-script are deprecated * --bootstrap option works and uses sh (Yelp#206) * --bootstrap-cmd, --bootstrap-file, --bootstrap-python-package, --bootstrap-script are deprecated * setup commands can no longer corrupt a task's input and output (Yelp#803) * sh_bin is now "sh -e" by default so setup fails fast (Yelp#810) * default is "/bin/sh -e" on EMR * EMR: * JarSteps work again (Yelp#763) * auto-uploads jars for JarSteps (Yelp#772) * JARs on the EMR instances can be accessed with file:/// URIs * ssh_cat() no longer raises an error when catting a file containing an error (Yelp#807) * Fixed SignatureDoesNotMatchError that happens with boto 2.10.0+ with Python prior to 2.7.5 (Yelp#778) * Hadoop: * now handles JarSteps too (Yelp#770) * Fix to mrjob.parse.urlparse() that was breaking Python 2.5 * mrjob.util.buffer_iterator_to_line_iterator() is now more efficient and uses a bounded amount of memory * bz2 decompression no longer discards data (Yelp#817)
secondary sort and self-terminating job flows * jobs: * SORT_VALUES: Secondary sort by value (Yelp#240) * see mrjob/examples/ * can now override jobconf() again (Yelp#656) * renamed mrjob.compat.get_jobconf_value() to jobconf_from_env() * examples: * bash_wrap/ (mapper/reducer_cmd() example) * mr_most_used_word.py (two step job) * mr_next_word_stats.py (SORT_VALUES example) * runners: * All runners: * single --setup option works but is not yet documented (Yelp#206) * setup now uses sh rather than python internally * EMR runner: * max_hours_idle: self-terminating idle job flows (Yelp#628) * mins_to_end_of_hour option gives finer control over self-termination. * Can reuse pooled job flows where previous job failed (Yelp#633) * Throws IOError if output path already exists (Yelp#634) * Gracefully handles SSL cert issues (Yelp#621, Yelp#706) * Automatically infers EMR/S3 endpoints from region (Yelp#658) * ls() supports s3n:// schema (Yelp#672) * Fixed log parsing crash on JarSteps (Yelp#645) * visible_to_all_users works with boto <2.8.0 (Yelp#701) * must use --interpreter with non-Python scripts (Yelp#683) * cat() can decompress gzipped data (Yelp#601) * Hadoop runner: * check_input_paths: can disable input path checking (Yelp#583) * cat() can decompress gzipped data (Yelp#601) * Inline/Local runners: * Fixed counter parsing for multi-step jobs in inline mode * Supports per-step jobconf (Yelp#616) * Documentation revamp * mrjob.parse.urlparse() works consistently across Python versions (Yelp#686) * deprecated: * many constants in mrjob.emr replaced with functions in mrjob.aws * removed deprecated features: * old conf locations (~/.mrjob and in PYTHONPATH) (Yelp#747) * built-in protocols must be instances (Yelp#488)
v0.3.5, 2012-08-21 -- The Last Ride of v0.3.x[?] * EMR: * --pool-wait-minutes option lets you wait up to X minutes before creating a job flow (Yelp#455) * Job flow ID included in error messages on failure (Yelp#452) * JOB and JOB_FLOW cleanup options (Yelp#485, Yelp#455) * EMR and Hadoop: * Compatibility fixes related to deprecated options and Hadoop's bizarre non-sequential version numbers (Yelp#489, Yelp#534) * Other: * Warn when *_PROTOCOL is not a class (Yelp#490) * Bug fixes: * Unicode strings can be used when specifying interpreters (Yelp#431) * --enable-emr-logging no longer causes the wrong counters/logs to be parsed (Yelp#446) * TMP_DIR inserted into 'sort' environment variables (Yelp#477) * Setting hadoop_home in mrjob.conf works again * Gzipped input files work when specified with relative paths (Yelp#494) * Passthrough options are not re-ordered when sent to Hadoop Streaming (Yelp#509) * json module is supported again if simplejson doesn't exist (Yelp#544) * HadoopJobRunner.path_exists() is no longer backwards (Yelp#549)
PreviousNext