Skip to content

The parcel.json file

adar edited this page Feb 6, 2014 · 29 revisions

The primary metadata file for a parcel is parcel.json. While all the other files are optional, this one must exist for a parcel to be a parcel.

A parcel.json Example

The following example is a simplified version of the CDH parcel.json. It truncates the long component/package/user lists, but illustrates all of the different parts of the file.

{
  "name": "CDH",
  "version": "5.0.0",
  "setActiveSymlink": true,

  "depends": "",
  "replaces": "IMPALA, SOLR, SPARK",
  "conflicts": "",

  "provides": [
    "cdh",
    "impala",
    "solr",
    "spark"
  ],

  "scripts": {
    "defines": "cdh_env.sh"
  },

  "packages": [
    { "name"   : "hadoop",
      "version": "2.2.0+cdh5.0.0+609-0.cdh5b2.p0.386~precise-cdh5.0.0"
    },
    { "name"   : "hadoop-client",
      "version": "2.2.0+cdh5.0.0+609-0.cdh5b2.p0.386~precise-cdh5.0.0"
    },
    { "name"   : "hadoop-hdfs",
      "version": "2.2.0+cdh5.0.0+609-0.cdh5b2.p0.386~precise-cdh5.0.0"
    }
  ],

  "components": [
    { "name"       : "hadoop",
      "version"    : "2.2.0-cdh5.0.0-SNAPSHOT",
      "pkg_version": "2.2.0+cdh5.0.0+609"
    },
    { "name"       : "hadoop-hdfs",
      "version"    : "2.2.0-cdh5.0.0-SNAPSHOT",
      "pkg_version": "2.2.0+cdh5.0.0+609"
    }
  ],

  "users": {
    "hdfs": {
      "longname"    : "Hadoop HDFS",
      "home"        : "/var/lib/hadoop-hdfs",
      "shell"       : "/bin/bash",
      "extra_groups": [ "hadoop" ]
    },
    "impala": {
      "longname"    : "Impala",
      "home"        : "/var/run/impala",
      "shell"       : "/bin/bash",
      "extra_groups": [ "hive", "hdfs" ]
    }
  },

  "groups": [
    "hadoop"
  ]
}

Fields

Basic Identity information

  • name: The name of the parcel.
  • version: The version of the parcel.

The name and version fields are the fundamental fields that identify a parcel; only one parcel may provide a given (name, version) combination, and for any given name, Cloudera Manager will only consider one version to be active at a time, for a particular cluster.

Parcel directories must be named 'name-version' to match these two fields. Cloudera Manager will flag any parcel where this is not true as invalid.

Compressed parcel files must similarly be named 'name-version-distro_suffix.parcel' to be considered valid. See [here](Parcel Distro Suffixes) for details on distro suffixes.

Inter-parcel dependency information

  • depends: List parcels that must be present for this parcel to function. This parcel cannot be activated until the dependencies are present.
  • replaces: List parcels that are replaced by this parcel. This parcel cannot be activated until the replaced parcels are removed.
  • conflicts: List parcels that conflict with this parcel. Only one of the conflicting parcels can be active at a time.

These rules follow the the syntax used for Debian packages.

Providing tags

  • provides: This field is a list of strings, where each string is a tag that indicates that interesting functionality the parcel provides for services being managed by Cloudera Manager

When Cloudera Manager starts a process as part of a managed service, that process will require access to files from one or more parcels. Most obviously, the binary that's being run has to come from one parcel or another, but there may be plugins or extensions provided by other parcels. To avoid exposing every process to every parcel, we have a concept of tagging - where a parcel can publish tags and services can subscribe to them - in this way we can indicate what parcels should affect what processes.

We do not use parcel names for this purpose as that requires the service handler to know about all possible relevant parcels ahead of time - which is impossible when trying to account for parcels that provide plugins or extensions - we also did not want to restrict the ability for files to be moved between parcels, or for parcels to be consolidated or split apart.

The service handlers built into Cloudera Manager look for a specific set of tags, and third party service handlers provided through CSDs can look for those tags or any new ones they wish to define. Likewise, parcels can provide those new tags so that a third party service handler can connect with a third party parcel.

See [here](Service-Parcel tags recognised by Cloudera Manager) for the list of tags recognised by the built-in service handlers

If the tag list is empty or a '*' entry is provided, then the parcel will affect all processes for all services. Do not do this without good reason, as Cloudera Manager will consider the parcel to be in-use as long as any process is using it, preventing it from being removed from the cluster; this can cause hassles for users if the processes don't really need the parcel.

Scripts

Currently, there is only one script that can be specified and used by Cloudera Manager, although that may change in the future.

  • defines: This script is sourced into the environment of each process that the parcel affects (based on the tags)

The 'defines' script does the real work to make a parcel's contents accessible to processes. This is a substantial topic, discussed [here](The Parcel Defines Script).