Skip to content

The parcel.json file

cynthia edited this page Jun 24, 2021 · 29 revisions

The primary metadata file for a parcel is parcel.json. While all the other files are optional, this one must exist for a parcel to be a parcel.

A parcel.json Example

The following example is a simplified version of the CDH parcel.json. It truncates the long component/package/user lists, but illustrates all of the different parts of the file.

{
  "schema_version": 1,
  "name": "CDH",
  "version": "5.0.0",
  "setActiveSymlink": true,

  "depends": "",
  "replaces": "IMPALA, SOLR, SPARK",
  "conflicts": "",

  "provides": [
    "cdh",
    "impala",
    "solr",
    "spark"
  ],

  "scripts": {
    "defines": "cdh_env.sh"
  },

  "packages": [
    { "name"   : "hadoop",
      "version": "2.2.0+cdh5.0.0+609-0.cdh5b2.p0.386~precise-cdh5.0.0"
    },
    { "name"   : "hadoop-client",
      "version": "2.2.0+cdh5.0.0+609-0.cdh5b2.p0.386~precise-cdh5.0.0"
    },
    { "name"   : "hadoop-hdfs",
      "version": "2.2.0+cdh5.0.0+609-0.cdh5b2.p0.386~precise-cdh5.0.0"
    }
  ],

  "components": [
    { "name"       : "hadoop",
      "version"    : "2.2.0-cdh5.0.0-SNAPSHOT",
      "pkg_version": "2.2.0+cdh5.0.0+609",
      "pkg_release": "0.cdh5.0.0.p0.386"
    },
    { "name"       : "hadoop-hdfs",
      "version"    : "2.2.0-cdh5.0.0-SNAPSHOT",
      "pkg_version": "2.2.0+cdh5.0.0+609",
      "pkg_release": "0.cdh5.0.0.p0.386"
    }
  ],

  "users": {
    "hdfs": {
      "longname"    : "Hadoop HDFS",
      "home"        : "/var/lib/hadoop-hdfs",
      "shell"       : "/bin/bash",
      "extra_groups": [ "hadoop" ]
    },
    "impala": {
      "longname"    : "Impala",
      "home"        : "/var/run/impala",
      "shell"       : "/bin/bash",
      "extra_groups": [ "hive", "hdfs" ]
    }
  },

  "groups": [
    "hadoop"
  ]
}

Fields

Schema Version

  • schema_version: For a parcel to conform to this documentation, and pass validation with our tool, it must declare its schema_version to be '1'.

Basic Identity information

  • name: The name of the parcel.
  • version: The version of the parcel.

The name and version fields are the fundamental fields that identify a parcel; only one parcel may provide a given (name, version) combination, and for any given name, Cloudera Manager will only consider one version to be active at a time, for a particular cluster.

Parcel directories must be named 'name-version' to match these two fields. Cloudera Manager will flag any parcel where this is not true as invalid.

The parcel tarball (either compressed or not) files must similarly be named 'name-version-distro_suffix.parcel' to be considered valid. See [here](Parcel Distro Suffixes) for details on distro suffixes.

Additionally, the name cannot contain a dash character '-'. This is reserved for separating the name and version fields.

Inter-parcel dependency information

  • depends: List parcels that must be present for this parcel to function. This parcel cannot be activated until the dependencies are present.
  • replaces: List parcels that are replaced by this parcel. This parcel cannot be activated until the replaced parcels are removed.
  • conflicts: List parcels that conflict with this parcel. Only one of the conflicting parcels can be active at a time.

These rules follow the the syntax used for Debian packages.

Providing tags

  • provides: This field is a list of strings, where each string is a tag that indicates that interesting functionality the parcel provides for services being managed by Cloudera Manager

When Cloudera Manager starts a process as part of a managed service, that process will require access to files from one or more parcels. Most obviously, the binary that's being run has to come from one parcel or another, but there may be plugins or extensions provided by other parcels. To avoid exposing every process to every parcel, we have a concept of tagging - where a parcel can publish tags and services can subscribe to them - in this way we can indicate what parcels should affect what processes.

We do not use parcel names for this purpose as that requires the service handler to know about all possible relevant parcels ahead of time - which is impossible when trying to account for parcels that provide plugins or extensions - we also did not want to restrict the ability for files to be moved between parcels, or for parcels to be consolidated or split apart.

The service handlers built into Cloudera Manager look for a specific set of tags, and third party service handlers provided through CSDs can look for those tags or any new ones they wish to define. Likewise, parcels can provide those new tags so that a third party service handler can connect with a third party parcel.

See here for the list of tags recognised by the built-in service handlers

If the tag list is empty or a '*' entry is provided, then the parcel will affect all processes for all services. Do not do this without good reason, as Cloudera Manager will consider the parcel to be in-use as long as any process is using it, preventing it from being removed from the cluster; this can cause hassles for users if the processes don't really need the parcel.

In the rare event that you have no services or processes contributed by your parcel, its reasonable to use a arbitrary placeholder tag, that seems reasonably titled for your parcel. Please avoid tags from service handlers built into Cloudera Manager, as detailed above.

Scripts

Currently, there is only one script that can be specified and used by Cloudera Manager, although that may change in the future.

  • defines: This script is sourced into the environment of each process that the parcel affects (based on the tags). Even if a parcel doesn't require any environment variables to be defined, this script must be provided (the script itself can be empty).

The 'defines' script does the real work to make a parcel's contents accessible to processes. This is a substantial topic, discussed [here](The Parcel Defines Script).

Packages

The packages section is purely informative, and Cloudera Manager does not use any of the information provided here. This section is intended for situations where a parcel is equivalent to a set of packages (as is the case for CDH), and it allows for that set of packages to be documented.

  • name: The name of the package
  • version: The version of the package

Components

The components section is similar to the packages section, but there are two important differences. Firstly, these entries correspond to the logical components inside the parcel - which may not map 1:1 to any packages, and secondly, the component entries are consumed by Cloudera Manager. They are used to populate the installed components table for Hosts in the CM UI and are compared for cross-host consistency by the CM host inspector.

The package related fields are intended to be optional; if your parcel is not built from the contents of packages, these fields are not relevant and ought to be left out. However, initial releases of CM (5.0.0, 5.0.1, 5.0.2) have a bug which makes pkg_version required. If the field is not meaningful for your parcel, just repeat the version here.

  • name: The name of the component
  • version: The version of the component
  • pkg_version: The version of the equivalent packaged component
  • pkg_release: (Optional) The release string of the equivalent packaged component

Users

This section describes any additional system users that are required by the programs contained in the parcel. Each user is described by an entry in this section.

  • Map Key: The key string is the desired username. eg: "hdfs"
  • longname: The descriptive name of the user
  • home: The user's home directory
  • shell: The user's shell program
  • extra_groups: A list of additional groups to add the user to. If you are adding a user to another user's group, then that other user must be listed first in this section.

Note that CM can be configured to not add the specified users. A product user can decide to disable this if their IT policies or infrastructure do not allow local user creation.

Groups

This is a list of any extra groups that should be created beyond the per-user groups that are created for each of the users listed in the users section.

Note that CM can be configured to not add the specified groups. A product user can decide to disable this if their IT policies or infrastructure do not allow local group creation.