Skip to content
Piotr Wendykier edited this page Sep 12, 2013 · 37 revisions

Table of Contents

Introduction

Oozie Maven Plugin (OMP) was created to simplify the creation of packages (tar.gz files) containing workflow definition of Apache Oozie as well as all other files needed to run a workflow (configuration files, libraries, etc.). In addition, this plugin enables workflow's reusability -- generated packages can be uploaded to a Maven's repository, added as a dependency to other workflows and reused as subworkflows. The plugin defines a new type of Maven's artifact called oozie, but it uses standard build lifecycles.

How to add the plugin to your project

The sources of OMP are available at https://github.com/CeON/oozie-maven-plugin.

The binaries are available in the Maven's repository of ICM. To use it, you need to add the following sections in your pom.xml:

    <build>
        <plugins>
            <plugin>
                <groupId>pl.edu.icm.maven</groupid>
                <artifactId>oozie-maven-plugin</artifactid>
                <version>current_version_number</version>
                <extensions>true</extensions>
            </plugin>
        </plugins>
    </build>

and

    <pluginRepositories>
        <pluginRepository>
            <id>yadda</id>
            <name>YADDA project repository</name>
            <url>http://maven.icm.edu.pl/artifactory/repo</url>
        </pluginrepository>
    </pluginrepositories>

Minimal project

Minimal project that uses OMP needs to contain the following files:

  • pom.xml
    <groupId>my-project-groupId</groupid>
    <artifactId>my-project-artifactId</artifactid>
    <version>VERSION_NUMBER</version>
    <packaging>oozie</packaging>
    
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceencoding>
    </properties>
    
    <build>
        <plugins>
            <plugin>
                <groupId>pl.edu.icm.maven</groupid>
                <artifactId>oozie-maven-plugin</artifactid>
                <version>1.1</version>
                <extensions>true</extensions>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>pl.edu.icm.oozie</groupid>
            <artifactId>oozie-runner</artifactid>
            <version>1.2-SNAPSHOT</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <repositories>
        <repository>
            <id>yadda</id>
            <name>YADDA project repository</name>
            <url>http://maven.icm.edu.pl/artifactory/repo</url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>yadda</id>
            <name>YADDA project repository</name>
            <url>http://maven.icm.edu.pl/artifactory/repo</url>
        </pluginrepository>
    </pluginrepositories>
 </project>
  • src/main/oozie/workflow.xml
You can generate these files with https://github.com/CeON/oozie-maven-archetype:
 mvn archetype:generate -DarchetypeArtifactId=oozie-maven-archetype \
   -DarchetypeGroupId=pl.edu.icm.maven.archetypes -DarchetypeVersion=1.0-SNAPSHOT \
   -DinteractiveMode=false -DgroupId=my-project-groupId -DartifactId=my-project-artifactId \
   -Dversion=VERSION_NUMBER -DarchetypeRepository=http://maven.icm.edu.pl/artifactory/repo

Build

You can build the project by calling

 mvn install

This call creates the package that can be uploaded to a Maven's repository and used in other projects. That package does not contain any dependencies (subworkflows, libraries), that should also be stored in the Maven's repository. The file created in this procedure is named artifactId-version-oozie-wf.tar.gz.

If you want to build a package intended for run on an Oozie server, you need to call

 mvn install -DjobPackage

The file created (artifactId-version-oozie-job.tar.gz) contains everything that is necessary to run a given workflow.

Support for Apache Pig

Oozie Maven Plugin (OMP) supports scripts written in PigLatin.

Modification of standard JAR file

OMP allows to use Pig's scripts from dependent modules. Such a module (containing Java classes such as UDFs used by Pig's scripts) should be added to a your project as direct dependency. A proper resource management in pom.xml file is necessary to ensure that a given dependent module contains Pig's scripts. For example, the following inset in pom.xml should guarantee that requirement:

        <build>
                <resources>
                        <resource>
                                <directory>src/main/pig</directory>
                                <filtering>false</filtering>
                                <includes>
                                        <include>**/*.pig</include>
                                </includes>
                                <excludes>
                                        <exclude>**/AUXIL*.pig</excludes>
                                </excludes>
                                <targetPath>${project.build.directory}/classes/pig</targetpath>
                        </resource>
                </resources>
        </build>

Once the above inset is added to pom.xml, the instruction mvn install will add to a generated JAR a directory pig with files *.pig copied from src/main/pig.

Example 1: file src/main/pig/lorem/ipsum/dolor/sit.pig will appear in the JAR file as pig/lorem/ipsum/dolor/sit.pig.
Example 2: file src/main/pig/lorem/ipsum/dolor/AUXIL_sit.pig will not be added to the JAR file, because it was excluded.

Clone this wiki locally