Skip to content

Commit

Permalink
Revamped README for DAI MOJO Deployment to CEM (EFM, NiFi Reg, MiNiFi…
Browse files Browse the repository at this point in the history
… CPP

removed most of the commands from the README, created a setup.sh script using most of the commands from the previous README.
Also added new commands that handle installing CEM components (Edge Flow Manager, NiFi Registry, etc) that way users can
build data flows for MiNiFi C++ in a nice UI provided by EFM for drag and drop processors for data flow building. Additionally,
users can still use the custom MiNiFi Python Processors related to h2o.ai for deploying the Driverless AI MOJO Scoring Pipeline
in a MiNiFi C++ Data Flow. Added many images to make it easier to use EFM to build MiNiFi C++ Data Flows. Added a troubleshooting
section in case one doesn't see the custom Python Processors for h2o.ai and how they can solve it. Added a troubleshooting section
on checking if EFM, NiFi Registry or MiNiFi C++ Agent is running. I made sure to incorporate Nick Png's and Edge Orendain's
feedback.
  • Loading branch information
james94 committed Oct 3, 2020
1 parent 5ee1fe3 commit c7817c9
Show file tree
Hide file tree
Showing 33 changed files with 1,440 additions and 977 deletions.
371 changes: 168 additions & 203 deletions mojo-py-minificpp/README.md

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions mojo-py-minificpp/cem/conf/efm.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Web Server Properties
# address: the hostname or ip address of the interface to bind to; to bind to all, use 0.0.0.0
efm.server.address=EFM_SERVER_IP
efm.server.port=10080
efm.server.servlet.contextPath=/efm

# Cluster Properties
# address: the address (host:port) to bind to for the embedded Hazelcast instance that coordinates cluster state
# memberAddress: the address (host:port) to advertise to other cluster members, if different from the bindAddress
# members: comma-separated list all cluster nodes; must be identical on all nodes in the cluster, including order
# format of node address is hostname or IP or hostname:port or IP:port
# port is optional (5701 the default port)
efm.cluster.enabled=false

# Cluster TLS/SSL Tunnel Properties
# enabled: enable secure communication within the cluster via a stunnel proxy
# command: the command or path to executable for stunnel, which must be installed, e.g., /usr/bin/stunnel
# logLevel: the level of stunnel debug output: emerg|alert|crit|err|warning|notice|info|debug
# logFile: (optional) if specified, the file to use for stunnel logs. if not specified, output is to EFM App Log
# caFile: The file containing Certificate Authority certificates. Must be PEM format.
# cert: The file containing this cluster node's public certificate. Must be PEM format.
# key: The file containing this cluster node's private key. Must be PEM format. Can be encrypted or unencrypted
# keyPassword: (optional) If the key file is encrypted with a password, the password to decrypt the key file.
# proxyServerPort: the port that will receive the TLS traffic and redirect to Hazelcast (default 10090)
# proxyClientPortStart: starting with the given port, the ports used to proxy communication with other cluster members
# over the secure TLS tunnel (default 10091). The number of ports used is one fewer than the number of cluster members.
# For additional Stunnel configuration options, see https://www.stunnel.org/static/stunnel.html
# global options, service level options, or client-/server-specific server options can be specified as
# key-value pairs with the appropriate prefix efm.cluster.stunnel.[global|service|clientService|serverService].*
efm.cluster.stunnel.enabled=false
efm.cluster.stunnel.command=stunnel
efm.cluster.stunnel.logLevel=warning
efm.cluster.stunnel.caFile=
efm.cluster.stunnel.cert=
efm.cluster.stunnel.key=
efm.cluster.stunnel.keyPassword=
efm.cluster.stunnel.proxyServerPort=10090
efm.cluster.stunnel.proxyClientPortStart=10091

# Web Server TLS Properties
efm.server.ssl.enabled=false
efm.server.ssl.keyStore=./conf/keystore.jks
efm.server.ssl.keyStoreType=jks
efm.server.ssl.keyStorePassword=
efm.server.ssl.keyPassword=
efm.server.ssl.trustStore=./conf/truststore.jks
efm.server.ssl.trustStoreType=jks
efm.server.ssl.trustStorePassword=
efm.server.ssl.clientAuth=WANT

# User Authentication Properties
# authentication via TLS mutual auth with client certificates
efm.security.user.certificate.enabled=false
# authentication via Knox SSO token passed in a cookie header
efm.security.user.knox.enabled=false
efm.security.user.knox.url=
efm.security.user.knox.publicKey=
efm.security.user.knox.cookieName=
efm.security.user.knox.audiences=
# authentication via generic reverse proxy with user passed in a header
efm.security.user.proxy.enabled=false
efm.security.user.proxy.headerName=x-webauth-user

# NiFi Registry Properties
# url: the base URL of a NiFi Registry instance
# bucket: Only set one of bucketId OR bucketName
# flowRefreshInterval: specify value and units (d=days, h=hours, m=minutes, s=seconds, ms=milliseconds)
efm.nifi.registry.enabled=true
efm.nifi.registry.url=http://EFM_SERVER_IP:18080
efm.nifi.registry.bucketId=
efm.nifi.registry.bucketName=DaiMojo
efm.nifi.registry.flowRefreshInterval=60s

# Database Properties
efm.db.url=jdbc:postgresql://EFM_SERVER_IP:5432/efm
efm.db.driverClass=org.postgresql.Driver
efm.db.username=efm
efm.db.password=clouderah2oai
efm.db.maxConnections=50
efm.db.sqlDebug=false

# Heartbeat Retention Properties
# For maxAgeToKeep, specify value and units (d=days, h=hours, m=minutes, s=seconds, ms=milliseconds)
# Set to 0 to disable persisting events entirely
efm.heartbeat.maxAgeToKeep=0
efm.heartbeat.persistContent=false

# Event Retention Properties
# Specify value and units (d=days, h=hours, m=minutes, s=seconds, ms=milliseconds)
# Set to 0 to disable persisting events entirely
# Set no value to disable auto-cleanup (manual deletion only)
efm.event.cleanupInterval=30s
efm.event.maxAgeToKeep.debug=0m
efm.event.maxAgeToKeep.info=1h
efm.event.maxAgeToKeep.warn=1d
efm.event.maxAgeToKeep.error=7d

# Agent Class Flow Monitor Properties
# Specify value and units (d=days, h=hours, m=minutes, s=seconds, ms=milliseconds)
efm.agent-class-monitor.interval=15s

# Agent Monitoring Properties
# Specify value and units (d=days, h=hours, m=minutes, s=seconds, ms=milliseconds)
# Set to zero to disable threshold monitoring entirely
efm.monitor.maxHeartbeatInterval=5m

# Operation Properties
efm.operation.monitoring.enabled=true
efm.operation.monitoring.inDeployedStateTimeout=5m
efm.operation.monitoring.inDeployedStateCheckFrequency=1m
efm.operation.monitoring.rollingBatchOperationsSize=10
efm.operation.monitoring.rollingBatchOperationsFrequency=5s

# Metrics Properties
management.metrics.export.simple.enabled=false
management.metrics.export.prometheus.enabled=true
management.metrics.enable.efm.heartbeat=true
management.metrics.enable.efm.agentStatus=true
management.metrics.enable.efm.flowStatus=true
management.metrics.enable.efm.repo=true
management.metrics.efm.enable-tag.efmHost=true
management.metrics.efm.enable-tag.agentClass=true
management.metrics.efm.enable-tag.agentManifestId=true
management.metrics.efm.enable-tag.agentId=true
management.metrics.efm.enable-tag.deviceId=false
management.metrics.efm.enable-tag.flowId=true
management.metrics.efm.enable-tag.connectionId=true
management.metrics.efm.max-tags.agentClass=100
management.metrics.efm.max-tags.agentManifestId=10
management.metrics.efm.max-tags.agentId=100
management.metrics.efm.max-tags.deviceId=100
management.metrics.efm.max-tags.flowId=100
management.metrics.efm.max-tags.connectionId=1000

# EL Specification Properties
efm.el.specifications.dir=./specs

# Logging Properties
# logging.level.{logger-name}={DEBUG|INFO|WARN|ERROR}
logging.level.com.cloudera.cem.efm=INFO
logging.level.com.hazelcast=WARN
logging.level.com.hazelcast.internal.cluster.ClusterService=INFO
logging.level.com.hazelcast.internal.nio.tcp.TcpIpConnection=ERROR
logging.level.com.hazelcast.internal.nio.tcp.TcpIpConnector=ERROR

# Encryption Password used for encrypting sensitive data saved to the EFM server
efm.encryption.password=clouderah2oai

# This property did not exist, so we added it anywhere in this file. Default is 'First In'
efm.manifest.strategy=Last In
Original file line number Diff line number Diff line change
@@ -1,62 +1,47 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Core Properties #
nifi.version=0.7.0
nifi.flow.configuration.file=./conf/config.yml
nifi.administrative.yield.duration=30 sec
# If a component has no work to do (is "bored"), how long should we wait before checking again for work?
nifi.bored.yield.duration=10 millis
nifi.bored.yield.duration=100 millis

# Provenance Repository #
nifi.provenance.repository.directory.default=${MINIFI_HOME}/provenance_repository
nifi.provenance.repository.max.storage.time=1 MIN
nifi.provenance.repository.max.storage.size=1 MB
nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
nifi.database.content.repository.directory.default=${MINIFI_HOME}/content_repository
nifi.provenance.repository.class.name=NoOpRepository

#nifi.remote.input.secure=true
#nifi.security.need.ClientAuth=
#nifi.security.client.certificate=
#nifi.security.client.private.key=
#nifi.security.client.pass.phrase=
#nifi.security.client.ca.certificate=

#nifi.rest.api.user.name=admin
#nifi.rest.api.password=password
# Disk space watchdog #
## Stops MiNiFi FlowController activity (excluding C2), when the available disk space on either of the repository
## volumes go below stop.threshold.bytes, checked every interval.ms, then restarts when the available space on all
## repository volumes reach at least restart.threshold.bytes.
minifi.disk.space.watchdog.enable=true
minifi.disk.space.watchdog.interval.ms=15000
minifi.disk.space.watchdog.stop.threshold.bytes=104857600
minifi.disk.space.watchdog.restart.threshold.bytes=157286400

## Enabling C2 Uncomment each of the following options
## define those with missing options
#nifi.c2.enable=true
nifi.c2.enable=true
## define protocol parameters
## The default is CoAP, if that extension is built.
## Alternatively, you may use RESTSender if http-curl is built
#nifi.c2.agent.protocol.class=CoapProtocol
#nifi.c2.agent.coap.host=
#nifi.c2.agent.coap.port=
nifi.c2.agent.protocol.class=RESTSender
## base URL of the c2 server,
## very likely the same base url of rest urls
#nifi.c2.flow.base.url=
#nifi.c2.rest.url=
#nifi.c2.rest.url.ack=
nifi.c2.flow.base.url=http://EFM_SERVER_IP:10080/efm/api/c2-protocol/
nifi.c2.rest.url=http://EFM_SERVER_IP:10080/efm/api/c2-protocol/heartbeat
nifi.c2.rest.url.ack=http://EFM_SERVER_IP:10080/efm/api/c2-protocol/acknowledge
nifi.c2.root.classes=DeviceInfoNode,AgentInformation,FlowInformation
## Minimize heartbeat payload size by excluding agent manifest from the heartbeat
nifi.c2.full.heartbeat=false
## heartbeat 4 times a second
#nifi.c2.agent.heartbeat.period=250
nifi.c2.agent.heartbeat.period=250
## define parameters about your agent
#nifi.c2.agent.class=
#nifi.c2.agent.identifier=
nifi.c2.agent.class=MiNiFiCPP_DAI_MOJO_PY_1
nifi.c2.agent.identifier=MiNiFiCPP_DAI_MOJO_PY_001
## define metrics reported
nifi.c2.root.class.definitions=metrics
nifi.c2.root.class.definitions.metrics.name=metrics
Expand All @@ -70,9 +55,8 @@ nifi.c2.root.class.definitions.metrics.metrics.processorMetrics.classes=GetFileM

## enable the controller socket provider on port 9998
## off by default. C2 must be enabled to support these
#controller.socket.host=localhost
#controller.socket.port=9998

controller.socket.host=localhost
controller.socket.port=9998

#JNI properties
nifi.framework.dir=${MINIFI_HOME}/minifi-jni/lib
Expand Down

Large diffs are not rendered by default.

34 changes: 34 additions & 0 deletions mojo-py-minificpp/cem/conf/minifi.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Core Properties #
nifi.version=0.7.0
nifi.flow.configuration.file=./conf/config.yml
nifi.administrative.yield.duration=30 sec
# If a component has no work to do (is "bored"), how long should we wait before checking again for work?
nifi.bored.yield.duration=10 millis

# Provenance Repository #
nifi.provenance.repository.directory.default=${MINIFI_HOME}/provenance_repository
nifi.provenance.repository.max.storage.time=1 MIN
nifi.provenance.repository.max.storage.size=1 MB
nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
nifi.database.content.repository.directory.default=${MINIFI_HOME}/content_repository

nifi.c2.root.classes=DeviceInfoNode,AgentInformation,FlowInformation
## define metrics reported
nifi.c2.root.class.definitions=metrics
nifi.c2.root.class.definitions.metrics.name=metrics
nifi.c2.root.class.definitions.metrics.metrics=typedmetrics
nifi.c2.root.class.definitions.metrics.metrics.typedmetrics.name=RuntimeMetrics
nifi.c2.root.class.definitions.metrics.metrics.queuemetrics.name=QueueMetrics
nifi.c2.root.class.definitions.metrics.metrics.queuemetrics.classes=QueueMetrics
nifi.c2.root.class.definitions.metrics.metrics.typedmetrics.classes=ProcessMetrics,SystemInformation
nifi.c2.root.class.definitions.metrics.metrics.processorMetrics.name=ProcessorMetric
nifi.c2.root.class.definitions.metrics.metrics.processorMetrics.classes=GetFileMetrics

#JNI properties
nifi.framework.dir=${MINIFI_HOME}/minifi-jni/lib
nifi.nar.directory=${MINIFI_HOME}/minifi-jni/nars
nifi.nar.deploy.directory=${MINIFI_HOME}/minifi-jni/nardeploy
nifi.nar.docs.directory=${MINIFI_HOME}/minifi-jni/nardocs
# must be comma separated
nifi.jvm.options=-Xmx1G
nifi.python.processor.dir=${MINIFI_HOME}/minifi-python/,${MINIFI_HOME}/minifi-python/h2o/
99 changes: 99 additions & 0 deletions mojo-py-minificpp/cem/conf/pg_hba.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# PostgreSQL Client Authentication Configuration File
# ===================================================
#
# Refer to the "Client Authentication" section in the PostgreSQL
# documentation for a complete description of this file. A short
# synopsis follows.
#
# This file controls: which hosts are allowed to connect, how clients
# are authenticated, which PostgreSQL user names they can use, which
# databases they can access. Records take one of these forms:
#
# local DATABASE USER METHOD [OPTIONS]
# host DATABASE USER ADDRESS METHOD [OPTIONS]
# hostssl DATABASE USER ADDRESS METHOD [OPTIONS]
# hostnossl DATABASE USER ADDRESS METHOD [OPTIONS]
#
# (The uppercase items must be replaced by actual values.)
#
# The first field is the connection type: "local" is a Unix-domain
# socket, "host" is either a plain or SSL-encrypted TCP/IP socket,
# "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a
# plain TCP/IP socket.
#
# DATABASE can be "all", "sameuser", "samerole", "replication", a
# database name, or a comma-separated list thereof. The "all"
# keyword does not match "replication". Access to replication
# must be enabled in a separate record (see example below).
#
# USER can be "all", a user name, a group name prefixed with "+", or a
# comma-separated list thereof. In both the DATABASE and USER fields
# you can also write a file name prefixed with "@" to include names
# from a separate file.
#
# ADDRESS specifies the set of hosts the record matches. It can be a
# host name, or it is made up of an IP address and a CIDR mask that is
# an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that
# specifies the number of significant bits in the mask. A host name
# that starts with a dot (.) matches a suffix of the actual host name.
# Alternatively, you can write an IP address and netmask in separate
# columns to specify the set of hosts. Instead of a CIDR-address, you
# can write "samehost" to match any of the server's own IP addresses,
# or "samenet" to match any address in any subnet that the server is
# directly connected to.
#
# METHOD can be "trust", "reject", "md5", "password", "gss", "sspi",
# "ident", "peer", "pam", "ldap", "radius" or "cert". Note that
# "password" sends passwords in clear text; "md5" is preferred since
# it sends encrypted passwords.
#
# OPTIONS are a set of options for the authentication in the format
# NAME=VALUE. The available options depend on the different
# authentication methods -- refer to the "Client Authentication"
# section in the documentation for a list of which options are
# available for which authentication methods.
#
# Database and user names containing spaces, commas, quotes and other
# special characters must be quoted. Quoting one of the keywords
# "all", "sameuser", "samerole" or "replication" makes the name lose
# its special character, and just match a database or username with
# that name.
#
# This file is read on server startup and when the postmaster receives
# a SIGHUP signal. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect. You can
# use "pg_ctl reload" to do that.

# Put your actual configuration here
# ----------------------------------
#
# If you want to allow non-local connections, you need to add more
# "host" records. In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.




# DO NOT DISABLE!
# If you change this first entry you will need to make sure that the
# database superuser can access the database using some other method.
# Noninteractive access to all databases is required during automatic
# maintenance (custom daily cronjobs, replication, and similar tasks).
#
# Database administrative login by Unix domain socket
local all postgres peer

# TYPE DATABASE USER ADDRESS METHOD

# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 0.0.0.0/0 trust
# IPv6 local connections:
host all all ::/0 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local replication postgres peer
#host replication postgres 127.0.0.1/32 md5
#host replication postgres ::1/128 md5
Loading

0 comments on commit c7817c9

Please sign in to comment.