Skip to content

Commit

Permalink
Add https sync support (fix #149)
Browse files Browse the repository at this point in the history
Add configuration tests for serverScheme

Document serverScheme field (#149)
  • Loading branch information
jonekdahl authored and Jeff Bornemann committed Oct 3, 2016
1 parent c0a0b01 commit 0f2be58
Show file tree
Hide file tree
Showing 11 changed files with 136 additions and 8 deletions.
2 changes: 1 addition & 1 deletion docs/GeneralLayout.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
There are two primary components to Grabbit: a client and a server that run in the two CQ instances that you want to copy to and from (respectively).

A recommended systems layout style is to have all content from a production publisher copied down to a staging "data warehouse" (DW) server to which all lower environments (beta, continuous integration, developer workstations, etc.) will connect. This way minimal load is placed on Production, and additional DW machines can be added to scale out if needed, each of which can grab from the "main" DW.
The client sends an HTTP GET request with a content path and "last grab time" to the server and receives a protobuf stream of all the content below it that has changed. The client's BasicAuth credentials are used to create the JCR Session, so the client can never see content they don't have explicit access to. There are a number of ways to tune how the client works, including specifying multiple focused paths, parallel or serial execution, JCR Session batch size (the number of nodes to cache before flushing to disk), etc.
The client sends an HTTP(S) GET request with a content path and "last grab time" to the server and receives a protobuf stream of all the content below it that has changed. The client's BasicAuth credentials are used to create the JCR Session, so the client can never see content they don't have explicit access to. There are a number of ways to tune how the client works, including specifying multiple focused paths, parallel or serial execution, JCR Session batch size (the number of nodes to cache before flushing to disk), etc.

1 change: 1 addition & 0 deletions docs/Running.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ The corresponding `YAML` configuration for the JSON above will look something li

===== Optional fields

* __serverScheme__: string. The protocol the client should use when connecting to the server. Supported options are `http` and `https`. Defaults to `http`.
* __deltaContent__: boolean, ```true``` syncs only 'delta' or changed content. Changed content is determined by comparing one of a number of date properties including jcr:lastModified, cq:lastModified, or jcr:created Date with the last successful Grabbit sync date. Nodes without any of previously mentioned date properties will always be synced even with deltaContent on, and if a node's data is changed without updating a date property (ie, from CRX/DE), the change will not be detected. Most common throughput bottlenecks are usually handled by delta sync for cases such as large DAM trees; but if your case warrants a more fine tuned use of delta sync, you may consider adding mix:lastModified to nodes not usually considered for exclusion, such as extremely large unstructured trees. The deltaContent flag __only__ applies to changes made on the server - changes to the client environment will not be detected (and won't be overwritten if changes were made on the client's path but not on the server).
* __batchSize__: integer. Used to specify the number of nodes in one batch, Defaults to 100.
* __deleteBeforeWrite__: boolean. Before the client retrieves content, should content under each path be cleared? When used in combination with excludePaths, nodes indicated by excludePaths will not be deleted
Expand Down
1 change: 1 addition & 0 deletions sample-config.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"serverUsername" : "<username>",
"serverPassword" : "<password>",
"serverScheme" : "http",
"serverHost" : "some.other.server",
"serverPort" : "4502",
"batchSize" : 150,
Expand Down
1 change: 1 addition & 0 deletions sample_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# Information for connecting to the source content
serverUsername : '<username>'
serverPassword : '<password>'
serverScheme : http
serverHost : some.other.server
serverPort : 4502

Expand Down
27 changes: 25 additions & 2 deletions src/main/groovy/com/twcable/grabbit/GrabbitConfiguration.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ class GrabbitConfiguration {

final String serverUsername
final String serverPassword
final String serverScheme
final String serverHost
final String serverPort
final boolean deltaContent
Expand All @@ -45,12 +46,13 @@ class GrabbitConfiguration {
final static int DEFAULT_BATCH_SIZE = 100


private GrabbitConfiguration(@Nonnull String user, @Nonnull String pass, @Nonnull String host,
@Nonnull String port, boolean deltaContent,
private GrabbitConfiguration(@Nonnull String user, @Nonnull String pass, @Nonnull String scheme,
@Nonnull String host, @Nonnull String port, boolean deltaContent,
@Nonnull Collection<PathConfiguration> pathConfigs) {
// all input is being verified by the "create" factory method
this.serverUsername = user
this.serverPassword = pass
this.serverScheme = scheme
this.serverHost = host
this.serverPort = port
this.deltaContent = deltaContent
Expand Down Expand Up @@ -84,6 +86,7 @@ class GrabbitConfiguration {

def serverUsername = nonEmpty(configMap, 'serverUsername', errorBuilder)
def serverPassword = nonEmpty(configMap, 'serverPassword', errorBuilder)
def serverScheme = schemeVal(configMap, 'serverScheme', errorBuilder)
def serverHost = nonEmpty(configMap, 'serverHost', errorBuilder)
def serverPort = nonEmpty(configMap, 'serverPort', errorBuilder)
def deltaContent = boolVal(configMap, 'deltaContent')
Expand All @@ -110,6 +113,7 @@ class GrabbitConfiguration {
return new GrabbitConfiguration(
serverUsername,
serverPassword,
serverScheme,
serverHost,
serverPort,
deltaContent,
Expand Down Expand Up @@ -151,6 +155,25 @@ class GrabbitConfiguration {
}


private static String schemeVal(Map<String, String> configMap, String key,
ConfigurationException.Builder errorBuilder) {
if (configMap.containsKey(key)) {
def val = configMap.get(key).toLowerCase()
if (val == "http" || val == "https") {
return val
}
else {
errorBuilder.add(key, 'must be either http or https')
return null
}
}
else {
log.debug "Input doesn't contain ${key} for a URL scheme value. Will default to http"
return "http"
}
}


private static boolean boolVal(Map<String, String> configMap, String key) {
if (!configMap.containsKey(key)) {
log.debug "Input doesn't contain ${key} for a boolean value. Will default to false"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ class ClientBatchJob {
public static final String PATH = "path"
public static final String EXCLUDE_PATHS = "excludePaths"
public static final String WORKFLOW_CONFIGS = "workflowConfigIds"
public static final String SCHEME = "scheme"
public static final String HOST = "host"
public static final String PORT = "port"
public static final String SERVER_USERNAME = "serverUsername"
Expand Down Expand Up @@ -84,6 +85,7 @@ class ClientBatchJob {
@CompileStatic
static class ServerBuilder {
final ConfigurableApplicationContext configAppContext
String scheme
String host
String port

Expand All @@ -93,7 +95,8 @@ class ClientBatchJob {
}


CredentialsBuilder andServer(String host, String port) {
CredentialsBuilder andServer(String scheme, String host, String port) {
this.scheme = scheme
this.host = host
this.port = port
return new CredentialsBuilder(this)
Expand Down Expand Up @@ -188,6 +191,7 @@ class ClientBatchJob {
final jobParameters = [
"timestamp" : System.currentTimeMillis() as String,
(PATH) : pathConfiguration.path,
(SCHEME) : serverBuilder.scheme,
(HOST) : serverBuilder.host,
(PORT) : serverBuilder.port,
(CLIENT_USERNAME) : credentialsBuilder.clientUsername,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,15 @@ class CreateHttpConnectionTasklet implements Tasklet {
final String path = jobParameters.get(ClientBatchJob.PATH)
final String excludePathParam = jobParameters.get(ClientBatchJob.EXCLUDE_PATHS)
final excludePaths = (excludePathParam != null && !excludePathParam.isEmpty() ? excludePathParam.split(/\*/) : Collections.EMPTY_LIST) as Collection<String>
final String scheme = jobParameters.get(ClientBatchJob.SCHEME)
final String host = jobParameters.get(ClientBatchJob.HOST)
final String port = jobParameters.get(ClientBatchJob.PORT)
final String contentAfterDate = jobParameters.get(ClientBatchJob.CONTENT_AFTER_DATE) ?: ""

final String encodedContentAfterDate = URLEncoder.encode(contentAfterDate, 'utf-8')
final String encodedPath = URLEncoder.encode(path, 'utf-8')

URIBuilder uriBuilder = new URIBuilder(scheme: "http", host: host, port: port as Integer, path: "/grabbit/content")
URIBuilder uriBuilder = new URIBuilder(scheme: scheme, host: host, port: port as Integer, path: "/grabbit/content")
uriBuilder.addParameter("path", encodedPath)
uriBuilder.addParameter("after", encodedContentAfterDate)
for(String excludePath : excludePaths) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ class DefaultClientService implements ClientService {
for (PathConfiguration pathConfig : configuration.pathConfigurations) {
try {
final clientBatchJob = new ClientBatchJob.ServerBuilder(configurableApplicationContext)
.andServer(configuration.serverHost, configuration.serverPort)
.andServer(configuration.serverScheme, configuration.serverHost, configuration.serverPort)
.andCredentials(clientUsername, configuration.serverUsername, configuration.serverPassword)
.andClientJobExecutions(fetchAllClientJobExecutions())
.withTransactionID(configuration.transactionID)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,89 @@ class GrabbitConfigurationSpec extends Specification {

}

def "Should return http server scheme by default"() {
given:
def input = """
{
"serverUsername" : "admin",
"serverPassword" : "admin",
"serverHost" : "localhost",
"serverPort" : "4503",
"deltaContent" : false,
"pathConfigurations" : [
{
"path" : "/content/a/b"
}
]
}
"""
def expectedOutput = "http"

when:
def actualOutput = GrabbitConfiguration.create(input)

then:
actualOutput instanceof GrabbitConfiguration
actualOutput.serverScheme == expectedOutput

}

def "Should return https scheme if configured"() {
given:
def input = """
{
"serverUsername" : "admin",
"serverPassword" : "admin",
"serverScheme" : "https",
"serverHost" : "localhost",
"serverPort" : "4503",
"deltaContent" : false,
"pathConfigurations" : [
{
"path" : "/content/a/b"
}
]
}
"""
def expectedOutput = "https"

when:
def actualOutput = GrabbitConfiguration.create(input)

then:
actualOutput instanceof GrabbitConfiguration
actualOutput.serverScheme == expectedOutput

}

def "Should fail to process invalid scheme"() {
given:
def input = """
{
"serverUsername" : "admin",
"serverPassword" : "admin",
"serverScheme" : "invalid-scheme",
"serverHost" : "localhost",
"serverPort" : "4503",
"deltaContent" : false,
"pathConfigurations" : [
{
"path" : "/content/a/b"
}
]
}
"""
def errors = [serverScheme: "must be either http or https"]

when:
GrabbitConfiguration.create(input)

then:
final GrabbitConfiguration.ConfigurationException exception = thrown()
exception.errors == errors

}

@Unroll
def "Should create configuration from json input"() {
when:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class ClientBatchJobSpec extends Specification {
final appContext = Mock(ConfigurableApplicationContext)
appContext.getBean(_ as String, JobOperator) >> Mock(JobOperator)
final job = new ClientBatchJob.ServerBuilder(appContext)
.andServer("host", "port")
.andServer("scheme", "host", "port")
.andCredentials("clientUser", "serverUser", "serverPass")
.andClientJobExecutions(jobExecutions)
.andConfiguration(new GrabbitConfiguration.PathConfiguration(path, [], [], deleteBeforeWrite, pathDeltaContent, 100))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ class CreateHttpConnectionTaskletSpec extends Specification {
(ClientBatchJob.SERVER_PASSWORD) : "password",
(ClientBatchJob.PATH) : "/content/test",
(ClientBatchJob.EXCLUDE_PATHS) : "/exclude*/exclude/metoo",
(ClientBatchJob.SCHEME) : "http",
(ClientBatchJob.HOST) : "localhost",
(ClientBatchJob.PORT) : "4503",
(ClientBatchJob.CONTENT_AFTER_DATE) : "2008-09-22T13:57:31.2311892-04:00"
Expand Down Expand Up @@ -146,7 +147,7 @@ class CreateHttpConnectionTaskletSpec extends Specification {
credentials.password == "password"
}

def "buildURIForRequest() builds a Grabbit URI correctly"() {
def "buildURIForRequest() builds a Grabbit http URI correctly"() {
given:
final jobParameters = getMockJobParameters()
final CreateHttpConnectionTasklet tasklet = new CreateHttpConnectionTasklet()
Expand All @@ -157,4 +158,17 @@ class CreateHttpConnectionTaskletSpec extends Specification {
then:
uri.toString() == "http://localhost:4503/grabbit/content?path=%252Fcontent%252Ftest&after=2008-09-22T13%253A57%253A31.2311892-04%253A00&excludePath=%252Fexclude&excludePath=%252Fexclude%252Fmetoo"
}

def "buildURIForRequest() builds a Grabbit https URI correctly"() {
given:
final jobParameters = getMockJobParameters()
jobParameters.put(ClientBatchJob.SCHEME, "https")
final CreateHttpConnectionTasklet tasklet = new CreateHttpConnectionTasklet()

when:
final URI uri = tasklet.buildURIForRequest(jobParameters)

then:
uri.toString() == "https://localhost:4503/grabbit/content?path=%252Fcontent%252Ftest&after=2008-09-22T13%253A57%253A31.2311892-04%253A00&excludePath=%252Fexclude&excludePath=%252Fexclude%252Fmetoo"
}
}

0 comments on commit 0f2be58

Please sign in to comment.