diff --git a/packages/HilltopCrawler/README.md b/packages/HilltopCrawler/README.md new file mode 100644 index 00000000..148d718a --- /dev/null +++ b/packages/HilltopCrawler/README.md @@ -0,0 +1,108 @@ +# Hilltop Crawler + +## Description + +This app extracts observation data from Hilltop servers and publishes them as to a Kafka `observations` topic for other +applications to consume. Which Hilltop servers and which types to extract are configured in a database table which can +be added to while the system is running. + +The app intends to keep all versions of data pulled from the Hilltop. Allowing downstream systems to keep track of when +data has been updated in Hilltop. Noting that because Hilltop does not expose when changes were made we can only capture +when the crawler first saw data change. This is also intended to allow us to update some parts of the system without +having to re-read all of the data from hilltop. + +## Process + +Overall, this is built around the Hilltop API which exposes three levels of GET API request + +* SITES_LIST — List of Sites, includes names and locations +* MEASUREMENTS_LIST — Per site, list of measurements available for that site, including details about the first and last + observation date +* MEASUREMENT_DATA — Per site, per measurement, the timeseries observed data + +The app crawls these by reading each level to decide what to read from the next level. i.e., The SITES_LIST tells the +app which sites call MEASUREMENTS_LIST for which in turn tells which measurements to call MEASUREMENT_DATA for. Requests +for MEASUREMENT_DATA are also split into monthly chunks to avoid issues with too much data being returned in one +request. + +The app keeps a list of API requests that it will keep up to date by calling that API on a schedule. This list is stored +in the `hilltop_fetch_tasks` table and works like a task queue. Each time a request is made, the result is used to try +and determine when next to schedule the task. The simple example is for MEASUREMENT_DATA if the last observation was +recent, then a refresh should be attempted soon, if it was a long way in the past, it should be refreshed less often. + +The next schedule time has been implemented as a random time in an interval, to provide some jitter when between task +requeue times to hopefully spread them out, and the load on the servers we are calling. + +The task queue also keeps meta-data about the previous history of tasks that are not used by the app this is to allow +engineers to monitor how the system is working. + +This process is built around three main components: + +* Every hour, monitor the configuration table + * Read the configuration table + * From the configuration, add any new tasks to the task queue + +* Continuously monitor the task queue + * Read the next task to work on + * Fetch data from hilltop for that task + * If a valid result which is not the same as the previous version + * Queue any new tasks from the result + * Send the result to Kafka `hilltop_raw` topic + * Requeue the task for sometime in the future, based on the type of request + +* Kafka streams component + * Monitor the stream + * For each message map it to either a `site_details` or and `observations` message + +Currently, the Manager component listens to the `observations` topic and stores the data from that in a DB table. + +## Task Queue + +The task queue is currently a custom implementation on top of a single table `hilltop_fetch_tasks`. + +This is a slightly specialized queue where + +* Each task has a scheduled run time backed by the `next_fecth_at` column +* Each time a task runs it will be re-queued for some time in the future +* The same task can be added multiple times and rely on Postgres “ON CONFLICT DO NOTHING” to avoid the task being added + multiple times + +The implementation relies on the postgres `SKIP LOCKED` feature to allow multiple worker threads to pull from the queue +at the same time without getting the same task. + +See this [reference](https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/) for +discussion about the `SKIP LOCKED` query. + +The queue implementation is fairly simple for this specific use. If it becomes more of a generic work queue then a +standard implemention such as Quartz might be worthwhile moving to. + +## Example Configuration + +These are a couple of insert statements that are not stored in migration scripts, so developer machines don't index +them by default. + +GW Water Use + +```sql +INSERT INTO hilltop_sources (council_id, hts_url, configuration) +VALUES (9, 'https://hilltop.gw.govt.nz/WaterUse.hts', + '{ "measurementNames": ["Water Meter Volume", "Water Meter Reading"] }'); +``` + +GW Rainfall + +```sql +INSERT INTO hilltop_sources (council_id, hts_url, configuration) +VALUES (9, 'https://hilltop.gw.govt.nz/merged.hts', '{ "measurementNames": ["Rainfall"] }'); +``` + +### TODO / Improvement + +* The `previous_history` on tasks will grow unbounded. This needs to be capped +* The algorithm for determining the next time to schedule an API refresh could be improved, something could be built + from the previous history based on how often data is unchanged. +* To avoid hammering Hilltop there is rate limiting using a "token bucket" library, currently this uses one bucket for + all requests. It could be split to use one bucket per server +* Because of the chunking by month, every time new data arrives we end up storing the whole month up to that new data + again in the `hilltop_raw` topic. This seems wasteful and there are options for cleaning up when a record just + supersedes the previous. But cleaning up would mean losing some knowledge about when a observation was first seen. diff --git a/packages/HilltopCrawler/build.gradle.kts b/packages/HilltopCrawler/build.gradle.kts index d33be2f6..e548e714 100644 --- a/packages/HilltopCrawler/build.gradle.kts +++ b/packages/HilltopCrawler/build.gradle.kts @@ -41,10 +41,14 @@ dependencies { implementation("org.flywaydb:flyway-core:10.1.0") implementation("org.flywaydb:flyway-database-postgresql:10.1.0") implementation("io.github.microutils:kotlin-logging-jvm:3.0.5") + implementation("org.apache.kafka:kafka-streams") + implementation("com.bucket4j:bucket4j-core:8.3.0") testImplementation("org.springframework.boot:spring-boot-starter-test") testImplementation("org.springframework.kafka:spring-kafka-test") - testImplementation("io.kotest:kotest-assertions-core:5.4.2") + testImplementation("io.kotest:kotest-assertions-core:5.7.2") + testImplementation("io.kotest:kotest-assertions-json:5.7.2") + testImplementation("org.mockito.kotlin:mockito-kotlin:5.1.0") } // Don't repackage build in a "-plain" Jar @@ -70,21 +74,6 @@ configure { kotlinGradle { ktfmt() } } -val dbConfig = - mapOf( - "url" to - "jdbc:postgresql://${System.getenv("CONFIG_DATABASE_HOST") ?: "localhost"}:5432/eop_test", - "user" to "postgres", - "password" to "password") - -flyway { - url = dbConfig["url"] - user = dbConfig["user"] - password = dbConfig["password"] - schemas = arrayOf("hilltop_crawler") - locations = arrayOf("filesystem:./src/main/resources/db/migration") -} - testlogger { showStandardStreams = true showPassedStandardStreams = false diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Application.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Application.kt index 4eeafdd8..fe89cc92 100644 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Application.kt +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Application.kt @@ -1,33 +1,16 @@ package nz.govt.eop.hilltop_crawler -import java.security.MessageDigest import org.springframework.boot.autoconfigure.SpringBootApplication -import org.springframework.boot.autoconfigure.jackson.Jackson2ObjectMapperBuilderCustomizer import org.springframework.boot.context.properties.EnableConfigurationProperties import org.springframework.boot.runApplication -import org.springframework.context.annotation.Bean -import org.springframework.http.converter.json.Jackson2ObjectMapperBuilder -import org.springframework.kafka.annotation.EnableKafka import org.springframework.scheduling.annotation.EnableScheduling @SpringBootApplication @EnableScheduling -@EnableKafka @EnableConfigurationProperties(ApplicationConfiguration::class) -class Application { - - @Bean - fun jsonCustomizer(): Jackson2ObjectMapperBuilderCustomizer { - return Jackson2ObjectMapperBuilderCustomizer { _: Jackson2ObjectMapperBuilder -> } - } -} +class Application {} fun main(args: Array) { System.setProperty("com.sun.security.enableAIAcaIssuers", "true") runApplication(*args) } - -fun hashMessage(message: String) = - MessageDigest.getInstance("SHA-256").digest(message.toByteArray()).joinToString("") { - "%02x".format(it) - } diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/CheckForNewSourcesTask.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/CheckForNewSourcesTask.kt new file mode 100644 index 00000000..619c1f08 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/CheckForNewSourcesTask.kt @@ -0,0 +1,32 @@ +package nz.govt.eop.hilltop_crawler + +import java.util.concurrent.TimeUnit +import nz.govt.eop.hilltop_crawler.api.requests.buildSiteListUrl +import nz.govt.eop.hilltop_crawler.db.DB +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessageType.SITES_LIST +import org.springframework.context.annotation.Profile +import org.springframework.scheduling.annotation.Scheduled +import org.springframework.stereotype.Component + +/** + * This task is responsible for triggering the first fetch task for each source stored in the DB. + * + * It makes sure any new rows added to the DB will start to be pulled from within an hour. + * + * Each time it runs, it will create the initial fetch task for each source found in the DB. This + * relies on the task queue (via "ON CONFLICT DO NOTHING") making sure that duplicate tasks will not + * be created. + */ +@Profile("!test") +@Component +class CheckForNewSourcesTask(val db: DB) { + + @Scheduled(fixedDelay = 1, timeUnit = TimeUnit.HOURS) + fun triggerSourcesTasks() { + val sources = db.listSources() + + sources.forEach { + db.createFetchTask(DB.HilltopFetchTaskCreate(it.id, SITES_LIST, buildSiteListUrl(it.htsUrl))) + } + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Constants.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Constants.kt index 85c8248c..8f0c6a8a 100644 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Constants.kt +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/Constants.kt @@ -1,3 +1,4 @@ package nz.govt.eop.hilltop_crawler const val HILLTOP_RAW_DATA_TOPIC_NAME = "hilltop.raw" +const val OUTPUT_DATA_TOPIC_NAME = "observations" diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/TaskScheduler.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/TaskScheduler.kt deleted file mode 100644 index 5de80c9b..00000000 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/TaskScheduler.kt +++ /dev/null @@ -1,128 +0,0 @@ -package nz.govt.eop.hilltop_crawler - -import java.time.Instant -import java.time.LocalDateTime -import java.time.OffsetDateTime -import java.time.ZoneOffset -import java.time.temporal.ChronoUnit.MINUTES -import mu.KotlinLogging -import nz.govt.eop.hilltop_crawler.db.DB -import nz.govt.eop.hilltop_crawler.db.DB.HilltopFetchTaskRequestType.* -import nz.govt.eop.hilltop_crawler.support.* -import nz.govt.eop.messages.HilltopDataMessage -import org.apache.kafka.clients.admin.NewTopic -import org.springframework.beans.factory.annotation.Qualifier -import org.springframework.kafka.core.KafkaTemplate -import org.springframework.scheduling.annotation.Scheduled -import org.springframework.stereotype.Component - -@Component("HilltopCrawlerTaskScheduler") -class TaskScheduler( - val db: DB, - @Qualifier("hilltopRawDataTopic") val dataTopic: NewTopic, - val kafkaSender: KafkaTemplate -) { - - private val logger = KotlinLogging.logger {} - - @Scheduled(fixedDelay = 60_000 * 60) - fun checkForNewSources() { - val sources = db.listSources() - - sources.forEach { - db.createFetchTask( - DB.HilltopFetchTaskCreate( - it.councilId, - SITES_LIST, - it.htsUrl, - buildUrl(it.htsUrl).query, - DB.HilltopFetchTaskState.PENDING)) - } - } - - @Scheduled(fixedDelay = 10_000) - fun processTasks() { - logger.debug { "Processing tasks" } - do { - val taskToProcess = db.getNextTaskToProcess() - - if (taskToProcess != null) { - logger.info { "Processing task ${taskToProcess.id}" } - val fetchUri = rebuildHim(taskToProcess.baseUrl, taskToProcess.queryParams) - val xmlContent = fetch(fetchUri) - val contentHash = hashMessage(xmlContent) - - if (contentHash == taskToProcess.previousDataHash) { - logger.info { "Content hash matches for $fetchUri" } - } else { - val isErrorXml = HilltopXmlParsers.isHilltopErrorXml(xmlContent) - if (isErrorXml) { - logger.info { "Error XML from Hilltop for $taskToProcess.url" } - } else { - - when (taskToProcess.requestType) { - SITES_LIST -> - processSitesXmlResponse( - taskToProcess.councilId, taskToProcess.baseUrl, xmlContent) - MEASUREMENTS_LIST -> - processMeasurementListResponse( - taskToProcess.councilId, taskToProcess.baseUrl, xmlContent) - MEASUREMENT_DATA -> Unit - } - - // Send to Kafka - kafkaSender.send( - dataTopic.name(), - taskToProcess.baseUrl, - HilltopDataMessage( - taskToProcess.councilId, - taskToProcess.baseUrl, - taskToProcess.requestType.toString(), - xmlContent, - Instant.now())) - } - } - db.requeueTask(taskToProcess.id, contentHash, Instant.now().plus(60, MINUTES)) - logger.info { "Processing task ${taskToProcess.id} finished" } - } - } while (taskToProcess != null) - logger.debug { "Processing finished" } - } - - private fun processSitesXmlResponse(councilId: Int, baseUrl: String, xmlContent: String) { - val sitesXml = HilltopSitesParser.parseSites(xmlContent) - sitesXml.sites - .filter { it.isValidSite() } - .forEach { - db.createFetchTask( - DB.HilltopFetchTaskCreate( - councilId, - MEASUREMENTS_LIST, - baseUrl, - buildUrl(baseUrl, it.name).query, - DB.HilltopFetchTaskState.PENDING)) - } - } - - private fun processMeasurementListResponse(councilId: Int, baseUrl: String, xmlContent: String) { - val measurementTypes = HilltopXmlParsers.parseMeasurementNames(xmlContent) - measurementTypes.datasources - .filter { it.name == "Rainfall" || it.name == "SCADA Rainfall" } - .filter { - LocalDateTime.parse(it.to) - .atOffset(ZoneOffset.of("+12")) - .isAfter(OffsetDateTime.now().minusDays(30)) - } - .filter { it.type == "StdSeries" } - .filter { it.measurements.isNotEmpty() } - .forEach { - db.createFetchTask( - DB.HilltopFetchTaskCreate( - councilId, - MEASUREMENT_DATA, - baseUrl, - buildUrl(baseUrl, it.siteId, it.measurements.first().requestAs).query, - DB.HilltopFetchTaskState.PENDING)) - } - } -} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/HilltopFetcher.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/HilltopFetcher.kt new file mode 100644 index 00000000..850bc4a2 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/HilltopFetcher.kt @@ -0,0 +1,40 @@ +package nz.govt.eop.hilltop_crawler.api + +import io.github.bucket4j.Bandwidth +import io.github.bucket4j.BlockingBucket +import io.github.bucket4j.Bucket +import java.net.URI +import java.time.Duration +import mu.KotlinLogging +import org.springframework.stereotype.Component +import org.springframework.web.client.RestClientException +import org.springframework.web.client.RestTemplate + +/** + * Simple wrapper around RestTemplate to limit the number of requests per second This is to avoid + * overloading the Hilltop servers + * + * Could be extended to have a bucket per host to increase throughput. But currently the bottleneck + * is when initially loading data from Hilltop, which ends up effectively processing one host at a + * time. + */ +@Component +class HilltopFetcher(val restTemplate: RestTemplate) { + + private final val logger = KotlinLogging.logger {} + + private final val bucketBandwidthLimit: Bandwidth = Bandwidth.simple(20, Duration.ofSeconds(1)) + private final val bucket: BlockingBucket = + Bucket.builder().addLimit(bucketBandwidthLimit).build().asBlocking() + + fun fetch(fetchRequest: URI): String? { + bucket.consume(1) + logger.trace { "Downloading $fetchRequest" } + return try { + restTemplate.getForObject(fetchRequest, String::class.java) + } catch (e: RestClientException) { + logger.info(e) { "Failed to download $fetchRequest" } + null + } + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopMeasurementValues.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopMeasurementValues.kt new file mode 100644 index 00000000..a6828c63 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopMeasurementValues.kt @@ -0,0 +1,49 @@ +package nz.govt.eop.hilltop_crawler.api.parsers + +import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlElementWrapper +import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty +import java.math.BigDecimal +import java.time.LocalDateTime +import java.time.OffsetDateTime +import java.time.ZoneOffset + +data class HilltopMeasurementValues( + @JacksonXmlProperty(localName = "Measurement") val measurement: Measurement? +) + +data class Measurement( + @JacksonXmlProperty(localName = "SiteName", isAttribute = true) val siteName: String, + @JacksonXmlProperty(localName = "DataSource") val dataSource: DataSource, + @JacksonXmlProperty(localName = "Data") val data: Data +) + +data class DataSource( + @JacksonXmlProperty(localName = "Name", isAttribute = true) val measurementName: String, +) + +data class Data( + @JacksonXmlProperty(localName = "DateFormat", isAttribute = true) val dateFormat: String, + @JacksonXmlProperty(localName = "E") + @JacksonXmlElementWrapper(useWrapping = false) + val values: List = emptyList() +) + +data class Value( + @JacksonXmlProperty(localName = "T") val timestampString: String, + @JacksonXmlProperty(localName = "I1") val value1String: String?, + @JacksonXmlProperty(localName = "Value") val value2String: String?, + @JacksonXmlProperty(localName = "Parameter") + @JacksonXmlElementWrapper(useWrapping = false) + val parameters: List? = null +) { + val value: BigDecimal + get() = BigDecimal(value1String ?: value2String) + + val timestamp: OffsetDateTime + get() = LocalDateTime.parse(timestampString).atOffset(ZoneOffset.of("+12")) +} + +data class Parameter( + @JacksonXmlProperty(localName = "Name") val name: String, + @JacksonXmlProperty(localName = "Value") val value: String +) diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopMeasurments.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopMeasurments.kt new file mode 100644 index 00000000..9eca3029 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopMeasurments.kt @@ -0,0 +1,28 @@ +package nz.govt.eop.hilltop_crawler.api.parsers + +import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlElementWrapper +import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty + +data class HilltopMeasurements( + @JacksonXmlProperty(localName = "DataSource") + @JacksonXmlElementWrapper(useWrapping = false) + val datasources: List +) + +data class HilltopDatasource( + @JacksonXmlProperty(localName = "Name", isAttribute = true) val name: String, + @JacksonXmlProperty(localName = "Site", isAttribute = true) val siteName: String, + @JacksonXmlProperty(localName = "From") val from: String, + @JacksonXmlProperty(localName = "To") val to: String, + @JacksonXmlProperty(localName = "TSType") val type: String, + @JacksonXmlProperty(localName = "Measurement") + @JacksonXmlElementWrapper(useWrapping = false) + val measurements: List = emptyList() +) + +data class HilltopMeasurement( + @JacksonXmlProperty(localName = "Name", isAttribute = true) val name: String, + @JacksonXmlProperty(localName = "RequestAs") val requestAs: String, + @JacksonXmlProperty(localName = "Item") val itemNumber: Int, + @JacksonXmlProperty(localName = "VM") val vm: Int? = null +) diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopSites.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopSites.kt new file mode 100644 index 00000000..435eecc1 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopSites.kt @@ -0,0 +1,18 @@ +package nz.govt.eop.hilltop_crawler.api.parsers + +import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlElementWrapper +import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty + +data class HilltopSites( + @JacksonXmlProperty(localName = "Agency") val agency: String, + @JacksonXmlProperty(localName = "Projection") val projection: String?, + @JacksonXmlProperty(localName = "Site") + @JacksonXmlElementWrapper(useWrapping = false) + val sites: List = emptyList() +) + +data class HilltopSite( + @JacksonXmlProperty(localName = "Name", isAttribute = true) val name: String, + @JacksonXmlProperty(localName = "Easting") val easting: Int?, + @JacksonXmlProperty(localName = "Northing") val northing: Int? +) diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopXmlParsers.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopXmlParsers.kt new file mode 100644 index 00000000..325654ea --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopXmlParsers.kt @@ -0,0 +1,32 @@ +package nz.govt.eop.hilltop_crawler.api.parsers + +import com.fasterxml.jackson.databind.DeserializationFeature +import com.fasterxml.jackson.dataformat.xml.XmlMapper +import com.fasterxml.jackson.module.kotlin.KotlinModule +import org.springframework.stereotype.Component + +@Component +class HilltopXmlParsers() { + + private final val xmlMapper: XmlMapper = + XmlMapper.builder() + .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) + .addModule(KotlinModule.Builder().build()) + .build() + + fun isHilltopErrorXml(data: String): Boolean { + return xmlMapper.readTree(data).has("Error") + } + + fun parseSitesResponse(data: String): HilltopSites { + return xmlMapper.readValue(data, HilltopSites::class.java) + } + + fun parseMeasurementsResponse(data: String): HilltopMeasurements { + return xmlMapper.readValue(data, HilltopMeasurements::class.java) + } + + fun parseMeasurementValuesResponse(data: String): HilltopMeasurementValues { + return xmlMapper.readValue(data, HilltopMeasurementValues::class.java) + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/requests/HilltopRequestBuilders.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/requests/HilltopRequestBuilders.kt new file mode 100644 index 00000000..72212b94 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/api/requests/HilltopRequestBuilders.kt @@ -0,0 +1,39 @@ +package nz.govt.eop.hilltop_crawler.api.requests + +import java.time.YearMonth +import org.springframework.web.util.DefaultUriBuilderFactory + +fun buildSiteListUrl(hilltopUrl: String): String = + DefaultUriBuilderFactory() + .uriString(hilltopUrl) + .queryParam("Service", "Hilltop") + .queryParam("Request", "SiteList") + .queryParam("Location", "Yes") + .build() + .toASCIIString() + +fun buildMeasurementListUrl(hilltopUrl: String, siteId: String): String = + DefaultUriBuilderFactory() + .uriString(hilltopUrl) + .queryParam("Service", "Hilltop") + .queryParam("Request", "MeasurementList") + .queryParam("Site", siteId) + .build() + .toASCIIString() + +fun buildPastMeasurementsUrl( + hilltopUrl: String, + siteId: String, + measurementName: String, + month: YearMonth +): String = + DefaultUriBuilderFactory() + .uriString(hilltopUrl) + .queryParam("Service", "Hilltop") + .queryParam("Request", "GetData") + .queryParam("Site", siteId) + .queryParam("Measurement", measurementName) + .queryParam("from", month.atDay(1).atStartOfDay()) + .queryParam("to", month.plusMonths(1).atDay(1).atStartOfDay().minusSeconds(1)) + .build() + .toASCIIString() diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/JacksonJsonConfig.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/JacksonJsonConfig.kt new file mode 100644 index 00000000..0e6832de --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/JacksonJsonConfig.kt @@ -0,0 +1,17 @@ +package nz.govt.eop.hilltop_crawler.config + +import com.fasterxml.jackson.databind.SerializationFeature +import org.springframework.boot.autoconfigure.jackson.Jackson2ObjectMapperBuilderCustomizer +import org.springframework.context.annotation.Bean +import org.springframework.context.annotation.Configuration +import org.springframework.http.converter.json.Jackson2ObjectMapperBuilder + +@Configuration +class JacksonJsonConfig { + @Bean + fun jsonCustomizer(): Jackson2ObjectMapperBuilderCustomizer { + return Jackson2ObjectMapperBuilderCustomizer { builder: Jackson2ObjectMapperBuilder -> + builder.featuresToDisable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS) + } + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/KafkaConfig.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/KafkaConfig.kt index 5446fab0..f208f919 100644 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/KafkaConfig.kt +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/KafkaConfig.kt @@ -1,22 +1,76 @@ package nz.govt.eop.hilltop_crawler.config +import com.fasterxml.jackson.databind.ObjectMapper import nz.govt.eop.hilltop_crawler.ApplicationConfiguration import nz.govt.eop.hilltop_crawler.HILLTOP_RAW_DATA_TOPIC_NAME +import nz.govt.eop.hilltop_crawler.OUTPUT_DATA_TOPIC_NAME +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessage +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessageKey import org.apache.kafka.clients.admin.NewTopic +import org.springframework.boot.autoconfigure.kafka.KafkaProperties import org.springframework.context.annotation.Bean import org.springframework.context.annotation.Configuration +import org.springframework.context.annotation.Profile +import org.springframework.kafka.annotation.EnableKafka +import org.springframework.kafka.annotation.EnableKafkaStreams import org.springframework.kafka.config.TopicBuilder +import org.springframework.kafka.core.DefaultKafkaProducerFactory +import org.springframework.kafka.core.KafkaTemplate +import org.springframework.kafka.core.ProducerFactory +import org.springframework.kafka.support.serializer.JsonSerializer +@Profile("!test") +@EnableKafka +@EnableKafkaStreams @Configuration -class KafkaConfig(val applicationConfiguration: ApplicationConfiguration) { +class KafkaConfig( + val applicationConfiguration: ApplicationConfiguration, + val properties: KafkaProperties, + val objectMapper: ObjectMapper +) { @Bean fun hilltopRawDataTopic(): NewTopic = TopicBuilder.name(HILLTOP_RAW_DATA_TOPIC_NAME) + // For the RAW data we keep the want to keep all messages from a single Hilltop server in + // the + // same partition so that they get consumed in the same order as they were created. + // @see HilltopMessageClient + // 16 partitions roughly equates to one per council. Which is a good starting point for + // each council + // having a Hilltop server. .partitions(16) .replicas(applicationConfiguration.topicReplicas) - .config("max.message.bytes", "134217728") - .config("retention.ms", "-1") - .config("retention.bytes", "-1") + .config("max.message.bytes", "10485760") + .compact() .build() + + @Bean + fun outputDataTopic(): NewTopic = + TopicBuilder.name(OUTPUT_DATA_TOPIC_NAME) + // For the output data it is keyed by Council / Site Name so many more partitions are + // needed. + // 64 is just a random large-ish number but no real insight if this is a good value + .partitions(64) + .replicas(applicationConfiguration.topicReplicas) + .config("max.message.bytes", "10485760") + .build() + + @Bean + fun jsonSerializer(): JsonSerializer { + return JsonSerializer(objectMapper).noTypeInfo() + } + + @Bean + fun producerFactory(): ProducerFactory { + return DefaultKafkaProducerFactory( + properties.buildProducerProperties(), + JsonSerializer(objectMapper).forKeys().noTypeInfo(), + JsonSerializer(objectMapper).noTypeInfo()) + } + + @Bean + fun kafkaTemplate(): KafkaTemplate { + return KafkaTemplate(producerFactory()) + } } diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/RestTemplateConfig.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/RestTemplateConfig.kt new file mode 100644 index 00000000..a45bbc53 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/config/RestTemplateConfig.kt @@ -0,0 +1,25 @@ +package nz.govt.eop.hilltop_crawler.config + +import java.time.Duration +import org.springframework.boot.autoconfigure.web.client.RestTemplateBuilderConfigurer +import org.springframework.boot.web.client.RestTemplateBuilder +import org.springframework.context.annotation.Bean +import org.springframework.context.annotation.Configuration +import org.springframework.web.client.RestTemplate + +@Configuration +class RestTemplateConfig { + @Bean + fun restTemplateBuilder( + restTemplateBuilderConfigurer: RestTemplateBuilderConfigurer + ): RestTemplateBuilder { + val builder = + RestTemplateBuilder() + .setConnectTimeout(Duration.ofSeconds(5)) + .setReadTimeout(Duration.ofSeconds(30)) + restTemplateBuilderConfigurer.configure(builder) + return builder + } + + @Bean fun restTemplate(): RestTemplate = RestTemplateBuilder().build() +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/db/DB.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/db/DB.kt index 7b8534fc..64bd5f43 100644 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/db/DB.kt +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/db/DB.kt @@ -1,100 +1,139 @@ package nz.govt.eop.hilltop_crawler.db +import com.fasterxml.jackson.databind.ObjectMapper +import java.net.URI +import java.sql.ResultSet import java.time.Instant import java.time.OffsetDateTime import java.time.ZoneOffset +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessageType import org.springframework.jdbc.core.JdbcTemplate import org.springframework.stereotype.Component @Component -class DB(val template: JdbcTemplate) { +class DB(val template: JdbcTemplate, val objectMapper: ObjectMapper) { - enum class HilltopFetchTaskState { - PENDING, - } - - enum class HilltopFetchTaskRequestType { - SITES_LIST, - MEASUREMENTS_LIST, - MEASUREMENT_DATA - } + data class HilltopSourcesRow( + val id: Int, + val councilId: Int, + val htsUrl: String, + val config: HilltopSourceConfig + ) - data class HilltopSourcesRow(val councilId: Int, val htsUrl: String) + data class HilltopSourceConfig( + val measurementNames: List, + val excludedSitesNames: List = listOf() + ) data class HilltopFetchTaskCreate( - val councilId: Int, - val requestType: HilltopFetchTaskRequestType, + val sourceId: Int, + val requestType: HilltopMessageType, val baseUrl: String, - val queryParams: String, - val state: HilltopFetchTaskState ) data class HilltopFetchTaskRow( val id: Int, - val councilId: Int, - val requestType: HilltopFetchTaskRequestType, - val baseUrl: String, - val queryParams: String, - val state: HilltopFetchTaskState, - val previousDataHash: String? + val sourceId: Int, + val requestType: HilltopMessageType, + val nextFetchAt: Instant, + val fetchUri: URI, + val previousDataHash: String?, ) - fun listSources(): List = - template.query( + data class HilltopFetchResult(val at: Instant, val result: HilltopFetchStatus, val hash: String?) + + enum class HilltopFetchStatus { + SUCCESS, + UNCHANGED, + FETCH_ERROR, + HILLTOP_ERROR, + PARSE_ERROR, + UNKNOWN_ERROR + } + + val hilltopSourcesRowMapper: (rs: ResultSet, rowNum: Int) -> HilltopSourcesRow = { rs, _ -> + HilltopSourcesRow( + rs.getInt("id"), + rs.getInt("council_id"), + rs.getString("hts_url"), + objectMapper.readValue(rs.getString("configuration"), HilltopSourceConfig::class.java)) + } + + fun getSource(id: Int): HilltopSourcesRow = + template.queryForObject( """ - SELECT council_id, hts_url + SELECT id, council_id, hts_url, configuration FROM hilltop_sources + WHERE id = ? + """ + .trimIndent(), + hilltopSourcesRowMapper, + id, + )!! + + fun listSources(): List { + + return template.query( """ - .trimIndent()) { rs, _ -> - HilltopSourcesRow(rs.getInt("council_id"), rs.getString("hts_url")) - } + SELECT id, council_id, hts_url, configuration + FROM hilltop_sources + ORDER BY id + """ + .trimIndent(), + hilltopSourcesRowMapper) + } fun createFetchTask(request: HilltopFetchTaskCreate) { template.update( """ - INSERT INTO hilltop_fetch_tasks (council_id, request_type, base_url, query_params, state) VALUES (?,?,?,?,?) - ON CONFLICT DO NOTHING + INSERT INTO hilltop_fetch_tasks (source_id, request_type, fetch_url) VALUES (?,?,?) + ON CONFLICT(source_id, request_type, fetch_url) DO NOTHING """ .trimIndent(), - request.councilId, + request.sourceId, request.requestType.toString(), - request.baseUrl, - request.queryParams, - request.state.toString()) + request.baseUrl) } fun getNextTaskToProcess(): HilltopFetchTaskRow? = template .query( """ - SELECT id, council_id, request_type, base_url, query_params, state, previous_data_hash + SELECT id, source_id, request_type, next_fetch_at, fetch_url, previous_data_hash FROM hilltop_fetch_tasks - WHERE state = 'PENDING' AND next_fetch_at < NOW() - ORDER BY next_fetch_at + WHERE next_fetch_at < NOW() + ORDER BY next_fetch_at, id LIMIT 1 + FOR UPDATE SKIP LOCKED """ .trimIndent()) { rs, _ -> HilltopFetchTaskRow( rs.getInt("id"), - rs.getInt("council_id"), - HilltopFetchTaskRequestType.valueOf(rs.getString("request_type")), - rs.getString("base_url"), - rs.getString("query_params"), - HilltopFetchTaskState.valueOf(rs.getString("state")), - rs.getString("previous_data_hash"), - ) + rs.getInt("source_id"), + HilltopMessageType.valueOf(rs.getString("request_type")), + rs.getTimestamp("next_fetch_at").toInstant(), + URI(rs.getString("fetch_url")), + rs.getString("previous_data_hash")) } .firstOrNull() - fun requeueTask(id: Int, currentContentHash: String, nextFetchAt: Instant) { + fun requeueTask( + id: Int, + currentContentHash: String?, + currentResult: HilltopFetchResult, + nextFetchAt: Instant + ) { template.update( """ UPDATE hilltop_fetch_tasks - SET state = 'PENDING', previous_data_hash = ?, next_fetch_at = ? + SET previous_data_hash = ?, + previous_history = previous_history || ?::jsonb, + next_fetch_at = ? WHERE id = ? """ .trimIndent(), currentContentHash, + objectMapper.writeValueAsString(currentResult), OffsetDateTime.ofInstant(nextFetchAt, ZoneOffset.UTC), id) } diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTaskProcessor.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTaskProcessor.kt new file mode 100644 index 00000000..a22304e6 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTaskProcessor.kt @@ -0,0 +1,157 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import com.fasterxml.jackson.core.JsonParseException +import com.fasterxml.jackson.databind.JsonMappingException +import java.time.Duration +import java.time.Instant +import java.time.temporal.TemporalAmount +import kotlin.random.Random +import mu.KotlinLogging +import nz.govt.eop.hilltop_crawler.api.HilltopFetcher +import nz.govt.eop.hilltop_crawler.api.parsers.HilltopXmlParsers +import nz.govt.eop.hilltop_crawler.db.DB +import nz.govt.eop.hilltop_crawler.db.DB.HilltopFetchResult +import nz.govt.eop.hilltop_crawler.db.DB.HilltopFetchStatus.* +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessageType.* +import org.springframework.stereotype.Component +import org.springframework.transaction.annotation.Transactional + +/** + * FetchTaskProcessor is a class responsible for processing fetch tasks from a database. + * + * It follows this pattern: + * - Pick up the next task to process + * - Fetch the content from the hilltop server source + * - Compare the content to the previous content + * - If the content has changed, from the parsed content, create any new tasks based on the content + * - If the content has changed, send a message to Kafka + * - Reschedule the task in the DB to run again in the future + * + * If any errors occur while processing, the task should be rescheduled to run again in the future + */ +@Component +class FetchTaskProcessor( + private val db: DB, + private val fetcher: HilltopFetcher, + private val parsers: HilltopXmlParsers, + private val kafkaClient: HilltopMessageClient +) { + private final val logger = KotlinLogging.logger {} + + @Transactional + fun runNextTask(): Boolean = + db.getNextTaskToProcess()?.let { + runTask(it) + true + } ?: false + + fun runTask(taskToProcess: DB.HilltopFetchTaskRow) { + val fetchedAt = Instant.now() + try { + val source = db.getSource(taskToProcess.sourceId) + val xmlContent = fetcher.fetch(taskToProcess.fetchUri) + + if (xmlContent == null) { + handleTaskErrorRequeue( + taskToProcess, fetchedAt, FETCH_ERROR, Duration.ofMinutes(5), Duration.ofHours(1)) + return + } + + val isErrorXml = + try { + parsers.isHilltopErrorXml(xmlContent) + } catch (e: JsonParseException) { + logger.warn(e) { "Failed to parse content [$xmlContent]" } + true + } + if (isErrorXml) { + handleTaskErrorRequeue( + taskToProcess, fetchedAt, HILLTOP_ERROR, Duration.ofDays(1), Duration.ofDays(30)) + return + } + + val taskMapper = + try { + when (taskToProcess.requestType) { + SITES_LIST -> + SitesListTaskMapper( + source, + taskToProcess.fetchUri, + fetchedAt, + xmlContent, + parsers.parseSitesResponse(xmlContent)) + MEASUREMENTS_LIST -> + MeasurementsListTaskMapper( + source, + taskToProcess.fetchUri, + fetchedAt, + xmlContent, + parsers.parseMeasurementsResponse(xmlContent)) + MEASUREMENT_DATA -> + MeasurementDataTaskMapper( + source, + taskToProcess.fetchUri, + fetchedAt, + xmlContent, + parsers.parseMeasurementValuesResponse(xmlContent)) + } + } catch (e: JsonMappingException) { + logger.warn(e) { "Failed to parse content [${xmlContent}]" } + handleTaskErrorRequeue( + taskToProcess, fetchedAt, PARSE_ERROR, Duration.ofDays(1), Duration.ofDays(30)) + return + } + + if (taskMapper.contentHash != taskToProcess.previousDataHash) { + taskMapper.buildNewTasksList().forEach(db::createFetchTask) + taskMapper.buildKafkaMessage()?.let(kafkaClient::send) + } + + handleTaskRequeue( + taskToProcess, + fetchedAt, + taskMapper.contentHash, + if (taskMapper.contentHash == taskToProcess.previousDataHash) UNCHANGED else SUCCESS, + taskMapper.determineNextFetchAt()) + } catch (e: Exception) { + // This is a catch-all for any errors that occur while processing a task. + logger.error(e) { + "Failed to process task [${taskToProcess.id}] with url [${taskToProcess.fetchUri}]" + } + // We want to make sure if we fail to process a task, we reschedule it to run again + // with a delay so that we don't end up in a situation where we are constantly trying to + // process a task that is failing. + handleTaskErrorRequeue( + taskToProcess, fetchedAt, UNKNOWN_ERROR, Duration.ofDays(1), Duration.ofDays(7)) + } + } + + fun handleTaskErrorRequeue( + task: DB.HilltopFetchTaskRow, + fetchedAt: Instant, + errorCode: DB.HilltopFetchStatus, + minAmount: TemporalAmount, + maxAmount: TemporalAmount + ) = + db.requeueTask( + task.id, + task.previousDataHash, + HilltopFetchResult(fetchedAt, errorCode, null), + randomTimeBetween(fetchedAt.plus(minAmount), fetchedAt.plus(maxAmount))) + + fun handleTaskRequeue( + task: DB.HilltopFetchTaskRow, + fetchedAt: Instant, + newContentHash: String?, + statusCode: DB.HilltopFetchStatus, + nextFetchAt: Instant + ) = + db.requeueTask( + task.id, + newContentHash, + HilltopFetchResult(fetchedAt, statusCode, newContentHash), + nextFetchAt) +} + +private fun randomTimeBetween(earliest: Instant, latest: Instant): Instant = + earliest.plusMillis(Random.nextLong(Duration.between(earliest, latest).toMillis() + 1)) diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTasksRunner.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTasksRunner.kt new file mode 100644 index 00000000..8da665c9 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTasksRunner.kt @@ -0,0 +1,48 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import jakarta.annotation.PostConstruct +import java.time.Duration +import mu.KotlinLogging +import org.springframework.scheduling.TaskScheduler +import org.springframework.stereotype.Component + +private const val TASKS_TO_RUN = 10 + +/** + * A class responsible for running fetch tasks using a FetchTaskProcessor. + * + * Seperated from the processor so that each task processed runs in its own transaction. + */ +@Component +class FetchTasksRunner(val taskScheduler: TaskScheduler, val processor: FetchTaskProcessor) { + + private final val logger = KotlinLogging.logger {} + + /** + * Uses springs task scheduler abstraction to effectively start a set of worker threads that will + * process tasks from the Hilltop queue. + * + * When a thread starts working, it will process items from the DB queue until there is nothing + * left, when it will stop. + */ + @PostConstruct + fun startTasks() { + logger.info { "Starting" } + (1..TASKS_TO_RUN).forEach { id -> + taskScheduler.scheduleWithFixedDelay({ processTasks(id) }, Duration.ofSeconds(10)) + } + } + + fun processTasks(id: Int) { + while (true) { + try { + val hadWorkToDo = processor.runNextTask() + if (!hadWorkToDo) { + return + } + } catch (e: InterruptedException) { + return + } + } + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/HilltopMessageClient.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/HilltopMessageClient.kt new file mode 100644 index 00000000..73e48625 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/HilltopMessageClient.kt @@ -0,0 +1,22 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import org.apache.kafka.clients.admin.NewTopic +import org.apache.kafka.clients.producer.ProducerRecord +import org.apache.kafka.clients.producer.internals.BuiltInPartitioner +import org.springframework.beans.factory.annotation.Qualifier +import org.springframework.context.annotation.Profile +import org.springframework.kafka.core.KafkaTemplate +import org.springframework.stereotype.Component + +@Profile("!test") +@Component +class HilltopMessageClient( + @Qualifier("hilltopRawDataTopic") private val dataTopic: NewTopic, + private val kafkaSender: KafkaTemplate +) { + fun send(message: HilltopMessage) { + val partitionKey = "${message.councilId}#${message.hilltopBaseUrl}".toByteArray() + val partition = BuiltInPartitioner.partitionForKey(partitionKey, dataTopic.numPartitions()) + kafkaSender.send(ProducerRecord(dataTopic.name(), partition, message.toKey(), message)) + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/Messages.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/Messages.kt new file mode 100644 index 00000000..edcd253f --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/Messages.kt @@ -0,0 +1,101 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import com.fasterxml.jackson.annotation.JsonSubTypes +import com.fasterxml.jackson.annotation.JsonTypeInfo +import java.time.Instant +import java.time.YearMonth + +enum class HilltopMessageType { + SITES_LIST, + MEASUREMENTS_LIST, + MEASUREMENT_DATA, +} + +@JsonTypeInfo( + use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.EXISTING_PROPERTY, property = "type") +@JsonSubTypes( + JsonSubTypes.Type(value = HilltopSitesMessageKey::class, name = "SITES_LIST"), + JsonSubTypes.Type(value = HilltopMeasurementListMessageKey::class, name = "MEASUREMENTS_LIST"), + JsonSubTypes.Type(value = HilltopMeasurementsMessageKey::class, name = "MEASUREMENT_DATA")) +abstract class HilltopMessageKey( + val type: HilltopMessageType, +) { + abstract val councilId: Int + abstract val hilltopBaseUrl: String + abstract val at: Instant +} + +data class HilltopSitesMessageKey( + override val councilId: Int, + override val hilltopBaseUrl: String, + override val at: Instant +) : HilltopMessageKey(HilltopMessageType.SITES_LIST) + +data class HilltopMeasurementListMessageKey( + override val councilId: Int, + override val hilltopBaseUrl: String, + override val at: Instant, + val siteName: String +) : HilltopMessageKey(HilltopMessageType.MEASUREMENTS_LIST) + +data class HilltopMeasurementsMessageKey( + override val councilId: Int, + override val hilltopBaseUrl: String, + override val at: Instant, + val siteName: String, + val measurementName: String, + val yearMonth: YearMonth +) : HilltopMessageKey(HilltopMessageType.MEASUREMENT_DATA) + +@JsonTypeInfo( + use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.EXISTING_PROPERTY, property = "type") +@JsonSubTypes( + JsonSubTypes.Type(value = HilltopSitesMessage::class, name = "SITES_LIST"), + JsonSubTypes.Type(value = HilltopMeasurementListMessage::class, name = "MEASUREMENTS_LIST"), + JsonSubTypes.Type(value = HilltopMeasurementsMessage::class, name = "MEASUREMENT_DATA")) +abstract class HilltopMessage(val type: HilltopMessageType) { + abstract val councilId: Int + abstract val hilltopBaseUrl: String + abstract val at: Instant + abstract val hilltopUrl: String + abstract val xml: String + + abstract fun toKey(): HilltopMessageKey +} + +data class HilltopSitesMessage( + override val councilId: Int, + override val hilltopBaseUrl: String, + override val at: Instant, + override val hilltopUrl: String, + override val xml: String +) : HilltopMessage(HilltopMessageType.SITES_LIST) { + override fun toKey(): HilltopMessageKey = HilltopSitesMessageKey(councilId, hilltopBaseUrl, at) +} + +data class HilltopMeasurementListMessage( + override val councilId: Int, + override val hilltopBaseUrl: String, + override val at: Instant, + val siteName: String, + override val hilltopUrl: String, + override val xml: String +) : HilltopMessage(HilltopMessageType.MEASUREMENTS_LIST) { + override fun toKey(): HilltopMessageKey = + HilltopMeasurementListMessageKey(councilId, hilltopBaseUrl, at, siteName) +} + +data class HilltopMeasurementsMessage( + override val councilId: Int, + override val hilltopBaseUrl: String, + override val at: Instant, + val siteName: String, + val measurementName: String, + val yearMonth: YearMonth, + override val hilltopUrl: String, + override val xml: String +) : HilltopMessage(HilltopMessageType.MEASUREMENT_DATA) { + override fun toKey(): HilltopMessageKey = + HilltopMeasurementsMessageKey( + councilId, hilltopBaseUrl, at, siteName, measurementName, yearMonth) +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/TaskMappers.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/TaskMappers.kt new file mode 100644 index 00000000..dc281a28 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/fetcher/TaskMappers.kt @@ -0,0 +1,227 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import java.net.URI +import java.security.MessageDigest +import java.time.* +import java.time.format.DateTimeFormatter +import java.time.temporal.ChronoUnit +import java.time.temporal.Temporal +import kotlin.random.Random +import nz.govt.eop.hilltop_crawler.api.parsers.HilltopMeasurementValues +import nz.govt.eop.hilltop_crawler.api.parsers.HilltopMeasurements +import nz.govt.eop.hilltop_crawler.api.parsers.HilltopSites +import nz.govt.eop.hilltop_crawler.api.requests.buildMeasurementListUrl +import nz.govt.eop.hilltop_crawler.api.requests.buildPastMeasurementsUrl +import nz.govt.eop.hilltop_crawler.db.DB + +/** + * This is an abstract class that represents a mapper for processing a specific type of task. When + * processing a task we do 3 things: + * * Generate a set of new tasks + * * Optionally send a message to Kafka + * * come up with a time to run the task again + * + * The differences between these are based on the type of task we are processing. Implementations of + * this class will be specific to the type of task and are responsible for implementing the logic to + * do the above per task type. + */ +abstract class TaskMapper( + val type: HilltopMessageType, + val sourceConfig: DB.HilltopSourcesRow, + val fetchedUri: URI, + val fetchedAt: Instant, + val content: String, + val parsedContent: T +) { + val contentHash: String = hashMessage(content) + val baseUri = getUriWithoutQuery(fetchedUri) + + abstract fun buildNewTasksList(): List + + abstract fun buildKafkaMessage(): HilltopMessage? + + abstract fun determineNextFetchAt(): Instant + + private fun getUriWithoutQuery(original: URI): String = + URI(original.scheme, original.authority, original.path, null, null).toASCIIString() + + private fun hashMessage(message: String) = + MessageDigest.getInstance("SHA-256").digest(message.toByteArray()).joinToString("") { + "%02x".format(it) + } + + protected fun randomTimeBetween(earliest: Instant, latest: Instant): Instant = + earliest.plusMillis(Random.nextLong(Duration.between(earliest, latest).toMillis() + 1)) +} + +class SitesListTaskMapper( + sourceConfig: DB.HilltopSourcesRow, + fetchedUri: URI, + fetchedAt: Instant, + content: String, + parsedContent: HilltopSites +) : + TaskMapper( + HilltopMessageType.SITES_LIST, + sourceConfig, + fetchedUri, + fetchedAt, + content, + parsedContent) { + + override fun buildNewTasksList(): List = + parsedContent.sites + .filter { it.name !in sourceConfig.config.excludedSitesNames } + .map { + DB.HilltopFetchTaskCreate( + sourceConfig.id, + HilltopMessageType.MEASUREMENTS_LIST, + buildMeasurementListUrl(baseUri, it.name), + ) + } + + override fun buildKafkaMessage(): HilltopMessage = + HilltopSitesMessage( + sourceConfig.councilId, + baseUri, + fetchedAt, + fetchedUri.toASCIIString(), + content, + ) + + override fun determineNextFetchAt(): Instant = + randomTimeBetween(fetchedAt, fetchedAt.plus(30, ChronoUnit.DAYS)) +} + +class MeasurementsListTaskMapper( + sourceConfig: DB.HilltopSourcesRow, + fetchedUri: URI, + fetchedAt: Instant, + content: String, + parsedContent: HilltopMeasurements +) : + TaskMapper( + HilltopMessageType.MEASUREMENTS_LIST, + sourceConfig, + fetchedUri, + fetchedAt, + content, + parsedContent) { + + /** Generates a sequence of `YearMonth` from `startDate` to `endDate` inclusively. */ + private fun generateMonthSequence(startDate: T, endDate: T): List { + val firstElement = YearMonth.from(startDate) + val lastElement = YearMonth.from(endDate) + return generateSequence(firstElement) { it.plusMonths(1) } + .takeWhile { it <= lastElement } + .toList() + } + + override fun buildNewTasksList(): List = + parsedContent.datasources + .filter { sourceConfig.config.measurementNames.contains(it.name) } + .filter { it.type == "StdSeries" } + .filter { it.measurements.isNotEmpty() } + .filter { + it.measurements.firstOrNull { + sourceConfig.config.measurementNames.contains(it.name) && it.vm == null + } != null + } + .flatMap { + val fromDate = + LocalDate.parse( + it.from.subSequence(0, 10), DateTimeFormatter.ofPattern("yyyy-MM-dd")) + val toDate = + LocalDate.parse(it.to.subSequence(0, 10), DateTimeFormatter.ofPattern("yyyy-MM-dd")) + + val requestAs = + it.measurements + .first { + sourceConfig.config.measurementNames.contains(it.name) && it.vm == null + } + .requestAs + + generateMonthSequence(fromDate, toDate).map { yearMonth -> + DB.HilltopFetchTaskCreate( + sourceConfig.id, + HilltopMessageType.MEASUREMENT_DATA, + buildPastMeasurementsUrl(baseUri, it.siteName, requestAs, yearMonth), + ) + } + } + + override fun buildKafkaMessage(): HilltopMessage = + HilltopMeasurementListMessage( + sourceConfig.councilId, + baseUri, + fetchedAt, + parsedContent.datasources.first().siteName, + fetchedUri.toASCIIString(), + content, + ) + + override fun determineNextFetchAt(): Instant = + randomTimeBetween(fetchedAt.plus(1, ChronoUnit.DAYS), fetchedAt.plus(30, ChronoUnit.DAYS)) +} + +class MeasurementDataTaskMapper( + sourceConfig: DB.HilltopSourcesRow, + fetchedUri: URI, + fetchedAt: Instant, + content: String, + parsedContent: HilltopMeasurementValues +) : + TaskMapper( + HilltopMessageType.MEASUREMENT_DATA, + sourceConfig, + fetchedUri, + fetchedAt, + content, + parsedContent) { + override fun buildNewTasksList(): List = emptyList() + + override fun buildKafkaMessage(): HilltopMessage? = + if (parsedContent.measurement != null) { + HilltopMeasurementsMessage( + sourceConfig.councilId, + baseUri, + fetchedAt, + parsedContent.measurement.siteName, + parsedContent.measurement.dataSource.measurementName, + parsedContent.measurement.data.values.first().timestamp.let { YearMonth.from(it) }, + fetchedUri.toASCIIString(), + content, + ) + } else { + null + } + + /** + * When deciding when next to fetch data, we want to try to guess when data might arrive and fetch + * as close as possible after that. + * + * Because Hilltop doesn't tell us when data arrived, we have to guess based on the last timestamp + * in the data. + * + * Its trying to balance refreshing fast enough to get new data in quickly, while not hammering + * the server, or wasting time refreshing data that we don't expect to be updated (since this will + * be called when getting historical data as well) + */ + override fun determineNextFetchAt(): Instant { + val lastValueAt = parsedContent.measurement?.data?.values?.lastOrNull()?.timestamp?.toInstant() + return if (lastValueAt != null && lastValueAt > fetchedAt.minus(1, ChronoUnit.HOURS)) { + randomTimeBetween( + maxOf(lastValueAt.plus(15, ChronoUnit.MINUTES), fetchedAt), + fetchedAt.plus(15, ChronoUnit.MINUTES)) + } else if (lastValueAt != null && lastValueAt > fetchedAt.minus(1, ChronoUnit.DAYS)) { + randomTimeBetween(fetchedAt, fetchedAt.plus(1, ChronoUnit.HOURS)) + } else if (lastValueAt != null && lastValueAt > fetchedAt.minus(7, ChronoUnit.DAYS)) { + randomTimeBetween(fetchedAt, fetchedAt.plus(1, ChronoUnit.DAYS)) + // Just approx to a month + } else if (lastValueAt != null && lastValueAt > fetchedAt.minus(28, ChronoUnit.DAYS)) { + randomTimeBetween(fetchedAt, fetchedAt.plus(7, ChronoUnit.DAYS)) + } else { // Any older or historical data, will rarely change, so we can fetch it less often. + randomTimeBetween(fetchedAt, fetchedAt.plus(30, ChronoUnit.DAYS)) + } + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationMessages.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationMessages.kt new file mode 100644 index 00000000..5aca5866 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationMessages.kt @@ -0,0 +1,75 @@ +package nz.govt.eop.hilltop_crawler.producer + +import com.fasterxml.jackson.annotation.JsonSubTypes +import com.fasterxml.jackson.annotation.JsonTypeInfo +import java.math.BigDecimal +import java.time.OffsetDateTime +import java.time.YearMonth + +enum class ObservationMessageType { + SITE_DETAILS, + OBSERVATION_DATA, +} + +@JsonTypeInfo( + use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.EXISTING_PROPERTY, property = "type") +@JsonSubTypes( + JsonSubTypes.Type(value = SiteMessageKey::class, name = "SITE_DETAILS"), + JsonSubTypes.Type(value = ObservationDataMessageKey::class, name = "OBSERVATION_DATA"), +) +abstract class ObservationMessageKey(val type: ObservationMessageType) { + abstract val councilId: Int + abstract val siteName: String +} + +data class SiteMessageKey( + override val councilId: Int, + override val siteName: String, +) : ObservationMessageKey(ObservationMessageType.SITE_DETAILS) + +data class ObservationDataMessageKey( + override val councilId: Int, + override val siteName: String, + val measurementName: String, + val yearMonth: YearMonth +) : ObservationMessageKey(ObservationMessageType.OBSERVATION_DATA) + +@JsonTypeInfo( + use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.EXISTING_PROPERTY, property = "type") +@JsonSubTypes( + JsonSubTypes.Type(value = SiteDetailsMessage::class, name = "SITE_DETAILS"), + JsonSubTypes.Type(value = ObservationDataMessage::class, name = "OBSERVATION_DATA"), +) +abstract class ObservationMessage(val type: ObservationMessageType) { + abstract val councilId: Int + abstract val siteName: String + + abstract fun toKey(): ObservationMessageKey +} + +data class SiteDetailsMessage( + override val councilId: Int, + override val siteName: String, + val location: Location? +) : ObservationMessage(ObservationMessageType.SITE_DETAILS) { + override fun toKey() = + SiteMessageKey( + councilId, + siteName, + ) +} + +data class Location(val easting: Int, val northing: Int) + +data class ObservationDataMessage( + override val councilId: Int, + override val siteName: String, + val measurementName: String, + val observations: List +) : ObservationMessage(ObservationMessageType.OBSERVATION_DATA) { + override fun toKey() = + ObservationDataMessageKey( + councilId, siteName, measurementName, YearMonth.from(observations.first().observedAt)) +} + +data class Observation(val observedAt: OffsetDateTime, val value: BigDecimal) diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationProducer.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationProducer.kt new file mode 100644 index 00000000..e2818c43 --- /dev/null +++ b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationProducer.kt @@ -0,0 +1,87 @@ +package nz.govt.eop.hilltop_crawler.producer + +import com.fasterxml.jackson.databind.ObjectMapper +import nz.govt.eop.hilltop_crawler.api.parsers.HilltopXmlParsers +import nz.govt.eop.hilltop_crawler.fetcher.* +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessageType.* +import org.apache.kafka.clients.admin.NewTopic +import org.apache.kafka.clients.producer.internals.BuiltInPartitioner +import org.apache.kafka.streams.KeyValue +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.Consumed +import org.apache.kafka.streams.kstream.Produced +import org.springframework.beans.factory.annotation.Autowired +import org.springframework.beans.factory.annotation.Qualifier +import org.springframework.context.annotation.Profile +import org.springframework.kafka.support.serializer.JsonSerde +import org.springframework.stereotype.Component + +@Profile("!test") +@Component +class ObservationProducer( + val parsers: HilltopXmlParsers, + val objectMapper: ObjectMapper, + @Qualifier("hilltopRawDataTopic") val rawDataTopic: NewTopic, + @Qualifier("outputDataTopic") val outputDataTopic: NewTopic, +) { + + @Autowired + fun buildPipeline(streamsBuilder: StreamsBuilder) { + val messageStream = + streamsBuilder.stream( + rawDataTopic.name(), + Consumed.with( + JsonSerde(HilltopMessageKey::class.java, objectMapper), + JsonSerde(HilltopMessage::class.java, objectMapper))) + + messageStream + .flatMap { _, value -> + when (value.type) { + SITES_LIST -> { + val parsedXml = parsers.parseSitesResponse(value.xml) + + val siteMessages = + parsedXml.sites.map { + val location = + if (it.easting != null && it.northing != null) { + Location(it.easting, it.northing) + } else { + null + } + SiteDetailsMessage(value.councilId, it.name, location) + } + + return@flatMap siteMessages.map { KeyValue.pair(it.toKey(), it) } + } + MEASUREMENT_DATA -> { + val parsedXml = parsers.parseMeasurementValuesResponse(value.xml) + + val observations = + parsedXml.measurement!!.data.values.map { Observation(it.timestamp, it.value) } + + val message = + ObservationDataMessage( + value.councilId, + parsedXml.measurement.siteName, + parsedXml.measurement.dataSource.measurementName, + observations) + return@flatMap listOf( + KeyValue.pair(message.toKey(), message), + ) + } + else -> { + return@flatMap listOf() + } + } + } + .to( + outputDataTopic.name(), + Produced.with( + JsonSerde(objectMapper).noTypeInfo(), + JsonSerde(objectMapper).noTypeInfo()) { _, key, _, numPartitions + -> + BuiltInPartitioner.partitionForKey( + "${key.councilId}#${key.siteName}".toByteArray(), numPartitions) + }) + } +} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopRequests.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopRequests.kt deleted file mode 100644 index 847b19f8..00000000 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopRequests.kt +++ /dev/null @@ -1,65 +0,0 @@ -package nz.govt.eop.hilltop_crawler.support - -import java.io.ByteArrayOutputStream -import java.net.URI -import java.net.http.HttpClient -import java.net.http.HttpRequest -import java.net.http.HttpResponse -import java.nio.charset.Charset -import java.time.Duration -import mu.KotlinLogging -import org.springframework.web.util.DefaultUriBuilderFactory - -private val logger = KotlinLogging.logger {} -private val client = HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(5)).build() - -fun buildUrl(hilltopUrl: String): URI = - DefaultUriBuilderFactory() - .uriString(hilltopUrl) - .queryParam("Service", "Hilltop") - .queryParam("Request", "SiteList") - .queryParam("Location", "Yes") - .queryParam("Measurement", "Rainfall") - .build() - -fun buildUrl(hilltopUrl: String, siteId: String): URI = - DefaultUriBuilderFactory() - .uriString(hilltopUrl) - .queryParam("Service", "Hilltop") - .queryParam("Request", "MeasurementList") - .queryParam("Site", siteId) - .build() - -fun buildUrl(hilltopUrl: String, siteId: String, measurementName: String): URI = - DefaultUriBuilderFactory() - .uriString(hilltopUrl) - .queryParam("Service", "Hilltop") - .queryParam("Request", "GetData") - .queryParam("Site", siteId) - .queryParam("Measurement", measurementName) - .queryParam("TimeInterval", "P30D/now") - .build() - -fun rebuildHim(baseUrl: String, queryParams: String): URI = - DefaultUriBuilderFactory().uriString(baseUrl).query(queryParams).build() - -fun fetch(fetchRequest: URI): String { - - logger.trace { "Downloading ${fetchRequest}" } - val request = - HttpRequest.newBuilder() - .version(HttpClient.Version.HTTP_1_1) - .uri(fetchRequest) - .timeout(Duration.ofSeconds(30)) - .build() - - val response = client.send(request, HttpResponse.BodyHandlers.ofInputStream()) - if (response.statusCode() == 200) { - val baos = ByteArrayOutputStream() - response.body().use { it.copyTo(baos) } - return baos.toString(Charset.defaultCharset()) - } else { - logger.warn { "Error Downloading ${fetchRequest}" } - throw Exception("Download failed") - } -} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopSitesParser.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopSitesParser.kt deleted file mode 100644 index ae0896f5..00000000 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopSitesParser.kt +++ /dev/null @@ -1,38 +0,0 @@ -package nz.govt.eop.hilltop_crawler.support - -import com.fasterxml.jackson.databind.DeserializationFeature -import com.fasterxml.jackson.dataformat.xml.XmlMapper -import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlElementWrapper -import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty -import com.fasterxml.jackson.module.kotlin.KotlinModule - -data class HilltopSitesXml( - @JacksonXmlProperty(localName = "Agency") val agency: String, - @JacksonXmlProperty(localName = "Projection") val projection: String?, - @JacksonXmlProperty(localName = "Site") - @JacksonXmlElementWrapper(useWrapping = false) - val sites: List = arrayListOf() -) { - fun validSites(): List = sites.filter { it.isValidSite() } -} - -data class HilltopSiteXml( - @JacksonXmlProperty(localName = "Name", isAttribute = true) val name: String, - @JacksonXmlProperty(localName = "Easting") val easting: Int?, - @JacksonXmlProperty(localName = "Northing") val northing: Int? -) { - fun isValidSite(): Boolean = name != "HY Maitai at Forks" && easting != null && northing != null -} - -object HilltopSitesParser { - - val xmlMapper = - XmlMapper.builder() - .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) - .addModule(KotlinModule.Builder().build()) - .build() - - fun parseSites(data: String): HilltopSitesXml { - return xmlMapper.readValue(data, HilltopSitesXml::class.java) - } -} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopXmlParsers.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopXmlParsers.kt deleted file mode 100644 index 207f0d33..00000000 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopXmlParsers.kt +++ /dev/null @@ -1,101 +0,0 @@ -package nz.govt.eop.hilltop_crawler.support - -import com.fasterxml.jackson.databind.DeserializationFeature -import com.fasterxml.jackson.dataformat.xml.XmlMapper -import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlElementWrapper -import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty -import com.fasterxml.jackson.module.kotlin.KotlinModule - -data class HilltopMeasurementTypeXml( - @JacksonXmlProperty(localName = "DataSource") - @JacksonXmlElementWrapper(useWrapping = false) - val datasources: List -) - -data class HilltopDatasource( - @JacksonXmlProperty(localName = "Name", isAttribute = true) val name: String, - @JacksonXmlProperty(localName = "Site", isAttribute = true) val siteId: String, - @JacksonXmlProperty(localName = "From") val from: String, - @JacksonXmlProperty(localName = "To") val to: String, - @JacksonXmlProperty(localName = "TSType") val type: String, - @JacksonXmlProperty(localName = "Measurement") - @JacksonXmlElementWrapper(useWrapping = false) - val measurements: List = emptyList() -) - -data class HilltopMeasurement( - @JacksonXmlProperty(localName = "Name", isAttribute = true) val name: String, - @JacksonXmlProperty(localName = "RequestAs") val requestAs: String, -) - -data class HilltopMeasurementDataXml( - @JacksonXmlProperty(localName = "Measurement") - val measurement: HilltopMeasurementDataMeasurement -) - -data class HilltopMeasurementDataMeasurement( - @JacksonXmlProperty(localName = "Data") val data: Data -) - -data class Data( - @JacksonXmlProperty(localName = "DateFormat", isAttribute = true) val dateFormat: String, - @JacksonXmlProperty(localName = "E") - @JacksonXmlElementWrapper(useWrapping = false) - val values: List -) - -data class Value( - @JacksonXmlProperty(localName = "T") val timestamp: String, - @JacksonXmlProperty(localName = "I1") val value1: String?, - @JacksonXmlProperty(localName = "Value") val value2: String?, - @JacksonXmlProperty(localName = "Parameter") - @JacksonXmlElementWrapper(useWrapping = false) - val parameters: List? -) - -data class Parameter( - @JacksonXmlProperty(localName = "Name") val name: String, - @JacksonXmlProperty(localName = "Value") val value: String -) - -data class Measurement( - val timestamp: String, - val value: String?, - val parameters: Map? -) - -data class MeasurementData(val measurements: List) - -object HilltopXmlParsers { - - val xmlMapper = - XmlMapper.builder() - .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) - .addModule(KotlinModule.Builder().build()) - .build() - - fun isHilltopErrorXml(data: String): Boolean { - return xmlMapper.readTree(data).has("Error") - } - - fun parseMeasurementNames(data: String): HilltopMeasurementTypeXml { - return xmlMapper.readValue(data, HilltopMeasurementTypeXml::class.java) - } - - fun parseSiteMeasurementData(data: String): MeasurementData { - val hilltopXML = xmlMapper.readValue(data, HilltopMeasurementDataXml::class.java) - - val measurements = - hilltopXML.measurement.data.values.map { - val value: String? = it.value1 ?: it.value2 - Measurement( - it.timestamp, - value, - if (it.parameters != null) - it.parameters.map { param -> Pair(param.name, param.value) }.toMap() - else null) - } - - return MeasurementData(measurements) - } -} diff --git a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/messages/Messages.kt b/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/messages/Messages.kt deleted file mode 100644 index e018d7a6..00000000 --- a/packages/HilltopCrawler/src/main/kotlin/nz/govt/eop/messages/Messages.kt +++ /dev/null @@ -1,11 +0,0 @@ -package nz.govt.eop.messages - -import java.time.Instant - -data class HilltopDataMessage( - val councilId: Int, - val hilltopUrl: String, - val type: String, - val xml: String, - val at: Instant -) diff --git a/packages/HilltopCrawler/src/main/resources/application.yml b/packages/HilltopCrawler/src/main/resources/application.yml index 2fdf6933..76226771 100644 --- a/packages/HilltopCrawler/src/main/resources/application.yml +++ b/packages/HilltopCrawler/src/main/resources/application.yml @@ -1,18 +1,26 @@ spring: + application: + name: hilltop-crawler main: banner-mode: off kafka: bootstrap-servers: localhost:29092 producer: - key-serializer: org.apache.kafka.common.serialization.StringSerializer - value-serializer: org.springframework.kafka.support.serializer.JsonSerializer properties: max.request.size: 134217728 buffer.memory: 134217728 - spring: - json.add.type.headers: false compression-type: gzip + streams: + properties: + num.stream.threads: 8 + producer.max.request.size: 10485760 + producer.compression.type: gzip + default: + key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde + value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde + application-id: nz.govt.eop.consumers.hilltop-crawler + datasource: url: jdbc:postgresql://${CONFIG_DATABASE_HOST:localhost}:${CONFIG_DATABASE_PORT:5432}/${CONFIG_DATABASE_NAME:eop_dev} username: ${CONFIG_DATABASE_USERNAME:postgres} @@ -26,11 +34,15 @@ spring: user: ${CONFIG_DATABASE_MIGRATIONS_USERNAME:eop_hilltop_crawler_migrations_user} password: ${CONFIG_DATABASE_MIGRATIONS_PASSWORD:password} + task: + scheduling: + pool: + size: 10 + crawler: topicReplicas: 1 logging: level: root: INFO - nz.govt.eop: INFO - nz.govt.eop.hilltop_crawler.TaskProcessor: DEBUG + nz.govt.eop: DEBUG diff --git a/packages/HilltopCrawler/src/main/resources/db/migration/R__Hilltop_Sources.sql b/packages/HilltopCrawler/src/main/resources/db/migration/R__Hilltop_Sources.sql deleted file mode 100644 index ed4112a4..00000000 --- a/packages/HilltopCrawler/src/main/resources/db/migration/R__Hilltop_Sources.sql +++ /dev/null @@ -1,14 +0,0 @@ -INSERT INTO hilltop_sources (council_id, hts_url) -VALUES (1, 'http://hilltop.nrc.govt.nz/Data.hts'), - (5, 'http://hilltop.gdc.govt.nz/data.hts'), - (6, 'https://data.hbrc.govt.nz/Envirodata/EMAR.hts'), - (8, 'https://tsdata.horizons.govt.nz/hydrology.hts'), - (9, 'https://hilltop.gw.govt.nz/merged.hts'), - (12, 'http://data.wcrc.govt.nz:9083/data.hts'), - (16, 'https://envdata.tasman.govt.nz/data.hts'), - (17, 'http://envdata.nelson.govt.nz/data.hts'), - (18, 'https://hydro.marlborough.govt.nz/data.hts'), - (13, 'http://wateruse.ecan.govt.nz/WQAll.hts'), - (14, 'http://gisdata.orc.govt.nz/Hilltop//data.hts'), - (15, 'http://odp.es.govt.nz/data.hts') -ON CONFLICT DO NOTHING; diff --git a/packages/HilltopCrawler/src/main/resources/db/migration/V0001__Hilltop_Indexing.sql b/packages/HilltopCrawler/src/main/resources/db/migration/V0001__Hilltop_Indexing.sql index 9da4df24..7e20b14a 100644 --- a/packages/HilltopCrawler/src/main/resources/db/migration/V0001__Hilltop_Indexing.sql +++ b/packages/HilltopCrawler/src/main/resources/db/migration/V0001__Hilltop_Indexing.sql @@ -1,10 +1,11 @@ CREATE TABLE hilltop_sources ( - id SERIAL NOT NULL, - council_id INT NOT NULL, - hts_url VARCHAR NOT NULL, - created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), - updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + id SERIAL NOT NULL, + council_id INT NOT NULL, + hts_url VARCHAR NOT NULL, + configuration JSONB NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), PRIMARY KEY (id), UNIQUE (council_id, hts_url) ); @@ -12,15 +13,14 @@ CREATE TABLE hilltop_sources CREATE TABLE hilltop_fetch_tasks ( id SERIAL NOT NULL, - council_id INT NOT NULL, + source_id INT NOT NULL, request_type VARCHAR NOT NULL, - base_url VARCHAR NOT NULL, - query_params VARCHAR NOT NULL, - state VARCHAR NOT NULL, + fetch_url VARCHAR NOT NULL, next_fetch_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), previous_data_hash VARCHAR, + previous_history JSONB DEFAULT '[]'::JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), - updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), PRIMARY KEY (id), - UNIQUE (council_id, request_type, base_url, query_params) + UNIQUE (source_id, request_type, fetch_url), + FOREIGN KEY (source_id) REFERENCES hilltop_sources (id) ); diff --git a/packages/HilltopCrawler/src/main/resources/db/migration/V0002__Index_on_next_fetch.sql b/packages/HilltopCrawler/src/main/resources/db/migration/V0002__Index_on_next_fetch.sql new file mode 100644 index 00000000..d79beda7 --- /dev/null +++ b/packages/HilltopCrawler/src/main/resources/db/migration/V0002__Index_on_next_fetch.sql @@ -0,0 +1 @@ +CREATE INDEX ON hilltop_fetch_tasks (next_fetch_at); diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/HilltopCrawlerTestConfiguration.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/HilltopCrawlerTestConfiguration.kt new file mode 100644 index 00000000..87a4f7b3 --- /dev/null +++ b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/HilltopCrawlerTestConfiguration.kt @@ -0,0 +1,13 @@ +package nz.govt.eop.hilltop_crawler + +import nz.govt.eop.hilltop_crawler.fetcher.HilltopMessageClient +import org.mockito.Mockito +import org.springframework.boot.test.context.TestConfiguration +import org.springframework.context.annotation.Bean + +@TestConfiguration +class HilltopCrawlerTestConfiguration { + + @Bean + fun hilltopMessageClient(): HilltopMessageClient = Mockito.mock(HilltopMessageClient::class.java) +} diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopXmlParsersTest.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopXmlParsersTest.kt new file mode 100644 index 00000000..1df59e57 --- /dev/null +++ b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/api/parsers/HilltopXmlParsersTest.kt @@ -0,0 +1,181 @@ +package nz.govt.eop.hilltop_crawler.api.parsers + +import io.kotest.matchers.shouldBe +import org.junit.jupiter.api.Nested +import org.junit.jupiter.api.Test + +class HilltopXmlParsersTest { + + val underTest = HilltopXmlParsers() + + @Nested + inner class SitesResponse { + @Test + fun `should parse a site locations XML`() { + // GIVEN + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-list.xml")!!.readText() + + // WHEN + val result = underTest.parseSitesResponse(input) + + // THEN + result.agency shouldBe "Horizons" + result.projection shouldBe "NZMG" + result.sites shouldBe + arrayListOf( + HilltopSite("Wrens Creek at Graham Road", 2764950, 6100940), + HilltopSite("X Forest Rd Drain at Drop Structure", null, null)) + } + + @Test + fun `should parse a site locations XML content when there are no sites`() { + // GIVEN + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-empty.xml")!!.readText() + + // WHEN + val result = underTest.parseSitesResponse(input) + + // THEN + result.agency shouldBe "Horizons" + result.projection shouldBe "NZMG" + result.sites shouldBe arrayListOf() + } + + @Test + fun `should parse a site locations XML content when there is no projection`() { + // GIVEN + val input = + this.javaClass.getResource("/hilltop-xml/SitesResponse-no-projection.xml")!!.readText() + + // WHEN + val result = underTest.parseSitesResponse(input) + + // THEN + result.agency shouldBe "Horizons" + result.projection shouldBe null + result.sites shouldBe + arrayListOf( + HilltopSite("Wrens Creek at Graham Road", 2764950, 6100940), + HilltopSite("X Forest Rd Drain at Drop Structure", null, null)) + } + } + + @Nested + inner class MeasurementsResponse { + @Test + fun `should parse a site measurements response XML`() { + // GIVEN + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementsResponse-list.xml")!!.readText() + + // WHEN + val result = underTest.parseMeasurementsResponse(input) + + // THEN + result.datasources.size shouldBe 18 + result.datasources.map { it.name } shouldBe + arrayListOf( + "Rainfall", + "Voltage", + "SCADA Rainfall", + "SCADA Rainfall (backup)", + "Air Temperature (1.5m)", + "Relative Humidity", + "Campbell Signature", + "Campbell Software Version", + "Rainfall", + "Voltage", + "SCADA Rainfall", + "SCADA Rainfall (backup)", + "Air Temperature (1.5m)", + "Relative Humidity", + "Campbell Signature", + "Campbell Software Version", + "Rainfall", + "SCADA Rainfall") + + result.datasources.map { it.type } shouldBe + arrayListOf( + "StdSeries", + "StdSeries", + "StdSeries", + "StdSeries", + "StdSeries", + "StdSeries", + "StdSeries", + "StdSeries", + "StdQualSeries", + "StdQualSeries", + "StdQualSeries", + "StdQualSeries", + "StdQualSeries", + "StdQualSeries", + "StdQualSeries", + "StdQualSeries", + "CheckSeries", + "CheckSeries") + + result.datasources[0].measurements.size shouldBe 9 + + result.datasources[0].measurements[0].name shouldBe "Rainfall" + result.datasources[0].measurements[0].requestAs shouldBe "Rainfall [Rainfall]" + result.datasources[0].measurements[0].itemNumber shouldBe 1 + result.datasources[0].measurements[0].vm shouldBe null + + result.datasources[0].measurements[8].name shouldBe "Precipitation Total (Gap Series)" + result.datasources[0].measurements[8].requestAs shouldBe "Precipitation Total (Gap Series)" + result.datasources[0].measurements[8].itemNumber shouldBe 1 + result.datasources[0].measurements[8].vm shouldBe 1 + + result.datasources[17].measurements[3].name shouldBe "Comment" + result.datasources[17].measurements[3].requestAs shouldBe "Comment [SCADA Rainfall]" + result.datasources[17].measurements[3].itemNumber shouldBe 4 + result.datasources[17].measurements[3].vm shouldBe null + } + } + + @Nested + inner class MeasurementValuesResponse { + @Test + fun `should parse a site measurement values response XML`() { + // GIVEN + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementValuesResponse.xml")!!.readText() + + // WHEN + val result = underTest.parseMeasurementValuesResponse(input) + + // THEN + result.measurement!!.siteName shouldBe "R26/6804" + result.measurement!!.dataSource.measurementName shouldBe "Water Meter Volume" + } + } + + @Nested + inner class ErrorResponse { + @Test + fun `should parse a error response XML`() { + // GIVEN + val input = this.javaClass.getResource("/hilltop-xml/ErrorResponse.xml")!!.readText() + + // WHEN + val result = underTest.isHilltopErrorXml(input) + + // THEN + result shouldBe true + } + + @Test + fun `should parse a non-error response XML`() { + // GIVEN + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementsResponse-list.xml")!!.readText() + + // WHEN + val result = underTest.isHilltopErrorXml(input) + + // THEN + result shouldBe false + } + } +} diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTaskProcessorIntegrationTest.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTaskProcessorIntegrationTest.kt new file mode 100644 index 00000000..301c12f3 --- /dev/null +++ b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/FetchTaskProcessorIntegrationTest.kt @@ -0,0 +1,594 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import io.kotest.inspectors.forExactly +import io.kotest.inspectors.forOne +import io.kotest.matchers.collections.shouldHaveSize +import io.kotest.matchers.date.shouldBeAfter +import io.kotest.matchers.date.shouldBeBefore +import io.kotest.matchers.shouldBe +import java.net.URI +import java.time.Instant +import nz.govt.eop.hilltop_crawler.HilltopCrawlerTestConfiguration +import nz.govt.eop.hilltop_crawler.db.DB +import org.junit.jupiter.api.BeforeEach +import org.junit.jupiter.api.Test +import org.mockito.kotlin.* +import org.springframework.beans.factory.annotation.Autowired +import org.springframework.boot.test.context.SpringBootTest +import org.springframework.context.annotation.Import +import org.springframework.http.HttpMethod +import org.springframework.jdbc.core.JdbcTemplate +import org.springframework.jdbc.support.GeneratedKeyHolder +import org.springframework.jdbc.support.KeyHolder +import org.springframework.test.context.ActiveProfiles +import org.springframework.test.web.client.MockRestServiceServer +import org.springframework.test.web.client.match.MockRestRequestMatchers.method +import org.springframework.test.web.client.match.MockRestRequestMatchers.requestTo +import org.springframework.test.web.client.response.MockRestResponseCreators.* +import org.springframework.transaction.annotation.Transactional +import org.springframework.web.client.RestTemplate + +@ActiveProfiles("test") +@SpringBootTest +@Import(HilltopCrawlerTestConfiguration::class) +@Transactional +class FetchTaskProcessorIntegrationTest( + @Autowired val underTest: FetchTaskProcessor, + @Autowired val restTemplate: RestTemplate, + @Autowired val jdbcTemplate: JdbcTemplate, + @Autowired val mockKafka: HilltopMessageClient +) { + + private final val mockServer = MockRestServiceServer.createServer(restTemplate) + + @BeforeEach + fun resetMocks() { + reset(mockKafka) + } + + @Test + fun `should return false when no work to do`() { + // GIVEN + // The DB table empty + + // WHEN + val result = underTest.runNextTask() + + // THEN + result shouldBe false + } + + @Test + fun `should process most recent in the queue first`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, sourceId, "SITES_LIST", "http://example.com/1", "2021-01-01 00:10:00") + createFetchTask( + jdbcTemplate, sourceId, "SITES_LIST", "http://example.com/2", "2021-01-01 00:00:00") + + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-empty.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com/2")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + + tasks shouldHaveSize 2 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com/2") + it.previousDataHash shouldBe + "035676f636972e925c822016c1fd88fc179a7f2d6a8798144e186de833f12fc5" + it.nextFetchAt shouldBeAfter Instant.now() + } + + // Has created rows for the new sites + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com/1") + it.previousDataHash shouldBe null + it.nextFetchAt shouldBeBefore Instant.now() + } + } + + @Test + fun `should requeue a failed HTTP request for later`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, sourceId, "SITES_LIST", "http://example.com", "2021-01-01 00:00:00") + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withUnauthorizedRequest()) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.previousDataHash shouldBe null + it.nextFetchAt shouldBeAfter Instant.now() + } + } + + @Test + fun `should requeue a hilltop error response for later`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, sourceId, "SITES_LIST", "http://example.com", "2021-01-01 00:00:00") + + val input = this.javaClass.getResource("/hilltop-xml/ErrorResponse.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.previousDataHash shouldBe null + it.nextFetchAt shouldBeAfter Instant.now() + } + } + + @Test + fun `should requeue a invalid XML for later`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, sourceId, "SITES_LIST", "http://example.com", "2021-01-01 00:00:00") + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess("Horizons", null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.previousDataHash shouldBe null + it.nextFetchAt shouldBeAfter Instant.now() + } + } + + @Test + fun `should requeue a un-parsable XML for later`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, sourceId, "SITES_LIST", "http://example.com", "2021-01-01 00:00:00") + + val xml = + """ + + Horizons + 2303.2.2.47 + NZMG + + + + + """ + .trimIndent() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(xml, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.previousDataHash shouldBe null + it.nextFetchAt shouldBeAfter Instant.now() + } + } + + @Test + fun `should requeue and ignore content when response has same hash as previous message`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, + sourceId, + "SITES_LIST", + "http://example.com", + "2021-01-01 00:00:00", + "0fcafcc9533e521e53cad82226d44c832eca280e75dda23ffe5575b6563995c0") + + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-list.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 // Didn't create any new tasks + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.requestType shouldBe HilltopMessageType.SITES_LIST + it.previousDataHash shouldBe + "0fcafcc9533e521e53cad82226d44c832eca280e75dda23ffe5575b6563995c0" + it.nextFetchAt shouldBeAfter Instant.now() + } + } + + @Test + fun `should correctly process a site list message`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, + sourceId, + "SITES_LIST", + "http://example.com", + "2021-01-01 00:00:00", + ) + + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-list.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 3 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.requestType shouldBe HilltopMessageType.SITES_LIST + it.previousDataHash shouldBe + "0fcafcc9533e521e53cad82226d44c832eca280e75dda23ffe5575b6563995c0" + it.nextFetchAt shouldBeAfter Instant.now() + } + + // Has created rows for the new sites + tasks.forExactly(2) { + it.requestType shouldBe HilltopMessageType.MEASUREMENTS_LIST + it.nextFetchAt shouldBeBefore Instant.now() + } + + argumentCaptor().apply { + verify(mockKafka).send(capture()) + + firstValue.councilId shouldBe 1 + firstValue.type shouldBe HilltopMessageType.SITES_LIST + firstValue.hilltopBaseUrl shouldBe "http://example.com" + } + } + + @Test + fun `should correctly process a measurement list message`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, + sourceId, + "MEASUREMENTS_LIST", + "http://example.com", + "2021-01-01 00:00:00", + ) + + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementsResponse-list.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 2 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.requestType shouldBe HilltopMessageType.MEASUREMENTS_LIST + it.previousDataHash shouldBe + "e58323f66a24dfdc774756f608efba792b55deba4f8c1135c3aefee38f71f404" + it.nextFetchAt shouldBeAfter Instant.now() + } + + // Has created rows for the new sites + tasks.forOne { + it.requestType shouldBe HilltopMessageType.MEASUREMENT_DATA + it.nextFetchAt shouldBeBefore Instant.now() + it.fetchUri shouldBe + URI( + "http://example.com?Service=Hilltop&Request=GetData&Site=Ahuahu%20at%20Te%20Tuhi%20Junction&Measurement=Rainfall%20%5BRainfall%5D&from=2023-03-01T00:00&to=2023-03-31T23:59:59") + } + + argumentCaptor().apply { + verify(mockKafka).send(capture()) + + firstValue.councilId shouldBe 1 + firstValue.type shouldBe HilltopMessageType.MEASUREMENTS_LIST + firstValue.hilltopBaseUrl shouldBe "http://example.com" + firstValue.siteName shouldBe "Ahuahu at Te Tuhi Junction" + } + } + + @Test + fun `should correctly process a measurement values message`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, + sourceId, + "MEASUREMENT_DATA", + "http://example.com", + "2021-01-01 00:00:00", + ) + + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementValuesResponse.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.requestType shouldBe HilltopMessageType.MEASUREMENT_DATA + it.previousDataHash shouldBe + "c1b8652916a235b608818d3ce3efa0dc29517ba115b1de1b6221a487b4e696bc" + it.nextFetchAt shouldBeAfter Instant.now() + } + + argumentCaptor().apply { + verify(mockKafka).send(capture()) + + firstValue.councilId shouldBe 1 + firstValue.type shouldBe HilltopMessageType.MEASUREMENT_DATA + firstValue.hilltopBaseUrl shouldBe "http://example.com" + } + } + + @Test + fun `should process a measurement values message and not send to kafka if no measurements`() { + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, + sourceId, + "MEASUREMENT_DATA", + "http://example.com", + "2021-01-01 00:00:00", + ) + + val input = + """ + + + GWRC + + """ + .trimIndent() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + tasks shouldHaveSize 1 + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.requestType shouldBe HilltopMessageType.MEASUREMENT_DATA + it.previousDataHash shouldBe + "1ddfbd7f9a3d44bca5a6ae05d56f2016eb4d6e4292625fa550102dea10861ce6" + it.nextFetchAt shouldBeAfter Instant.now() + } + + verifyNoInteractions(mockKafka) + } + + @Test + fun `should reschedule a task when an unknown error occurs`() { + // For this faking that the kafka component threw an exception and just making sure the task + // still was rescheduled for later + + // GIVEN + val sourceId = createDefaultSource(jdbcTemplate) + + createFetchTask( + jdbcTemplate, + sourceId, + "SITES_LIST", + "http://example.com", + "2021-01-01 00:00:00", + ) + + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-list.xml")!!.readText() + + mockServer + .expect(requestTo("http://example.com")) + .andExpect(method(HttpMethod.GET)) + .andRespond(withSuccess(input, null)) + + doThrow(RuntimeException("Something went wrong")).whenever(mockKafka).send(any()) + + // WHEN + val result = underTest.runNextTask() + + // THEN + mockServer.verify() + + result shouldBe true + + val tasks = listTasksToProcess(jdbcTemplate) + + // Has updated the row we just fetched + tasks.forOne { + it.fetchUri shouldBe URI("http://example.com") + it.requestType shouldBe HilltopMessageType.SITES_LIST + it.previousDataHash shouldBe null + it.nextFetchAt shouldBeAfter Instant.now() + } + } +} + +fun listTasksToProcess(template: JdbcTemplate): List = + template.query( + """ + SELECT * + FROM hilltop_fetch_tasks + ORDER BY next_fetch_at, id + """ + .trimIndent()) { rs, _ -> + DB.HilltopFetchTaskRow( + rs.getInt("id"), + rs.getInt("source_id"), + HilltopMessageType.valueOf(rs.getString("request_type")), + rs.getTimestamp("next_fetch_at").toInstant(), + URI(rs.getString("fetch_url")), + rs.getString("previous_data_hash")) + } + +fun createDefaultSource(template: JdbcTemplate): Int { + val keyHolder: KeyHolder = GeneratedKeyHolder() + + template.update( + { connection -> + connection.prepareStatement( + """INSERT INTO hilltop_sources (council_id, hts_url, configuration) VALUES (1, 'http://example.com', '{"measurementNames": ["Rainfall"]}') RETURNING id""", + arrayOf("id")) + }, + keyHolder) + + return keyHolder.keys?.get("id") as Int +} + +fun createFetchTask( + template: JdbcTemplate, + sourceId: Int, + requestType: String, + url: String, + nextFetchAt: String, + previousDataHash: String? = null +) = + template.update( + """INSERT INTO hilltop_fetch_tasks (source_id, request_type, fetch_url, next_fetch_at, previous_data_hash) VALUES (?, ?, ?, ?::TIMESTAMP, ?)""", + sourceId, + requestType, + url, + nextFetchAt, + previousDataHash) diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/MessagesParsingTest.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/MessagesParsingTest.kt new file mode 100644 index 00000000..ed3b470f --- /dev/null +++ b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/MessagesParsingTest.kt @@ -0,0 +1,399 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import com.fasterxml.jackson.databind.ObjectMapper +import com.fasterxml.jackson.databind.SerializationFeature +import io.kotest.assertions.json.shouldEqualJson +import io.kotest.matchers.shouldBe +import java.time.Instant +import java.time.YearMonth +import org.junit.jupiter.api.Nested +import org.junit.jupiter.api.Test +import org.springframework.http.converter.json.Jackson2ObjectMapperBuilder +import org.springframework.test.context.ActiveProfiles + +@ActiveProfiles("test") +class MessagesParsingTest { + val objectMapper = + Jackson2ObjectMapperBuilder() + .featuresToDisable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS) + .build() + + @Nested + inner class HilltopMessageKeyParsing { + + @Test + fun `Should parse a HilltopSitesMessageKey from JSON`() { + // GIVEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "SITES_LIST", + "at": "2023-09-11T21:10:17.594098Z" + } + """ + .trimIndent() + + // WHEN + val result = objectMapper.readValue(json, HilltopMessageKey::class.java) + + // THEN + result shouldBe + HilltopSitesMessageKey( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:17.594098Z")) + } + + @Test + fun `Should parse a HilltopMeasurementListMessageKey from JSON`() { + // GIVEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENTS_LIST", + "at": "2023-09-11T21:10:18.034236Z", + "siteName": "259250/2" + } + """ + .trimIndent() + + // WHEN + val result = objectMapper.readValue(json, HilltopMessageKey::class.java) + + // THEN + result shouldBe + HilltopMeasurementListMessageKey( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:18.034236Z"), + "259250/2") + } + + @Test + fun `Should parse a HilltopMeasurementsMessageKey from JSON`() { + // GIVEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENT_DATA", + "at": "2023-09-11T21:11:56.876507Z", + "siteName": "292019/18", + "measurementName": "Water Meter Volume", + "yearMonth": "2022-01" + } + """ + .trimIndent() + + // WHEN + val result = objectMapper.readValue(json, HilltopMessageKey::class.java) + + // THEN + result shouldBe + HilltopMeasurementsMessageKey( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:11:56.876507Z"), + "292019/18", + "Water Meter Volume", + YearMonth.of(2022, 1)) + } + } + + @Nested + inner class HilltopMessageKeySerialization { + + @Test + fun `Should write a HilltopSitesMessageKey to JSON`() { + // GIVEN + val message = + HilltopSitesMessageKey( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:17.594098Z")) + + // WHEN + val result = objectMapper.writeValueAsString(message) + + // THEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "SITES_LIST", + "at": "2023-09-11T21:10:17.594098Z" + } + """ + .trimIndent() + + result shouldEqualJson json + } + + @Test + fun `Should write a HilltopMeasurementListMessageKey to JSON`() { + // GIVEN + val message = + HilltopMeasurementListMessageKey( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:18.034236Z"), + "259250/2") + + // WHEN + val result = objectMapper.writeValueAsString(message) + + // THEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENTS_LIST", + "at": "2023-09-11T21:10:18.034236Z", + "siteName": "259250/2" + } + """ + .trimIndent() + result shouldEqualJson json + } + + @Test + fun `Should write a HilltopMeasurementsMessageKey to JSON`() { + // GIVEN + val message = + HilltopMeasurementsMessageKey( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:11:56.876507Z"), + "292019/18", + "Water Meter Volume", + YearMonth.of(2022, 1)) + + // WHEN + val result = objectMapper.writeValueAsString(message) + // THEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENT_DATA", + "at": "2023-09-11T21:11:56.876507Z", + "siteName": "292019/18", + "measurementName": "Water Meter Volume", + "yearMonth": "2022-01" + } + """ + .trimIndent() + result shouldEqualJson json + } + } + + @Nested + inner class HilltopMessageParsing { + + @Test + fun `Should parse a HilltopSitesMessage from JSON`() { + // GIVEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "SITES_LIST", + "at": "2023-09-11T21:10:17.594098Z", + "hilltopUrl": "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=SiteList&Location=Yes", + "xml": "" + } + """ + .trimIndent() + + // WHEN + val result = objectMapper.readValue(json, HilltopMessage::class.java) + + // THEN + result shouldBe + HilltopSitesMessage( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:17.594098Z"), + "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=SiteList&Location=Yes", + "") + } + + @Test + fun `Should parse a HilltopMeasurementListMessage from JSON`() { + // GIVEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENTS_LIST", + "at": "2023-09-11T21:10:19.965935Z", + "siteName": "292068/12", + "hilltopUrl": "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=MeasurementList&Site=292068/12", + "xml": "" + } + """ + .trimIndent() + + // WHEN + val result = objectMapper.readValue(json, HilltopMessage::class.java) + + // THEN + result shouldBe + HilltopMeasurementListMessage( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:19.965935Z"), + "292068/12", + "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=MeasurementList&Site=292068/12", + "") + } + + @Test + fun `Should parse a HilltopMeasurementsMessage from JSON`() { + // GIVEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENT_DATA", + "at": "2023-09-11T21:11:56.973761Z", + "siteName": "292019/18", + "measurementName": "Water Meter Volume", + "yearMonth": "2022-03", + "hilltopUrl": "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=GetData&Site=292019/18&Measurement=Water%20Meter%20Volume&from=2022-03-01T00:00&to=2022-03-31T23:59:59", + "xml": "" + } + """ + .trimIndent() + + // WHEN + val result = objectMapper.readValue(json, HilltopMessage::class.java) + + // THEN + result shouldBe + HilltopMeasurementsMessage( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:11:56.973761Z"), + "292019/18", + "Water Meter Volume", + YearMonth.of(2022, 3), + "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=GetData&Site=292019/18&Measurement=Water%20Meter%20Volume&from=2022-03-01T00:00&to=2022-03-31T23:59:59", + "") + } + } + + @Nested + inner class HilltopMessageSerialization { + + @Test + fun `Should write a HilltopSitesMessage to JSON`() { + // GIVEN + val message = + HilltopSitesMessage( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:17.594098Z"), + "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=SiteList&Location=Yes", + "") + + // WHEN + val result = objectMapper.writeValueAsString(message) + + // THEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "SITES_LIST", + "at": "2023-09-11T21:10:17.594098Z", + "hilltopUrl": "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=SiteList&Location=Yes", + "xml": "" + } + """ + .trimIndent() + + result shouldEqualJson json + } + + @Test + fun `Should write a HilltopMeasurementListMessage to JSON`() { + // GIVEN + val message = + HilltopMeasurementListMessage( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:10:19.965935Z"), + "292068/12", + "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=MeasurementList&Site=292068/12", + "") + // WHEN + val result = objectMapper.writeValueAsString(message) + + // THEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENTS_LIST", + "at": "2023-09-11T21:10:19.965935Z", + "siteName": "292068/12", + "hilltopUrl": "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=MeasurementList&Site=292068/12", + "xml": "" + } + """ + .trimIndent() + + result shouldEqualJson json + } + + @Test + fun `Should write a HilltopMeasurementsMessage to JSON`() { + // GIVEN + val message = + HilltopMeasurementsMessage( + 9, + "https://hilltop.gw.govt.nz/WaterUse.hts", + Instant.parse("2023-09-11T21:11:56.973761Z"), + "292019/18", + "Water Meter Volume", + YearMonth.of(2022, 3), + "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=GetData&Site=292019/18&Measurement=Water%20Meter%20Volume&from=2022-03-01T00:00&to=2022-03-31T23:59:59", + "") + + // WHEN + val result = objectMapper.writeValueAsString(message) + + // THEN + val json = + """ + { + "councilId": 9, + "hilltopBaseUrl": "https://hilltop.gw.govt.nz/WaterUse.hts", + "type": "MEASUREMENT_DATA", + "at": "2023-09-11T21:11:56.973761Z", + "siteName": "292019/18", + "measurementName": "Water Meter Volume", + "yearMonth": "2022-03", + "hilltopUrl": "https://hilltop.gw.govt.nz/WaterUse.hts?Service=Hilltop&Request=GetData&Site=292019/18&Measurement=Water%20Meter%20Volume&from=2022-03-01T00:00&to=2022-03-31T23:59:59", + "xml": "" + } + """ + .trimIndent() + result shouldEqualJson json + } + } +} diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/TaskMappersTests.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/TaskMappersTests.kt new file mode 100644 index 00000000..3c809d1a --- /dev/null +++ b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/fetcher/TaskMappersTests.kt @@ -0,0 +1,561 @@ +package nz.govt.eop.hilltop_crawler.fetcher + +import io.kotest.matchers.collections.shouldHaveSize +import io.kotest.matchers.date.shouldBeAfter +import io.kotest.matchers.date.shouldBeBefore +import io.kotest.matchers.shouldBe +import java.net.URI +import java.time.Instant +import java.time.YearMonth +import java.time.ZoneOffset +import nz.govt.eop.hilltop_crawler.api.parsers.* +import nz.govt.eop.hilltop_crawler.db.DB +import org.junit.jupiter.api.Nested +import org.junit.jupiter.api.Test +import org.springframework.test.context.ActiveProfiles + +@ActiveProfiles("test") +class TaskMappersTests { + + @Nested + inner class SitesListTaskMapperTests { + + private fun createRecordForTesting(sites: List = emptyList()) = + SitesListTaskMapper( + DB.HilltopSourcesRow( + 1, + 1, + "http://some.url", + DB.HilltopSourceConfig(emptyList(), listOf("some ignored site"))), + URI("http://some.uri?foo=bar"), + Instant.parse("2000-01-01T00:00:00Z"), + "some content", + HilltopSites("some agency", "some projection", sites)) + + @Test + fun `should return empty ist of tasks when there are no sites for determine next tasks`() { + // GIVEN + val underTest = createRecordForTesting() + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldHaveSize 0 + } + + @Test + fun `should return list of tasks mapped from sites for determine next tasks`() { + // GIVEN + val underTest = createRecordForTesting(listOf(HilltopSite("Some Site", null, null))) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldBe + listOf( + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENTS_LIST, + "http://some.uri?Service=Hilltop&Request=MeasurementList&Site=Some%20Site", + )) + } + + @Test + fun `should ignore sites listed in config for determine next tasks`() { + // GIVEN + val underTest = createRecordForTesting(listOf(HilltopSite("some ignored site", null, null))) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldHaveSize 0 + } + + @Test + fun `should return Kafka message`() { + + // GIVEN + val underTest = createRecordForTesting() + + // WHEN + val result = underTest.buildKafkaMessage() + + // THEN + result shouldBe + HilltopSitesMessage( + 1, + "http://some.uri", + Instant.parse("2000-01-01T00:00:00Z"), + "http://some.uri?foo=bar", + "some content", + ) + } + + @Test + fun `should return next fetch at in next month`() { + // GIVEN + val underTest = createRecordForTesting() + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-01T00:00:00Z") + result shouldBeBefore Instant.parse("2000-01-31T00:00:00Z") + } + } + + @Nested + inner class MeasurementsListTaskMapperTests { + + private fun createRecordForTesting() = + createRecordForTesting( + listOf( + HilltopDatasource( + "some datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf(HilltopMeasurement("some measurement name", "some request as", 1, 1))))) + + private fun createRecordForTesting(dataSources: List) = + MeasurementsListTaskMapper( + DB.HilltopSourcesRow( + 1, + 1, + "http://some.url", + DB.HilltopSourceConfig( + listOf("some datasource name", "another datasource name"), emptyList())), + URI("http://some.uri?foo=bar"), + Instant.parse("2000-01-01T00:00:00Z"), + "some content", + HilltopMeasurements(dataSources)) + + @Test + fun `should return list of tasks mapped from measurements for determine next tasks`() { + // GIVEN + val underTest = + createRecordForTesting( + listOf( + HilltopDatasource( + "some datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf( + HilltopMeasurement("some datasource name", "some datasource name", 1))))) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldBe + listOf( + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-01-01T00:00&to=2000-01-31T23:59:59", + )) + } + + @Test + fun `should return list of tasks mapped from all measurements for determine next tasks`() { + // GIVEN + val underTest = + createRecordForTesting( + listOf( + HilltopDatasource( + "some datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf( + HilltopMeasurement("some datasource name", "some datasource name", 1))), + HilltopDatasource( + "another datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf( + HilltopMeasurement( + "another datasource name", "another datasource name", 1))), + )) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldBe + listOf( + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-01-01T00:00&to=2000-01-31T23:59:59", + ), + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=another%20datasource%20name&from=2000-01-01T00:00&to=2000-01-31T23:59:59", + )) + } + + @Test + fun `should return list of tasks mapped from measurements split by month for determine next tasks`() { + // GIVEN + val underTest = + createRecordForTesting( + listOf( + HilltopDatasource( + "some datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-06-01T00:00:00Z", + "StdSeries", + listOf( + HilltopMeasurement("some datasource name", "some datasource name", 1))), + )) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldBe + listOf( + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-01-01T00:00&to=2000-01-31T23:59:59", + ), + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-02-01T00:00&to=2000-02-29T23:59:59", + ), + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-03-01T00:00&to=2000-03-31T23:59:59", + ), + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-04-01T00:00&to=2000-04-30T23:59:59", + ), + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-05-01T00:00&to=2000-05-31T23:59:59", + ), + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=some%20datasource%20name&from=2000-06-01T00:00&to=2000-06-30T23:59:59", + ), + ) + } + + @Test + fun `should return list of tasks mapped from measurements for determine next tasks using the requestAs name in the URL`() { + // GIVEN + val underTest = + createRecordForTesting( + listOf( + HilltopDatasource( + "some datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf( + HilltopMeasurement("some datasource name", "check me out like this", 1))), + )) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldBe + listOf( + DB.HilltopFetchTaskCreate( + 1, + HilltopMessageType.MEASUREMENT_DATA, + "http://some.uri?Service=Hilltop&Request=GetData&Site=some%20site%20name&Measurement=check%20me%20out%20like%20this&from=2000-01-01T00:00&to=2000-01-31T23:59:59", + ), + ) + } + + @Test + fun `should handle when message no Measurements which are not Virtual and have the same name as the datasource`() { + // GIVEN + val underTest = + createRecordForTesting( + listOf( + HilltopDatasource( + "some datasource name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf( + HilltopMeasurement( + "some datasource name", "check me out like this", 1, 1))), + )) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldHaveSize 0 + } + + @Test + fun `should ignore measurement names that are excluded via source config for determine next tasks`() { + // GIVEN + val underTest = + createRecordForTesting( + listOf( + HilltopDatasource( + "some ignored name", + "some site name", + "2000-01-01T00:00:00Z", + "2000-01-01T00:00:00Z", + "StdSeries", + listOf(HilltopMeasurement("some name", "some request as", 1, 1))))) + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldHaveSize 0 + } + + @Test + fun `should return Kafka message`() { + + // GIVEN + val underTest = createRecordForTesting() + + // WHEN + val result = underTest.buildKafkaMessage() + + // THEN + result shouldBe + HilltopMeasurementListMessage( + 1, + "http://some.uri", + Instant.parse("2000-01-01T00:00:00Z"), + "some site name", + "http://some.uri?foo=bar", + "some content", + ) + } + + @Test + fun `should return next fetch at in next month`() { + // GIVEN + val underTest = createRecordForTesting() + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-01T00:00:00Z") + result shouldBeBefore Instant.parse("2000-01-31T00:00:00Z") + } + } + + @Nested + inner class MeasurementDataTaskMapperTests { + + private fun createRecordForTesting( + fetchedAtString: String, + lastValueAtString: String + ): MeasurementDataTaskMapper { + + val fetchedAt = Instant.parse(fetchedAtString) + val timestampInPlus12Time = + Instant.parse(lastValueAtString) + .atOffset(ZoneOffset.of("+12")) + .toString() + .substring(0, 16) + + return MeasurementDataTaskMapper( + DB.HilltopSourcesRow( + 1, 1, "http://some.url", DB.HilltopSourceConfig(emptyList(), emptyList())), + URI("http://some.uri?foo=bar"), + fetchedAt, + "some content", + HilltopMeasurementValues( + Measurement( + "some site name", + DataSource("some measurement name"), + Data( + "some name", + listOf( + Value(timestampInPlus12Time, "1.0", null), + ))))) + } + + @Test + fun `should return empty list for determine next tasks`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-01T00:00:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.buildNewTasksList() + + // THEN + result shouldHaveSize 0 + } + + @Test + fun `should return null for Kafka message when there are no measurements`() { + // GIVEN + val fetchedAt = Instant.parse("2000-01-01T00:00:00Z") + + val underTest = + MeasurementDataTaskMapper( + DB.HilltopSourcesRow( + 1, 1, "http://some.url", DB.HilltopSourceConfig(emptyList(), emptyList())), + URI("http://some.uri"), + fetchedAt, + "some content", + HilltopMeasurementValues(null)) + + // WHEN + val result = underTest.buildKafkaMessage() + + // THEN + result shouldBe null + } + + @Test + fun `should return Kafka message when there are measurements`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-01T00:20:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.buildKafkaMessage() + + // THEN + result shouldBe + HilltopMeasurementsMessage( + 1, + "http://some.uri", + Instant.parse("2000-01-01T00:20:00Z"), + "some site name", + "some measurement name", + YearMonth.of(2000, 1), + "http://some.uri?foo=bar", + "some content", + ) + } + + @Test + fun `should return next fetch at in the next 30 minutes when last value was recent`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-01T00:20:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-01T00:20:00Z") + result shouldBeBefore Instant.parse("2000-01-01T00:50:00Z") + } + + @Test + fun `should return next fetch at least 15 minutes after last value`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-01T00:05:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-01T00:15:00Z") + result shouldBeBefore Instant.parse("2000-01-01T00:45:00Z") + } + + @Test + fun `should return next fetch at in the next hour when last value was within a day`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-01T23:00:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-01T23:00:00Z") + result shouldBeBefore Instant.parse("2000-01-02T00:00:00Z") + } + + @Test + fun `should return next fetch at in the next day when last value was within a week`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-07T00:00:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-07T00:00:00Z") + result shouldBeBefore Instant.parse("2000-01-08T00:00:00Z") + } + + @Test + fun `should return next fetch at in the next week when last value was within a month`() { + // GIVEN + val underTest = createRecordForTesting("2000-01-27T00:00:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-27T00:00:00Z") + result shouldBeBefore Instant.parse("2000-02-03T00:00:00Z") + } + + @Test + fun `should return next fetch at in next month when the last measurement is a long time ago`() { + // GIVEN + val underTest = createRecordForTesting("2020-01-01T00:00:00Z", "2000-01-01T00:00:00Z") + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2020-01-01T00:00:00Z") + result shouldBeBefore Instant.parse("2020-01-31T00:00:00Z") + } + + @Test + fun `should return next fetch at in next month when there is no measurements`() { + // GIVEN + val fetchedAt = Instant.parse("2000-01-01T00:00:00Z") + + val underTest = + MeasurementDataTaskMapper( + DB.HilltopSourcesRow( + 1, 1, "http://some.url", DB.HilltopSourceConfig(emptyList(), emptyList())), + URI("http://some.uri"), + fetchedAt, + "some content", + HilltopMeasurementValues(null)) + + // WHEN + val result = underTest.determineNextFetchAt() + + // THEN + result shouldBeAfter Instant.parse("2000-01-01T00:00:00Z") + result shouldBeBefore Instant.parse("2000-01-31T00:00:00Z") + } + } +} diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationsProducerTest.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationsProducerTest.kt new file mode 100644 index 00000000..8e0c2810 --- /dev/null +++ b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/producer/ObservationsProducerTest.kt @@ -0,0 +1,140 @@ +package nz.govt.eop.hilltop_crawler.producer + +import com.fasterxml.jackson.databind.ObjectMapper +import io.kotest.matchers.collections.shouldContain +import io.kotest.matchers.collections.shouldHaveSize +import java.math.BigDecimal +import java.time.Instant +import java.time.OffsetDateTime +import java.time.YearMonth +import java.util.* +import nz.govt.eop.hilltop_crawler.HILLTOP_RAW_DATA_TOPIC_NAME +import nz.govt.eop.hilltop_crawler.HilltopCrawlerTestConfiguration +import nz.govt.eop.hilltop_crawler.OUTPUT_DATA_TOPIC_NAME +import nz.govt.eop.hilltop_crawler.api.parsers.HilltopXmlParsers +import nz.govt.eop.hilltop_crawler.fetcher.* +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.TestInputTopic +import org.apache.kafka.streams.TestOutputTopic +import org.apache.kafka.streams.TopologyTestDriver +import org.junit.jupiter.api.Test +import org.springframework.beans.factory.annotation.Autowired +import org.springframework.boot.test.context.SpringBootTest +import org.springframework.context.annotation.Import +import org.springframework.kafka.config.TopicBuilder +import org.springframework.kafka.support.serializer.JsonSerde +import org.springframework.test.context.ActiveProfiles + +@ActiveProfiles("test") +@SpringBootTest +@Import(HilltopCrawlerTestConfiguration::class) +class ObservationsProducerTest(@Autowired val objectMapper: ObjectMapper) { + + private final val inputTopic: TestInputTopic + private final val outputTopic: TestOutputTopic + + init { + val streamsBuilder = StreamsBuilder() + + ObservationProducer( + HilltopXmlParsers(), + objectMapper, + TopicBuilder.name(HILLTOP_RAW_DATA_TOPIC_NAME).build(), + TopicBuilder.name(OUTPUT_DATA_TOPIC_NAME).build()) + .buildPipeline(streamsBuilder) + val topology = streamsBuilder.build() + val driver = TopologyTestDriver(topology, Properties()) + inputTopic = + driver.createInputTopic( + HILLTOP_RAW_DATA_TOPIC_NAME, + JsonSerde(HilltopMessageKey::class.java).noTypeInfo().serializer(), + JsonSerde(HilltopMessage::class.java).noTypeInfo().serializer()) + + outputTopic = + driver.createOutputTopic( + OUTPUT_DATA_TOPIC_NAME, + JsonSerde(ObservationMessageKey::class.java).noTypeInfo().deserializer(), + JsonSerde(ObservationMessage::class.java).noTypeInfo().deserializer()) + } + + @Test + fun `should map sites list XML to a series of SiteDetailsMessage's`() { + + // GIVEN + val input = this.javaClass.getResource("/hilltop-xml/SitesResponse-list.xml")!!.readText() + + val message = + HilltopSitesMessage( + 555, "http://example.com", Instant.now(), "http://hilltop.example.com/some/path", input) + + // WHEN + inputTopic.pipeInput(message.toKey(), message) + + // THEN + val records = outputTopic.readRecordsToList() + records shouldHaveSize 2 + val recordValues = records.map { it.value() } + + recordValues shouldContain + SiteDetailsMessage(555, "Wrens Creek at Graham Road", Location(2764950, 6100940)) + recordValues shouldContain SiteDetailsMessage(555, "X Forest Rd Drain at Drop Structure", null) + } + + @Test + fun `should ignore a measurement list message`() { + // GIVEN + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementsResponse-list.xml")!!.readText() + + val message = + HilltopMeasurementListMessage( + 555, + "http://hilltop.example.com", + Instant.now(), + "SOME_SITE_NAME", + "http://hilltop.example.com/some/path", + input) + + // WHEN + inputTopic.pipeInput(message.toKey(), message) + + // THEN + val records = outputTopic.readRecordsToList() + records shouldHaveSize 0 + } + + @Test + fun `should map a measurement message to an ObservationDataMessage`() { + // GIVEN + val input = + this.javaClass.getResource("/hilltop-xml/MeasurementValuesResponse.xml")!!.readText() + + val message = + HilltopMeasurementsMessage( + 555, + "http://example.com", + Instant.now(), + "SOME_SITE_NAME", + "SOME_MEASUREMENT_NAME", + YearMonth.of(2023, 8), + "http://hilltop.example.com/some/path", + input) + + // WHEN + inputTopic.pipeInput(message.toKey(), message) + + // THEN + val records = outputTopic.readRecordsToList() + records shouldHaveSize 1 + + val recordValues = records.map { it.value() } + recordValues shouldContain + ObservationDataMessage( + 555, + "R26/6804", + "Water Meter Volume", + listOf( + Observation(OffsetDateTime.parse("2023-08-29T12:00Z"), BigDecimal.valueOf(0)), + Observation(OffsetDateTime.parse("2023-08-30T12:00Z"), BigDecimal.valueOf(100)))) + } +} diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopMeasurementListParserTest.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopMeasurementListParserTest.kt deleted file mode 100644 index c4ad61a8..00000000 --- a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopMeasurementListParserTest.kt +++ /dev/null @@ -1,18 +0,0 @@ -package nz.govt.eop.hilltop_crawler.support - -import org.junit.jupiter.api.Test - -class HilltopMeasurementListParserTest { - - @Test - fun `should parse a site locations XML content`() { - // GIVEN - val input = this.javaClass.getResource("/hilltop-xml/measurementlist.xml")!!.readText() - - // WHEN - HilltopXmlParsers.parseMeasurementNames(input) - - // THEN - // result.size shouldBe 10 - } -} diff --git a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopSitesParserTest.kt b/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopSitesParserTest.kt deleted file mode 100644 index 8d2810fe..00000000 --- a/packages/HilltopCrawler/src/test/kotlin/nz/govt/eop/hilltop_crawler/support/HilltopSitesParserTest.kt +++ /dev/null @@ -1,62 +0,0 @@ -package nz.govt.eop.hilltop_crawler.support - -import io.kotest.matchers.shouldBe -import org.junit.jupiter.api.Test - -class HilltopSitesParserTest { - - @Test - fun `should parse a site locations XML content`() { - // GIVEN - val input = this.javaClass.getResource("/hilltop-xml/sitelist.xml")!!.readText() - - // WHEN - val result = HilltopSitesParser.parseSites(input) - - // THEN - result.agency shouldBe "Horizons" - result.projection shouldBe "NZMG" - result.sites shouldBe - arrayListOf( - HilltopSiteXml("Wrens Creek at Graham Road", 2764950, 6100940), - HilltopSiteXml("X Forest Rd Drain at Drop Structure", null, null)) - - result.validSites() shouldBe - arrayListOf(HilltopSiteXml("Wrens Creek at Graham Road", 2764950, 6100940)) - } - - @Test - fun `should parse a site locations XML content when there are no sites`() { - // GIVEN - val input = this.javaClass.getResource("/hilltop-xml/sitelist-empty.xml")!!.readText() - - // WHEN - val result = HilltopSitesParser.parseSites(input) - - // THEN - result.agency shouldBe "Horizons" - result.projection shouldBe "NZMG" - result.sites shouldBe arrayListOf() - result.validSites() shouldBe arrayListOf() - } - - @Test - fun `should parse a site locations XML content when there is no projection`() { - // GIVEN - val input = this.javaClass.getResource("/hilltop-xml/sitelist-no-projection.xml")!!.readText() - - // WHEN - val result = HilltopSitesParser.parseSites(input) - - // THEN - result.agency shouldBe "Horizons" - result.projection shouldBe null - result.sites shouldBe - arrayListOf( - HilltopSiteXml("Wrens Creek at Graham Road", 2764950, 6100940), - HilltopSiteXml("X Forest Rd Drain at Drop Structure", null, null)) - - result.validSites() shouldBe - arrayListOf(HilltopSiteXml("Wrens Creek at Graham Road", 2764950, 6100940)) - } -} diff --git a/packages/HilltopCrawler/src/test/resources/application-test.yml b/packages/HilltopCrawler/src/test/resources/application-test.yml new file mode 100644 index 00000000..66146740 --- /dev/null +++ b/packages/HilltopCrawler/src/test/resources/application-test.yml @@ -0,0 +1,16 @@ +spring: + datasource: + url: jdbc:postgresql://${CONFIG_DATABASE_HOST:localhost}:${CONFIG_DATABASE_PORT:5432}/eop_test + username: postgres + password: password + + flyway: + schemas: hilltop_crawler + locations: classpath:/db/migration + user: postgres + password: password + +logging: + level: + root: WARN + nz.govt.eop: INFO diff --git a/packages/HilltopCrawler/src/test/resources/graphql-documents/observation.graphql b/packages/HilltopCrawler/src/test/resources/graphql-documents/observation.graphql deleted file mode 100644 index 23b11924..00000000 --- a/packages/HilltopCrawler/src/test/resources/graphql-documents/observation.graphql +++ /dev/null @@ -1,3 +0,0 @@ -mutation($request: CreateObservationRequest!) { - createObservation(request: $request) -} \ No newline at end of file diff --git a/packages/HilltopCrawler/src/test/resources/hilltop-xml/ErrorResponse.xml b/packages/HilltopCrawler/src/test/resources/hilltop-xml/ErrorResponse.xml new file mode 100644 index 00000000..6bb0a5e0 --- /dev/null +++ b/packages/HilltopCrawler/src/test/resources/hilltop-xml/ErrorResponse.xml @@ -0,0 +1,4 @@ + + + No Data Foo + diff --git a/packages/HilltopCrawler/src/test/resources/hilltop-xml/MeasurementValuesResponse.xml b/packages/HilltopCrawler/src/test/resources/hilltop-xml/MeasurementValuesResponse.xml new file mode 100644 index 00000000..d3e55025 --- /dev/null +++ b/packages/HilltopCrawler/src/test/resources/hilltop-xml/MeasurementValuesResponse.xml @@ -0,0 +1,27 @@ + + + GWRC + + + StdSeries + SimpleTimeSeries + Incremental + + Water Meter Volume + F + m3 + ######.# + + + + + 2023-08-30T00:00:00 + 0 + + + 2023-08-31T00:00:00 + 100 + + + + diff --git a/packages/HilltopCrawler/src/test/resources/hilltop-xml/measurementlist.xml b/packages/HilltopCrawler/src/test/resources/hilltop-xml/MeasurementsResponse-list.xml similarity index 99% rename from packages/HilltopCrawler/src/test/resources/hilltop-xml/measurementlist.xml rename to packages/HilltopCrawler/src/test/resources/hilltop-xml/MeasurementsResponse-list.xml index 85817a1e..940e8c81 100644 --- a/packages/HilltopCrawler/src/test/resources/hilltop-xml/measurementlist.xml +++ b/packages/HilltopCrawler/src/test/resources/hilltop-xml/MeasurementsResponse-list.xml @@ -7,7 +7,7 @@ SimpleTimeSeries Incremental 0 - 2018-02-13T17:48:00 + 2023-03-13T17:48:00 2023-03-07T09:42:00 1 @@ -611,4 +611,4 @@ ### - \ No newline at end of file + diff --git a/packages/HilltopCrawler/src/test/resources/hilltop-xml/sitelist-empty.xml b/packages/HilltopCrawler/src/test/resources/hilltop-xml/SitesResponse-empty.xml similarity index 100% rename from packages/HilltopCrawler/src/test/resources/hilltop-xml/sitelist-empty.xml rename to packages/HilltopCrawler/src/test/resources/hilltop-xml/SitesResponse-empty.xml diff --git a/packages/HilltopCrawler/src/test/resources/hilltop-xml/sitelist.xml b/packages/HilltopCrawler/src/test/resources/hilltop-xml/SitesResponse-list.xml similarity index 100% rename from packages/HilltopCrawler/src/test/resources/hilltop-xml/sitelist.xml rename to packages/HilltopCrawler/src/test/resources/hilltop-xml/SitesResponse-list.xml diff --git a/packages/HilltopCrawler/src/test/resources/hilltop-xml/sitelist-no-projection.xml b/packages/HilltopCrawler/src/test/resources/hilltop-xml/SitesResponse-no-projection.xml similarity index 100% rename from packages/HilltopCrawler/src/test/resources/hilltop-xml/sitelist-no-projection.xml rename to packages/HilltopCrawler/src/test/resources/hilltop-xml/SitesResponse-no-projection.xml diff --git a/packages/HilltopCrawler/src/test/resources/logback-test.xml b/packages/HilltopCrawler/src/test/resources/logback-test.xml index 0281f0db..fb7583ef 100644 --- a/packages/HilltopCrawler/src/test/resources/logback-test.xml +++ b/packages/HilltopCrawler/src/test/resources/logback-test.xml @@ -1,6 +1,5 @@ - @@ -17,12 +16,10 @@ - - - + - \ No newline at end of file + diff --git a/packages/Manager/src/main/kotlin/nz/govt/eop/plan_limits/Manifest.kt b/packages/Manager/src/main/kotlin/nz/govt/eop/plan_limits/Manifest.kt index f241b8f4..88049b00 100644 --- a/packages/Manager/src/main/kotlin/nz/govt/eop/plan_limits/Manifest.kt +++ b/packages/Manager/src/main/kotlin/nz/govt/eop/plan_limits/Manifest.kt @@ -23,6 +23,7 @@ class Manifest(val queries: Queries, val context: DSLContext) { // for individual queries cause errors update(9) } + @CachePut(cacheNames = [MANIFEST_CACHE_KEY]) fun update(councilId: Int): Map { return generate(councilId) diff --git a/packages/Manager/src/main/resources/db/migration/R__effective_daily_consents.sql b/packages/Manager/src/main/resources/db/migration/R__effective_daily_consents.sql new file mode 100644 index 00000000..9c135c42 --- /dev/null +++ b/packages/Manager/src/main/resources/db/migration/R__effective_daily_consents.sql @@ -0,0 +1,30 @@ +create or replace view effective_daily_consents as + +with all_consents as ( + select distinct source_id from water_allocations +), + +days_in_last_year as ( + select GENERATE_SERIES(DATE_TRUNC('day', now()) - INTERVAL '1 YEAR', DATE_TRUNC('day', now()) - INTERVAL '1 DAY', INTERVAL '1 DAY') as effective_on +), +data_per_day as ( + select * from all_consents cross join days_in_last_year +), + +effective_daily_data as ( + select dpd.source_id, dpd.effective_on, wa.area_id, wa.allocation, wa.consent_id, wa.status, wa.is_metered, wa.metered_allocation_daily, wa.metered_allocation_yearly, wa.meters + from data_per_day dpd + left join lateral + ( + select * from + water_allocations wai + where wai.source_id = dpd.source_id + and date(dpd.effective_on) >= date(wai.effective_from) + and (wai.effective_to is null or date(dpd.effective_on) < date(wai.effective_to)) + order by wai.effective_from desc + limit 1 + ) wa + on wa.source_id = dpd.source_id +) + +select * from effective_daily_data diff --git a/packages/Manager/src/main/resources/db/migration/R__water_allocation_and_usage_by_area.sql b/packages/Manager/src/main/resources/db/migration/R__water_allocation_and_usage_by_area.sql new file mode 100644 index 00000000..47100cbd --- /dev/null +++ b/packages/Manager/src/main/resources/db/migration/R__water_allocation_and_usage_by_area.sql @@ -0,0 +1,76 @@ +create or replace view water_allocation_and_usage_by_area as + +with all_days as ( + select distinct date(effective_on) as effective_on from effective_daily_consents +), +all_areas as ( + select distinct + area_id, + 0 as allocation, + 0 as metered_allocation_daily, + 0 as metered_allocation_yearly, + false as is_metered, + '{}'::varchar[] as meters, + 'inactive' as status + from effective_daily_consents + where area_id is not null +), +data_per_day as ( + select * from all_days cross join all_areas +), +effective_consents_with_defaults as ( + select * from data_per_day + union + select + date(effective_on), + area_id, + allocation, + metered_allocation_daily, + metered_allocation_yearly, + is_metered, + meters, + status + from effective_daily_consents + where area_id is not null and status = 'active' +), +observed_water_use_with_sites as ( + select owu.*, os.name as site_name + from observed_water_use_aggregated_daily owu + inner join observation_sites os on os.id = owu.site_id +), +expanded_meters_per_area as + (select effective_on, area_id, UNNEST(meters) as meter from effective_consents_with_defaults where is_metered = true), +meter_use_by_area as (select effective_on, + area_id, + meter, + use.daily_usage + from expanded_meters_per_area + left join observed_water_use_with_sites use on effective_on = day_observed_at and meter = site_name), +total_daily_use_by_area as (select area_id, + effective_on as date, + SUM(daily_usage) as daily_usage + from meter_use_by_area + group by area_id, effective_on), +total_daily_allocation_by_area as ( + select + area_id, + effective_on as date, + SUM(allocation) as allocation, + SUM(metered_allocation_daily) as allocation_daily, + SUM(metered_allocation_yearly) as metered_allocation_yearly + from effective_consents_with_defaults group by 1, 2 + order by date +), +allocated_joined_with_use AS ( + select + area_id, + date, + allocation, + allocation_daily, + metered_allocation_yearly, + coalesce(daily_usage, 0) as daily_usage + from total_daily_allocation_by_area + left join total_daily_use_by_area using (area_id, date) +) +SELECT * +FROM allocated_joined_with_use \ No newline at end of file diff --git a/packages/Manager/src/test/kotlin/nz/govt/eop/consumers/WaterAllocationConsumerTest.kt b/packages/Manager/src/test/kotlin/nz/govt/eop/consumers/WaterAllocationConsumerTest.kt index ceb42878..862bd732 100644 --- a/packages/Manager/src/test/kotlin/nz/govt/eop/consumers/WaterAllocationConsumerTest.kt +++ b/packages/Manager/src/test/kotlin/nz/govt/eop/consumers/WaterAllocationConsumerTest.kt @@ -97,6 +97,7 @@ class WaterAllocationConsumerTest(@Autowired val context: DSLContext) { "firstIngestId", Instant.now()) } + fun fetchWaterAllocations(sourceId: String): Result { return context .selectFrom(WATER_ALLOCATIONS) diff --git a/packages/Manager/src/test/kotlin/nz/govt/eop/plan_limits/WaterAllocationAndUsageViewsTest.kt b/packages/Manager/src/test/kotlin/nz/govt/eop/plan_limits/WaterAllocationAndUsageViewsTest.kt new file mode 100644 index 00000000..8197ab21 --- /dev/null +++ b/packages/Manager/src/test/kotlin/nz/govt/eop/plan_limits/WaterAllocationAndUsageViewsTest.kt @@ -0,0 +1,428 @@ +package nz.govt.eop.plan_limits + +import io.kotest.inspectors.forAll +import io.kotest.matchers.collections.shouldContain +import io.kotest.matchers.shouldBe +import java.math.BigDecimal +import java.time.Instant +import java.time.LocalDate +import java.time.LocalDateTime +import java.time.ZoneOffset +import nz.govt.eop.messages.ConsentStatus +import org.junit.jupiter.api.Test +import org.springframework.beans.factory.annotation.Autowired +import org.springframework.boot.test.context.SpringBootTest +import org.springframework.jdbc.core.DataClassRowMapper +import org.springframework.jdbc.core.JdbcTemplate +import org.springframework.jdbc.support.GeneratedKeyHolder +import org.springframework.jdbc.support.KeyHolder +import org.springframework.test.context.ActiveProfiles +import org.springframework.test.web.servlet.request.MockMvcRequestBuilders.* +import org.springframework.test.web.servlet.result.MockMvcResultMatchers.* +import org.springframework.transaction.annotation.Transactional + +data class WaterAllocationUsageRow( + val areaId: String, + val date: LocalDate, + val allocation: BigDecimal, + val allocationDaily: BigDecimal, + val meteredAllocationYearly: BigDecimal, + val dailyUsage: BigDecimal, +) + +data class AllocationRow( + val sourceId: String, + val consentId: String, + val status: ConsentStatus, + val areaId: String, + val allocation: BigDecimal, + val isMetered: Boolean, + val meteredAllocationDaily: BigDecimal, + val meteredAllocationYearly: BigDecimal, + val meters: List, + val ingestId: String, + val effectiveFrom: Instant, + val effectiveTo: Instant? +) + +@ActiveProfiles("test") +@SpringBootTest +@Transactional +class WaterAllocationAndUsageViewsTest(@Autowired val jdbcTemplate: JdbcTemplate) { + + val testSiteId = 1 + val testAreaId = "test-area-id" + val testEffectiveFrom: LocalDateTime = LocalDate.now().atStartOfDay().minusDays(100) + val testAllocation = + AllocationRow( + "source-id", + "consent-id", + ConsentStatus.active, + testAreaId, + BigDecimal(100), + true, + BigDecimal(10), + BigDecimal(10), + listOf(testSiteId.toString()), + "ingest-id", + testEffectiveFrom.toInstant(ZoneOffset.UTC), + null) + + @Test + fun `should be empty with no allocations`() { + // GIVEN + // WHEN + val result = + jdbcTemplate.queryForObject( + "select count(*) from water_allocation_and_usage_by_area", Int::class.java) + + // THEN + result shouldBe 0 + } + + @Test + fun `should include a year of data for an area`() { + // GIVEN + createTestAllocation(testAllocation) + + // WHEN + val result = + jdbcTemplate.queryForMap( + "select count(*) as count, min(date) as min_date, max(date) as max_date from water_allocation_and_usage_by_area where area_id = '${testAllocation.areaId}'") + + // THEN + arrayOf(365L, 366L) shouldContain result["count"] as Long + val min = LocalDate.parse(result["min_date"].toString()) + val max = LocalDate.parse(result["max_date"].toString()) + min.plusYears(1).plusDays(-1) shouldBe max + } + + @Test + fun `should use default allocation data before the effective date`() { + // GIVEN + createTestAllocation(testAllocation) + + // WHEN + val dateFilter = testEffectiveFrom.toString() + val results = + queryAllocationsAndUsage( + "where area_id = '${testAllocation.areaId}' and date < '$dateFilter'") + + // THEN + checkResults(results, testAreaId, BigDecimal(0), BigDecimal(0), BigDecimal(0)) + } + + @Test + fun `should aggregate allocation data from the effective date`() { + // GIVEN + createTestAllocation(testAllocation) + + // WHEN + val dateFilter = testEffectiveFrom.toString() + val results = + queryAllocationsAndUsage( + "where area_id = '${testAllocation.areaId}' and date >= '$dateFilter'") + + // THEN + checkResults( + results, + testAllocation.areaId, + testAllocation.allocation, + testAllocation.meteredAllocationDaily, + testAllocation.meteredAllocationYearly) + } + + @Test + fun `should handle an allocation being effective before the earliest time period`() { + // GIVEN + val dateOlderThanAYear = LocalDate.now().atStartOfDay().minusYears(2).toInstant(ZoneOffset.UTC) + val oldAllocation = testAllocation.copy(effectiveFrom = dateOlderThanAYear) + createTestAllocation(oldAllocation) + + // WHEN + val results = + queryAllocationsAndUsage( + "where area_id = '${oldAllocation.areaId}' and date >= '$dateOlderThanAYear'") + + // THEN + checkResults( + results, + oldAllocation.areaId, + oldAllocation.allocation, + oldAllocation.meteredAllocationDaily, + oldAllocation.meteredAllocationYearly) + } + + @Test + fun `should aggregate observation data`() { + // GIVEN + val observationDate = LocalDate.now().atStartOfDay().minusDays(10) + createTestAllocation(testAllocation) + createTestObservation(testSiteId, 10, observationDate.toInstant(ZoneOffset.UTC)) + + // WHEN + val whereClause = "where area_id = '${testAllocation.areaId}' and date = '${observationDate}'" + + // THEN + val result = queryAllocationsAndUsage(whereClause) + result.size shouldBe 1 + result[0].dailyUsage.compareTo(BigDecimal(864)) shouldBe 0 + + // GIVEN + createTestObservation(testSiteId, 5, observationDate.plusHours(1).toInstant(ZoneOffset.UTC)) + + // WHEN + val secondResult = queryAllocationsAndUsage(whereClause) + + // THEN + secondResult[0].dailyUsage.compareTo(BigDecimal(648)) shouldBe 0 + } + + @Test + fun `should handle changes to allocation data`() { + // GIVEN + val allocationUpdatedAt = LocalDate.now().atStartOfDay().minusDays(10) + val initialAllocation = + testAllocation.copy(effectiveTo = allocationUpdatedAt.toInstant(ZoneOffset.UTC)) + createTestAllocation(initialAllocation) + val updateAllocation = + testAllocation.copy( + allocation = BigDecimal(200), + meteredAllocationDaily = BigDecimal(20), + meteredAllocationYearly = BigDecimal(20), + meters = listOf(), + effectiveFrom = allocationUpdatedAt.toInstant(ZoneOffset.UTC)) + createTestAllocation(updateAllocation) + createTestObservation(testSiteId, 10, allocationUpdatedAt.toInstant(ZoneOffset.UTC)) + + // WHEN + val resultBeforeUpdate = + queryAllocationsAndUsage( + "where area_id = '${initialAllocation.areaId}' and date >= '${initialAllocation.effectiveFrom}' and date < '${allocationUpdatedAt}'") + + // THEN + checkResults( + resultBeforeUpdate, + initialAllocation.areaId, + initialAllocation.allocation, + initialAllocation.meteredAllocationDaily, + initialAllocation.meteredAllocationYearly, + BigDecimal(0)) + + // WHEN + val resultAfterUpdate = + queryAllocationsAndUsage( + "where area_id = '${updateAllocation.areaId}' and date >= '${allocationUpdatedAt}'") + + // THEN + checkResults( + resultAfterUpdate, + updateAllocation.areaId, + updateAllocation.allocation, + updateAllocation.meteredAllocationDaily, + updateAllocation.meteredAllocationYearly, + BigDecimal(0)) + } + + @Test + fun `should handle changes to allocation data in the same day`() { + // GIVEN + val firstAllocationUpdatedAt = LocalDate.now().atStartOfDay().minusDays(10) + val secondAllocationUpdatedAt = firstAllocationUpdatedAt.plusHours(2) + createTestAllocation( + testAllocation.copy(effectiveTo = firstAllocationUpdatedAt.toInstant(ZoneOffset.UTC))) + createTestAllocation( + testAllocation.copy( + allocation = BigDecimal(20), + effectiveFrom = firstAllocationUpdatedAt.toInstant(ZoneOffset.UTC), + effectiveTo = secondAllocationUpdatedAt.toInstant(ZoneOffset.UTC))) + createTestAllocation( + testAllocation.copy( + allocation = BigDecimal(30), + effectiveFrom = secondAllocationUpdatedAt.toInstant(ZoneOffset.UTC))) + + // WHEN + val results = + queryAllocationsAndUsage( + "where area_id = '${testAllocation.areaId}' and date = '${secondAllocationUpdatedAt}'") + + // THEN + checkResults(results, testAreaId, allocation = BigDecimal(30)) + } + + @Test + fun `should not include allocation data when a consent status is not active`() { + // GIVEN + val allocationUpdatedAt = LocalDate.now().atStartOfDay().minusDays(10) + val initialAllocation = + testAllocation.copy(effectiveTo = allocationUpdatedAt.toInstant(ZoneOffset.UTC)) + createTestAllocation(initialAllocation) + val updatedAllocation = + testAllocation.copy( + effectiveFrom = allocationUpdatedAt.toInstant(ZoneOffset.UTC), + status = ConsentStatus.inactive) + createTestAllocation(updatedAllocation) + + // WHEN + val results = + queryAllocationsAndUsage( + "where area_id = '${updatedAllocation.areaId}' and date >= '${allocationUpdatedAt}'") + + // THEN + checkResults(results, updatedAllocation.areaId, BigDecimal(0), BigDecimal(0), BigDecimal(0)) + } + + @Test + fun `should not include an allocations observations when is_metered is false`() { + // GIVEN + val allocationWithIsMeteredFalse = testAllocation.copy(isMetered = false) + createTestAllocation(allocationWithIsMeteredFalse) + val observationDate = LocalDate.now().atStartOfDay().minusDays(10) + createTestObservation(testSiteId, 10, observationDate.toInstant(ZoneOffset.UTC)) + + // WHEN + val result = + queryAllocationsAndUsage( + "where area_id = '${testAllocation.areaId}' and date = '${observationDate}'") + + // THEN + result[0].dailyUsage.compareTo(BigDecimal(0)) shouldBe 0 + } + + @Test + fun `should aggregate allocation data for different areas separately`() { + // GIVEN + createTestAllocation(testAllocation) + val secondAllocationInSameArea = + testAllocation.copy(sourceId = "another-source-same-area", meters = listOf("2", "3")) + createTestAllocation(secondAllocationInSameArea) + + val allocationInDifferentArea = + testAllocation.copy( + areaId = "different-area-id", + sourceId = "another-source-different-area", + meters = listOf("4")) + createTestAllocation(allocationInDifferentArea) + + val observationDate = LocalDate.now().atStartOfDay().minusDays(10) + createTestObservation(testSiteId, 10, observationDate.toInstant(ZoneOffset.UTC)) + createTestObservation( + secondAllocationInSameArea.meters[0].toInt(), 5, observationDate.toInstant(ZoneOffset.UTC)) + createTestObservation( + secondAllocationInSameArea.meters[1].toInt(), 5, observationDate.toInstant(ZoneOffset.UTC)) + createTestObservation( + allocationInDifferentArea.meters.first().toInt(), + 30, + observationDate.toInstant(ZoneOffset.UTC)) + + // WHEN + val results = + queryAllocationsAndUsage( + "where area_id = '${testAllocation.areaId}' and date = '$observationDate'") + + // THEN + results.size shouldBe 1 + checkResults( + results, + testAllocation.areaId, + testAllocation.allocation + secondAllocationInSameArea.allocation, + testAllocation.meteredAllocationDaily + secondAllocationInSameArea.meteredAllocationDaily, + testAllocation.meteredAllocationYearly + secondAllocationInSameArea.meteredAllocationYearly, + BigDecimal(1728)) + + // WHEN + val resultsInADifferentArea = + queryAllocationsAndUsage( + "where area_id = '${allocationInDifferentArea.areaId}' and date = '$observationDate'") + + // THEN + resultsInADifferentArea[0].dailyUsage.compareTo(BigDecimal(2592)) shouldBe 0 + } + + fun queryAllocationsAndUsage(whereClause: String): MutableList = + jdbcTemplate.query( + """select * from water_allocation_and_usage_by_area $whereClause""", + DataClassRowMapper.newInstance(WaterAllocationUsageRow::class.java)) + + fun checkResults( + results: List, + areaId: String? = null, + allocation: BigDecimal? = null, + meteredAllocationDaily: BigDecimal? = null, + meteredAllocationYearly: BigDecimal? = null, + dailyUsage: BigDecimal? = null + ) { + results.forAll { + if (areaId != null) it.areaId shouldBe areaId + if (allocation != null) it.allocation.compareTo(allocation) shouldBe 0 + if (meteredAllocationDaily != null) + it.allocationDaily.compareTo(meteredAllocationDaily) shouldBe 0 + if (meteredAllocationYearly != null) + it.meteredAllocationYearly.compareTo(meteredAllocationYearly) shouldBe 0 + if (dailyUsage != null) it.dailyUsage.compareTo(dailyUsage) shouldBe 0 + } + } + + fun createTestAllocation(allocation: AllocationRow) { + val effectiveTo = if (allocation.effectiveTo != null) "'${allocation.effectiveTo}'" else null + val meters = allocation.meters.joinToString(",") + + jdbcTemplate.update( + """ + INSERT INTO water_allocations (area_id, allocation, ingest_id, source_id, consent_id, status, is_metered, metered_allocation_daily, metered_allocation_yearly, meters, effective_from, effective_to, created_at, updated_at) + VALUES ( + '${allocation.areaId}', + '${allocation.allocation}', + '${allocation.ingestId}', + '${allocation.sourceId}', + '${allocation.consentId}', + '${allocation.status}', + '${allocation.isMetered}', + '${allocation.meteredAllocationDaily}', + '${allocation.meteredAllocationYearly}', + '{$meters}', + '${allocation.effectiveFrom}', + $effectiveTo, + now(), + now() + ) + """) + } + + fun createTestObservation(siteId: Int, amount: Int, timestamp: Instant) { + + val measurementId = createOrRetrieveSiteAndMeasurement(siteId) + jdbcTemplate.update( + """ + INSERT INTO observations (observation_measurement_id, amount, observed_at) + VALUES ($measurementId, $amount, '$timestamp') + """) + } + + fun createOrRetrieveSiteAndMeasurement(siteId: Int): Int { + val councilId = 9 + val measurementName = "Water Meter Volume" + val keyHolder: KeyHolder = GeneratedKeyHolder() + + jdbcTemplate.update( + """ + INSERT INTO observation_sites (id, council_id, name) + VALUES ($siteId, $councilId, '$siteId') + ON CONFLICT (id) DO NOTHING + """) + + jdbcTemplate.update( + { connection -> + connection.prepareStatement( + """ + INSERT INTO observation_sites_measurements (site_id, measurement_name, first_observation_at, last_observation_at, observation_count) + VALUES ($siteId, '$measurementName', now(), now(), 0) + ON CONFLICT(site_id, measurement_name) DO UPDATE SET measurement_name = '$measurementName' + RETURNING id + """, + arrayOf("id")) + }, + keyHolder) + return keyHolder.keys?.get("id") as Int + } +}