OpenSearch and Spark Integration P0 Demo #316
dai-chen
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background
Please find more context about this feature in https://github.com/opensearch-project/sql/issues/1116.
Demo Use Case
Architecture: From customer end, they already have ingestion pipeline pushing ALB logs to a S3 bucket. The dataset is extremely huge and keeps growing. There is some monitoring system that alarms on suspicious client IP.
Workflow: Once received the notification, customer wants to quick load data corresponding only into predefined OpenSearch index and dashboard. So they can diagnose and troubleshoot fast by full text analytics and visualization offered by OpenSearch.
Solution for Demo: we propose a new Maximus table format on which secondary index and materialized view are based. For the use case:
client_ip
as first accelerationPrerequisites
$HOME/.aws
alb_logs_temp
to simulate customer ingestiondeltalog
,alb_logs_raw
andalb_logs_metrics
for Maximum metadata and MV dataalb_logs_raw
andalb_logs_metrics
index created previouslyPlease run with DevTools in OpenSearch Dashboard.
Demo
Workflow
Steps
Please run with DevTools in OpenSearch Dashboard.
Cleanup
Remove
docker_os_data
anddocker_spark_data
docker volume.Alternatively, if you don't want to lose everything you created in OpenSearch, start docker and run the following commands in CLI and re-create
alb_logs_temp
only.Video
OpenSearch.Spark.demo.part.1.mp4
OpenSearch.Spark.demo.part.2.mp4
Beta Was this translation helpful? Give feedback.
All reactions