subcollection | copyright | lastupdated | lasttested | content-type | services | account-plan | completion-time | use-case | ||
---|---|---|---|---|---|---|---|---|---|---|
solution-tutorials |
|
2024-01-02 |
2023-10-24 |
tutorial |
cloud-object-storage, EventStreams, AnalyticsEngine, sql-query, key-protect |
paid |
3h |
Analytics, Cybersecurity, AIAndML |
{{site.data.keyword.attribute-definition-list}}
{: #big-data-log-analytics} {: toc-content-type="tutorial"} {: toc-services="cloud-object-storage, EventStreams, AnalyticsEngine, sql-query, key-protect"} {: toc-completion-time="3h"}
This tutorial may incur costs. Use the Cost Estimator to generate a cost estimate based on your projected usage. {: tip}
In this tutorial, you will build a log analysis pipeline designed to collect, store and analyze log records to support regulatory requirements or aid information discovery. This solution leverages several services available in {{site.data.keyword.cloud_notm}}: {{site.data.keyword.messagehub}}, {{site.data.keyword.cos_short}}, {{site.data.keyword.sqlquery_short}} (previously SQL Query), {{site.data.keyword.keymanagementserviceshort}}, and {{site.data.keyword.iae_full_notm}}. A script and tool will assist you to simulate the transmission of web server log messages from a static file to {{site.data.keyword.messagehub}}. {: shortdesc}
With {{site.data.keyword.messagehub}} the pipeline scales to receive millions of log records from a variety of producers. Using a combination of {{site.data.keyword.sqlquery_short}} or {{site.data.keyword.iae_full_notm}}, log data can be inspected in realtime to integrate business processes. Log messages can also be easily redirected to long term storage using {{site.data.keyword.cos_short}} where developers, support staff and auditors can work directly with data.
While this tutorial focuses on log analysis, it is applicable to other scenarios: storage-limited IoT devices can similarly stream messages to {{site.data.keyword.cos_short}} or marketing professionals can segment and analyze customer events across digital properties with {{site.data.keyword.sqlquery_short}}. {: shortdesc}
{: #big-data-log-analytics-objectives}
- Understand Apache Kafka publish-subscribe messaging
- Store log data for audit and compliance requirements
- Monitor logs to create exception handling processes
- Conduct forensic and statistical analysis on log data
{: caption="Figure 1. Architecture diagram of the tutorial" caption-side="bottom"} {: style="text-align: center;"}
- Application generates log events to {{site.data.keyword.messagehub}}.
- To persist the log events, they are stream landed into {{site.data.keyword.cos_short}} through {{site.data.keyword.sqlquery_short}}.
- The storage bucket and the {{site.data.keyword.sqlquery_short}} jobs are encrypted with {{site.data.keyword.keymanagementserviceshort}} service. Also, the stream landing job executes in {{site.data.keyword.sqlquery_short}} by securely retrieving the service ID from {{site.data.keyword.keymanagementserviceshort}}.
- Auditor or support staff use {{site.data.keyword.sqlquery_short}} or {{site.data.keyword.iae_short}} to perform requests.
- Requests are executed against the data stored in {{site.data.keyword.cos_short}}.
{: #big-data-log-analytics-prereqs}
This tutorial requires:
- {{site.data.keyword.cloud_notm}} CLI with the {{site.data.keyword.iae_short}} plugin (
ibmcloud plugin install analytics-engine-v3
) curl
andawk
- Optionally, a Docker client.
You will find instructions to download and install these tools for your operating environment in the Getting started with solution tutorials guide.
To avoid the installation of these tools you can use the {{site.data.keyword.cloud-shell_short}} from the {{site.data.keyword.cloud_notm}} console. {: tip}
{: #big-data-log-analytics-setup} {: step}
In this section, you will create the services required to perform analysis of log events generated by your applications. Choose a region that supports all services. Dallas and Frankfurt are supported.
{: #big-data-log-analytics-platform-logs}
Platform logs are generally useful for troubleshooting resources and will be required in a later step to see the output of {{site.data.keyword.iae_short}}.
You can have multiple {{site.data.keyword.loganalysisshort_notm}} instances, however, only one instance in a region can be configured to receive platform logs from enabled cloud services in that {{site.data.keyword.Bluemix_notm}} region. Also, be aware of the limitations for {{site.data.keyword.iae_short}} serverless instances in terms of logging. {: important}
- Navigate to the Observability{: external} page and click Logging, look for any existing log analysis services with
Platform logs
enabled. If there is a platform logging instance in the region, no further configuration is required. - To create a new {{site.data.keyword.loganalysislong_notm}} click the Options then Create. Continue to create a logging instance with 7-day plan. If not visible, click the refresh button.
- Back in the Observability{: external} page and click Logging on the left pane.
- Click on Options > Edit platform and select the region.
- Select the {{site.data.keyword.loganalysisshort_notm}} service instance created for platform logs.
For more information, see Configuring {{site.data.keyword.Bluemix_notm}} platform logs {: tip}
{: #big-data-log-analytics-schematics}
- Navigate to Create {{site.data.keyword.bpshort}} Workspaces.
- Under the Specify Template section, verify:
- Repository URL is
https://github.com/IBM-Cloud/big-data-log-analytics
- Terraform version is terraform_v1.5
- Repository URL is
- Under Workspace details,
- Provide a workspace name : big-data-log-analytics.
- Choose a
Resource Group
and aLocation
. - Click on Next.
- Verify the details and then click on Create.
- Under the Variables section, you can optionally set a value for
prefix
orregion
. - Scroll to the top of the page and click Apply plan.
- Wait for the Apply job to complete. Check the logs to see the status of the services created.
{: #big-data-log-analytics-explore}
- Navigate to the resource list.
- Set Group to the resource group created by {{site.data.keyword.bpshort}}. The resource group is named after the
prefix
you set. If you leftprefix
blank, the resource group will follow the patternblda-1234-group
. - Locate all of the resources from the diagram above:
- an instance of {{site.data.keyword.messagehub}},
- an instance of {{site.data.keyword.sqlquery_short}},
- an instance of {{site.data.keyword.cos_short}} and a bucket to persist log data and files generated by {{site.data.keyword.sqlquery_short}} jobs,
- an instance of {{site.data.keyword.iae_short}} to inspect log data,
- and an instance of {{site.data.keyword.keymanagementserviceshort}} used to encrypt the storage bucket, and to store {{site.data.keyword.sqlquery_short}} job data and the service ID for stream landing.
{: #big-data-log-analytics-configure-streams} {: step}
{: #big-data-log-analytics-streamlanding}
In this section, you will learn how to run a fully-managed stream data ingestion from {{site.data.keyword.messagehub}} into Parquet on {{site.data.keyword.cos_full_notm}}. {{site.data.keyword.sqlquery_notm}} is the key component in the Stream Landing approach. It is the service that connects to {{site.data.keyword.messagehub}} and copies the data to {{site.data.keyword.cos_full_notm}}.
Parquet{: external} is an open source file format for nested data structures in a flat columnar format. Compared to the traditional approach where data is stored in rows, Parquet is more efficient in terms of storage and performance.
- In your browser, navigate to the resource list{: external} and under Integration, click on the {{site.data.keyword.messagehub}}
log-analysis-es
service. - Select Topics from the navigation pane on the left.
- Select the context menu (three vertical dots) for your topic
webserver
and click Create stream landing configuration. {: caption="Event Streams topics" caption-side="bottom"} - Click Start and select the
log-analysis-cos
service. Click Next. - Select the bucket with the name ending with
-bucket
and click Next. - Select the {{site.data.keyword.sqlquery_short}}
log-analysis-sql
service and click Next. - Configure how you want your topic data to be streamed to {{site.data.keyword.cos_short}}:
- Prefix for objects added to {{site.data.keyword.cos_short}} bucket:
logs-stream-landing
- Create a new Service ID :
logs-stream-landing-service-id
- Select Key Protect as the Authorization service.
- Select the log-analysis-kp {{site.data.keyword.keymanagementserviceshort}} service instance.
- Click Start streaming data.
- Prefix for objects added to {{site.data.keyword.cos_short}} bucket:
You now see the status Queued
for your topic. It may take up to 5 minutes until the streaming job is fully dispatched and up and running. You will see the status switch to Running
at that point. In the context menu, you find a new option called View stream landing configuration
.
{: #big-data-log-analytics-prepare-shell}
-
In {{site.data.keyword.bpshort}}, select the workspace you created.
-
In Settings, locate the Workspace ID.
-
Define
WORKSPACE_ID
WORKSPACE_ID=<your workspace id>
{: pre}
-
Target the region where you deployed resources:
ibmcloud target -r us-south
{: pre}
-
View the outputs generated by the Apply job.
ibmcloud schematics output -id $WORKSPACE_ID --output json | jq
{: pre}
Notice the outputs are small scripts. You will use them in the next steps.
-
Retrieve all the outputs produced by the workspace in a variable.
OUTPUTS_JSON=$(ibmcloud schematics output -id $WORKSPACE_ID --output json)
{: pre}
{: #big-data-log-analytics-kafkatools}
The streaming job is currently idle and awaiting messages. In this section, you will configure the tool kcat
{: external} to work with {{site.data.keyword.messagehub}}. kcat
allows you to produce arbitrary messages from the terminal and send them to {{site.data.keyword.messagehub}}. Below the Kafka message feed will be persisted in your data lake on {{site.data.keyword.cos_full_notm}} .
-
Either install
kcat
{: external} on your machine or use it via Docker. The tutorial uses the Docker image. -
Run the followwing command to generate the
kcat
configuration file. The file includes the list of Kafka brokers and authentication.echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .kcat_config | .value' > kcat.config
{: pre}
-
Run
kcat
docker run -v ${PWD}:/bdla -w /bdla -it edenhill/kcat:1.7.0 -F kcat.config -P -t webserver
{: pre}
The command will connect as event producer to the topic webserver. Remember the command for later.
-
The kcat tool is awaiting input. Copy and paste the log message from below into the terminal. Hit
enter
thenCTRL-d
to send the log message to {{site.data.keyword.messagehub}}.{ "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" } { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" } { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" } { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" } { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" }
{: codeblock}
-
Repeat the previous step to generate additional events.
{: #big-data-log-analytics-checkmessages}
You can check the landed data in the {{site.data.keyword.sqlquery_short}} UI and also in the {{site.data.keyword.cos_short}} bucket.
-
Navigate to the resource list{: external} and under Databases, click on
log-analysis-sql
service. -
Click on Launch {{site.data.keyword.sqlquery_short}} UI to open the {{site.data.keyword.sqlquery_short}} UI. You should see the streaming job
Running
. -
Click on the Details tab to see the actual SQL statement that was submitted to {{site.data.keyword.sqlquery_short}} for the stream landing. Notice the Result location it will be used shortly to query the data. {: caption="{{site.data.keyword.sqlquery_short}} console" caption-side="bottom"}
The Select statement would looks like
SELECT * FROM <EVENT_STREAMS_CRN>/webserver STORED AS JSON EMIT cos://<REGION>/<BUCKET_NAME>/logs-stream-landing/topic=webserver STORED AS PARQUET EXECUTE AS <KEY_PROTECT_CRN_WITH_KEY>
{: codeblock}
It is a SELECT statement from your {{site.data.keyword.messagehub}} instance and topic (identified via the unique CRN) and the selected data is emitted (EMIT) to your {{site.data.keyword.cos_short}} bucket AS PARQUET format. The operation is executed (EXECUTE) with the service ID's API key that is stored in the {{site.data.keyword.keymanagementserviceshort}} instance. {: tip}
-
Click on the link in the
Result location
field, which opens the {{site.data.keyword.cos_short}} UI with a filter set to the objects that are being written by that job. {: caption="{{site.data.keyword.cos_short}} object view" caption-side="bottom"}In the {{site.data.keyword.cos_short}} UI, switch to
object view
by clicking on the icon next toUpload
, You should see that there are a couple of metadata objects to track, such as the latest offset that has been consumed and landed. But, in addition, you can find the Parquet files with the actual payload data. {: tip} -
Return to the {{site.data.keyword.sqlquery_short}} UI and in the Details tab click on Query the result and then click Run to execute a
Batch job
. You should see the query in the panel pointing to the {{site.data.keyword.cos_short}} file (underFROM
) with the log message(s) you sent above. Wait for the job to change toCompleted
. -
Click on the Results tab to see the log messages in a tabular format.
The query saves the result to a
CSV
file under a different bucket with namesql-<SQL_QUERY_GUID>
. Check theINTO
part of the query. {: tip}If the query fails with a message like The Parquet file is empty or corrupted, send additional messages and confirm that files are created in the bucket. {: tip}
{: #big-data-log-analytics-streamsload}
For later analysis purposes increase the message volume sent to {{site.data.keyword.messagehub}}. The provided script simulates a flow of messages to {{site.data.keyword.messagehub}} based on traffic to the webserver. To demonstrate the scalability of {{site.data.keyword.messagehub}}, you will increase the throughput of log messages.
-
Download and unzip the Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed log file from NASA:
curl ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz -o NASA_access_log_Jul95.gz
{: pre}
-
Turn the access logs into JSON format by running:
gunzip -c NASA_access_log_Jul95.gz | awk -F " " '{ print "{\"host\":\"" $1 "\",\"time_stamp\":\"" $4 " " $5 "\",\"request\":" $6 " " $7 " " $8 ",\"responseCode\":\"" $9 "\",\"bytes\":\"" $10 "\"}" }' > NASA_logs.json
{: pre}
-
Create a shell script to only send few log lines per second. Create a new file
rate_limit.sh
and copy the following into it:#! /bin/bash if [ -z "$1" ]; then echo "usage: $0 filename lines wait" exit fi INPUT_FILE=$1 NUM_LINES=$2 COUNTER=0 WAIT_SECONDS=$3 while read -u3 input_text rest; do trap 'exit 130' INT echo $input_text $rest ((COUNTER++)) if (( COUNTER == $NUM_LINES )); then sleep $WAIT_SECONDS COUNTER=0 fi done 3< "$INPUT_FILE"
{: pre}
The script accepts a file name, the number of lines to output as chunk, and how many seconds to wait in between.
-
Make the script executable:
chmod +x rate_limit.sh
{: pre}
-
Run the following command to send lines each from the access log to {{site.data.keyword.messagehub}}. It uses the converted log file from above, sends 10 lines each and waits 1 second before sending the next lines:
./rate_limit.sh NASA_logs.json 10 1 | docker run -v ${PWD}:/bdla -w /bdla -i edenhill/kcat:1.7.0 -F kcat.config -P -t webserver
{: pre}
-
The script configuration above pushes about 10 lines/second. Stop the script after the desired number of messages have been streamed using
control+C
. -
In your browser, return to the {{site.data.keyword.sqlquery_short}} UI and the Details tab. There, click on Query the result and then click Run to see some received messages under the
Results
tab of the batch job. -
You can experiment with {{site.data.keyword.messagehub}} by increasing or decreasing the number of lines value.
{: #big-data-log-analytics-sqlquery} {: step}
Depending on how long you ran the transfer, the number of files on {{site.data.keyword.cos_short}} has certainly grown. You will now act as an investigator answering audit or compliance questions by combining {{site.data.keyword.sqlquery_short}} with your log file. The benefit of using {{site.data.keyword.sqlquery_short}} is that the log file is directly accessible - no additional transformations or database servers are necessary.
-
Back in the Details view edit the {{site.data.keyword.sqlquery_short}}.
- Click on the drop down to select Streaming jobs (as opposed to Batch jobs).
- Open the Details and click on the Query the result.
- Notice the query editor, above, is populated with a query.
- Notice the FROM clause does not specify a specific parquet object in the bucket but references the job id, which means all objects in the job. Perfect!
cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID>
- Remove the INTO clause to display the output without storing the results.
INTO cos://<Region>/sql-<ID>/result/ STORED AS CSV
- Run the query.
- Observe the results when it is complete in the Result tab. It includes the 50 first messages you sent.
- Now lets do some investigation by modifying this basic query.
-
In the {{site.data.keyword.sqlquery_short}} UI, edit the SQL in the text area to look more like this, keep the FROM statement as is.
-- What are the top 10 web pages on NASA from July 1995? -- Which mission might be significant? SELECT REQUEST, COUNT(REQUEST) FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET WHERE REQUEST LIKE '%.htm%' GROUP BY REQUEST ORDER BY 2 DESC LIMIT 10
{: codeblock}
-
Update the
FROM
clause with your Object SQL URL and click Run. -
Click on the latest Completed job to see the result under the Result tab.
-
Select the Details tab to view additional information such as the location where the result was stored on {{site.data.keyword.cos_short}}.
-
Try the following question and answer pairs by adding them individually to the Type SQL here ... text area.
-- Who are the top 5 viewers? SELECT HOST, COUNT(*) FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET GROUP BY HOST ORDER BY 2 DESC LIMIT 5
{: codeblock}
-- Which viewer has suspicious activity based on application failures? SELECT HOST, COUNT(*) FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET WHERE `responseCode` == 500 GROUP BY HOST ORDER BY 2 DESC
{: codeblock}
-- Which requests showed a page not found error to the user? SELECT DISTINCT REQUEST FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET WHERE `responseCode` == 404
{: codeblock}
-- What are the top 10 largest files? SELECT DISTINCT REQUEST, BYTES FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET WHERE BYTES > 0 ORDER BY CAST(BYTES as Integer) DESC LIMIT 10
{: codeblock}
-- What is the distribution of total traffic by hour? SELECT SUBSTRING(TIME_STAMP, 13, 2), COUNT(*) FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET GROUP BY 1 ORDER BY 1 ASC
{: codeblock}
-- Why did the previous result return an empty hour? -- Hint, find the malformed hostname. SELECT HOST, REQUEST FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET WHERE SUBSTRING(TIME_STAMP, 13, 2) == ''
{: codeblock}
{: #big-data-log-analytics-5} {: step}
The data stream landed to {{site.data.keyword.cos_short}} can be also queried using Apache Spark that is part of the {{site.data.keyword.iae_short}} serverless instance. Programs will need to be loaded into a bucket and executed from the command line. The pyspark python environment will be used in this tutorial.
{: #big-data-log-analytics-simple-spark}
-
In case your {{site.data.keyword.cloud-shell_short}} connection has expired, reconnect the session and restore the environment variables:
WORKSPACE_ID=<your workspace id> OUTPUTS_JSON=$(ibmcloud schematics output -id $WORKSPACE_ID --output json)
{: pre}
-
Set the environment variables needed for the next steps:
eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_01_env_variables | .value')"
{: pre}
-
Run a program that is built into the IBM spark runtime to print some output:
eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_02_run_word_count | .value')"
{: pre}
To view the command before they get executed run
echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_02_run_word_count | .value'
{: tip} -
Navigate to the resource list{: external} and under Analytics, click on
log-analysis-iae
service. -
Select the Applications tab, click refresh. Notice the submitted job. Eventually it gets marked as Finished.
{: #big-data-log-analytics-view-spark-log}
-
Navigate to the Logging{: external} page, look for the existing {{site.data.keyword.la_short}} service in the region with
Platform logs
enabled. -
Click Open dashboard.
-
In a few minutes you should see the logs associated with the program.Search for
host:ibmanalyticsengine
. There will be a lot of output. Look for:Michael,: 1 29: 1 Andy,: 1 30: 1 Justin,: 1 19: 1
{: codeblock}
{: #big-data-log-analytics-hello-world}
During provisioning, {{site.data.keyword.bpshort}} created a simple hello.py
file in the {{site.data.keyword.cos_short}} bucket.
-
Run the
hello.py
script as an application in the IBM spark runtime:eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_03_run_hello_world | .value')"
{: pre}
To view the command before they get executed run
echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_03_run_hello_world | .value'
{: tip} -
In the {{site.data.keyword.iae_short}} instance on the Applications, click refresh to see the status of the application. Wait for it to finish.
-
Look for
hello world
in the Platform Logs.
{: #big-data-log-analytics-cos-app}
The final step is to submit the spark application that accesses the data in the same bucket. The solution.py
Python script has already been created by {{site.data.keyword.bpshort}} in the {{site.data.keyword.cos_short}} bucket.
The script requires you to provide the ID of the {{site.data.keyword.sqlquery_short}} job used by stream landing.
-
Navigate to the resource list{: external} and under Storage, click on
log-analysis-cos
service. -
Select the bucket ending with
-bucket
-
Click the second object.
-
In the side panel, the full object name is visible with a format similar to
logs-stream-landing/topic=webserver/jobid=123456-aaa-4444-bbbb-08f3d1626c46
. Write down the job ID. In this example the job ID is123456-aaa-4444-bbbb-08f3d1626c46
-
In {{site.data.keyword.cloud-shell_short}}, define a JOB_ID variable:
JOB_ID=<the value you got from the bucket object>
{: pre}
-
Run the
solution.py
applicationeval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_04_run_solution | .value')"
{: pre}
To view the command before they get executed run
echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_04_run_solution | .value'
{: tip} -
In the {{site.data.keyword.iae_short}} instance on the Applications, click refresh to see the status of the application. Wait for it to finish.
-
Look for the application logs in the Platform Logs.
+---------------------------+-----+--------------------+--------------------+------------+--------------------+ |_corrupt_or_schema_mismatch|bytes| host| request|responseCode| time_stamp| +---------------------------+-----+--------------------+--------------------+------------+--------------------+ | null| 9867| ntigate.nt.com|GET /software/win...| 200|[01/Jul/1995:04:1...| | null| 7634|piweba3y.prodigy.com|GET /shuttle/miss...| 200|[01/Jul/1995:04:1...| | null|25218| ntigate.nt.com|GET /software/win...| 200|[01/Jul/1995:04:1...| | null| 4441| ntigate.nt.com|GET /software/win...| 200|[01/Jul/1995:04:1...| | null| 1414| ntigate.nt.com|GET /images/const...| 200|[01/Jul/1995:04:1...| | null|45308|line03.pm1.abb.mi...|GET /shuttle/miss...| 200|[01/Jul/1995:04:1...| | null| 669| source.iconz.co.nz|GET /images/WORLD...| 200|[01/Jul/1995:04:1...| | null| 234| source.iconz.co.nz|GET /images/USA-l...| 200|[01/Jul/1995:04:1...| | null| 363| source.iconz.co.nz|GET /images/MOSAI...| 200|[01/Jul/1995:04:1...| | null|13372| ntigate.nt.com|GET /software/win...| 200|[01/Jul/1995:04:1...| +---------------------------+-----+--------------------+--------------------+------------+--------------------+
{: codeblock}
{: #big-data-log-analytics-expand}
Congratulations, you have built a log analysis pipeline with {{site.data.keyword.cloud_notm}}. Follow the Build a data lake using {{site.data.keyword.cos_short}} tutorial to add a dashboard to log data.
{: #big-data-log-analytics-removal} {: step}
- From the Resource List{: external}, select the log-analysis-kp* service instance.
- Delete the key with a name starting with
streaming-job
. This key was created outside of {{site.data.keyword.bpshort}} and should be deleted first. - Go to {{site.data.keyword.bpshort}} and select your workspace.
- Under Actions, select Destroy resources.
- Wait for {{site.data.keyword.bpshort}} to destroy all resources.
- Delete the workspace.
- Navigate to Manage > Access (IAM) > Service IDs{: external} in the {{site.data.keyword.cloud_notm}} console and Remove the
log-stream-landing-service-id
serviceID.
Depending on the resource it might not be deleted immediately, but retained (by default for 7 days). You can reclaim the resource by deleting it permanently or restore it within the retention period. See this document on how to use resource reclamation. {: tip}
{: #big-data-log-analytics-8} {: related}
- Apache Kafka{: external}