Skip to content

Latest commit

 

History

History
534 lines (429 loc) · 29.8 KB

big-data-log-analytics.md

File metadata and controls

534 lines (429 loc) · 29.8 KB
subcollection copyright lastupdated lasttested content-type services account-plan completion-time use-case
solution-tutorials
years
2024
2024-01-02
2023-10-24
tutorial
cloud-object-storage, EventStreams, AnalyticsEngine, sql-query, key-protect
paid
3h
Analytics, Cybersecurity, AIAndML

{{site.data.keyword.attribute-definition-list}}

Process big data logs with SQL

{: #big-data-log-analytics} {: toc-content-type="tutorial"} {: toc-services="cloud-object-storage, EventStreams, AnalyticsEngine, sql-query, key-protect"} {: toc-completion-time="3h"}

This tutorial may incur costs. Use the Cost Estimator to generate a cost estimate based on your projected usage. {: tip}

In this tutorial, you will build a log analysis pipeline designed to collect, store and analyze log records to support regulatory requirements or aid information discovery. This solution leverages several services available in {{site.data.keyword.cloud_notm}}: {{site.data.keyword.messagehub}}, {{site.data.keyword.cos_short}}, {{site.data.keyword.sqlquery_short}} (previously SQL Query), {{site.data.keyword.keymanagementserviceshort}}, and {{site.data.keyword.iae_full_notm}}. A script and tool will assist you to simulate the transmission of web server log messages from a static file to {{site.data.keyword.messagehub}}. {: shortdesc}

With {{site.data.keyword.messagehub}} the pipeline scales to receive millions of log records from a variety of producers. Using a combination of {{site.data.keyword.sqlquery_short}} or {{site.data.keyword.iae_full_notm}}, log data can be inspected in realtime to integrate business processes. Log messages can also be easily redirected to long term storage using {{site.data.keyword.cos_short}} where developers, support staff and auditors can work directly with data.

While this tutorial focuses on log analysis, it is applicable to other scenarios: storage-limited IoT devices can similarly stream messages to {{site.data.keyword.cos_short}} or marketing professionals can segment and analyze customer events across digital properties with {{site.data.keyword.sqlquery_short}}. {: shortdesc}

Objectives

{: #big-data-log-analytics-objectives}

  • Understand Apache Kafka publish-subscribe messaging
  • Store log data for audit and compliance requirements
  • Monitor logs to create exception handling processes
  • Conduct forensic and statistical analysis on log data

Architecture{: caption="Figure 1. Architecture diagram of the tutorial" caption-side="bottom"} {: style="text-align: center;"}

  1. Application generates log events to {{site.data.keyword.messagehub}}.
  2. To persist the log events, they are stream landed into {{site.data.keyword.cos_short}} through {{site.data.keyword.sqlquery_short}}.
  3. The storage bucket and the {{site.data.keyword.sqlquery_short}} jobs are encrypted with {{site.data.keyword.keymanagementserviceshort}} service. Also, the stream landing job executes in {{site.data.keyword.sqlquery_short}} by securely retrieving the service ID from {{site.data.keyword.keymanagementserviceshort}}.
  4. Auditor or support staff use {{site.data.keyword.sqlquery_short}} or {{site.data.keyword.iae_short}} to perform requests.
  5. Requests are executed against the data stored in {{site.data.keyword.cos_short}}.

Before you begin

{: #big-data-log-analytics-prereqs}

This tutorial requires:

  • {{site.data.keyword.cloud_notm}} CLI with the {{site.data.keyword.iae_short}} plugin (ibmcloud plugin install analytics-engine-v3)
  • curl and awk
  • Optionally, a Docker client.

You will find instructions to download and install these tools for your operating environment in the Getting started with solution tutorials guide.

To avoid the installation of these tools you can use the {{site.data.keyword.cloud-shell_short}} from the {{site.data.keyword.cloud_notm}} console. {: tip}

Create services

{: #big-data-log-analytics-setup} {: step}

In this section, you will create the services required to perform analysis of log events generated by your applications. Choose a region that supports all services. Dallas and Frankfurt are supported.

Enable Platform Logs

{: #big-data-log-analytics-platform-logs}

Platform logs are generally useful for troubleshooting resources and will be required in a later step to see the output of {{site.data.keyword.iae_short}}.

You can have multiple {{site.data.keyword.loganalysisshort_notm}} instances, however, only one instance in a region can be configured to receive platform logs from enabled cloud services in that {{site.data.keyword.Bluemix_notm}} region. Also, be aware of the limitations for {{site.data.keyword.iae_short}} serverless instances in terms of logging. {: important}

  1. Navigate to the Observability{: external} page and click Logging, look for any existing log analysis services with Platform logs enabled. If there is a platform logging instance in the region, no further configuration is required.
  2. To create a new {{site.data.keyword.loganalysislong_notm}} click the Options then Create. Continue to create a logging instance with 7-day plan. If not visible, click the refresh button.
  3. Back in the Observability{: external} page and click Logging on the left pane.
    1. Click on Options > Edit platform and select the region.
    2. Select the {{site.data.keyword.loganalysisshort_notm}} service instance created for platform logs.

For more information, see Configuring {{site.data.keyword.Bluemix_notm}} platform logs {: tip}

Create the tutorial resources with {{site.data.keyword.bpshort}}

{: #big-data-log-analytics-schematics}

  1. Navigate to Create {{site.data.keyword.bpshort}} Workspaces.
  2. Under the Specify Template section, verify:
    1. Repository URL is https://github.com/IBM-Cloud/big-data-log-analytics
    2. Terraform version is terraform_v1.5
  3. Under Workspace details,
    1. Provide a workspace name : big-data-log-analytics.
    2. Choose a Resource Group and a Location.
    3. Click on Next.
  4. Verify the details and then click on Create.
  5. Under the Variables section, you can optionally set a value for prefix or region.
  6. Scroll to the top of the page and click Apply plan.
  7. Wait for the Apply job to complete. Check the logs to see the status of the services created.

Explore the resources created by {{site.data.keyword.bpshort}}

{: #big-data-log-analytics-explore}

  1. Navigate to the resource list.
  2. Set Group to the resource group created by {{site.data.keyword.bpshort}}. The resource group is named after the prefix you set. If you left prefix blank, the resource group will follow the pattern blda-1234-group.
  3. Locate all of the resources from the diagram above:
    • an instance of {{site.data.keyword.messagehub}},
    • an instance of {{site.data.keyword.sqlquery_short}},
    • an instance of {{site.data.keyword.cos_short}} and a bucket to persist log data and files generated by {{site.data.keyword.sqlquery_short}} jobs,
    • an instance of {{site.data.keyword.iae_short}} to inspect log data,
    • and an instance of {{site.data.keyword.keymanagementserviceshort}} used to encrypt the storage bucket, and to store {{site.data.keyword.sqlquery_short}} job data and the service ID for stream landing.

Streams landing from {{site.data.keyword.messagehub}} to Cloud {{site.data.keyword.cos_short}}

{: #big-data-log-analytics-configure-streams} {: step}

Configure stream landing

{: #big-data-log-analytics-streamlanding}

In this section, you will learn how to run a fully-managed stream data ingestion from {{site.data.keyword.messagehub}} into Parquet on {{site.data.keyword.cos_full_notm}}. {{site.data.keyword.sqlquery_notm}} is the key component in the Stream Landing approach. It is the service that connects to {{site.data.keyword.messagehub}} and copies the data to {{site.data.keyword.cos_full_notm}}.

Parquet{: external} is an open source file format for nested data structures in a flat columnar format. Compared to the traditional approach where data is stored in rows, Parquet is more efficient in terms of storage and performance.

  1. In your browser, navigate to the resource list{: external} and under Integration, click on the {{site.data.keyword.messagehub}} log-analysis-es service.
  2. Select Topics from the navigation pane on the left.
  3. Select the context menu (three vertical dots) for your topic webserver and click Create stream landing configuration. Event Streams topics{: caption="Event Streams topics" caption-side="bottom"}
  4. Click Start and select the log-analysis-cos service. Click Next.
  5. Select the bucket with the name ending with -bucket and click Next.
  6. Select the {{site.data.keyword.sqlquery_short}} log-analysis-sql service and click Next.
  7. Configure how you want your topic data to be streamed to {{site.data.keyword.cos_short}}:
    1. Prefix for objects added to {{site.data.keyword.cos_short}} bucket: logs-stream-landing
    2. Create a new Service ID : logs-stream-landing-service-id
    3. Select Key Protect as the Authorization service.
    4. Select the log-analysis-kp {{site.data.keyword.keymanagementserviceshort}} service instance.
    5. Click Start streaming data.

You now see the status Queued for your topic. It may take up to 5 minutes until the streaming job is fully dispatched and up and running. You will see the status switch to Running at that point. In the context menu, you find a new option called View stream landing configuration.

Prepare {{site.data.keyword.cloud-shell_short}}

{: #big-data-log-analytics-prepare-shell}

  1. In {{site.data.keyword.bpshort}}, select the workspace you created.

  2. In Settings, locate the Workspace ID.

  3. Open {{site.data.keyword.cloud-shell_short}}.

  4. Define WORKSPACE_ID

    WORKSPACE_ID=<your workspace id>

    {: pre}

  5. Target the region where you deployed resources:

    ibmcloud target -r us-south

    {: pre}

  6. View the outputs generated by the Apply job.

    ibmcloud schematics output -id $WORKSPACE_ID --output json | jq

    {: pre}

    Notice the outputs are small scripts. You will use them in the next steps.

  7. Retrieve all the outputs produced by the workspace in a variable.

    OUTPUTS_JSON=$(ibmcloud schematics output -id $WORKSPACE_ID --output json)

    {: pre}

Using kcat with {{site.data.keyword.messagehub}}

{: #big-data-log-analytics-kafkatools}

The streaming job is currently idle and awaiting messages. In this section, you will configure the tool kcat{: external} to work with {{site.data.keyword.messagehub}}. kcat allows you to produce arbitrary messages from the terminal and send them to {{site.data.keyword.messagehub}}. Below the Kafka message feed will be persisted in your data lake on {{site.data.keyword.cos_full_notm}} .

  1. Either install kcat{: external} on your machine or use it via Docker. The tutorial uses the Docker image.

  2. Run the followwing command to generate the kcat configuration file. The file includes the list of Kafka brokers and authentication.

    echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .kcat_config | .value' > kcat.config

    {: pre}

  3. Run kcat

    docker run -v  ${PWD}:/bdla -w /bdla -it edenhill/kcat:1.7.0 -F kcat.config -P -t webserver

    {: pre}

    The command will connect as event producer to the topic webserver. Remember the command for later.

  4. The kcat tool is awaiting input. Copy and paste the log message from below into the terminal. Hit enter then CTRL-d to send the log message to {{site.data.keyword.messagehub}}.

    { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" }
    { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" }
    { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" }
    { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" }
    { "host": "199.72.81.55", "time_stamp": "01/Jul/1995:00:00:01 -0400", "request": "GET /history/apollo/ HTTP/1.0", "responseCode": "200", "bytes": "6245" }

    {: codeblock}

  5. Repeat the previous step to generate additional events.

Check the landed data

{: #big-data-log-analytics-checkmessages}

You can check the landed data in the {{site.data.keyword.sqlquery_short}} UI and also in the {{site.data.keyword.cos_short}} bucket.

  1. Navigate to the resource list{: external} and under Databases, click on log-analysis-sql service.

  2. Click on Launch {{site.data.keyword.sqlquery_short}} UI to open the {{site.data.keyword.sqlquery_short}} UI. You should see the streaming job Running.

  3. Click on the Details tab to see the actual SQL statement that was submitted to {{site.data.keyword.sqlquery_short}} for the stream landing. Notice the Result location it will be used shortly to query the data. {{site.data.keyword.sqlquery_short}} console{: caption="{{site.data.keyword.sqlquery_short}} console" caption-side="bottom"}

    The Select statement would looks like

    SELECT * FROM <EVENT_STREAMS_CRN>/webserver 
    STORED AS JSON EMIT cos://<REGION>/<BUCKET_NAME>/logs-stream-landing/topic=webserver 
    STORED AS PARQUET EXECUTE AS <KEY_PROTECT_CRN_WITH_KEY>

    {: codeblock}

    It is a SELECT statement from your {{site.data.keyword.messagehub}} instance and topic (identified via the unique CRN) and the selected data is emitted (EMIT) to your {{site.data.keyword.cos_short}} bucket AS PARQUET format. The operation is executed (EXECUTE) with the service ID's API key that is stored in the {{site.data.keyword.keymanagementserviceshort}} instance. {: tip}

  4. Click on the link in the Result location field, which opens the {{site.data.keyword.cos_short}} UI with a filter set to the objects that are being written by that job. {{site.data.keyword.cos_short}} object view{: caption="{{site.data.keyword.cos_short}} object view" caption-side="bottom"}

    In the {{site.data.keyword.cos_short}} UI, switch to object view by clicking on the icon next to Upload, You should see that there are a couple of metadata objects to track, such as the latest offset that has been consumed and landed. But, in addition, you can find the Parquet files with the actual payload data. {: tip}

  5. Return to the {{site.data.keyword.sqlquery_short}} UI and in the Details tab click on Query the result and then click Run to execute a Batch job. You should see the query in the panel pointing to the {{site.data.keyword.cos_short}} file (under FROM) with the log message(s) you sent above. Wait for the job to change to Completed.

  6. Click on the Results tab to see the log messages in a tabular format.

    The query saves the result to a CSV file under a different bucket with name sql-<SQL_QUERY_GUID>. Check the INTO part of the query. {: tip}

    If the query fails with a message like The Parquet file is empty or corrupted, send additional messages and confirm that files are created in the bucket. {: tip}

Increase message load

{: #big-data-log-analytics-streamsload}

For later analysis purposes increase the message volume sent to {{site.data.keyword.messagehub}}. The provided script simulates a flow of messages to {{site.data.keyword.messagehub}} based on traffic to the webserver. To demonstrate the scalability of {{site.data.keyword.messagehub}}, you will increase the throughput of log messages.

  1. Download and unzip the Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed log file from NASA:

    curl ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz -o NASA_access_log_Jul95.gz

    {: pre}

  2. Turn the access logs into JSON format by running:

    gunzip -c NASA_access_log_Jul95.gz | awk -F " " '{ print "{\"host\":\"" $1 "\",\"time_stamp\":\"" $4 " "  $5 "\",\"request\":" $6 " " $7 " " $8 ",\"responseCode\":\"" $9 "\",\"bytes\":\"" $10 "\"}" }' > NASA_logs.json

    {: pre}

  3. Create a shell script to only send few log lines per second. Create a new file rate_limit.sh and copy the following into it:

    #! /bin/bash
    if [ -z "$1" ]; then 
       echo "usage: $0 filename lines wait"
       exit
    fi
    INPUT_FILE=$1
    NUM_LINES=$2
    COUNTER=0
    WAIT_SECONDS=$3
    while read -u3 input_text rest; do
       trap 'exit 130' INT	
       echo $input_text $rest
       ((COUNTER++))
       if (( COUNTER == $NUM_LINES )); then
    	   sleep $WAIT_SECONDS
    	   COUNTER=0
       fi
    done 3< "$INPUT_FILE"

    {: pre}

    The script accepts a file name, the number of lines to output as chunk, and how many seconds to wait in between.

  4. Make the script executable:

    chmod +x rate_limit.sh

    {: pre}

  5. Run the following command to send lines each from the access log to {{site.data.keyword.messagehub}}. It uses the converted log file from above, sends 10 lines each and waits 1 second before sending the next lines:

    ./rate_limit.sh NASA_logs.json 10 1 | docker run -v  ${PWD}:/bdla -w /bdla -i edenhill/kcat:1.7.0 -F kcat.config -P -t webserver

    {: pre}

  6. The script configuration above pushes about 10 lines/second. Stop the script after the desired number of messages have been streamed using control+C.

  7. In your browser, return to the {{site.data.keyword.sqlquery_short}} UI and the Details tab. There, click on Query the result and then click Run to see some received messages under the Results tab of the batch job.

  8. You can experiment with {{site.data.keyword.messagehub}} by increasing or decreasing the number of lines value.

Investigating log data using {{site.data.keyword.sqlquery_short}}

{: #big-data-log-analytics-sqlquery} {: step}

Depending on how long you ran the transfer, the number of files on {{site.data.keyword.cos_short}} has certainly grown. You will now act as an investigator answering audit or compliance questions by combining {{site.data.keyword.sqlquery_short}} with your log file. The benefit of using {{site.data.keyword.sqlquery_short}} is that the log file is directly accessible - no additional transformations or database servers are necessary.

  1. Back in the Details view edit the {{site.data.keyword.sqlquery_short}}.

    • Click on the drop down to select Streaming jobs (as opposed to Batch jobs).
    • Open the Details and click on the Query the result.
    • Notice the query editor, above, is populated with a query.
    • Notice the FROM clause does not specify a specific parquet object in the bucket but references the job id, which means all objects in the job. Perfect!
      cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID>
    • Remove the INTO clause to display the output without storing the results.
      INTO cos://<Region>/sql-<ID>/result/ STORED AS CSV
    • Run the query.
    • Observe the results when it is complete in the Result tab. It includes the 50 first messages you sent.
    • Now lets do some investigation by modifying this basic query.
  2. In the {{site.data.keyword.sqlquery_short}} UI, edit the SQL in the text area to look more like this, keep the FROM statement as is.

    -- What are the top 10 web pages on NASA from July 1995?
    -- Which mission might be significant?
    SELECT REQUEST, COUNT(REQUEST)
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    WHERE REQUEST LIKE '%.htm%'
    GROUP BY REQUEST
    ORDER BY 2 DESC
    LIMIT 10

    {: codeblock}

  3. Update the FROM clause with your Object SQL URL and click Run.

  4. Click on the latest Completed job to see the result under the Result tab.

  5. Select the Details tab to view additional information such as the location where the result was stored on {{site.data.keyword.cos_short}}.

  6. Try the following question and answer pairs by adding them individually to the Type SQL here ... text area.

    -- Who are the top 5 viewers?
    SELECT HOST, COUNT(*)
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    GROUP BY HOST
    ORDER BY 2 DESC
    LIMIT 5

    {: codeblock}

    -- Which viewer has suspicious activity based on application failures?
    SELECT HOST, COUNT(*)
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    WHERE `responseCode` == 500
    GROUP BY HOST
    ORDER BY 2 DESC

    {: codeblock}

    -- Which requests showed a page not found error to the user?
    SELECT DISTINCT REQUEST
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    WHERE `responseCode` == 404

    {: codeblock}

    -- What are the top 10 largest files?
    SELECT DISTINCT REQUEST, BYTES
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    WHERE BYTES > 0
    ORDER BY CAST(BYTES as Integer) DESC
    LIMIT 10

    {: codeblock}

    -- What is the distribution of total traffic by hour?
    SELECT SUBSTRING(TIME_STAMP, 13, 2), COUNT(*)
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    GROUP BY 1
    ORDER BY 1 ASC

    {: codeblock}

    -- Why did the previous result return an empty hour?
    -- Hint, find the malformed hostname.
    SELECT HOST, REQUEST
    FROM cos://<REGION>/<BUCKET-NAME>/log-stream-landing/topic=webserver/jobid=<JOBID> STORED AS PARQUET
    WHERE SUBSTRING(TIME_STAMP, 13, 2) == ''

    {: codeblock}

Investigating data using {{site.data.keyword.iae_short}} serverless instances

{: #big-data-log-analytics-5} {: step}

The data stream landed to {{site.data.keyword.cos_short}} can be also queried using Apache Spark that is part of the {{site.data.keyword.iae_short}} serverless instance. Programs will need to be loaded into a bucket and executed from the command line. The pyspark python environment will be used in this tutorial.

Submit a simple job to the Spark runtime

{: #big-data-log-analytics-simple-spark}

  1. In case your {{site.data.keyword.cloud-shell_short}} connection has expired, reconnect the session and restore the environment variables:

    WORKSPACE_ID=<your workspace id>
    OUTPUTS_JSON=$(ibmcloud schematics output -id $WORKSPACE_ID --output json)

    {: pre}

  2. Set the environment variables needed for the next steps:

    eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_01_env_variables | .value')"

    {: pre}

  3. Run a program that is built into the IBM spark runtime to print some output:

    eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_02_run_word_count | .value')"

    {: pre}

    To view the command before they get executed run echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_02_run_word_count | .value' {: tip}

  4. Navigate to the resource list{: external} and under Analytics, click on log-analysis-iae service.

  5. Select the Applications tab, click refresh. Notice the submitted job. Eventually it gets marked as Finished.

View the simple job output

{: #big-data-log-analytics-view-spark-log}

  1. Navigate to the Logging{: external} page, look for the existing {{site.data.keyword.la_short}} service in the region with Platform logs enabled.

  2. Click Open dashboard.

  3. In a few minutes you should see the logs associated with the program.Search for host:ibmanalyticsengine. There will be a lot of output. Look for:

    Michael,: 1
    29: 1
    Andy,: 1
    30: 1
    Justin,: 1
    19: 1
    

    {: codeblock}

Run hello world

{: #big-data-log-analytics-hello-world}

During provisioning, {{site.data.keyword.bpshort}} created a simple hello.py file in the {{site.data.keyword.cos_short}} bucket.

  1. Run the hello.py script as an application in the IBM spark runtime:

    eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_03_run_hello_world | .value')"

    {: pre}

    To view the command before they get executed run echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_03_run_hello_world | .value' {: tip}

  2. In the {{site.data.keyword.iae_short}} instance on the Applications, click refresh to see the status of the application. Wait for it to finish.

  3. Look for hello world in the Platform Logs.

Run an application going through the files created by the stream landing configuration

{: #big-data-log-analytics-cos-app}

The final step is to submit the spark application that accesses the data in the same bucket. The solution.py Python script has already been created by {{site.data.keyword.bpshort}} in the {{site.data.keyword.cos_short}} bucket.

The script requires you to provide the ID of the {{site.data.keyword.sqlquery_short}} job used by stream landing.

  1. Navigate to the resource list{: external} and under Storage, click on log-analysis-cos service.

  2. Select the bucket ending with -bucket

  3. Click the second object.

  4. In the side panel, the full object name is visible with a format similar to logs-stream-landing/topic=webserver/jobid=123456-aaa-4444-bbbb-08f3d1626c46. Write down the job ID. In this example the job ID is 123456-aaa-4444-bbbb-08f3d1626c46

  5. In {{site.data.keyword.cloud-shell_short}}, define a JOB_ID variable:

    JOB_ID=<the value you got from the bucket object>

    {: pre}

  6. Run the solution.py application

    eval "$(echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_04_run_solution | .value')"

    {: pre}

    To view the command before they get executed run echo $OUTPUTS_JSON | jq -r '.[] | .output_values | .[] | .iae_04_run_solution | .value' {: tip}

  7. In the {{site.data.keyword.iae_short}} instance on the Applications, click refresh to see the status of the application. Wait for it to finish.

  8. Look for the application logs in the Platform Logs.

    +---------------------------+-----+--------------------+--------------------+------------+--------------------+
    |_corrupt_or_schema_mismatch|bytes|                host|             request|responseCode|          time_stamp|
    +---------------------------+-----+--------------------+--------------------+------------+--------------------+
    |                       null| 9867|      ntigate.nt.com|GET /software/win...|         200|[01/Jul/1995:04:1...|
    |                       null| 7634|piweba3y.prodigy.com|GET /shuttle/miss...|         200|[01/Jul/1995:04:1...|
    |                       null|25218|      ntigate.nt.com|GET /software/win...|         200|[01/Jul/1995:04:1...|
    |                       null| 4441|      ntigate.nt.com|GET /software/win...|         200|[01/Jul/1995:04:1...|
    |                       null| 1414|      ntigate.nt.com|GET /images/const...|         200|[01/Jul/1995:04:1...|
    |                       null|45308|line03.pm1.abb.mi...|GET /shuttle/miss...|         200|[01/Jul/1995:04:1...|
    |                       null|  669|  source.iconz.co.nz|GET /images/WORLD...|         200|[01/Jul/1995:04:1...|
    |                       null|  234|  source.iconz.co.nz|GET /images/USA-l...|         200|[01/Jul/1995:04:1...|
    |                       null|  363|  source.iconz.co.nz|GET /images/MOSAI...|         200|[01/Jul/1995:04:1...|
    |                       null|13372|      ntigate.nt.com|GET /software/win...|         200|[01/Jul/1995:04:1...|
    +---------------------------+-----+--------------------+--------------------+------------+--------------------+
    

    {: codeblock}

Expand the tutorial

{: #big-data-log-analytics-expand}

Congratulations, you have built a log analysis pipeline with {{site.data.keyword.cloud_notm}}. Follow the Build a data lake using {{site.data.keyword.cos_short}} tutorial to add a dashboard to log data.

Remove services

{: #big-data-log-analytics-removal} {: step}

  1. From the Resource List{: external}, select the log-analysis-kp* service instance.
  2. Delete the key with a name starting with streaming-job. This key was created outside of {{site.data.keyword.bpshort}} and should be deleted first.
  3. Go to {{site.data.keyword.bpshort}} and select your workspace.
  4. Under Actions, select Destroy resources.
  5. Wait for {{site.data.keyword.bpshort}} to destroy all resources.
  6. Delete the workspace.
  7. Navigate to Manage > Access (IAM) > Service IDs{: external} in the {{site.data.keyword.cloud_notm}} console and Remove the log-stream-landing-service-id serviceID.

Depending on the resource it might not be deleted immediately, but retained (by default for 7 days). You can reclaim the resource by deleting it permanently or restore it within the retention period. See this document on how to use resource reclamation. {: tip}

Related content

{: #big-data-log-analytics-8} {: related}