Skip to content

Virtual Machine README

Phil Hagen edited this page Oct 14, 2024 · 7 revisions

Background

This page contains details for the SOF-ELK® (Security Operations and Forensics Elasticsearch, Logstash, Kibana) VM. The VM is provided as a community resource but is covered at varying depths in the following SANS courses:

The VM is also used in the following courseware and/or other instructional materials:

All parsers and dashboards for this VM are now maintained in the project's Github repository.

Download

The latest version of the VM itself is always available at https://for572.com/sof-elk-vm.

Latest Distribution Vitals

  • Basic details on the distribution
    • VM is a CentOS 7.9 base with all OS updates as of 2024-10-11
    • Includes Elastic stack components v8.15.2
    • Configuration files are from the "public/v20241011" branch of this Github repository
  • Metadata
    • Filename and size: Public SOF-ELK v20241011.7z (2,456,956,798 bytes)
    • MD5: 505b4f71ccfee6d97186bd29d7247625
    • SHA256: a7ea2505d8396dde46110085d29bb079f6af33db6d348853b183031a6cce263f

General Information

  • The VM was created with VMware Fusion v13.6.0 and ships with virtual hardware v18.

    • If you're using an older version of VMware Workstation/Fusion/Player, you will likely need to convert the VM back to a previous version of the hardware.
    • Some VMware software provides this function via the GUI, or you may find the free "VMware vCenter Converter" tool helpful.
  • The VM is deployed with the "NAT" network mode enabled

  • Credentials:

    • username: elk_user
      • password: forensics
      • has sudo access to run ALL commands
  • Logstash will ingest all files from the following filesystem locations:

    • /logstash/aws/: JSON-formatted Amazon Web Services CloudTrail log files. Use the included aws-cloudtrail2sof-elk.py loader script.
    • /logstash/azure/: JSON-formatted Microsoft Azure logs. At this time, the following log types are supported: Event Logs, Sign In Logs, Audit Logs, Admin Activity Logs, and Storage Logs.
    • /logstash/gcp/: JSON-formatted Google Compute Platform logs.
    • /logstash/gws/: JSON-formatted Google Workspace logs extracted using the Google Workspace API.
    • /logstash/httpd/: Apache logs in common, combined, or vhost-combined formats
    • /logstash/kape/: JSON-format files generated by the KAPE triage collection tool. (See this document for details on which specific output files are currently supported and their required file naming structure.)
    • /logstash/kubernetes/: Kubernetes log files.
    • /logstash/microsoft365/: JSON-formatted Microsoft 365 logs only.
    • /logstash/nfarch/: Archived NetFlow output, formatted as described below.
    • /logstash/passivedns/: Logs from the passivedns utility.
    • /logstash/plaso/: CSV bodyfile-format files generated by the Plaso tool from the log2timeline framework. (See this document for details on creating CSV files in a supported format.)
    • /logstash/syslog/: Syslog-formatted data
      • NOTICE: Remember that syslog DOES NOT reflect the year of a log entry! Therefore, Logstash has been configured to look for a year value in the path to a file. For example: /logstash/syslog/2015/var/log/messages will assign all entries from that file to the year 2015. If no year is present, the current year will be assumed. This is enabled only for the /logstash/syslog/ directory.
    • /logstash/zeek/: JSON-formatted logs from the Zeek Network Security Monitoring platform. These must be in decompressed form. The following Zeek logs are supported:
      • conn.log: Treated like NetFlow and stored in the netflow-* indices.
      • dns.log: Treated like other DNS logs and stored in the logstash-* indices.
      • http.log: Treated like other HTTP logs and stored in the httpdlog-* indices.
      • The following logs are stored in the zeek-* indices:
        • files.log
        • ftp.log
        • notice.log
        • ssl.log
        • weird.log
        • x509.log
  • Commands to be familiar with:

    • /usr/local/sbin/sof-elk_clear.py: DESTROY contents of the Elasticsearch database. Most frequently used with an index name base (e.g. sof-elk_clear.py -i logstash to delete all data from the Elasticsearch logstash-* indexes.) Other options detailed with the -h flag.
    • /usr/local/sbin/sof-elk_update.sh: Update the SOF-ELK® configuration files from the Github repository. (Requires sudo.)
  • Files to be familiar with:

    • /etc/logstash/conf.d/*.conf: Symlinks to github-based configuration files that handle input, preprocessing, parsing, postprocessing, and output of log events.
    • /usr/local/sof-elk/*: Clone of the project Github repository, with the public/v* branch corresponding to the virtual machine's release version.

How to Use

  • Extract the compressed archive to your host system
  • Open and boot the VM
  • Log into the VM with the elk_user credentials (see above)
    • Logging in via SSH recommended, but if using the console login and a non-US keyboard, run sudo loadkeys uk, replacing uk as needed for your local keyboard mapping
  • cd to one of the /logstash/*/ directories as appropriate
  • Place files in this location (Mind the above warning about the year for syslog data. Files must also be readable by the "logstash" user.)
  • Open the main Kibana dashboard using the Kibana URL shown in the pre-authentication screen, http://<ip_address>:5601
    • This dashboard gives a basic overview of what data has been loaded and how many records are present
    • There are links to several stock dashboards on the left hand side
  • Wait for Logstash to parse the input files, load the appropriate dashboard URL, and start interacting with your data

Ingesting Archived NetFlow

  • To ingest existing nfcapd-created NetFlow evidence, it must be parsed into a specific format. The included nfdump2sof-elk.sh script will take care of this.
    • Read from single file: nfdump2sof-elk.sh -r /path/to/netflow/nfcapd.201703190000 -w /logstash/nfarch/inputfile_1.txt
    • Read recursively from directory: nfdump2sof-elk.sh -r /path/to/netflow/ -w /logstash/nfarch/inputfile_2.txt
    • Optionally, you can specify the IP address of the exporter that created the flow data: nfdump2sof-elk.sh -e 10.3.58.1 -r /path/to/netflow/ -w /logstash/nfarch/inputfile_3.txt
  • To ingest existing AWS VPC Flow data files in JSON format, use the included aws-vpcflow2sof-elk.sh script.
    • Read recursively from directory: aws-vpcflow2sof-elk.sh -r /path/to/aws-vpcflow/ -w /logstash/nfarch/aws-vpcflow_1.txt
  • To ingest existing GCP VPC Flow data files in JSON format, use the included azure-vpcflow2sof-elk.py script.
    • Read from single file: azure-vpcflow2sof-elk.py -r /path/to/gcp-vpcflow/file1.json -w /logstash/nfarch/gcp-vpcflow_1.txt
    • Read recursively from directory: azure-vpcflow2sof-elk.py -r /path/to/gcp-vpcflow/ -w /logstash/nfarch/gcp-vpcflow_1.txt

Sample Data Included

  • Syslog data in ~elk_user/lab-2.3_source_evidence/
    • Unzip each of these files into the /logstash/syslog/ directory, such as: unzip -d /logstash/syslog/ ~elk_user/lab-2.3_source_evidence/<file>
    • Use the time frame 2013-06-08 15:00:00 to 2013-06-08 23:30:00 to examine this data.
  • NetFlow data in ~elk_user/lab-3.1_source_evidence/
    • Use the nfdump2sof-elk.sh script and write output to the /logstash/nfarch/ directory, such as: cd /home/elk_user/lab-3.1_source_evidence/ ; nfdump2sof-elk.sh -e 10.3.58.1 -r ~elk_user/lab-3.1_source_evidence/netflow/ -w /logstash/nfarch/lab-3.1_netflow.txt
    • Use the time frame 2012-04-02 to 2012-04-07 to examine this data.

Credits and Special Thanks

This project continues in part due to the amazing support from a range of people from the security industry. The valuable and vital contributions from those who have committed content, filed issues, and submitted pull requests are reflected in the GitHub interface for those respective functions. In addition, the support from others is just as critical, including generating and/or providing sample data to test new features, documentation inputs, and more. This is not an exhaustive list, but the efforts of the information security community is always an important factor in the success of any open source project.

  • Derek B: Cisco ASA parsing/debugging and sample data
  • Barry A: Sample data and troubleshooting
  • Ryan Johnson: Testing
  • Matt Bromiley: Testing
  • Mike Pilkington: Testing
  • Mark Hallman: Testing
  • David Szili: Testing and troubleshooting
  • Pierre Lidome: Microsoft 365 assistance, test data, and overall testing of the cloud data parsers
  • Josh Lemon: GCP assistance
  • David Cowen: AWS assistance
  • Megan Roddie: Testing
  • Bedang Sen: Documentation regarding building new parsers
  • Tony Knutson: Sample data for the KAPE, Snare, and Plaso parsers; Overall testing

Admin/Legal

  • This virtual appliance is provided "as is" with no express or implied warranty for accuracy or accessibility. No support for the functionality the VM provides is offered outside of this document.
  • This virtual appliance includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com and subject to the GeoLite2 EULA. The included GeoIP database files are from December 17, 2019 and are covered by the previous MaxMind license that permitted redistribution of these files without collecting user contact information. Installation of updated GeoIP databases should be accomplished by the included /usr/local/sbin/geoip_bootstrap.sh script. This script also optionally enables scheduled automatic updates to the databases for Internet-connected systems. You can learn more about the GeoLite2 databases, as well as sign up for a free MaxMind account by clicking here.
  • SOF-ELK® is a registered trademark of Lewes Technology Consulting, LLC. Content is copyrighted by its respective contributors. SOF-ELK logo is a wholly owned property of Lewes Technology Consulting, LLC and is used by permission.