Skip to content
Nicolas Toussaint edited this page Aug 24, 2020 · 80 revisions

Welcome to the GSoC_2020_FOSSology (Fossdash) wiki!

Introduction

FOSSology generates a large set of data that is exported to the time-series influx database and visualized with the help of Grafana. I Wrote a fossdash_publisher script that collects useful data from FOSSology DB (Postgres) and exposes them to influx DB. Developed a visualization dashboard in Grafana by integrating influx as an input data source.

This project is divided into two parts:

  1. Generating meaningful data from fossology DB and publish those data to InfluxDB ( Time-series database ) using a fossdash-publisher script.
  2. In the Grafana using a query tool, We can get InfluxDB data and show it using meaningful charts and graphs.

Architecture

architecture_image

Configuration - To Get Started with Fossdash

  • Install fossology - You can get more information about fossology from Here: https://github.com/fossology/fossology/wiki
  • Install the fossdash dependency by running the script install/fossdash/fossdash_dep_install.sh
  • We can configure fossdash from the Fossology Sysconfig UI page.
    • by going to Admin->Fossdash.
  • Enable/Disable fossdash from above fossology UI sysconfig page.
  • Set InfluxDB server URL (FossDash Endpoint URL): We are pushing theses all data metrics to specified InfluxDB URL.
    • URL included = influxDB URL + database name
    • If running local installation of fossology: http://localhost:8086/write?db=fossology_db
    • If running docker instance of fossology: http://influxdb:8086/write?db=fossology_db
  • Fossology_instance_name: Set the fossology instance name, leave empty to use autogenerated UUID value.
    • If you leave this field empty, It automatically used the default UUID (e.g. 569ef786-4182-4b8d-bbf4-bbe055cfc3f3 )
    • you can configure your custom unique fossology instance name.
  • Cron job Configuration: we are triggering a fossdash-publisher script to run at every interval specified according to the cronjob interval.
    • Every minute: * * * * *
    • Schedule a cron to execute at 1am daily: 0 1 * * *
  • Fossdash reported files cleaning: Number of days for which the successfully pushed metrics are archived. Older data will be deleted. Leave empty to disable cleanup.
    • To save disk space and removes old reported files.
    • Set it to zero to delete all reported files.
  • Auth_type for InfluxDB: you can choose either username_password based authentication or Token_based authentication to push data to inlfuxDB.
    • username_password: Asked from InfluxDB admin to create username and password for you and have access to the database fossology_db
      • To test user_password auth: curl -XPOST "localhost:8086/query?db=fossology_db&u=admin&p=admin" --data-urlencode 'q=show MEASUREMENTS'
    • token_based: generate JWT token by providing your influx username + shared secrete key of InfluxDB (get it from InfluxDB config file) + expiration timestamp.
    • Steps to generate InfluxDB token
  • Fossdash metric-reporting config : you can modify(inlude/exclude) fossdash metrics display in dashboard by using this configuration.
    • If you leave empty: The default metric config file is used, install/fossdash/fossdash_metrics.yml
    • If you want to add a new metric field in the dashboard
      • Add new metric name in the QUERIES_NAME list;
        •   QUERIES_NAME: [ ... , "number_of_users" ]
          
      • Add same query name and its related query to fetch the data from postgres (fossologyDB)
        •   QUERY:
               ... 
               ...
               number_of_users: "SELECT count(u.*) AS users FROM users u;" 
          

Fossdash - Grafana Dashboard

  • Clone this repository: https://github.com/Orange-OpenSource/fossdash
  • Configure Environment variable for grafana and influxDB
    • SERVICE_URL_ROOT=http://localhost
    • INFLUXDB_ADMIN_USER=admin
    • INFLUXDB_ADMIN_PASSWORD=admin
    • etc...
  • Once you are done with the configuration, you can run this as docker instance: docker-compose up -d
  • Browse to http://localhost:8081/grafana
    • There two dashboards
      • Instances-Specific_FOSSY_DASH: You can choose fossology instances from drop-down to get all stats related to the selected instance.
      • FOSSY_DASH: generic dashboard gives stats about all fossology instances.
    • you can change the time range to get statistics between two times. Absolute time range at the top right corner.

basepath and logs

  • Storing reported and unreported metrics files in /srv/fossology/repository/fossdash
  • Checking fossdash log in /srv/fossology/repository/fossdash/fossdash.log file
  • Fossdash metrics config: /usr/local/etc/fossology/fossdash_metrics.yml

Codebase

  • Added fossdash configuration.
    • src/lib/php/fossdash-config.php
    • src/www/ui/admin-fossdash-config.php
  • Changes in fossology sysconfig UI to add Bootstrap classes.
    • src/www/ui/admin-config.php
    • src/lib/php/common-sysconfig.php
  • changes in respective Makefile
  • Wrote a fossdash-publisher script in python to get the latest data from Postgres and convert them to InfluxDB standard formate and then push them to InfluxDB.
    • install/fossdash/fossdash-publish.py.in
    • Metric-reported config for fossdash: install/fossdash/fossdash_metrics.yml
  • Developed a Dashboard in grafana to show these data metrics in a meaningful way.
    • General Dashboard: It shows combined information of all instances of fossology.
    • Instance-specific-Dashboard: It shows only instance-specific data metric on the dashboard. You can choose any specific instance name from the dropdown.
  • Checking cron job entry using : sudo -u www-data crontab -l OR check file /var/spool/cron/crontabs/www-data (in Docker, in the Web container)

Git Repository

Team

  • Student: Darshan Kansagara (darshank15)
  • Mentor(s): Gaurav Mishra (@GMishx), Shaheem Azmal (@shaheemazmalmmd), Sandipbhuyan (@sandipbhuyan)

Weekly Report

Community Bonding (04-May - 01-June)

  • Clone fossology repo and set up to run it locally
  • Understand the terminology used by the fossology project by going through the documentation. Read wiki page about the fossology
  • Understanding more on Prometheus and influxdb real-time data source and Grafana.
  • Created two demo architecture for dashboard
    • Using Prometheus as a data source ( Pull based architecture )
    • Using influxdb as a data source ( Push based architecture )
  • Created dashboard for each to showcase as POC of our idea.
  • Created document for basic terminology of Prometheus and grafana for the beginners Link .
  • Link to commit: https://github.com/darshank15/GSoC_2020_FOSSOlogy/commit/121695cc6569b9f0a042d6b88f8bd8fc287633a7
  • Discuss with mentor about project and its goals and understand it more clearly.
  • Understanding bash script as it was used much more in many scripts in fossology.

Week_1 (01-June to 06-June)

  • Look into some docker command and learn docker-compose
  • Take look into fossology branch for fossdash dev/fossdash-exporter
  • Run it locally to see data generated by the python script file “fossdash-publish.py”
  • Solve issue by modifying the above python file
  • Link to Issue: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/1
Initially python script generatting data in below formate :
agents_count.copyright,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00 value=1 1591286501000000000
agents_count.ecc,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00 value=1 1591286501000000000


We want to modify python script to get data as below to get new tag_set for type of agent_count
agents_count,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00,type=copyright value=1 1591246503000000000
agents_count,instance=c7fe15ee-5c9c-4687-91d8-b1ba840e6b00,type=ecc value=1 1591246503000000000
 
So we can do groupby based on INSTANCE as well as based on TYPE.
  • Made a simple dashboard for the above newly generated data for the showcase.

Week_2 (08-June to 13-June)

  • Look into an issue for configuring fossdash to work on both Docker containers as well as in source code.
  • the issue is here: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/2
  • Look into Makefile for configuration
  • Changed the code in common-sysconfig.php to get an input box in UI to configure FossDash URL and store it into the database table sysconfig.
  • Wrote script run_me.py which will trigger to read updated data from the database and modify fossdash.conf file.
  • I also started working on code to include VERSION info into the influx DB.
  • The issue is here: https://github.com/darshank15/GSoC_2020_FOSSOlogy/issues/3
  • Currently fetched all versions and build info from the VERSION file. But later on, we can get this data from the database table.

Week_3 (15-June to 20-June)

Week_4 (22-June to 27-June)

Week_5 (29-June to 04-July)

  • Rename UUID on a Fossology instance
  • Done with the first GSoC evalution.
  • Task-1 Cleaning old fossdash reported files.
    • As of now we generating all reported files posts send the data to the influxDB. As the fossdash script may be running every day, the Amount of all reported files in local space will be larger and more disk consumption over the period of time.
    • Implemented this functionality using find command (ctime, maxDepth) to get and delete older reported file to save the disk space.
  • Task-2 Cron job configuration to schedule an interval for the fossdash script file.
    • From configuration, the user can change the corn job schedule interval for fossdash.
    • Done using crontab command to update cronjob interval.
  • Task-3 Implemented Enable/Disable button to control the functionality of the fossdash.

Week_6 (06-July to 11-July)

Week_7 (13-July to 18-July)

Week_8 (20-July to 25-July)

Week_8 (26-July to 31-July)

Week_9 (03-August to 08-August)

Week_10 (10-August to 15-August)

Week_11 (17-August to 22-August)