Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor license checker task to work with Apache Rat v0.16.1 #16121

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

hainenber
Copy link

Description

[Describe what this change achieves]
Apache Rat v0.16+ has a major refactor that is quite breaking to current Ant integration for :licenseChecker task. This PR is to remediate it.

Refactor license checker task to work with Apache Rat v0.16.1

Testing

I deliberately change a file's license header to disallowed BSD4 and deleting entire header of another file. The precommit task works as expected.

image image

Related Issues

Resolves #16071

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers labels Sep 29, 2024
Copy link
Contributor

❌ Gradle check result for b1d0060: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 86efeeb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hainenber
Copy link
Author

Yikes, seems to be plenty of flaky tests. Seems to be unrelated to my change :/

image

@hainenber hainenber force-pushed the bump-apache-rat-0.16.1 branch from 86efeeb to 03d2954 Compare October 1, 2024 01:06
@hainenber
Copy link
Author

Apologies for the force push but I need to rebase added changes with --sign-off argument. Thanks @sandeshkr419 for the tip!

@sandeshkr419 sandeshkr419 added the backport 2.x Backport to 2.x branch label Oct 1, 2024
Copy link
Contributor

github-actions bot commented Oct 1, 2024

❌ Gradle check result for 03d2954: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hainenber hainenber force-pushed the bump-apache-rat-0.16.1 branch from e03523c to dfad9dd Compare October 1, 2024 14:53
@hainenber
Copy link
Author

Yikes, rebase sheenanigan!

@hainenber hainenber force-pushed the bump-apache-rat-0.16.1 branch from dfad9dd to d3521ae Compare October 1, 2024 14:58
@reta
Copy link
Collaborator

reta commented Oct 1, 2024

Yikes, rebase sheenanigan!

Thanks @hainenber for this change, I've asked one of the Apache RAT committers (@Claudenw) to take a look and help us out with possible disruptions that may come with 0.17

@Claudenw
Copy link

Claudenw commented Oct 1, 2024

The method of calling the Rat executable is very questionable. I wrote most of the changes for Rat 0.16 and 0.16.1 and for the upcoming 0.17. I am not certain that this code will continue to work going forward.

The recommended way forward is to look at https://github.com/apache/creadur-rat/blob/apache-rat-project-0.16.1/apache-rat-core/src/main/resources/org/apache/rat/default.xml and write a configuration file that does what you want with respect to licenses and approved licenses.

You should then be able to call the rat executable passing --licenses <your/file/here> to read your configuration. You will also need to pass the directory to process.

If you are using the Ant task you can simply create an instance of the Ant Report task and call the setLicenses() method passing your configuration file, and probably call setUseDefaultLicenses(false) to disable the default licenses.

Finally, you could create an instance of ReportConfiguration, set the properties you want, and call Reporter.report(config). See the antTask/Report.exec() method for an example.

As an aside, there is a gradle Rat plugin. However, it has not been updated and will probably not be updated until after 0.17 is released as we are cleaning up how options are made available to the UIs (e.g. Maven, Ant, Gradle).

If you would like further information please reach out to me: [email protected]

@hainenber
Copy link
Author

hi @Claudenw , thanks for visiting by and leaving a review! Checking RAT's master branch, I'm pretty sure It'll be broken again 😄 (though not requiring heavy refactor like this)

Consider that the :licenseHeader task is currently configured to be ran programmatically as Ant task, I believe this is the approach I'll try my hands around.

Finally, you could create an instance of ReportConfiguration, set the properties you want, and call Reporter.report(config). See the antTask/Report.exec() method for an example.

Copy link
Contributor

github-actions bot commented Oct 1, 2024

✅ Gradle check result for 9b2dc22: SUCCESS

Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 0% with 65 lines in your changes missing coverage. Please review.

Project coverage is 71.87%. Comparing base (a767e92) to head (9b2dc22).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...nsearch/gradle/precommit/LicenseHeadersTask.groovy 0.00% 65 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16121      +/-   ##
============================================
- Coverage     71.98%   71.87%   -0.11%     
+ Complexity    64542    64478      -64     
============================================
  Files          5288     5289       +1     
  Lines        301474   301524      +50     
  Branches      43552    43562      +10     
============================================
- Hits         217024   216730     -294     
- Misses        66657    66994     +337     
- Partials      17793    17800       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hainenber
Copy link
Author

✅ Gradle check result for 9b2dc22: SUCCESS

Why would you be green when im about to refactor? 😭

@Claudenw
Copy link

Claudenw commented Oct 2, 2024

@hainenber Paul Merlin ([email protected] has a Gradle plugin for Rat. You may want to contact him to see if it makes sense to use his plugin.

@hainenber
Copy link
Author

Refactored following this approach as recommended by Claude. Initially I tried a bit with Gradle plugin one but yeahhh, let's wait until 0.17.1 coming out and all things got cleaned up.

Finally, you could create an instance of ReportConfiguration, set the properties you want, and call Reporter.report(config). See the antTask/Report.exec() method for an example.

The new change works as expected. An example for a file with unapproved BSD4 license and another one without any header

image image image

Copy link
Contributor

github-actions bot commented Oct 6, 2024

❌ Gradle check result for 31bd039: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@Claudenw
Copy link

Claudenw commented Oct 7, 2024

@hainenber I think you misunderstood me. I didn't mean to reuse the ant tasks (unless you can call Ant tasks directly, then it might make sense) but to implement something similar.

First, move all your license definitions into an XML file. This will keep the definitions separate from code and will allow you to use the same definitions if you run the command line or any other Rat UI.

<?xml version="1.0" encoding="UTF-8"?>
<!--
  ~ SPDX-License-Identifier: Apache-2.0
  ~
  ~ The OpenSearch Contributors require contributions made to
  ~ this file be licensed under the Apache-2.0 license or a
  ~ compatible open source license.
  -->

<rat-config>
    <families>
        <family id="AL" name="Apache License Version 2.0" />
        <family id="GEN" name="Generated Files" />
        <family id='BSD-3' name="BSD 3 clause" />
    </families>
    <licenses>
        <license family="AL">
            <any id="ALStandard">
                <text>Licensed under the Apache License, Version 2.0 (the
                    "License")</text>
                <text>Licensed to the Apache Software Foundation (ASF) under
                    one or more contributor license agreements; and to You under
                    the Apache License, Version 2.0.</text>
                <text>http://www.apache.org/licenses/LICENSE-2.0</text>
                <text>https://www.apache.org/licenses/LICENSE-2.0</text>
                <text>http://www.apache.org/licenses/LICENSE-2.0.html</text>
                <text>https://www.apache.org/licenses/LICENSE-2.0.html</text>
                <text>http://www.apache.org/licenses/LICENSE-2.0.txt</text>
                <text>https://www.apache.org/licenses/LICENSE-2.0.txt</text>
                <spdx name='Apache-2.0' />
            </any>
        </license>

        <license family="GEN">
            <note>
                Files that are automatically generated.</note>
            <or>
                <any resource="/org/apache/rat/generation-keywords.txt" />
                <text>ANTLR GENERATED CODE</text>
            </or>
        </license>

        <license family='BSD-3'>
            <any>
                <all>
                    <copyright />
                    <text id="BSD-3-txt">

                        Redistribution and use in source and binary forms, with
                        or without modification, are permitted provided that the
                        following conditions are met:

                        1. Redistributions of source code must retain the above
                        copyright notice, this list of conditions and the
                        following disclaimer.

                        2. Redistributions in binary form must reproduce the
                        above copyright notice, this list of conditions and the
                        following disclaimer in the documentation and/or other
                        materials provided with the distribution.

                        3. Neither the name of the copyright holder nor the
                        names of its contributors may be used to endorse or
                        promote products derived from this software without
                        specific prior written permission.

                        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
                        CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED
                        WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
                        WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
                        PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
                        COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY
                        DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
                        CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
                        PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
                        USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
                        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
                        CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
                        NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
                        USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
                        OF SUCH DAMAGE.</text>

                </all>
                <spdx name='BSD-3-Clause' />
            </any>
        </license>
    </licenses>
    <approved>
        <family license_ref='AL' />
        <family license_ref="BSD-3" />
        <family license_ref='GEN' />
    </approved>
    <matchers>
        <matcher class="org.apache.rat.configuration.builders.AllBuilder" />
        <matcher class="org.apache.rat.configuration.builders.AnyBuilder" />
        <matcher class="org.apache.rat.configuration.builders.CopyrightBuilder" />
        <matcher class="org.apache.rat.configuration.builders.MatcherRefBuilder" />
        <matcher class="org.apache.rat.configuration.builders.NotBuilder" />
        <matcher class="org.apache.rat.configuration.builders.RegexBuilder" />
        <matcher class="org.apache.rat.configuration.builders.SpdxBuilder" />
        <matcher class="org.apache.rat.configuration.builders.TextBuilder" />
    </matchers>
</rat-config>

I called the above file rat-config.xml and placed it in the same directory as the java code below, but you can put it anywhere as long as you can open it in the java code.

In looking at your code you define several licenses that I don't think make sense. (see my latest review notes).

The actual Gradle task should looks something like:

/*
 * SPDX-License-Identifier: Apache-2.0
 *
 * The OpenSearch Contributors require contributions made to
 * this file be licensed under the Apache-2.0 license or a
 * compatible open source license.
 */

package org.opensearch.gradle.precommit;

import org.apache.rat.Defaults;
import org.apache.rat.ReportConfiguration;
import org.apache.rat.Reporter;
import org.apache.rat.walker.DirectoryWalker;
import org.apache.rat.report.claim.ClaimStatistic;
import org.apache.rat.utils.Log;
import org.gradle.api.DefaultTask;
import org.gradle.api.tasks.OutputFile;
import org.gradle.api.tasks.TaskAction;


import java.io.File;
import java.io.IOException;
import java.nio.file.Files;

public class RatTask  extends DefaultTask {

    // I don't know if the annotations sets the file or not. 
    @OutputFile
    File reportFile;

    public RatTask() {
    }
    // I assume the TaskAction annotation will cause this code to run when the task is called in Gradle.
    @TaskAction
    final void executeTask() {
        try {
            Files.deleteIfExists(reportFile.toPath());
        } catch (IOException e) {
            throw new IllegalStateException("Can not delete "+reportFile.getName());
        }

        // create an empty configuration with the Logger defined below.
        ReportConfiguration configuration = new ReportConfiguration(new Logger(this.getLogger()));

        // set the file to write the results to.  If not set standard out will be used.
        configuration.setOut(reportFile);

        // setup the defaults. to read the rat-config.xml and not to read the default configuraitons.
        Defaults.Builder defaultsBuilder = Defaults.builder().add(this.getClass().getResource("rat-config.xml"))
            .noDefault();

        // set the configuration from the defaults.
        configuration.setFrom(defaultsBuilder.build());

        // create a directory walker from the directory of the project.  The Ant project has the concept of a Union of nested
        // resources.  If Gradle has the same concept that could be used here to create the list of files to process.  In that
        // case see the Rat Ant task code ResourceCollectionContainer.java to for an example of how to wrap it.
        DirectoryWalker directoryWalker = new DirectoryWalker(this.getProject().getRootDir(), configuration.getInputFileFilter(), configuration.getDirectoryFilter());
        configuration.setReportable(directoryWalker);

        // Execute the report
        ClaimStatistic result;
        try {
            result = Reporter.report(configuration);
        } catch (Exception e) {
            throw new IllegalStateException("Can not execute Rat report", e);
        }
        if (result.getNumUnknown() > 0 || result.getNumUnApproved() > 0) {
            throw new IllegalStateException("License header problems were found! Full details: " + reportFile.getAbsolutePath());
        }
    }

    // class that wraps the Gradle logger for use within Rat.
    private class Logger implements Log {
        private final org.gradle.api.logging.Logger gradleLogger;
        Logger(org.gradle.api.logging.Logger gradleLogger) {
            this.gradleLogger = gradleLogger;
        }

        @Override
        public void log(Level level, String msg) {
            switch (level) {
                case DEBUG:
                    gradleLogger.debug(msg);
                    break;
                case INFO:
                    gradleLogger.info(msg);
                    break;
                case WARN:
                    gradleLogger.warn(msg);
                    break;
                case ERROR:
                    gradleLogger.error(msg);
                    break;
                case OFF:
                default:
                    break;
            }
        }
    }
}

pattern(substring: "Copyright OpenSearch Contributors.")
}

generateRatLicense("SPDX", "SPDX", "SPDX-License-Identifier: Apache-2.0"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered in the standard AL license definition. See the configuration file in my long comment for an example of how this is done.


generateRatLicense("SPDX", "SPDX", "SPDX-License-Identifier: Apache-2.0"),
// SPDX: Apache license (OpenSearch)
generateRatLicense("SPDX-ES", "SPDX", "Copyright OpenSearch Contributors."),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This license makes no sense. It is not a license but a copyright notice. Just placing the copyright in the file is not enough to make it a license.

// Apache 2
generateRatLicense("AL2", "Apache License Version 2.0", "Licensed to the Apache Software Foundation (ASF)"),
// Generated code from Protocol Buffer compiler
generateRatLicense("GEN", "Generated", "Generated by the protocol buffer compiler. DO NOT EDIT!"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in my example but you should be able to add it to the configuration file fairly easily following the example I have there.

Please submit a bug report to Rat indicating that the string "Generated by the protocol buffer compiler. DO NOT EDIT!" should be added to the list of generated text tags.

Comment on lines +96 to 119
/**
* Create license matcher from allowed/disallowed license list.
*
* @param licenseSettingsMap A map of license identifier and its associated data (family name, category and pattern)
*/
private static ILicense generateRatLicense(String licenseCategory, String licenseFamilyName, String pattern) {
SortedSet<ILicenseFamily> licenseCtx = new TreeSet<ILicenseFamily>()
var licenseFamilyBuilder = new ILicenseFamilyBuilder()
var licenseFamily = licenseFamilyBuilder.setLicenseFamilyCategory(licenseCategory)
.setLicenseFamilyName(licenseFamilyName)
.build()
licenseCtx.add(licenseFamily)

var license = new License()
license.setName(licenseFamily.getFamilyName())
license.setFamily(licenseFamily.getFamilyCategory())
license.add(new SimpleTextMatcher(pattern))

var configuredLicense = license.build(licenseCtx)

return configuredLicense
}

/**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration should be done in the XML configuration file.

pattern(substring: additional.getValue())
}
}
generateRatLicense("GEN", "Generated", "ANTLR GENERATED CODE"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I have in the example. The protocol buffer generated code string should go into the same license.

// // we keep this here, in case someone adds BSD code for some reason, it should never be allowed.
// generateRatLicense("BSD4", "Original BSD License (with advertising clause)", "All advertising materials"),
// )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comment is incorrect. There are two ways to do this.

  1. Add BSD4 to the configuration file and DO NOT add it to the list of approved licenses.
  2. explicitly call the configuraiton option andremove the id "BSD4" from the list of approved licenses.

// generateRatLicense("BSD4", "Original BSD License (with advertising clause)", "All advertising materials"),
// )

ReportConfiguration configuration = new ReportConfiguration(DefaultLog.INSTANCE);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want to log to the default log (std out). So create a wrapper around the Gradle logger that implements the Rat Log interface. See the example code I provided in the long comment.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dependency Upgrade] Bump org.apache.rat:apache-rat from 0.15 to 0.16.1 (or later) in /buildSrc
4 participants