Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Lifecycle: Acceptance Criteria #39

Closed
Naomi-Wash opened this issue Aug 7, 2024 · 2 comments
Closed

Project Lifecycle: Acceptance Criteria #39

Naomi-Wash opened this issue Aug 7, 2024 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@Naomi-Wash
Copy link
Contributor

Comment moved from Project Lifecycle Document Section 3. Stages - Definitions & Expectations - Labs Stage - Acceptance Criteria

Since these metrics can vary significantly depending on a project's type, scope, and size, the TAC has final judgment over the level of activity adequate to meet these criteria.

Discussion

This seems to be very subjective. Would it be worth while spelling these criteria out at least for OQS? What is "substantial" (flow of commits and participation)? What is a "healthy" (number of committers)? This may also help the current disparate (on different sub projects) maintainers understand what is expected of them.

The "community metrics" portion of the approval is by far the most subjective part of the process, and it's not because people haven't tried to define it in detail. We have seen a number of cases where people just try to do the minimum to game the system (i.e., if you require X commits, someone just makes X documentation changes or something like that) so most projects and folks leave this to be judged in a somewhat subjective manner. Personally, I have spent hours and hours over the years discussing this with people, and have never come up with a metric that totally works and isn't subjective. I'm not saying it's impossible, but it is definitely tricky.

Well, if it's not impossible then we should at least try to before qualifying projects at a wrong level. This may be easier when considering the criteria for just the one direction I'm mostly concerned about: "upwards": Particularly crypto software should never be lightly considered "production ready" (which is your "Impact Stage" if I get it right) as so many external users then rely on such classification that they have no chance to properly judge for themselves. The current criteria listed throughout are mostly "process oriented" and not "quality oriented". The one and only "measurable" criterium on technical community capability ("committers...from at least 2 organizations") is silent on the type of organizations: Would you trust code for productive use that has been created by 2 pure research organizations, say? Organizations that measure & promote people on papers and blog posts created instead of engineering prowess/experience/FOSS community reputation?

A couple of things: 1. It's much easier to jude processes than actual quality. This is the case for just about everything in security! A full audit of code is hard; making sure that a project has the right security processes in place is much easier. 2. As someone with a research background, yes, I would trust research orgs. I think it's going to be very hard to classify companies if you wanted to go this route. What is a research company versus an engineering company? Don't all companies have an engineering angle? Would this only rule out universities?

On your item 1: I'm not worried that "easy" things are not getting addressed here -- I'm concerned about the "hard" ones, e.g., when and under which circumstances to add obligations that stand in the way of progress (say quick "experimentation" or a new feature for the next paper), but which make the code more reliable.
On your item 2: As someone with a research background, I do NOT trust research organisations doing the right thing for production-use code: Their "organisational" mindset is too different for that: The next paper is always more important than fixing up a bug on a specific platform ("publish or perish"); promotion is achieved by publication count, not github reputation; talking/"presence" is always more appreciated in that community than writing code/getting PRs accepted or delivering code that doesn't contain (too many) bugs. I DO trust specific people (within and outside of research organizations) that have demonstrated substantial "code responsibility" (and/or who are actually in charge of products ("the buck stops here" kind of people). In FOSS those are the folks whom everyone asks for advice on GH if some brownish stuff hits a fast-moving rotator :)

In my view all projects benefit from diversity - coders and theorists, testing vs code vs docs, careful diligent people vs those with more radical ideas. Measuring those characteristics is hard, but I do think a broader spread of committers & orgs (far more than 2) at least goes some way . There's a lot of really good points in the discussion, but I still do think we mostly need to have some flexibility to discuss on a case by case basis

Measuring the base-line notion of diversity is pretty simple: Check the number of people actively contributing. If there's too few, the project has problems. Next of course is checking what they contribute, e.g., whether contributors always do the "same" (e.g., docs) or whether they "grow" (e.g., begin to change code).
The "good to have flexibility" argument is a "get out of jail" argument if applied without proper care: It's correct if applied to allow for a security project to "fall" in lifecycle stage (err on the side of caution for such projects). In turn, it should not be used to allow for a premature "lifecycle raise" (e.g., a politically expedient declaration of "code fitness", e.g., to "declare success" and reduce contribution efforts or resourcing).

I opened up https://github.com/PQCA/TAC/issues/31 to suggest we could enable the LF Insights metrics on our projects. This includes organizational diversity and captures trends over time. Being part of LF we may also be able to feed back on metrics that work for us, don't work, and improvements. I did use it on a previous project (in part to support funding)
We can then review the data and decide what/if any actions/improvements we can take.
There are also some checks made by the openssf scorecard.
I suggest closing the issue here - but important not to lose the thought about understanding our community better (as well as how it feeds into issues like oqs#1

@Naomi-Wash Naomi-Wash added the documentation Improvements or additions to documentation label Aug 7, 2024
@baentsch
Copy link

baentsch commented Aug 8, 2024

the TAC has final judgment over the level of activity adequate to meet these criteria.

This statement is not the jist of the discussion as I understood it.

The summary should rather be "For security software projects, e.g. OQS and PQCP, in case of disagreements of acceptance assessment criteria, the more conservative interpretation assumed by either TAC or (sub) project maintainers shall rule". Should PQCA think all its (also future) projects are of this nature, feel free to simplify to

In case of disagreements of acceptance assessment criteria, the more conservative interpretation assumed by either TAC or (sub) project maintainers shall rule.

As stated elsewhere I'm concerned that corporate-interest driven, thus typically TAC-based entities (as the LF/PQCA contract gives TAC majority control to corporate PQCA sponsors), could thus "force" the external publication of falsely optimistic "project readiness" assessments thus misleading the public. This may be OK for projects purely marketing some corporate pet project put into OSS say, but IMO is not acceptable for software that people may rely on for their (also personal) security. I'd actually urge PQCA considering adopting the OpenSSL mission and values that would prohibit such things.

Based on this, the by LF-contractual design corporate-controlled TAC cannot have final judgement over these criteria.

Finally, as a worked example: If the TAC had "final judgement", it would allow entities not seriously contributing to a project, but controlling the TAC, to declare a low level of contribution to be sufficient for a (sub) project to be "healthy", thus putting it into a higher "lifecycle state". This would put the onus of delivery (on that false lifecycle assertion) onto the shoulders of the (sub) project maintainers and exonerate the (TAC-based) non-contributors from responsibility (to contribute or take responsibility for their false (or "optimistic") assessment criteria decision).

Also here, please tag me explicitly @Naomi-Wash on any suggested resolution of this issue as I think this touches on the core reputational risks to software I feel personal responsibility for. Thanks.

@brian-jarvis-aws brian-jarvis-aws self-assigned this Sep 11, 2024
@brian-jarvis-aws
Copy link
Contributor

@baentsch I see you requested to be specifically tagged prior to resolution. I believe this issue was already addressed in a similar manner to issue #40. The verbiage that this comment thread originally started on is no longer in the Labs Stage acceptance criteria. In its place, just as in the Impact Stage, there is an expectation that the project includes the metrics used in justifying their proposed lifecycle stage. I plan to resolve this issue unless there’s a suggestion for further updates to the Labs Stage acceptance criteria.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants