Skip to content
This repository has been archived by the owner on Aug 25, 2024. It is now read-only.

docs: arch: 2nd and 3rd party plugins #1061

Closed
wants to merge 3 commits into from

Conversation

johnandersen777
Copy link

@johnandersen777 johnandersen777 commented Apr 1, 2021

@johnandersen777
Copy link
Author

Pinging @yashlamba @sakshamarora1 for re-review

@johnandersen777
Copy link
Author

johnandersen777 commented May 7, 2021

We need to make sure to account for: How do we deal with different Python versions. For example, it looks like TensorFlow doesn't support 3.9 yet.

Co-authored-by: Yash Lamba <[email protected]>
Co-authored-by: Saksham Arora <[email protected]>
Signed-off-by: John Andersen <[email protected]>
@johnandersen777
Copy link
Author

johnandersen777 commented Oct 26, 2021

@johnandersen777
Copy link
Author

  • Manifest as dataflow
  • How to install is operation name
  • Deployment types where certaintain operation instances are overwritten for that deployment type with another operation name as well as the ability for a deployment type to specify the which implementation networks should be prefered for a given deployment type

@johnandersen777
Copy link
Author

Three manifests

  • one for inputs
  • One for operations
  • One for orchestration

After that there can be acceptance criteria on the output of each operation or as a set (dont mind if these parts fail)
Combination of these three things is a RunDataFlow (serializable version of run_dataflow operation, think RunSingleConfig serialized)

@johnandersen777
Copy link
Author

johnandersen777 commented Feb 17, 2022

webhook service which checks depenency tree of project for incoming webhook and dispatches downstream validation for other projects

Related: #1315


- 0 main package

- 1 2nd party required pass for release
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


essentially trigger a domino effect where we analyze the requirements files of all of the plugins that are either first or second party possibly support third party later somehow uhm and we build a dependency tree to understand which packages or which plugins are dependent on the plug in which is being changed in the original pull request we run the validation for the original pull request and then we run validation against you we trigger all of the CI runs of all of the downstream projects with the PR applied to with the original PR applied at if any of the downstream repos have would need to be changed for their CI to pass we can create PR's against those repos in the original PR we can provide overrides for each dependency so that when we trigger the validation or not dependency but downstream package so that when we trigger the validation for each downstream package we can say use this PR so if you've made an API breaking change and you need to go through all of the downstream dependencies are and make changes and submit PR that would make it OK then you go and then you specify you know all of those PRs which will be used when running the CI of the downstream dependencies respectively

We should also make sure to support 3rd party plugin's abilities to revalidate against any of their dependencies, whenever one of their dependencies changes. Possibly some kind of service people can set as a webhook which is a sort of pubsub. The SCM sever such as GitHub publishes webhook events to the service (`dffml-service-sw-src-change-notify`). The service then relays to any listeners. Listeners are downstream projects. Downstream projects can register themselves with the listener to receive change events for any of their dependencies. Registration involves plugin based configurable callbacks.
Copy link
Author

@johnandersen777 johnandersen777 Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could create DIDs for each event (change)
    • Update 2023-03-21: Doing ActivityPub for now, ideally we content address those statuses or event better hardware rooted DIDs (did:keri) as status IDs

@johnandersen777
Copy link
Author

johnandersen777 commented May 18, 2022

Contributing To a 2nd Party Plugin

$ git clone https://github.com/owner/repo
$ cd repo
$ python -m venv .venv
$ echo On Windows run .\.venv\Scripts\activate instead of next command
$ . .venv/bin/activate # Windows: .\.venv\Scripts\activate
$ python -m pip install -U pip setuptools wheel distlib
$ python -m pip install -U \
    -e .[dev] \
    "https://github.com/intel/dffml/archive/main.zip#egg=dffml"

If you are locally developing on DFFML as well, then also run the following.

Reasoning behind SETUPTOOLS_USE_DISTUTILS=stdlib is explained here:
pypa/setuptools#2938 (comment)

$ mkdir -p ~/Documents/python
$ git clone -b some-branch https://github.com/intel/dffml ~/Documents/python/dffml
$ python -m pip uninstall -y dffml
$ SETUPTOOLS_USE_DISTUTILS=stdlib python -m pip install -U \
    -e ~/Documents/python/dffml
$ cd ~/Documents/python/dffml
$ python -m venv .venv
$ echo On Windows run .\.venv\Scripts\activate instead of next command
$ . .venv/bin/activate # Windows: .\.venv\Scripts\activate
$ python -m pip install -U pip setuptools wheel distlib
$ SETUPTOOLS_USE_DISTUTILS=stdlib python -m pip install -U \
    -e .[dev]

You now have a separate virtual environment for working on DFFML and for this
project.

References:

@johnandersen777
Copy link
Author

johnandersen777 commented Sep 27, 2022

Didn't mean to close this,

Related, Alice, who will help us help our ecosystem: #1399 & #1401 & #1207


Rolling Alice: Progress Report 1: Where are we

https://www.youtube.com/watch?v=dI1oGv7K21A&list=PLtzAOVTpO2jYt71umwc-ze6OmwwCIMnLw

Okay, so we are going to do our series of tutorials on building Alice, our
software architect. And so as a part of that we're going to do a bunch of
engineering log videos, just the same as we do the weekly sync stuff. So this is
going to be not like the weekly sync where we do, you know, it's more of like an
office hours. This is more of, you know, whoever's working on something will
stream what they're working on so that there's, you know, basically like you can
understand their thought process and we'll have, you know, basically as detailed
as it gets logs of how we built this thing. So that's the plan. This is going to
be the first video. So we're going to do this tutorial series. Basically we're
going to do, so 12 chapters in 12 months. And yeah, and by the end of it, what
we're hoping to do is have this AI driven software architect and her name will
be Alice, like Alice in Adventures in Wonderland. And that's in the public
domain so we can riff all over that. And it's great. I love it. So we're going
to build this tutorial series. We're going to build this AI. You know, hopefully
she can help us maintain our projects, right? Just doing more of what we always
do. So writing more test cases and in effect, you know, our test cases are just
going to be this AI driven pile of CI jobs, right? We will, for instance, you
know, we've been talking about the second party plugins and that third party
plugins for a while. So basically as we, we are finally splitting that stuff
out. And as we do it, we're going to need help, sort of. Because we're going to
end up with this situation where we've got this like poly repo setup and we
don't have access to all the repos and so there's different maintainers. And
these are the third party people and then we have our second party people, which
is our org. And you know, the whole thing is going to be chaos, right? And so,
well, you know, kind of look up Alice in Wonderland, it says it's this genre of
like nonsense literature. It's like chaos. There's chaos everywhere. So as we
know, so is software development and so is life in general and so is everything,
right? So, Alice is going to help us make sense of that chaos.
She is going to
rely on data flows and machine learning models and hopefully some web3 stuff,
which we'll cover in our first, you know, few videos of context here. We're
going to lay some groundwork with that. So, essentially, if you're familiar with
like KVM nested virtualization, then this is going to be a familiar concept to
you. But we're going to sort of do the simplest approach to this first. The
simplest approach and something that we can sort of like massively parallelize.
Just so if we get more compute, you know, then we can use more compute. So
great. Okay. So and what are we going to build? Okay. So what's the first thing?
Our first target is to have Alice have an understanding of an individual code
base. So once we do that, we're basically going to do an extension of should I.
So there's a bunch of should I tutorials up there. Ties in with the building
operations and stuff. The way this sort of went is, you know, the initial
project rolled out. We did this automated classification demo.


PR validation

essentially trigger a domino effect where we analyze the requirements files of all of the plugins that are either first or second party possibly support third party later somehow uhm and we build a dependency tree to understand which packages or which plugins are dependent on the plug in which is being changed in the original pull request we run the validation for the original pull request and then we run validation against you we trigger all of the CI runs of all of the downstream projects with the PR applied to with the original PR applied at if any of the downstream repos have would need to be changed for their CI to pass we can create PR's against those repos in the original PR we can provide overrides for each dependency so that when we trigger the validation or not dependency but downstream package so that when we trigger the validation for each downstream package we can say use this PR so if you've made an API breaking change and you need to go through all of the downstream dependencies are and make changes and submit PR that would make it OK then you go and then you specify you know all of those PRs which will be used when running the CI of the downstream dependencies respectively
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to nessicarily update status checks via API, can just have a pipeline within PR workflows which says this other PR must be merged in an upstrema or downstrema before this one can auto merge

Copy link
Author

@johnandersen777 johnandersen777 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema/github/actions/result/container/example-pull-request-validation.yaml

$schema: "https://github.com/intel/dffml/raw/dffml/schema/github/actions/result/container/0.0.0.schema.json"
commit_url: "https://github.com/intel/dffml/commit/1f347bc7f63f65041a571d9e3c174d8b9ead24aa"
job_url: "https://github.com/intel/dffml/actions/runs/4185582030/jobs/7252852590"
result: "docker.io/intelotc/dffml@sha256:ae636f72f96f499ff5206150ebcaafbd64ce30affa7560ce0a41f54e871da2"

Copy link
Author

@johnandersen777 johnandersen777 Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2023-02-23 @pdxjohnny Engineering Logs

  • https://github.com/cloudfoundry-community/node-cfenv
  • Eventting helps us have Alice sit alongside and look at new issues, workflow runs, etc. This will help her help developers stay away from known bad/unhelpful trains of thought.
    • She can look at issue bodies for similar stack traces
      • Eventually we'll have the updating like we do where we update issue or discussion thread with what console commands and outputs we run while debugging, or we'll just do peer to peer depending on context!
      • docs: arch: Inventory #1207
        • live at HEAD is great, but poly repo PR validation will bring us into the future, since we'll be running inference over all the active pull requests
          • We'll take this further to branches, then to the in progress trains of thought (active debug, states of the art which gatekeeper/umbrella/prioriziter says are active based on overlays for context of scientific exploration)
            • As our inference gets better, we'll look across the trains of thought and Prohpet.predict() state of the art trains of thought, then validate those via dispatch/distributed compute, then we'll start to just infer the outputs of the distributed compute, and validate based on risk and criticality, we'll then have our best guess muscle memory machine.
  • Mermaid has mind map functionality now
  • https://www.youtube.com/watch?v=tXJ03mPChYo&t=375s
    • Alice helps us understand the security posture of this whole stack over it's lifecycle. She's trying to help us understand the metrics and models produced from analysis of our software and improve it in arbitrary areas (via overlays). She has overlays for dependency analysis and deciding if there is anything she can do to help improve those dependencies. alice threats will be where she decides if those changes or the stats mined from shouldi are aligned to her strategic principles, we'll also look to generate threat models based on analysis of dependencies found going down the rabbit hole again with alice shouldi (shouldi: deptree: Create dependency tree of project #596). These threat models can then be improved via running https://github.com/johnlwhiteman/living-threat-models auditor.py alice threats audit, threats are inherently strategic, based on deployment context, they require knowledge of the code (static), past behavior (pulled from event stream of distributed compute runs), and understanding of what deployments are relavent for vuln analysis per the threat model.
      • Entity, infrastructure (methodology for traversal and chaining), (open) architecture
      • What are you running (+deps), where are you running it (overlayed deployment, this is evaluated in federated downstream SCITT for applicablity and reissusance of VEX/VDR by downstream), and what's the upstream threat model telling you if you should care if what your running and how your running it yields unmittigated threats. If so, and Alice knows how to contribute, Alice please contribute. If not and Alice doesn't know how to contribute. Alice please log todos, across org relevant poly repos.
      • When we do our depth of field mapping (ref early engineering log streams) we'll merge all the event stream analysis via the tuned brute force prioritizer (grep alice discussion arch)
  • Loosly coupled DID VC CI/CD enables AI in the loop development in a decentralized poly repo environment (Open Source Software cross orgs)

WIP: IETF SCITT: Use Case: OpenSSF Metrics: activitypub extensions for security.txt

Copy link
Author

@johnandersen777 johnandersen777 Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graph TD
    subgraph transparency_service[Transparency Service]
        transparency_service_pypi_known_good_package[Trust Attestation in-toto style<br>test result for known-good-package]
    end
    subgraph shouldi[shouldi - OSS Risk Analysis]
        subgraph shouldi_pypi[PyPi]
            shouldi_pypi_insecure_package[insecure-package]
            shouldi_pypi_known_good_package[known-good-package]
        end
    end
    subgraph shouldi[shouldi - OSS Risk Analysis]
        subgraph shouldi_pypi[PyPi]
            shouldi_pypi_insecure_package[insecure-package]
            shouldi_pypi_known_good_package[known-good-package]
        end
    end
    subgraph cache_index[Container with pip download for use with file:// pip index]
        subgraph cache_index_pypi[PyPi]
            cache_index_pyOpenSSL[pyOpenSSL]
        end
    end
    subgraph fork[Forked Open Source Packages]
        subgraph fork_c[C]
            fork_OpenSSL[fork - OpenSSL]
        end
        subgraph fork_python[Python]
            fork_pyOpenSSL[fork - pyOpenSSL]
        end

        fork_OpenSSL -->|Compile, link, embed| fork_pyOpenSSL
    end
    subgraph cicd[CI/CD]
        runner_tool_cache[$RUNNER_TOOL_CACHE]
        runner_image[Runner container image - OSDecentrAlice]
        subgraph loopback_index_service[Loopback/sidecar package index]
            serve_package[Serve Package]
        end

        subgraph workflow[Python project workflow]
            install_dependencies[Install Dependencies]
            install_dependencies -->|Deps from N-1 2nd<br>party SBOMs get cached| runner_tool_cache
            install_dependencies -->|PIP_INDEX_URL| loopback_index_service
        end

        runner_tool_cache --> runner_image
    end

    shouldi_pypi_known_good_package --> transparency_service_pypi_known_good_package

    serve_package -->|Check for presence of trust attestation<br>inserted against relavent statement<br>URN of policy engine workflow used| transparency_service_pypi_known_good_package

    cache_index_pypi -->|Populate $RUNNER_TOOL_CACHE<br>from cached index| runner_image

    fork_pyOpenSSL -->|Publish| cache_index_pyOpenSSL
Loading

johnandersen777 pushed a commit that referenced this pull request Feb 15, 2023
… request validation manifest

Related: 1f347bc
Related: #1401
Related: #1207
Related: #1061
Alice Engineering Comms 2023-02-15 @pdxjohnny Engineering Logs: #1406
johnandersen777 pushed a commit that referenced this pull request Mar 21, 2023
…r eventing across pull requests in poly repo env

Related: #1061

Repo locks

for this what amounts to essentially a Poly repo structure to work we with the way that we're validating all of our poor requests against each other before merge we need to ensure that when the original PR is merged all the rest of the PR's associated with it that might you know fix API breaking changes in downstream dependent packages are also merged therefore we will need some sort of a system account or bot to which has which must approve every pull request and that bot we can make the logic so that if there is if an approved reviewer has approved the pull request then the bot will approve the pull request analyst initiate the locking procedure and rebate support request into the into the repo so when we have a change which effects more than one repo we will we will trigger rebase is into the respective repos main branches while all of those repos are locked in fact all of the reports will be locked within that within the main repo and the 2nd party org this is because we need to ensure that all of the changes get merged and there are no conflicts so that we end up in an unknown state which which would result in us ending up in an unknown state our state is known so long as we have tested all of the PR's involved against the main branch I or the you know the latest commit before rebase. When all PR's in a set across repos are approved the bot will merge starting with the farthest downstream PR at it will specify somehow version information to the CIA so that the C I can block waiting for the commit which was in the original PR to be merged before continuing this will ensure that the CI jobs do not run against a slightly outdated version of the original the repo which the original PR was made against
Copy link
Author

@johnandersen777 johnandersen777 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

johnandersen777 pushed a commit that referenced this pull request May 4, 2023
johnandersen777 pushed a commit that referenced this pull request May 4, 2023
johnandersen777 pushed a commit that referenced this pull request May 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants