Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a design document for how to support richer testing input #1295

Closed
bassosimone opened this issue Nov 18, 2022 · 4 comments
Closed

Write a design document for how to support richer testing input #1295

bassosimone opened this issue Nov 18, 2022 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation funder/drl2022-2024 priority/medium

Comments

@bassosimone
Copy link
Contributor

This issue is a child issue of #1291. The aim is to write a design document explaining how richer testing input will flow from the backend, to the probe, to the generated data, to the pipeline, and to explorer.

@bassosimone bassosimone added documentation Improvements or additions to documentation priority/medium funder/drl2022-2024 labels Nov 18, 2022
@bassosimone bassosimone self-assigned this Nov 18, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Feb 6, 2023
This PoC investigates whether it would be possible to run
experiments directly without using the internal/engine
abstraction as the middle man.

The PoC is in the context of ooni/ooni.org#1295
bassosimone added a commit to ooni/2023-05-richer-input that referenced this issue Jun 12, 2023
@bassosimone
Copy link
Contributor Author

As explained in ooni/2023-05-richer-input@7871d3b, the https://github.com/ooni/2023-05-richer-input contains a prototype that explores the richer input domain and redesigns ooniprobe around richer input. Notably, such a repository contains an initial design document.

@bassosimone
Copy link
Contributor Author

bassosimone commented Oct 19, 2023

So, here's the plan I propose for starting to supporting and experimenting with richer input.

Check-in API We will keep using /api/v1/check-in for now. We will include extra information in the responses for dnscheck, riseupvpn, signal, stunreachability, and torsf. To this end, we will extend the experiment specific stanza inside of the check-in response. For example, here's how we will extend the /api/v1/check-in stanza for dnscheck:

{
  "tests": {
    "dnscheck": {
      "report_id": "", // same as before
      "targets": [{
        "input": "https://dns.google/dns-query",
        "options": {"HTTP3Enabled": true}
      }]
    }
  }
}

In other words, this means that we will have a specific richer input definition for each experiment that depends on the characteristics of each experiment. The probe will process this structure and act accordingly.

With richer input we aim to solve the following experiment needs:

  • dnscheck: ability to deliver input and options
  • riseupvpn: ability to deliver the correct CA, the correct endpoints, and to disable if riseup API is down
  • signal: ability to deliver the correct CA
  • stunreachability: ability to deliver the correct STUN endpoints
  • torsf: ability to deliver the correct rendezvous method and cloud-fronted SNI
  • urlgetter: ability to run some measurements for research

Regarding disabling riseupvpn, we already implemented the core functionality in ooni/probe-cli#1355.

Probe To implement this plan we need to ensure that we periodically call check-in. The best course of action would probably be to refafctor the code such that we call check-in at the beginning of a measurement session. This refactoring would requires changes in the probe-cli, probe-android, and probe-ios codebases.

Now, because of the backend changes described above, the check-in response contains richer input. We already have a package in the probe called checkincache that caches information extracted from the check-in response. We will extend this package to cache the richer input provided for each richer-input-aware experiment.

We will also refactor how we run experiments. We will define the following JSON structure as the most fundamental pure-data structure representing executing a single experiment with a single input and some options:

{
  "input": "https://dns.google/dns-query",
  "options": {"HTTP3Enabled": true},
  "test_name": "dnscheck"
}

Note how this data structure corresponds exactly to executing this code:

./miniooni dnscheck -O HTTP3Enabled=true -i https://dns.google/dnsquery

So, if the user runs the above commands, miniooni would translate the command line to the data structure and execute the data structure. Instead, if the user runs:

./miniooni dnscheck -O HTTP3Enabled=true -i https://dns.google/dnsquery -i https://example.com/dns-query

We will translate the command line invocation to two data structures, one for each input.

Additionally, if the user runs:

./miniooni dnscheck

We will use the checkincache package to load targets for dnscheck and produce a list of data structures to run.

OONI Run v2 An OONI Run v2 descriptor contains a superset of the following information:

{
  "nettests": [{
    "inputs": ["https://dns.google/dns-query"],
    "options": {"HTTP3Enabled": true},
    "test_name": "dnscheck"
  }]
}

Thus, note how OONI Run v2 and miniooni command line invocations are isomorphic. So, there exists a data transformation to convert an OONI Run v2 nettest descriptor to a miniooni command line invocation.

Hence, supporting richer input for miniooni implies supporting it for OONI Run v2. (The proper course of action would be that ./miniooni command line invocations produce ephemeral OONI Run v2 descriptors and then invokes the OONI Run v2 executor engine to perform measurements.)

Data processing pipeline We add richer input to dnscheck, riseupvpn, signal, stunreachability, and torsf. Thus, we need to ask the question of where and how to process the richer-input-enhanced measurement results.

  • Regarding dnscheck and urlgetter, the best place to process the results is ooni/data, given that these results are best understood by exploding the measurement JSON and looking at individual observations.

  • Regarding riseupvpn, signal, stunreachability, and torsf, richer input is only meant to make the experiments more reliable and there is no need to change the way in which we are processing their results.

OONI Explorer We will start exposing information about observations extracted using ooni/data.

Required Cleanups We can safely delete /api/_/check-in, the experimental check-in API developed when experimenting the correct design to support richer input. We don't need it anymore.

Rejected Designs In ooni/2023-05-richer-input, we experimented with several designs for implementing richer input. The most promising design was the one based on an external DSL written in JSON, which described how we would run experiments. Unfortunately, this design does not allow for computing the results of measurements directly in the probe given that the DSL is very limited and only allows for composing basic measurement primitives (e.g., DNS lookups, TCP connect, TLS handshake). It seems the correct way forward in this respect would be to write experiments in a high-level language (e.g., JavaScript) and serve this code to probes. However, this design is way beyond the original intent and scope of richer input, therefore we should shelve it for now.

@bassosimone
Copy link
Contributor Author

I have started to create follow-up issues for this issue. However, I am not done. When I will be done, I will close this issue. I'll see to do this as part of the current Sprint.

@bassosimone
Copy link
Contributor Author

@jbonisteel today we need to discuss whether to close this issue in light of the set of issues I have created or whether we need to additionally scope down and split some backend or explorer issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation funder/drl2022-2024 priority/medium
Projects
None yet
Development

No branches or pull requests

2 participants