Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should I represent user-defined types that are SMT bit-vectors #78

Open
bubaflub opened this issue Jan 3, 2024 · 5 comments
Open

Comments

@bubaflub
Copy link

bubaflub commented Jan 3, 2024

I'd like to create user-defined types to represent IP addresses. Specifically, a bv[32] to represent an IPv4 address and a bv[128] to represent an IPv6 address. Since i32 is interchangeable with bv[32] I'm currently just doing:

type ipv4address = i32

And then I can parse an IPv4 address from a string to a 32-bit integer:

fun parseIPv4Address(address: string) : ipv4address option = ...

rel ipv4addresses(string, ipv4address option)
ipv4addresses("11.2.3.4", parseIPv4Address("11.2.3.4")).

And when I run with --dump-all I do see this relation:

ipv4addresses("11.2.3.4", some(184681220))

Where 184681220 is the decimal representation of that IPv4 address.

However, this won't work for IPv6 addresses since there is no corresponding native 128-bit type. Additionally, I'd like to extend this to represent CIDR ranges / subnetworks. For example, the CIDR 192.168.0.0/16 represents the range of IP addresses from 192.168.0.0 to 192.168.255.255. I'd like to represent this as an SMT bit vector where the first 16-bits are fixed (has the values from the first two octets, 192 and 168) and the last 16-bits are free. This representation makes it very easy to check if a (concrete) IP address is in the range of a CIDR. And I'll be using IP addresses and CIDRs to build out models of firewalls and routing tables.

So my questions are:

  1. How can I define a user-type that is just an arbitrary-length SMT bit-vector?
  2. Once that's done, I'll need functions that parse IP address (or CIDR) strings to their binary representations. I'll also need functions that can check if a particular IP address is within the CIDR range. Can these just functions written in ML+SMT? Or do I need something else to manipulate these bit-vectors?
@mgree
Copy link
Contributor

mgree commented Jan 4, 2024

You can use the bv constructor---symbolic evaluation has an example.

But per your second question, you'll be limited in how you work with such a bitvector: it's in SMT, so it's possibly symbolic. If you made IP addresses bitvectors, you'd have to work with them in SMT. There's an example of this later in the same file.

@bubaflub
Copy link
Author

Great, thanks for links to the examples. I'll give it a shot and report back.

Could clarify more about:

you'll be limited in how you work with such a bitvector

What are the limitations when using a symbolic value? Am I prevented from using (possibly) symbolic values in datalog relations and queries?

Taking a step back, what I'd like to ultimately accomplish is to represent a large number of IP addresses and CIDRs in my program and query if an IP address is contained by a CIDR. I'd like to combine those building blocks to model a network firewall where allowing or denying a network connection depends on the source and destination IP CIDRs and a list of allow/deny rules matching CIDRs. Then a (very basic) model of network reachability is finding a path from A to B through any number of firewalls that would accept or drop the connection.

I was thinking that representing IP addresses as symbolic bitvectors would be a natural way to represent this, but I suppose I could also treat these IP addresses as (binary) numbers and a CIDR as a range between two concrete values. That's not exactly correct but works for my purposes.

@mgree
Copy link
Contributor

mgree commented Jan 17, 2024

Right: you can't look at the "value" of a bv[k] smt without first getting a model---and then it's just the value in that model. You're free to use such a value to build up larger symbolic values, but they X smt has no X value until you have a model.

I would be tempted to go with a more explicit representation, i.e., an IP address is four concrete ints, and a CIDR is a datatype that for each class, holds the information you need (e.g., CIDR_ClassA just holds a single int, representing just the 8 bits of that class A network; CIDR_ClassB holds 2 ints, and so on). You could define matching explicitly by doing appropriate masking in a functional program. If you then wanted to reason more symbolically about these, you could also define CIDR matching over SMT terms.

If that doesn't make sense, I'd be happy to meet---probably a lot clearer that way. 😁

@aaronbembenek
Copy link
Contributor

@mgree Just to clarify: do you mean i32 (Formulog's type for a concrete, 32-bit machine integer) instead of int (Formulog's type for an SMT math integer)?

@mgree
Copy link
Contributor

mgree commented Feb 1, 2024

Whoops, yes, I do, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants