Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumented path based operations using hooks defined in Checker #325

Merged
merged 10 commits into from
Oct 27, 2024

Conversation

ajaychandran
Copy link
Contributor

@ajaychandran ajaychandran commented Oct 23, 2024

Instrumented path based operations using hooks defined in Checker.

trait Checker {
  def onRead(path: ReadablePath): Unit
  def onWrite(path: Path): Unit
}

Exceptions

The following operations were not instrumented:

  • followLink, readLink
  • list, walk
  • exists, isLink, isFile, isDir
  • read operations for permissions/stats
  • watch

Future work

  • A more comprehensive design would add hooks for each core operation. This would eliminate the special check handling in operations like move and symlink.
  • As such, the methods of ReadablePath represent escape hatches. These cannot be "plugged" without breaking binary compatibility.

This resolves part 1 of mill #3746.

Comment on lines 38 to 69
test("group") {
test - prep { wd =>
test("checker") - prepChecker { wd =>
if (Unix()) {
// Only works as root :(
if (false) {
intercept[WriteDenied] {
os.owner.set(rd / "File.txt", "nobody")
}

val originalOwner = os.owner(wd / "File.txt")

os.owner.set(wd / "File.txt", "nobody")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original "group" test was testing owner operations.

Comment on lines -194 to +214
def copyOne(p: Path): file.Path = {
def copyOne(p: Path): Unit = {
val target = to / p.relativeTo(from)
if (mergeFolders && isDir(p, followLinks) && isDir(target, followLinks)) {
// nothing to do
target.wrapped
} else {
Files.copy(p.wrapped, target.wrapped, opts1 ++ opts2 ++ opts3: _*)
}
}

copyOne(from)
if (stat(from, followLinks = followLinks).isDir) walk(from).map(copyOne)
if (stat(from, followLinks = followLinks).isDir) for (p <- walk(from)) copyOne(p)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor optimization that eliminates the creation of an unused collection.
This isn't required for the original ticket.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization! Would walk(from).foreach(copyOne) be more idiomatic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid this.

Comment on lines +49 to +50
// check read preemptively in case "dest" is created
for (source <- sources) checker.value.onRead(source.src)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the implementation should be refactored to defer file creation until all data is collected.

@ajaychandran
Copy link
Contributor Author

@lihaoyi Please review.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 24, 2024

Thanks @ajaychandran, I think this looks good. Could you add Scaladoc to the checker API as well? Then I can merge this and cut an unstable -M1 so you can start using it in Mill

@lihaoyi
Copy link
Member

lihaoyi commented Oct 24, 2024

The following operations were not instrumented:

Let's instrument these as well. For the purposes of Mill's (and probably others), we also want to block code from making decisions based on filesystem metadata outside their designated sandboxes, not just file contents

@ajaychandran
Copy link
Contributor Author

A more comprehensive design would add hooks for each core operation. This would eliminate the special check handling in operations like move and symlink.

@lihaoyi Any thoughts on this?
I'd rather not add special handling in the operation implementations.
The number of core operations is not that large. We could even pass in additional parameters like createFolders, followLinks to have the full context available.

@ajaychandran
Copy link
Contributor Author

Could you add Scaladoc to the checker API as well?

I had removed Scaladoc from the PR description. Do we need more than this?

/**
 * Defines hooks for path based operations.
 *
 * This, in conjunction with [[checker]], can be used to implement custom checks like
 *  - restricting operations to some path(s)
 *  - logging operations
 */
trait Checker {

  /** A hook for a read operation on `path`. */
  def onRead(path: ReadablePath): Unit

  /** A hook for a write operation on `path`. */
  def onWrite(path: Path): Unit
}```

@lihaoyi
Copy link
Member

lihaoyi commented Oct 24, 2024

@ajaychandran I'm not sure what you mean by special check handling. Could you elaborate?

  • For move, I would expect it to call onWrite to both source and destination paths, since it is modifying both.
  • For symlink, I would only expect it to call onWrite for the path of the created link, and do nothing for the path the link is pointing to (since we are neither reading anything from it nor modifying it)

I think for the purposes of this ticket, let's stick with a simple onRead and onWrite API. That's what we'll need in Mill. We can flesh it out more in future releases as necessary

@ajaychandran
Copy link
Contributor Author

@lihaoyi
The current implementation assumes an equivalence between write and create/remove operations. I am not sure if this would fit all use cases.

  • The modified implementation for move checks for write access on the source parent directory. I gather this is not per your expectation.
  • I assumed a symlinked path could be used as argument to read. If so, a user could gain access to any file.
  • Should makeDir test write access on an existing parent folder when createFolders is true?

Adding a hook for each core operation lets os-lib remain unopinionated.
Maybe rename Checker to Hook?

@@ -38,6 +42,7 @@ object makeDir extends Function1[Path, Unit] {
object all extends Function1[Path, Unit] {
def apply(path: Path): Unit = apply(path, null, true)
def apply(path: Path, perms: PermSet = null, acceptLinkedDirectory: Boolean = true): Unit = {
checker.value.onWrite(path)
Copy link

@dabd dabd Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation creates any necessary enclosing directories via java.nio.file.Files#createDirectories.

Here the checker is only checking for write access on the full path.

Don't we need to apply it to the enclosing directories as well?

I see this would entail providing a shim for java.nio.file.Files#createDirectories.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 24, 2024

Good point on the symlinks and enclosing folders. In the end this is not going to be an airtight sandbox, so it's a judgement call exactly where to draw the line.

  1. For move, let's call onWrite it on the full source and destination paths

  2. I think for creating enclosing folders, let's just punt on that and say it's up the the implementer of def onWrite

  3. For symlinks, I think what you say is reasonable, so lets call onWrite on the destination path, since we can compute it in-memory by resolving any relative path without needing to perform any filesystem operations

I'm a bit reluctant to try and match the entire OS-Lib API in the Checker interface. Let's go with the more minimal ad-hoc interface for now.

For the purposes of this pull request, could you add an @experimental annotation, apply it to the stuff you added, and add it to the def mimaExcludeAnnotations in the build file? Just to make it clear that this API isn't stable, and we're only publishing it because we need to experiment with it in Mill

@lihaoyi
Copy link
Member

lihaoyi commented Oct 25, 2024

@ajaychandran not everything needs to be experimental, only the public APIs you added: os.checker and os.Checker I think

@ajaychandran
Copy link
Contributor Author

@lihaoyi This is ready for another review.

@ajaychandran
Copy link
Contributor Author

Updated hardlink as well.

@@ -59,6 +59,8 @@ package object os {

val sub: SubPath = SubPath.sub

val checker: DynamicVariable[Checker] = new DynamicVariable[Checker](Checker.Nop)
Copy link

@dabd dabd Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding these for ergonomics?

def withChecker[T](c: Checker)(op: => T): T = 
  checker.withValue(c)(op)
  
  
def checkReadAccess(path: os.Path): Unit = checker.value.onRead(path)
def checkWriteAccess(path: os.Path): Unit = checker.value.onWrite(path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now we can punt on these, since it's a pretty advanced feature so being easy to use isn't as important

@lihaoyi
Copy link
Member

lihaoyi commented Oct 26, 2024

@ajaychandran code changes look good. Last request here is to move all the checker tests into one separate test suite, rather than having them mixed in with all the non-checker tests. That will make it easier to skim them and audit them as necessary.

After that i will merge this and cut a milestone release for experimentation in Mill

@ajaychandran
Copy link
Contributor Author

@lihaoyi Checker tests have been moved.

Some compound operations like zip incur the cost of double-checking.
One solution would be to define a private[os] def unchecked variant, for each operation, for internal use. This can be deferred until the feature is stable.

@dabd
Copy link

dabd commented Oct 26, 2024

@lihaoyi Checker tests have been moved.

Some compound operations like zip incur the cost of double-checking. One solution would be to define a private[os] def unchecked variant, for each operation, for internal use. This can be deferred until the feature is stable.

Another option is to delegate some sort of caching solution to the implementor of Checker.

@lihaoyi lihaoyi merged commit e8519c0 into com-lihaoyi:main Oct 27, 2024
8 checks passed
@lihaoyi
Copy link
Member

lihaoyi commented Oct 27, 2024

@ajaychandran pushed a tag for 0.11.4-M1 https://github.com/com-lihaoyi/os-lib/actions/runs/11537940922 hopefully it goes out soon and you can try it out in Mill. If you're going to work on the downstream tasks, we can hold off on payment for now to save on transfer fees. If not I can transfer the first portion of the bounty using the details we used before

@ajaychandran
Copy link
Contributor Author

ajaychandran commented Oct 27, 2024

@lihaoyi Thanks. Lets hold off payment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants