Hyperscan support #1221

dvershinin · 2024-11-15T11:48:12Z

Coraza would greatly benefit from Hyperscan support, and basically solve issues like Performance drop with larger request body.

Here I'd like to share some of the findings related to an attempt to integrate it using an @rx plugin and Hyperscan 5.4, hoping that these are useful for future implementation.

In our findings:

Bare Hyperscan would not work with significant number of SecRule in OWASP or most other rulesets. It will fail with basic things like !@rx ^$ and error: Start of match is not currently supported for patterns which match an empty buffer.
We then tried to use Chimera, and the problem with regex support was solved.
Our initial implementation was roughly based on the patch, accounting for Chimera Go API:

var globalHandler chimera.HandlerFunc = func(id uint, from, to uint64, flags uint, captured []*chimera.Capture, ctx interface{}) chimera.Callback {
	// Type-assert ctx to *HandlerContext
	handlerCtx, ok := ctx.(*HandlerContext)
	if !ok {
		return chimera.Terminate // Skip if ctx is not of the expected type
	}

	// Access the counter and matches fields from handlerCtx
	if atomic.LoadInt32(handlerCtx.counter) >= 10 {
		*handlerCtx.matches = true
		return chimera.SkipPattern // Stop after 10 matches
	}
	atomic.AddInt32(handlerCtx.counter, 1)

	// Capture matches if capturing is enabled
	if handlerCtx.tx.Capturing() {
		bts := handlerCtx.bts
		start := bytes.LastIndexByte(bts[:from], '\n')
		end := int(to) + bytes.IndexByte(bts[to:], '\n')
		// Ensure start and end are within bounds
		if start == -1 {
			start = 0
		} else {
			start++
		}
		if end == -1 {
			end = len(bts)
		}
		handlerCtx.tx.CaptureField(int(*handlerCtx.counter), string(bts[start:end]))
	}
	return chimera.Continue
}

followed by o.db.Database.Scan(bts, scratch, globalHandler, handlerCtx).

I do not provide the complete plugin code as it results in degraded performance as compared to even the native Go regexp capabilities. And why is where things get interesting:

Hyperscan will work faster only if relevant patterns are pre-compiled into a single Hyperscan database

The patch I referenced and the first version of our @rx plugin allocated a separate hyperscan database per pattern which is completely not what Hyperscan design principle is about. This approach is slower, and so we looked into how shared Hyperscan database can be used with Coraza.

A second iteration of @rx plugin was created that simply registered all patterns into a single Hyperscan database. This furthermore degraded performance at runtime, due to "bloated Hyperscan database" with irrelevant patterns being used across all data points, furthermore because Evaluate is invoked on each pattern at runtime anyway.

So there are multiple things that need to be accounted for in order to implement it in a way that will work faster, and those are in Coraza itself. Primarily batching patterns that look into the same data point into a single Hyperscan database, per phase e.g.:

SecRule REQUEST_URI "@rx admin" "id:1001,phase:2,deny,log,msg:'Admin Access Attempt'"
SecRule REQUEST_URI "@rx login" "id:1002,phase:2,deny,log,msg:'Login Attempt'"
SecRule REQUEST_URI "@rx password" "id:1003,phase:2,deny,log,msg:'Password Attempt'"
SecRule REQUEST_URI "@rx user" "id:1004,phase:2,deny,log,msg:'User Attempt'"

These rules' patterns can be batched into a hyperscan database because:

They all deal with a single entry point REQUEST_URI
Belong to the same phase
Not chained

But currently from what I can tell looking at Coraza code, rules are sequentially processed. How feasible would it be to pre-aggregate patterns in order to supporting Hyperscan "multiple patterns in a database" design requirement? Or do you think it completely contradict the nature of how modSecurity rulesets must be processed? As immediate roadblocks, I see:

Data transformations would require separate scans/Hyperscan database
Dealing with aggregation of patterns from patterns that come from chained rules

The text was updated successfully, but these errors were encountered:

jptosso · 2024-11-15T12:32:12Z

Hey! Thank you for your detailed report.
Indeed, we know we would benefit from this, but we have a minor setback. Most of the core team uses M1/M2 MacBooks and we cannot test this locally. It breaks compatibility with ARM and it doesn't scale performance for AMD
Regarding your implementation, I was looking at a similar approach integrating with our MEMOIZE feature. But I believe we would have to create a regex service that would take care of a regex pool asynchronously.
This discussion is open and I'm personally interested in this feature

dvershinin · 2024-12-01T16:21:54Z

I have the same setback but used Goland and set it up to compile/run the code on a remote VPS with Intel. But it looks like it's quite easier to go with https://github.com/VectorCamp/vectorscan which supports more platforms including ARM.

jptosso added the enhancement New feature or request label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperscan support #1221

Hyperscan support #1221

dvershinin commented Nov 15, 2024

jptosso commented Nov 15, 2024

dvershinin commented Dec 1, 2024

Hyperscan support #1221

Hyperscan support #1221

Comments

dvershinin commented Nov 15, 2024

jptosso commented Nov 15, 2024

dvershinin commented Dec 1, 2024