-
Notifications
You must be signed in to change notification settings - Fork 49
Hadrian Analyzing Engine
After loading a PFA file, Hadrian provides a list of com.opendatagroup.hadrian.jvmcompiler.PFAEngine
instances. Because of intentional restrictions in the PFA language, a lot of useful information is available about the algorithm it represents without running it.
The Hadrian Basic Use page describes how to create a PFAEngine
, execute it, take snapshots and revert its state, and look at or convert its Abstract Syntax Tree (AST). This page explains how to statically analyze the PFAEngine
without executing it.
A call graph is a directed graph in which functions are nodes and if function A calls function B, there is a directed edge from A to B. This graph can be used to determine if a PFA file calls any functions that are deemed unsafe or otherwise undesirable, if they're recursive, if the scoring engine has mutable state, etc.
The call graph is stored in PFAEngine.callGraph
, which is a Map[String, Set[String]]
from a string-based function name to the set of all functions it calls. Special forms and methods like begin
, action
, and end
are given in parentheses.
PFAEngine
has several convenience functions for common applications:
-
calledBy
: determine which functions call a given function (reverse lookup, opposite of thecallGraph
'sapply
method). -
callDepth
: determine how many levels deep this function calls other functions, terminating on library functions and special forms. The return type isDouble
so that recursive functions can be expressed asInfinity
. -
isRecursive
: determine if this function directly calls itself. -
hasRecursive
: determine if this function has any recursive loops (isRecursive
implieshasRecursive
, but not the other way around). -
hasSideEffects
: determine if a function has"(cell-to)"
or"(pool-to)"
anywhere in its subgraph. These are the only ways to modify the persistent state of a PFA engine.
To determine if engine
, an instance of PFAEngine
, is mutable, call
engine.hasSideEffects("(action)")
Although Hadrian has transparent, built-in support for concurrency within a JVM process, scoring engines running in different processes or on different computers cannot take advantage of this mechanism. An execution environment may want to forbid or handle mutable PFA differently from immutable PFA.
All of the persistent state of a PFA scoring engine (mutable or immutable) is contained in its cells (single, global variables) and pools (R-like environments, containing global variables that can be created or destroyed at runtime). If the scoring engine is mutable, its changing state can be monitored without creating snapshots of its entire state.
-
analyzeCell
takes a cell name and an analysis, an arbitrary Scala function, and returns the result of that function when applied to the given cell. -
analyzePool
takes a pool name and an analysis, an arbitrary Scala function, and returns aMap[String, X]
of that function's results when applied to each element of the pool.
Hadrian can also track memory use of a running PFA engine. General Java tools exist for tracking the total memory used by a JVM or the memory used by specific classes, but neither of these provide granularity at the level of a PFA engine, which consists of many classes and not the whole JVM.
com.opendatagroup.hadrian.memory.EngineReport
provides an analysis of the memory used by one PFAEngine
's persistent state, and com.opendatagroup.hadrian.memory.EnginesReport
provides an analysis of memory used by a collection of PFAEngine
instances that may share memory.
Here is how to use them on engine
(an instance of PFAEngine
) and engines
(a Scala collection of PFAEngine
instances):
val report1 = EngineReport(engine)
println(report1.total)
println(report)
val report2 = EnginesReport(engines)
println(report2.sharedCells)
println(report2)
EngineReport
and EnginesReport
are both immutable case classes with meaningful fields and a convenient toString
method for generating a tabular report.
The analysis is based on counting existing data elements and multiplying by appropriate factors for each type of data element (including Java's object padding, distinctions between boxed and unboxed primitives, etc.). For large collections (arrays and maps), approximations are used, since the underlying data is represented as a tree of unknown balancing. This method does not account for structural sharing of immutable data and is therefore an overestimate. It also does not count data that is beyond scope but not yet garbage collected, so it is a more stable measure than total heap space used.
Some data types are fast to compute, such as an array of integers, since one only needs to know the size of an integer and the number of elements in the array. Others are slow, such as an array of strings, since one must traverse the array and add memory usage of each variable-length string.
Return to the Hadrian wiki table of contents.
Licensed under the Hadrian Personal Use and Evaluation License (PUEL).