Skip to content

Latest commit

 

History

History
144 lines (101 loc) · 6.63 KB

program-abstraction.adoc

File metadata and controls

144 lines (101 loc) · 6.63 KB

Program Abstraction in Tai-e (core classes and IR)

This document introduces Tai-e’s abstraction of the Java program being analyzed. You will likely need to use the classes introduced in this document when developing analyses on top of Tai-e. See Section 2 of Tai-e’s paper for more discussions.

Core Classes

  • JClass (in pascal.taie.language.classes) represents classes in the program. Each instance contains various information of a class, such as class name, modifiers, declared methods and fields, etc.

  • JMethod and JField: (in pascal.taie.language.classes): represents class members, i.e., methods and fields in the program. Each JMethod/JField instance contains various information of a method/field, such as declaring class, name, etc.

  • ClassHierarchy (in pascal.taie.language.classes): manages all the classes of the program. It offers APIs to query class hierarchy information, such as method dispatching, subclass checking, etc.

  • Type (in pascal.taie.language.type): represents types in the program. It has several subclasses, e.g., PrimitiveType, ClassTyp, and ArrayType, representing different kinds of Java types.

  • TypeSystem (in pascal.taie.language.type): provides APIs for retrieving specific types and subtype checking.

  • World (in pascal.taie): manages the whole-program information of the program. By using its getters, you can access these information, e.g., ClassHierarchy and TypeSystem. World is essentially a singleton class, and you can obtain the instance by calling World.get().

Tai-e IR

Tai-e IR is typed, 3-address, statement and expression based representation of Java method body.

You could dump IR for the classes of input program to .tir files via option -a ir-dumper. By default, Tai-e dumps IR to its default output directory output/. If you want to dump IR to a specific directory, just use option -a ir-dumper=dump-dir:path/to/dir. ir-dumper is implemented as a class analysis, thus the scope of the classes it dumps are affected by option -scope.

The IR classes reside in package pascal.taie.ir and its sub-packages.

There are three core classes in Tai-e IR:

  • IR is the central data structure of intermediate representation in Tai-e, and each IR instance can be seen as a container of the information for the body of a particular method, such as variables, parameters, statements, etc. You could easily obtain IR instance of a method by JMethod.getIR() (providing the method is not abstract).

  • Stmt represents all statements in the program. This interface has a dozen of subclasses, corresponding to various statements. Stmt`s are stored in `IR, and you could obtain them via IR.getStmts().

  • Exp represents all expressions in the program. This interface has dozens of subclasses, corresponding to various expressions. Exp`s are associated with `Stmt`s, and you could obtain them via specific APIs of `Stmt.

We believe that the API of IR is self-documenting and easy to use. To make IR more intelligible, we present a formal definition (i.e., context-free grammar) below that illustrates all kinds of expressions and statements in the IR, and how Stmt are formed by Exp. Most non-terminals in the grammar corresponds to classes in pascal.taie.ir.

Grammar of Expressions

Exp → Var | Literal | FieldAccess | ArrayAccess | NewExp | InvokeExp | UnaryExp | BinaryExp | InstanceOfExp | CastExp

  • Var → Identifier

  • Literal → IntLiteral | LongLiteral | FloatLiteral | DoubleLiteral | StringLiteral | ClassLiteral | NullLiteral | MethodHandle | MethodType

    • FieldAccess → InstanceFieldAccess | StaticFieldAccess

    • InstanceFieldAccess → Var.FieldRef

    • StaticFieldAccess → FieldRef

    • FieldRef → <ClassType: Type FieldName>

    • FieldName → Identifier

  • ArrayAccess → Var[Var]

    • NewExp → NewInstance | NewArray | NewMultiArray

    • NewInstance → new ClassType

    • NewArray → new Type[Var]

    • NewMultiArray → new Type LengthList EmptyList

    • LengthList → [Var] | [Var]LengthList

    • EmptyList → ε | []EmptyList

  • InvokeExp → InvokeVirtual | InvokeInterface | InvokeSpecial | InvokeStatic | InvokeDynamic

    • InvokeVirtual → invokevirtual Var.MethodRef(ArgList)

    • InvokeInterface → invokeinterface Var.MethodRef(ArgList)

    • InvokeSpecial → invokespecial Var.MethodRef(ArgList)

    • InvokeStatic → invokestatic MethodRef(ArgList)

    • InvokeDynamic → invokedynamic BootstrapMethodRef MethodName MethodType [BootstrapArgList] (ArgList)

    • MethodRef → <ClassType: Type MethodName(TypeList)>

    • MethodName → Identifier

    • TypeList → ε | Type TypeList'

    • TypeList' → ε | , Type TypeList'

    • ArgList → ε | Var ArgList'

    • ArgList' → ε | , Var ArgList'

    • BootstrapMethodRef → MethodRef

    • BootstrapArgList → ε | Literal BootstrapArgList'

    • BootstrapArgList' → ε | , Literal BootstrapArgList'

  • UnaryExp → NegExp | ArrayLengthExp

    • NegExp → !Var

    • ArrayLengthExp → Var.length

  • BinaryExp → ArithmeticExp | BitwiseExp | ComparisonExp | ConditionExp | ShiftExp

    • ArithmeticExp → Var ArithmeticOp Var

    • ArithmeticOp → + | - | * | / | %

    • BitwiseExp → Var BitwiseOp Var

    • BitwiseOp → "|" | & | ^

    • ComparisonExp → Var ComparisonOp Var

    • ComparisonOp → cmp | cmpl | cmpg

    • ConditionExp → Var ConditionOp Var

    • ConditionOp → == | != | < | > | ⇐ | >=

    • ShiftExp → Var ShiftOp Var

    • ShitOp → << | >> | >>>

  • InstanceOfExp → Var instanceof Type

  • CastExp → (Type) Var

Grammar of Statements

Stmt → AssignStmt | JumpStmt | Invoke | Return | Throw | Catch | Monitor | Nop

  • AssignStmt → New | AssignLiteral | Copy | LoadArray | StoreArray | LoadField | StoreField | Unary | Binary | InstanceOf | Cast

    • New → Var = NewExp;

    • AssignLiteral → Var = Literal;

    • Copy → Var = Var;

    • LoadArray → Var = ArrayAccess;

    • StoreArray → ArrayAccess = Var;

    • LoadField → Var = FieldAccess;

    • StoreField → FieldAccess = Var;

    • Unary → Var = UnaryExp;

    • Binary → Var = BinaryExp;

    • InstanceOf → Var = InstanceOfExp;

    • Cast → Var = CastExp;

  • JumpStmt → Goto | If | Switch

    • Goto → goto Label;

    • If → if ConditionExp goto Label;

    • Switch → TableSwitch | LookupSwitch

    • TableSwitch → tableswitch (Var) { CaseList default: goto Label; }

    • LookupSwitch → lookupswitch (Var) { CaseList default: goto Label; }

    • Label → IntLiteral

    • CaseList → ε | case IntLiteral: goto Label; CaseList

  • Invoke → InvokeExp; | Var = InvokeExp;

  • Return → return; | return Var;

  • Throw → throw Var;

  • Catch → catch Var;

  • Monitor → monitorenter Var; | monitorexit Var;

  • Nop → nop;