This repository has been archived by the owner on Sep 24, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6
Abstract Syntax Tree for BEL Terms
Anthony Bargnesi edited this page Aug 31, 2015
·
9 revisions
A parser exists that will recognize different types of BEL expressions. The types that are currently recognized are:
- value (e.g.
AKT1
) - namespaced value (e.g.
HGNC:AKT1
) - term (e.g.
p(AKT1)
,p(HGNC:AKT1)
,tscript(p(HGNC:AKT1, pmod(S,Y,694))
)
Each parsed token indicates if it was deemed complete and has a character position interval [start, end) (i.e.left-closed, right-open).
For example the recognition of p(HGNC:AKT1
would produce the following tree:
TERM[0](-1, -1]
fx(p)(0, 1]
ARG[1]
NV[1](2, 11]
pfx(HGNC)(2, 6]
val(AKT1)(7, 11]
ARG[0]
(null)
(null)
With this example notice that:
- An incomplete TERM token was recognized with indeterminate character positions.
- A complete argument (ARG) was recognized that happened to be a Namespace Value (NV) token. It's character position interval was
(2, 11]
. - An argument (ARG) exists with two NULL children. This leaf node signifies the exclusive end of the argument list. Additionally by having this right child it preserves the structure of a binary tree (parent nodes always have only two children).
Install bel.rb from the term_ast branch using:
git clone [email protected]:OpenBEL/bel.rb.git
cd bel.rb
git checkout -b term_ast
gem build bel.gemspec
gem install bel-0.3.3.gem
Open up an irb (or pry) session and try:
require 'bel'
# Returns an Abstract Syntax Tree that you can traverse.
BEL::Parser.parse('AKT1')
=> #<BEL::LibBEL::BelAst:0x000000035adea0>
# You can also print to a flattened string for debugging.
BEL::LibBEL.bel_print_ast(BEL::Parser.parse('AKT1'))
=> ARG[1] NV[1][0, 4] pfx((null)) val(AKT1)[0, 4] ARG[0] (null) (null)
# Structured as a tree it would be:
# ARG[1]
# NV[1][0, 4]
# pfx((null))
# val(AKT1)[0, 4]
# ARG[0]
# (null)
# (null)
Screen recording showing examples of expression parsing with BEL::Parser
.
Simple protein term:
p(HGNC:AKT1)
TERM + + | | .------+ +------. | | fx(p) ARG + + | | .--------+ +----------. | | NV ARG ++ + + || | | .-----++------. .--+ +--. | | | | pfx(HGNC) val(AKT1) NULL NULL
Modified protein term:
p(SFAM:"STAT5 Family",pmod(P,Y,694))
TERM + + | | .------+ +------. | | fx(p) ARG + + | | .--------+ +----------------. | | NV ARG ++ + + || | | .-----++------. .--+ +-----------. | | | | pfx(SFAM) val(STAT5 Family) TERM ARG + + + + | | | | .--+ +--. .--+ +--. | | | | fx(pmod) ARG NULL NULL + + | | .--------------+ +--. | | NV ARG ++ + + || | | .--++--. .--+ +---------. | | | | pfx(NULL) P NV ARG ++ + + || | | .--++--. .--+ +-------. | | | | pfx(NULL) Y NV ARG ++ + + || | | .--++--. .--+ +--. | | | | pfx(NULL) 694 NULL NULL