Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Format Extensions

Anthony Bargnesi edited this page Jul 27, 2015 · 4 revisions

New formats for reading and writing BEL Evidence can be supported by adding a format extension. A format extension is expected to read data as BEL Evidence and write BEL Evidence as data.

Quickstart (for the impatient eager)

  1. Create a your extension under bel/extensions/ on the Ruby LOAD_PATH. For example create a gem then create the file lib/bel/extensions/my_format_extension.rb.
  2. Create a BEL::Extension::Format::Formatter class to deserialize and serialize your format.
  3. Register a formatter using BEL::Extension::Format.register_formatter.
  4. Load your extension by id with BEL::Extension.load_extension(:my_format).

Format Extension

The following is a skeleton format extension for the example format:

require 'bel'

# Open the BEL::Extension::Format module.
module BEL::Extension::Format

  class FormatExample
    include Formatter

    def id
      :example
    end

    # Read Evidence objects from data encoded in the Example format.
    def deserialize(data, &block)
      ExampleToEvidenceYielder.new(data)
    end

    # Write Evidence objects to data encoded in the Example format.
    def serialize(objects, writer = StringIO.new, options = {})
      EvidenceToExampleYielder.new(objects).each { |example_encoded_object|
        writer << "#{example_encoded_object}"
        writer.flush
      }
    end
  end

  class ExampleToEvidenceYielder
    # Your implementation for reading Example data to Evidence objects.
  end

  class EvidenceToExampleYielder
    # Your implementation for writing Evidence objects to Example data.
  end

  # register your Example format extension with BEL
  register_formatter(FormatExample.new)
end

Deserialization

The deserialization method should convert IO to Evidence objects. The expected return is an object that can be enumerated by calling the each method. This can be useful in yielding objects as the IO is read which keeps the memory requirements low.

Let us look at the FormatBEL extension as an example of lazily yielding Evidence objects.

class FormatBEL
  include Formatter

  # ommissions for brevity

  def deserialize(data, &block)
    EvidenceYielder.new(data)
  end
end

class EvidenceYielder

  # ommissions for brevity

  def each
    if block_given?
      ::BEL::Script.parse(@data).each { |parsed_obj|
        # yield evidence
      }
    else
      to_enum(:each)
    end
  end
end

Note: This can be accomplished because BEL::Script.parse(@data) will yield each parsed object as the IO-like object is read. You can visualize this process as a pipeline where at each step only one Evidence object is in memory at a time.

Serialization

The serialization method should convert Evidence object to your format. You can return a stream of objects, but also write directly to the provided IO. The latter use case is more likely when serializing data.

Let us look again at the FormatBEL extension for iteratively writing Evidence objects to an IO stream.

class FormatBEL
  include Formatter

  # ommissions for brevity

  def serialize(objects, writer = StringIO.new, options = {})
    BELYielder.new(objects).each { |bel_part|
      writer << "#{bel_part}"
      writer.flush
    }
  end
end

class BELYielder

  # ommissions for brevity

  def each
    if block_given?
      @data.each { |evidence|
        # yield bel
      }
    else
      to_enum(:each)
    end
  end
end

Use case: Read evidence objects.

Note: Using the BEL::Format.evidence API.

require 'bel'
BEL::Extension.load_extension('bel')

lazy_evidence_stream =
  BEL::Format.
    evidence(
      File.open('large_corpus.bel'), # The IO to read.
      :bel                           # The formatter identified by id symbol.
    ).
    lazy

# Now you can lazily filter, map, and reduce the stream.
lazy_evidence_stream.
  filter { |evidence|
    ...
  }.
  map { |evidence|
    ...
  }.
  reduce { |result, object|
    ...
  }

Use case: Translate Evidence from one format to another.

Note: Using the BEL::Format.translate API.

require 'bel'
BEL::Extension.load_extension('bel', 'jgf')

BEL::Format.
    translate(
      File.open('large_corpus.bel'), # The IO to read from.
      :bel,                          # The input format  (either by id, file extension, or media type)
      %s(application/vnd.jgf+json),  # The output format (either by id, file extension, or media type)
      $stdout                        # The IO to write to.
    )
Clone this wiki locally