Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Commit

Permalink
batch evidence to an array, avoid JRuby enumerator
Browse files Browse the repository at this point in the history
The JRuby enumerator uses a thread per next object in an enumerator
which proves costly. Hundreds of threads are created (tested with
yourkit) when batch-creating evidence due to the "each_slice(500)" of
the enumerator.

This issue is logged in JRuby:
jruby/jruby#2577

The solution employed was to yield each evidence directly to the block
and batch 500 into an array at a time. This should avoid the OOM
exception received:

ava.lang.OutOfMemoryError: unable to create new native thread

Indeed the thread count was observed to be lower in yourkit.
  • Loading branch information
Anthony Bargnesi committed Jan 12, 2016
1 parent fb58eee commit a515587
Showing 1 changed file with 32 additions and 17 deletions.
49 changes: 32 additions & 17 deletions app/openbel/api/routes/datasets.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,15 @@ class Datasets < Base
include OpenBEL::Helpers

DEFAULT_TYPE = 'application/hal+json'

ACCEPTED_TYPES = {
:bel => 'application/bel',
:xml => 'application/xml',
:xbel => 'application/xml',
:json => 'application/json',
}

EVIDENCE_BATCH = 500

def initialize(app)
super

Expand Down Expand Up @@ -233,33 +234,47 @@ def retrieve_dataset(uri)
# Create dataset in RDF.
@rr.insert_statements(void_dataset)

dataset = retrieve_dataset(void_dataset_uri)
dataset = retrieve_dataset(void_dataset_uri)
dataset_id = dataset[:identifier]

# Add batches of read evidence objects; save to Mongo and RDF.
# TODO Add JRuby note regarding Enumerator threading.
evidence_batch = []
BEL.evidence(io, type).each do |ev|
# Standardize annotations from experiment_context.
@annotation_transform.transform_evidence!(ev, base_url)

# Add slices of read evidence objects; save to Mongo and RDF.
BEL.evidence(io, type).each.lazy.each_slice(500) do |slice|
slice.map! do |ev|
# Standardize annotations from experiment_context.
@annotation_transform.transform_evidence!(ev, base_url)
ev.metadata[:dataset] = dataset_id
facets = map_evidence_facets(ev)
ev.bel_statement = ev.bel_statement.to_s
hash = ev.to_h
hash[:facets] = facets
# Create dataset field for efficient removal.
hash[:_dataset] = dataset_id

# Add filterable metadata field for dataset identifier.
ev.metadata[:dataset] = dataset[:identifier]
evidence_batch << hash

facets = map_evidence_facets(ev)
ev.bel_statement = ev.bel_statement.to_s
hash = ev.to_h
hash[:facets] = facets
if evidence_batch.size == EVIDENCE_BATCH
_ids = @api.create_evidence(evidence_batch)

# Create dataset field for efficient removal.
hash[:_dataset] = dataset[:identifier]
hash
dataset_parts = _ids.map { |object_id|
RDF::Statement.new(void_dataset_uri, RDF::DC.hasPart, object_id.to_s)
}
@rr.insert_statements(dataset_parts)

evidence_batch.clear
end
end

_ids = @api.create_evidence(slice)
unless evidence_batch.empty?
_ids = @api.create_evidence(evidence_batch)

dataset_parts = _ids.map { |object_id|
RDF::Statement.new(void_dataset_uri, RDF::DC.hasPart, object_id.to_s)
}
@rr.insert_statements(dataset_parts)

evidence_batch.clear
end

status 201
Expand Down

0 comments on commit a515587

Please sign in to comment.