Optional type as return value of parser calls #51

julik · 2018-01-15T12:56:49Z

We call parsers in succession, and a single parser failing should not, at least for most errors, lead to other parsers failing as well. We currently capture a tiny subset of errors in the .parse method of the FormatParser module, but there is no guarantee these are the only errors a parser could raise.

I think what we could use something like this instead:

result_nil_or_error = ParserOptional.(parser, input)

case result_nil_or_error
when Exception #=> register an exception and continue
when nil # => parser didn't do anything
else # => We did recover a meaningful result
end

For this we only have to introduce one wrapper method, and we don't have to change the parsers at all.

The text was updated successfully, but these errors were encountered:

duplamatyi · 2018-02-17T16:25:02Z

@julik I would like to implement this.
Maybe I could also implement also some small other changes in this scope for the parse method. For example the results argument is reused as a variable storing the results of the parser calls. It is a bit confusing...
https://github.com/WeTransfer/format_parser/blob/master/lib/format_parser.rb#L80
Also the results.first is enough to break out of the map method, so setting the amount variable and using take is not necessary. At least according to my quick tests.
In regards with the exception handling I would just catch every StandardError. Catching any other exception I consider a bad idea.
Any suggestions on what does registering an exception mean? Does it make sense returning them to the caller?

So here is a code snippet. Here I just return the errors in a hash if the strategy is all_results.

  STRATEGIES = %i[all_results first_result]

  def self.parse(io, natures: @parsers_per_nature.keys, formats: @parsers_per_format.keys, strategy: :all_results)
    # Check for strategy
    unless STRATEGIES.include? strategy
      throw ArgumentError.new(
        "Uknown strategy #{strategy}. Supported strategies: #{STRATEGIES}."
      )
    end

    # If the cache is preconfigured do not apply an extra layer. It is going
    # to be preconfigured when using parse_http.
    io = Care::IOWrapper.new(io) unless io.is_a?(Care::IOWrapper)

    # Always instantiate parsers fresh for each input, since they might
    # contain instance variables which otherwise would have to be reset
    # between invocations, and would complicate threading situations
    parsers = parsers_for(natures, formats)
    parser_errors = {}

    results = parsers.lazy.map do |parser|
      # We need to rewind for each parser, anew
      io.seek(0)
      # Limit how many operations the parser can perform
      limited_io = ReadLimiter.new(io, max_bytes: MAX_BYTES, max_reads: MAX_READS, max_seeks: MAX_SEEKS)
      begin
        parser.call(limited_io)
      rescue => error
        parser_errors[parser.class.name] = error
        nil
      end
    end.reject(&:nil?)

    return results.first if strategy == :first_result

    {
      results: results.to_a,
      errors: parser_errors
    }
  end

Could you comment on it?

julik · 2018-02-18T20:29:44Z

Any suggestions on what does registering an exception mean?

Good question. Something like this: https://github.com/WeTransfer/image_vise/blob/master/lib/image_vise/render_engine.rb#L83

In other words, we can have a situation where parsers A and B return a valid result, and parser C errors out. In that case, we need to have some way to communicate that parser C failed to the caller, and maybe use a global FormatParser-configured hook for reporting the error, say, to Appsignal. So by "registering" I mean "communicating".

Your idea seems nice, however I think we should return a more "monadic" flat list of parsers + their results or errors. Something like a sum type for Either<StandardError,Result> that you either have to unwrap explicitly or we bridge over methods and fail if you call a method that belongs on a result on an Exception object?..

duplamatyi · 2018-02-22T11:59:27Z

@julik So each parser would return either a StandardError, a Result or simply nil.

Should this result be encapsulated in a Hash or some other data structure so that you know which specific parsers did raise those errors or returned a result? Maybe every element in this lazy enumerator should be a hash containing the result and the name of the parser that produced it.

{
  result: safe_parse(parser, limited_io) # returns either StandardError, Result or nil
  parser: parser.class.name
}

Should the first_result strategy return the first non StandardError or nil output? Also, here would it be interesting to remove the strategy from this method and return just the lazy enumerator and let the caller or some other wrapper methods we implement decide what to do with it?

julik · 2018-02-22T12:00:47Z

@WJWH I know what you will say, but we might be needing a sum type or a monad here. Ideas?

julik added the apidesign label Jan 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional type as return value of parser calls #51

Optional type as return value of parser calls #51

julik commented Jan 15, 2018

duplamatyi commented Feb 17, 2018

julik commented Feb 18, 2018

duplamatyi commented Feb 22, 2018 •

edited

Loading

julik commented Feb 22, 2018

Optional type as return value of parser calls #51

Optional type as return value of parser calls #51

Comments

julik commented Jan 15, 2018

duplamatyi commented Feb 17, 2018

julik commented Feb 18, 2018

duplamatyi commented Feb 22, 2018 • edited Loading

julik commented Feb 22, 2018

duplamatyi commented Feb 22, 2018 •

edited

Loading