Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional type as return value of parser calls #51

Open
julik opened this issue Jan 15, 2018 · 4 comments
Open

Optional type as return value of parser calls #51

julik opened this issue Jan 15, 2018 · 4 comments

Comments

@julik
Copy link
Contributor

julik commented Jan 15, 2018

We call parsers in succession, and a single parser failing should not, at least for most errors, lead to other parsers failing as well. We currently capture a tiny subset of errors in the .parse method of the FormatParser module, but there is no guarantee these are the only errors a parser could raise.

I think what we could use something like this instead:

result_nil_or_error = ParserOptional.(parser, input)

case result_nil_or_error
when Exception #=> register an exception and continue
when nil # => parser didn't do anything
else # => We did recover a meaningful result
end

For this we only have to introduce one wrapper method, and we don't have to change the parsers at all.

@duplamatyi
Copy link
Contributor

@julik I would like to implement this.
Maybe I could also implement also some small other changes in this scope for the parse method. For example the results argument is reused as a variable storing the results of the parser calls. It is a bit confusing...
https://github.com/WeTransfer/format_parser/blob/master/lib/format_parser.rb#L80
Also the results.first is enough to break out of the map method, so setting the amount variable and using take is not necessary. At least according to my quick tests.
In regards with the exception handling I would just catch every StandardError. Catching any other exception I consider a bad idea.
Any suggestions on what does registering an exception mean? Does it make sense returning them to the caller?

So here is a code snippet. Here I just return the errors in a hash if the strategy is all_results.

  STRATEGIES = %i[all_results first_result]

  def self.parse(io, natures: @parsers_per_nature.keys, formats: @parsers_per_format.keys, strategy: :all_results)
    # Check for strategy
    unless STRATEGIES.include? strategy
      throw ArgumentError.new(
        "Uknown strategy #{strategy}. Supported strategies: #{STRATEGIES}."
      )
    end

    # If the cache is preconfigured do not apply an extra layer. It is going
    # to be preconfigured when using parse_http.
    io = Care::IOWrapper.new(io) unless io.is_a?(Care::IOWrapper)

    # Always instantiate parsers fresh for each input, since they might
    # contain instance variables which otherwise would have to be reset
    # between invocations, and would complicate threading situations
    parsers = parsers_for(natures, formats)
    parser_errors = {}

    results = parsers.lazy.map do |parser|
      # We need to rewind for each parser, anew
      io.seek(0)
      # Limit how many operations the parser can perform
      limited_io = ReadLimiter.new(io, max_bytes: MAX_BYTES, max_reads: MAX_READS, max_seeks: MAX_SEEKS)
      begin
        parser.call(limited_io)
      rescue => error
        parser_errors[parser.class.name] = error
        nil
      end
    end.reject(&:nil?)

    return results.first if strategy == :first_result

    {
      results: results.to_a,
      errors: parser_errors
    }
  end

Could you comment on it?

@julik
Copy link
Contributor Author

julik commented Feb 18, 2018

Any suggestions on what does registering an exception mean?

Good question. Something like this: https://github.com/WeTransfer/image_vise/blob/master/lib/image_vise/render_engine.rb#L83

In other words, we can have a situation where parsers A and B return a valid result, and parser C errors out. In that case, we need to have some way to communicate that parser C failed to the caller, and maybe use a global FormatParser-configured hook for reporting the error, say, to Appsignal. So by "registering" I mean "communicating".

Your idea seems nice, however I think we should return a more "monadic" flat list of parsers + their results or errors. Something like a sum type for Either<StandardError,Result> that you either have to unwrap explicitly or we bridge over methods and fail if you call a method that belongs on a result on an Exception object?..

@duplamatyi
Copy link
Contributor

duplamatyi commented Feb 22, 2018

@julik So each parser would return either a StandardError, a Result or simply nil.

Should this result be encapsulated in a Hash or some other data structure so that you know which specific parsers did raise those errors or returned a result? Maybe every element in this lazy enumerator should be a hash containing the result and the name of the parser that produced it.

{
  result: safe_parse(parser, limited_io) # returns either StandardError, Result or nil
  parser: parser.class.name
}

Should the first_result strategy return the first non StandardError or nil output? Also, here would it be interesting to remove the strategy from this method and return just the lazy enumerator and let the caller or some other wrapper methods we implement decide what to do with it?

@julik
Copy link
Contributor Author

julik commented Feb 22, 2018

@WJWH I know what you will say, but we might be needing a sum type or a monad here. Ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants