Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple errors (function rb_frame_method_id_and_class cannot be found; internal exception) with torch.rb #3205

Open
mtortonesi opened this issue Aug 10, 2023 · 5 comments
Labels

Comments

@mtortonesi
Copy link
Contributor

I am trying to get torch.rb working with TruffleRuby (version truffleruby+graalvm-23.0.0 installed with rbenv+ruby_build on a M1 MacBookPro), but unfortunately when I launch the following super simple command that generates a 1D tensor from the data contained in an array:

bundle exec ruby -e 'require "torch"; Torch::Tensor.new([1,2,3,4])'

I get the following error:

/Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/rice-4.1.0/include/rice/rice.hpp:4094:in `call': External LLVMFunction rb_frame_method_id_and_class cannot be found. (Polyglot::ForeignException)
	from /Users/mauro/.rbenv/versions/truffleruby+graalvm-23.0.0/graalvm/Contents/Home/languages/ruby/lib/truffle/truffle/cext_ruby.rb:41:in `'
	from /Users/mauro/.rbenv/versions/truffleruby+graalvm-23.0.0/graalvm/Contents/Home/languages/ruby/lib/truffle/truffle/cext_ruby.rb:41:in `Torch::TensorOptions#initialize'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/lib/torch.rb:529:in `Torch.tensor_options'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/lib/torch.rb:429:in `Torch.tensor'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/lib/torch.rb:281:in `#.new'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/lib/torch/tensor.rb:29:in `Torch::Tensor.new'
	from -e:1:in `'

In addition, slightly changing the code to give dimensions instead of data to Tensor.new:

bundle exec ruby -e 'require "torch"; Torch::Tensor.new(1,2)'

generates a completely different error that reports an internal exception:

truffleruby: an internal exception escaped out of the interpreter, please report it to https://github.com/oracle/truffleruby/issues.
<no message> (com.oracle.truffle.api.CompilerDirectives.ShouldNotReachHere)
	from com.oracle.truffle.api.CompilerDirectives.shouldNotReachHere(CompilerDirectives.java:574)
	from com.oracle.truffle.api.CompilerDirectives.shouldNotReachHere(CompilerDirectives.java:546)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNode.getSignatureSource(LLVMDispatchNode.java:165)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNode.bindSymbol(LLVMDispatchNode.java:303)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNodeGen.executeAndSpecialize(LLVMDispatchNodeGen.java:536)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNodeGen.executeDispatch(LLVMDispatchNodeGen.java:304)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMCallNode.doCall(LLVMCallNode.java:82)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMCallNodeGen.executeGeneric(LLVMCallNodeGen.java:37)
	from com.oracle.truffle.llvm.runtime.nodes.api.LLVMVoidStatementNodeGen.execute(LLVMVoidStatementNodeGen.java:31)
	from com.oracle.truffle.llvm.runtime.nodes.api.LLVMFrameNuller.doExecute(LLVMFrameNuller.java:64)
	from com.oracle.truffle.llvm.runtime.nodes.api.LLVMFrameNullerNodeGen.execute(LLVMFrameNullerNodeGen.java:29)
	from com.oracle.truffle.llvm.runtime.nodes.base.LLVMBasicBlockNode$InitializedBlockNode.execute(LLVMBasicBlockNode.java:154)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNode.dispatchFromBasicBlock(LLVMDispatchBasicBlockNode.java:123)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNode.doDispatch(LLVMDispatchBasicBlockNode.java:90)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNodeGen.executeGeneric(LLVMDispatchBasicBlockNodeGen.java:33)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNode.doRun(LLVMFunctionRootNode.java:81)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNodeGen.executeGeneric(LLVMFunctionRootNodeGen.java:34)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMFunctionStartNode.execute(LLVMFunctionStartNode.java:102)
/opt/homebrew/Cellar/pytorch/2.0.1/include/torch/csrc/autograd/generated/variable_factories.h:261:in `empty_symint'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/ext/torch/torch_functions.cpp:4970:in `torch_empty'
	from /Users/mauro/.rbenv/versions/truffleruby+graalvm-23.0.0/graalvm/Contents/Home/languages/ruby/lib/truffle/truffle/cext_ruby.rb:41:in `empty'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/lib/torch.rb:285:in `new'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/lib/torch/tensor.rb:29:in `new'
	from -e:1:in `<main>'
Caused by:
unsupported type [2 x i64] in native interop (com.oracle.truffle.llvm.runtime.NativeContextExtension.UnsupportedNativeTypeException)
	from com.oracle.truffle.llvm.nativemode.runtime.NFIContextExtension.getNativeType(NFIContextExtension.java:424)
	from com.oracle.truffle.llvm.nativemode.runtime.NFIContextExtension.getNativeArgumentTypes(NFIContextExtension.java:430)
	from com.oracle.truffle.llvm.nativemode.runtime.NFIContextExtension.getNativeSignature(NFIContextExtension.java:607)
	from com.oracle.truffle.llvm.nativemode.runtime.NFIContextExtension$SignatureSourceCache.getSignatureSource(NFIContextExtension.java:162)
	from com.oracle.truffle.llvm.nativemode.runtime.NFIContextExtension$SignatureSourceCache.getSignatureSourceSkipStackArg(NFIContextExtension.java:155)
	from com.oracle.truffle.llvm.nativemode.runtime.NFIContextExtension.getNativeSignatureSourceSkipStackArg(NFIContextExtension.java:561)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNode.getSignatureSource(LLVMDispatchNode.java:163)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNode.bindSymbol(LLVMDispatchNode.java:303)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNodeGen.executeAndSpecialize(LLVMDispatchNodeGen.java:536)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMDispatchNodeGen.executeDispatch(LLVMDispatchNodeGen.java:304)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMCallNode.doCall(LLVMCallNode.java:82)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMCallNodeGen.executeGeneric(LLVMCallNodeGen.java:37)
	from com.oracle.truffle.llvm.runtime.nodes.api.LLVMVoidStatementNodeGen.execute(LLVMVoidStatementNodeGen.java:31)
	from com.oracle.truffle.llvm.runtime.nodes.api.LLVMFrameNuller.doExecute(LLVMFrameNuller.java:64)
	from com.oracle.truffle.llvm.runtime.nodes.api.LLVMFrameNullerNodeGen.execute(LLVMFrameNullerNodeGen.java:29)
	from com.oracle.truffle.llvm.runtime.nodes.base.LLVMBasicBlockNode$InitializedBlockNode.execute(LLVMBasicBlockNode.java:154)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNode.dispatchFromBasicBlock(LLVMDispatchBasicBlockNode.java:123)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNode.doDispatch(LLVMDispatchBasicBlockNode.java:90)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMDispatchBasicBlockNodeGen.executeGeneric(LLVMDispatchBasicBlockNodeGen.java:33)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNode.doRun(LLVMFunctionRootNode.java:81)
	from com.oracle.truffle.llvm.runtime.nodes.control.LLVMFunctionRootNodeGen.executeGeneric(LLVMFunctionRootNodeGen.java:34)
	from com.oracle.truffle.llvm.runtime.nodes.func.LLVMFunctionStartNode.execute(LLVMFunctionStartNode.java:102)

I would like to help fixing this, but I really wouldn't know where to start. Any suggestions?

@mtortonesi
Copy link
Contributor Author

I upgraded to truffleruby+graalvm-dev (also adding the #3151 patch), but the problem still persists.

@mtortonesi
Copy link
Contributor Author

Apparently, the problem is that the rice gem does not work with TruffleRuby: ruby-rice/rice#189

Any suggestion?

@eregon eregon added the cexts label Aug 16, 2023
@eregon
Copy link
Member

eregon commented Aug 16, 2023

External LLVMFunction rb_frame_method_id_and_class cannot be found.

So that means that C API function is not yet implemented in TruffleRuby.
It needs to be implemented to fix that one.

@eregon
Copy link
Member

eregon commented Aug 16, 2023

Regarding the second error

unsupported type [2 x i64] in native interop (com.oracle.truffle.llvm.runtime.NativeContextExtension.UnsupportedNativeTypeException)

I think that means a struct by value is used as an argument or return type, and NFI does not support that yet.
It would be good to confirm if that's the case based on the backtrace and C source code.
#3118 might solve this struct-by-value in native extensions in general.

@eregon
Copy link
Member

eregon commented Aug 16, 2023

It would be good to confirm if that's the case based on the backtrace and C source code.

Could you quote here what these two lines contain?

/opt/homebrew/Cellar/pytorch/2.0.1/include/torch/csrc/autograd/generated/variable_factories.h:261:in `empty_symint'
	from /Users/mauro/code/test/truffleruby/reinforce/vendor/bundle/truffleruby/3.1.3.23.0.0/gems/torch-rb-0.13.2/ext/torch/torch_functions.cpp:4970:in `torch_empty'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants