You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to make a deployable version of torchmoji.. I'm still very new to Pyspark and I'm doing this project on Databricks.
My code:
importpyspark.sql.functionsasFfrompyspark.sql.typesimport*defdeepmojify(sentence,top_n=1):
tokenized, _, _=st.tokenize_sentences([sentence])
prob=model(tokenized)[0]
emoji_ids=top_elements(prob, top_n)
emojis=map(lambdax: EMOJIS[x], emoji_ids)
# returning the emojis as a list named as list_emojisreturnemoji.emojize(f"{' '.join(emojis)}", use_aliases=True)
udf_deepmojify=udf(deepmojify, StringType())
test_udf_deepmojify=df.withColumn("emojis", udf_deepmojify("review_by_customer"))
display(test_udf_deepmojify)
The error I keep getting is Py4JJavaError and ModuleNotFoundError: No module named 'torchmoji'.
What do I do?
Full error:
---------------------------------------------------------------------------Py4JJavaErrorTraceback (mostrecentcalllast)
<command-4055905865243119>in<module>---->1final_df_test.show()
/databricks/spark/python/pyspark/sql/dataframe.pyinshow(self, n, truncate, vertical)
382""" 383 if isinstance(truncate, bool) and truncate:--> 384 print(self._jdf.showString(n, 20, vertical)) 385 else: 386 print(self._jdf.showString(n, int(truncate), vertical))/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value(-> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 61 def deco(*a, **kw): 62 try:---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: 65 s = e.java_exception.toString()/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n".--> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError(Py4JJavaError: An error occurred while calling o12923.showString.: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 4 times, most recent failure: Lost task 0.3 in stage 77.0 (TID 201, 10.139.64.5, executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 464, in main func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) File "/databricks/spark/python/pyspark/worker.py", line 316, in read_udfs arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf) File "/databricks/spark/python/pyspark/worker.py", line 170, in read_single_udf f, return_type = read_command(pickleSer, infile) File "/databricks/spark/python/pyspark/worker.py", line 73, in read_command command = serializer.loads(command.value) File "/databricks/spark/python/pyspark/serializers.py", line 695, in loads return pickle.loads(obj, encoding=encoding)ModuleNotFoundError: No module named 'torchmoji' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:540) at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81) at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62) at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159) at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140) at org.apache.spark.scheduler.Task.run(Task.scala:113) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2362) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2350) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2349) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2349) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1102) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2582) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2529) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2517) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:897) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280) at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:270) at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:280) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:80) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:86) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508) at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:57) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectResult(Dataset.scala:2905) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3517) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634) at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3501) at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3496) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1$$anonfun$apply$1.apply(SQLExecution.scala:112) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:98) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:835) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:74) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:184) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3496) at org.apache.spark.sql.Dataset.head(Dataset.scala:2634) at org.apache.spark.sql.Dataset.take(Dataset.scala:2848) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:279) at org.apache.spark.sql.Dataset.showString(Dataset.scala:316) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 464, in main func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type) File "/databricks/spark/python/pyspark/worker.py", line 316, in read_udfs arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf) File "/databricks/spark/python/pyspark/worker.py", line 170, in read_single_udf f, return_type = read_command(pickleSer, infile) File "/databricks/spark/python/pyspark/worker.py", line 73, in read_command command = serializer.loads(command.value) File "/databricks/spark/python/pyspark/serializers.py", line695, inloadsreturnpickle.loads(obj, encoding=encoding)
ModuleNotFoundError: Nomodulenamed'torchmoji'atorg.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:540)
atorg.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
atorg.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
atorg.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
atorg.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
atscala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
atscala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
atscala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
atorg.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(UnknownSource)
atorg.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
atorg.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
atorg.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
atorg.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
atorg.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
atorg.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
atorg.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
atorg.apache.spark.scheduler.Task.run(Task.scala:113)
atorg.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
atorg.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
atorg.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1more
The text was updated successfully, but these errors were encountered:
Sorry, not familiar with PySpark, so I could be completely off.
The error is saying it can't find the torchMoji module. Maybe it is not installed correctly? Also, assuming the above isn't your complete code snippet since st is not created / imported in there, so I could be missing the additional context.
Hello,
I'm trying to make a deployable version of torchmoji.. I'm still very new to Pyspark and I'm doing this project on Databricks.
My code:
The error I keep getting is
Py4JJavaError
andModuleNotFoundError: No module named 'torchmoji'
.What do I do?
Full error:
The text was updated successfully, but these errors were encountered: