Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py4JJavaError: An error occurred while calling o2.iterator. #1

Open
Vimos opened this issue May 15, 2017 · 7 comments
Open

Py4JJavaError: An error occurred while calling o2.iterator. #1

Vimos opened this issue May 15, 2017 · 7 comments

Comments

@Vimos
Copy link

Vimos commented May 15, 2017

Hi, I am trying to read an orc file.

In [1]: from orcreader import OrcReader
   ...: reader = OrcReader('dt=2017-05-14_os=android_part=000004_0')
   ...: reader.open()
   ...: 

I have successfully get the schema like this

In [3]: reader.schema()
Out[3]: 
OrderedDict([(u'log_id', u'string'),
             (u'city_id', u'string'),
             (u'city_name', u'string'),
             (u'city_name_en', u'string'),
             (u'province_id', u'string'),
             (u'province_name', u'string'),
 ...  
             (u'activity_flag', u'string')])

But when I am trying to read rows, it reports the following error

In [2]: for row in reader:
   ...:     print row
   ...:     
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-2-df0cbab3b6b5> in <module>()
----> 1 for row in reader:
      2     print row
      3 

/usr/local/lib/python2.7/dist-packages/python_orc-0.0.1-py2.7.egg/orcreader/reader.pyc in __iter__(self)
     79 
     80     def __iter__(self):
---> 81         return OrcRecordIterator(self.reader.iterator())
     82 
     83     def __enter__(self):

/usr/local/lib/python2.7/dist-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.pyc in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

/usr/local/lib/python2.7/dist-packages/py4j-0.10.4-py2.7.egg/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)
    317                 raise Py4JJavaError(
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:
    321                 raise Py4JError(

Py4JJavaError: An error occurred while calling o2.iterator.
: java.lang.RuntimeException: Unable to init iterator
	at com.pythonorc.SimplifiedOrcReader.iterator(SimplifiedOrcReader.java:72)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

Any suggestions on how to debug this error?

@nqbao
Copy link
Owner

nqbao commented May 15, 2017

Is it possible to share the ORC file? I can try to take a look at it.

@Vimos
Copy link
Author

Vimos commented May 15, 2017

I am sorry, I am not permitted to send you the data. I may offer more debug info from Java.

/usr/lib/jvm/java-8-oracle/bin/java -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:32805,suspend=y,server=n -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-8-oracle/jre/lib/charsets.jar:/usr/lib/jvm/java-8-oracle/jre/lib/deploy.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/jfxrt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-oracle/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-oracle/jre/lib/javaws.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jce.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfr.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jfxswt.jar:/usr/lib/jvm/java-8-oracle/jre/lib/jsse.jar:/usr/lib/jvm/java-8-oracle/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-oracle/jre/lib/plugin.jar:/usr/lib/jvm/java-8-oracle/jre/lib/resources.jar:/usr/lib/jvm/java-8-oracle/jre/lib/rt.jar:/home/vimos/Public/github/ml/python-orc/java-gateway/target/classes:/data/home/vimos/.m2/repository/net/sf/py4j/py4j/0.10.2.1/py4j-0.10.2.1.jar:/data/home/vimos/.m2/repository/org/apache/orc/orc-core/1.1.1/orc-core-1.1.1.jar:/data/home/vimos/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/data/home/vimos/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/data/home/vimos/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/data/home/vimos/.m2/repository/org/apache/hadoop/hadoop-common/2.6.0/hadoop-common-2.6.0.jar:/data/home/vimos/.m2/repository/org/apache/hadoop/hadoop-annotations/2.6.0/hadoop-annotations-2.6.0.jar:/usr/lib/jvm/java-8-oracle/lib/tools.jar:/data/home/vimos/.m2/repository/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar:/data/home/vimos/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/data/home/vimos/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/data/home/vimos/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/data/home/vimos/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/data/home/vimos/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/data/home/vimos/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/data/home/vimos/.m2/repository/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar:/data/home/vimos/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar:/data/home/vimos/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar:/data/home/vimos/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/data/home/vimos/.m2/repository/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar:/data/home/vimos/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/data/home/vimos/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.8.3/jackson-jaxrs-1.8.3.jar:/data/home/vimos/.m2/repository/org/codehaus/jackson/jackson-xc/1.8.3/jackson-xc-1.8.3.jar:/data/home/vimos/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar:/data/home/vimos/.m2/repository/asm/asm/3.1/asm-3.1.jar:/data/home/vimos/.m2/repository/tomcat/jasper-compiler/5.5.23/jasper-compiler-5.5.23.jar:/data/home/vimos/.m2/repository/tomcat/jasper-runtime/5.5.23/jasper-runtime-5.5.23.jar:/data/home/vimos/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/data/home/vimos/.m2/repository/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar:/data/home/vimos/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/data/home/vimos/.m2/repository/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar:/data/home/vimos/.m2/repository/org/apache/httpcomponents/httpclient/4.1.2/httpclient-4.1.2.jar:/data/home/vimos/.m2/repository/org/apache/httpcomponents/httpcore/4.1.2/httpcore-4.1.2.jar:/data/home/vimos/.m2/repository/com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar:/data/home/vimos/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/data/home/vimos/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/data/home/vimos/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/data/home/vimos/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/data/home/vimos/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar:/data/home/vimos/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/data/home/vimos/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/data/home/vimos/.m2/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/data/home/vimos/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.0/hadoop-auth-2.6.0.jar:/data/home/vimos/.m2/repository/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar:/data/home/vimos/.m2/repository/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar:/data/home/vimos/.m2/repository/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar:/data/home/vimos/.m2/repository/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar:/data/home/vimos/.m2/repository/org/apache/curator/curator-framework/2.6.0/curator-framework-2.6.0.jar:/data/home/vimos/.m2/repository/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar:/data/home/vimos/.m2/repository/org/apache/curator/curator-client/2.6.0/curator-client-2.6.0.jar:/data/home/vimos/.m2/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/data/home/vimos/.m2/repository/org/htrace/htrace-core/3.0.4/htrace-core-3.0.4.jar:/data/home/vimos/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/data/home/vimos/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar:/data/home/vimos/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar:/data/home/vimos/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.6.0/hadoop-hdfs-2.6.0.jar:/data/home/vimos/.m2/repository/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar:/data/home/vimos/.m2/repository/io/netty/netty/3.6.2.Final/netty-3.6.2.Final.jar:/data/home/vimos/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar:/data/home/vimos/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar:/data/home/vimos/.m2/repository/org/apache/hive/hive-storage-api/2.1.0-pre-orc/hive-storage-api-2.1.0-pre-orc.jar:/data/home/vimos/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar:/data/home/vimos/.m2/repository/stax/stax-api/1.0.1/stax-api-1.0.1.jar:/data/home/vimos/.m2/repository/org/iq80/snappy/snappy/0.2/snappy-0.2.jar:/data/home/vimos/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/data/home/vimos/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.jar:/opt/jetbrains/idea-IU-171.4249.39/lib/idea_rt.jar com.pythonorc.SimplifiedOrcReader
Connected to the target VM, address: '127.0.0.1:32805', transport: 'socket'
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[log_id, mhotelid, city_id, city_name, city_name_en, province_id, province_name, province_name_en, m_city_id, m_province_id, log_type, uid, source, os_version, card_number, user_name, user_ip, appid, user_agent, trace_id, latitude, longitude, carrier, userinfo_channel, level, model, brand, orderid, proxyid, caller_attr_channel, economic_hotel, fast_filter_keywords, mhotel_ids, return_has_xianfu_hotel, return_has_yufu_hotel, hotel_brand_id, only_limitime_sale, facility_ids, theme_ids, star_rates, district_id, district_type, price_pair, payment_methods, nearby, poi_id, region_id, check_in, check_out, id, executetime, keywords, setkeywords, setbrandid, setstarrates, inner_search_type, hotel_group_id, sorting_method, setfilterattr, mrankflag, mranktype, setnearby, setfastfilter_attr, setpoi_id, sethotel_group_id, response_mhotelids, setprice_pair, star_ratessize, facility_idssize, setdistrict_type, settheme_ids, setdistrict_id, settrace_id, pageindex, pagesize, recreqattrtype, ifun, crawled_flag, geo_type, activity_flag]
80
1149130
Exception in thread "main" java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3863789
	at org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:217)
	at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:262)
	at java.io.InputStream.read(InputStream.java:101)
	at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
	at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
	at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
	at org.apache.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10679)
	at org.apache.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10643)
	at org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10748)
	at org.apache.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10743)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
	at org.apache.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10976)
	at org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readStripeFooter(RecordReaderUtils.java:165)
	at org.apache.orc.impl.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:236)
	at org.apache.orc.impl.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:849)
	at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:820)
	at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:977)
	at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1012)
	at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:212)
	at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:579)
	at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:566)
	at com.pythonorc.SimplifiedOrcReader.iterator(SimplifiedOrcReader.java:70)
	at com.pythonorc.SimplifiedOrcReader.main(SimplifiedOrcReader.java:285)
Disconnected from the target VM, address: '127.0.0.1:32805', transport: 'socket'

@nqbao
Copy link
Owner

nqbao commented May 16, 2017

It gives me some hint, let me try to work on it.

@nqbao
Copy link
Owner

nqbao commented May 16, 2017

I don't have any sample that can produce this error. But if i understand correctly then the bufferSize is set inside the footer of the ORC file. Maybe for some reason, the bufferSize is incorrect in the footer.

Can you help me to checkout this branch add-fetch-filemetainfo, build again and then fetch reader.fileMetaInfo and paste me back the information. A sample output would be

{u'metadataSize': u'250', u'compressionType': u'ZLIB', u'writerVersion': u'1', u'versionLists': u'
[0, 12]', u'bufferSize': u'10000'}

This info will help me to debug further into the problem.

@Vimos
Copy link
Author

Vimos commented May 17, 2017

I used the orc tools and got this

➜  src git:(master) ./orc-metadata ../../../../../python-orc/dt=2017-05-14_os=android_part=000004_0
{ "name": "../../../../../python-orc/dt=2017-05-14_os=android_part=000004_0",
  "type": "struct<log_id:string,mhotelid:string,city_id:string,city_name:string,city_name_en:string,province_id:string,province_name:string,province_name_en:string,m_city_id:string,m_province_id:string,log_type:string,uid:string,source:string,os_version:string,card_number:string,user_name:string,user_ip:string,appid:string,user_agent:string,trace_id:string,latitude:string,longitude:string,carrier:string,userinfo_channel:string,level:string,model:string,brand:string,orderid:string,proxyid:string,caller_attr_channel:string,economic_hotel:string,fast_filter_keywords:string,mhotel_ids:string,return_has_xianfu_hotel:string,return_has_yufu_hotel:string,hotel_brand_id:string,only_limitime_sale:string,facility_ids:string,theme_ids:string,star_rates:string,district_id:string,district_type:string,price_pair:string,payment_methods:string,nearby:string,poi_id:string,region_id:string,check_in:string,check_out:string,id:string,executetime:string,keywords:string,setkeywords:string,setbrandid:string,setstarrates:string,inner_search_type:string,hotel_group_id:string,sorting_method:string,setfilterattr:string,mrankflag:string,mranktype:string,setnearby:string,setfastfilter_attr:string,setpoi_id:string,sethotel_group_id:string,response_mhotelids:array<string>,setprice_pair:string,star_ratessize:string,facility_idssize:string,setdistrict_type:string,settheme_ids:string,setdistrict_id:string,settrace_id:string,pageindex:string,pagesize:string,recreqattrtype:string,ifun:string,crawled_flag:string,geo_type:string,activity_flag:string>",
  "rows": 1149130,
  "stripe count": 3,
  "format": "0.12", "writer version": "original",
  "compression": "zlib", "compression block": 262144,
  "file length": 116043721,
  "content": 116041038, "stripe stats": 3599, "footer": 2549, "postscript": 23,
  "row index stride": 10000,
  "user metadata": {
  },
  "stripes": [
    { "stripe": 0, "rows": 575000,
      "offset": 3, "length": 57272824,
      "index": 72291, "data": 57198680, "footer": 1853
    },
    { "stripe": 1, "rows": 510000,
      "offset": 57272827, "length": 51277701,
      "index": 64624, "data": 51211228, "footer": 1849
    },
    { "stripe": 2, "rows": 64130,
      "offset": 108550528, "length": 7490510,
      "index": 12819, "data": 7475943, "footer": 1748
    }
  ]
}

@Vimos
Copy link
Author

Vimos commented May 17, 2017

Using the new branch, I got this.

In [1]: from orcreader import OrcReader
   ...: reader = OrcReader('dt=2017-05-14_os=android_part=000004_0')
   ...: reader.open()
   ...: 

In [2]: reader.fileMetaInfo
Out[2]: {u'metadataSize': u'3599', u'compressionType': u'ZLIB', u'writerVersion': u'0', u'versionLists': u'[0, 12]', u'bufferSize': u'262144'}

@nqbao
Copy link
Owner

nqbao commented May 17, 2017

Yeah. The reader by default will use the blockSize from the metadata, which is "compression block": 262144

The possible option is to manually override the blockSize. I will work on this later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants