Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate avro "not a data file" #10

Open
ty-n-42 opened this issue Jan 18, 2015 · 3 comments
Open

generate avro "not a data file" #10

ty-n-42 opened this issue Jan 18, 2015 · 3 comments

Comments

@ty-n-42
Copy link

ty-n-42 commented Jan 18, 2015

Hi,
This seems like a great tool. Unfortunately I'm a newb and when I create avsc and avro files from xsd and xml files I run into trouble trying to use the avro file.

In Hive and using the command line avro-tools.jar I get the error message "not a data file". Also when I view the avro file content in Hue the preview screen renders like a binary file so I don't think it is recognising the avro file either.

Is there something I need to do with the .avro files xml-arvo creates before I can use them?

Here's the exception from avro-tools:

java -jar ~/Downloads/avro-tools-1.7.7.jar totext ./test.avro -
Exception in thread "main" java.io.IOException: Not a data file.
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
    at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
    at org.apache.avro.tool.ToTextTool.run(ToTextTool.java:67)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)

update:
I managed to get the code up and running in NetBeans and changed the last part of Converter.main() to see if I could get some textual output:

try (OutputStream stream = new FileOutputStream(opts.avroFile)) {
            DatumWriter<Object> datumWriter = new SpecificDatumWriter<>(schema);
            //datumWriter.write(datum, EncoderFactory.get().directBinaryEncoder(stream, null));
            datumWriter.write(datum, EncoderFactory.get().jsonEncoder(schema, stream));

This produced the content I was expecting. Any suggestion on what may be happening with directBinaryEncoder is appreciated.

Thanks

@OzLe
Copy link

OzLe commented Jul 31, 2016

Your fix is not correct. it just creates a JSON file. In-order to create a Hadoop working AVRO that contains the Schema replace the last statement to:

try (OutputStream stream = new FileOutputStream(opts.avroFile)) { DataFileWriter<Object> fileWriter = new DataFileWriter<>(datumWriter); fileWriter.create(schema,stream); fileWriter.append(datum);

This will work.

@rajabhathor
Copy link

I have the same issue! Hive loads the data fine but complains its not a data file...
And I'm working for a marquee HDP client and would imagine this gets some attention ...
And I don't feel comfortable messing around with the code for obvious reasons
Any assistance is appreciated!!!
Raj

@GeethanadhP
Copy link

@OzLe's code is working, i have tested that on hive as well..
I have an updated code available in the fork with some fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants