Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continued project cannot read xls files #19

Open
bengalo opened this issue Jan 10, 2015 · 12 comments
Open

continued project cannot read xls files #19

bengalo opened this issue Jan 10, 2015 · 12 comments

Comments

@bengalo
Copy link

bengalo commented Jan 10, 2015

You are probably aware that in this project you are using your libxls and the follow-up project by Jan is using xlslib which cannot read xls files at all, right?
While there is no option to read/parse files in Jan's project (yet?), I'd advise to update the README once again to reflect that.
Might save someone (like me) an hour or two ;)

@dhoerl
Copy link
Owner

dhoerl commented Jan 10, 2015

Sorry for the confusion - I tried to answer too fast. For sure I will fix this problem - give me a few days.

@dhoerl dhoerl closed this as completed Jan 11, 2015
@DLGill
Copy link

DLGill commented Jan 12, 2015

I can generate xlxs files instead of xls files if JXLS will read those.
So which project should I use to read either XLS or XLXS files on
Machintosh OSx? Is it the DHxlsreader, or JXLS?

nPowerLogo.gif

IwLogo.gif
Kind Regards,

David Gill
President

IntegrityWare, Inc.(http://www.IntegrityWare.com)
nPower Software (http://www.nPowerSoftware.com/)
[email protected] mailto:[email protected]
Phone: 858-592-8866
Fax: 858-592-8844

On 1/10/2015 9:48 AM, bengalo wrote:

You are probably aware that in this project you are using your libxls
and the follow-up project by Jan is using xlslib which cannot read xls
files at all, right?
While there is no option to read/parse files in Jan's project (yet?),
I'd advise to update the README once again to reflect that.
Might save someone (like me) an hour or two ;)


Reply to this email directly or view it on GitHub
#19.

@dhoerl
Copy link
Owner

dhoerl commented Jan 12, 2015

The code can only read .xls binary files - the xlsx are XML format and totally different. Very sorry for the confusion earlier - I had a DHxlslib project earlier, and stopped supporting it due to Jan taking it over (renaming it and adding more features). DHlibxls to read on Mac/ios, and JXLS to write them.

@dhoerl dhoerl reopened this Jan 12, 2015
@DLGill
Copy link

DLGill commented Jan 12, 2015

Hi David,

So, was I correct to use the DHxlsReader, or should I be using the
JXLS? I took the DHxlsReader out of my project over the weekend, but I
hadn't quite got the JXLS hooked up yet.

By the way, I found that everything else in the DHxlsReader is very high
quality. It is well written code, and works very well (except the one
problem I found).

nPowerLogo.gif

IwLogo.gif
Kind Regards,

David Gill
President

IntegrityWare, Inc.(http://www.IntegrityWare.com)
nPower Software (http://www.nPowerSoftware.com/)
[email protected] mailto:[email protected]
Phone: 858-592-8866
Fax: 858-592-8844

On 1/10/2015 3:09 PM, David Hoerl wrote:

Sorry for the confusion - I tried to answer too fast. For sure I will
fix this problem - give me a few days.


Reply to this email directly or view it on GitHub
#19 (comment).

@dhoerl
Copy link
Owner

dhoerl commented Jan 12, 2015

To add to the confusion is that I named the project 'DHlibxls', but the actual framework is 'DHxlsReader'. You only need JXLS if you want to WRITE Excel files. BTW, did you send me the problematic .xls file yet?

@DLGill
Copy link

DLGill commented Jan 12, 2015

Hi David,

Here is the Excel file. So I will put DHxlsReader back into my
project. By the way, I will continue to try to debug the problem and
send you a fix if I find it. I haven't gotten to that point yet (but I
think I was pretty close). It must have something to do with being an
empty cell inside of a table. The 3B cell in the first empty cell, and
the one that it errors out on.

By the way, I really appreciate your code. You have done a very nice
job with it.

nPowerLogo.gif

IwLogo.gif
Kind Regards,

David Gill
President

IntegrityWare, Inc.(http://www.IntegrityWare.com)
nPower Software (http://www.nPowerSoftware.com/)
[email protected] mailto:[email protected]
Phone: 858-592-8866
Fax: 858-592-8844

On 1/12/2015 12:09 PM, David Hoerl wrote:

To add to the confusion is that I named the project 'DHlibxls', but
the actual framework is 'DHxlsReader'. You only need JXLS if you want
to WRITE Excel files. BTW, did you send me the problematic .xls file yet?


Reply to this email directly or view it on GitHub
#19 (comment).

@DLGill
Copy link

DLGill commented Jan 13, 2015

Hello David,

No worries.

Isn't the .xlxs format just the newer Excel format? I can produce
either, but my version of Excel defaults to the .xlxs format.
DHxlsReader kicks them out as non Excel files, but they are just newer
format.

nPowerLogo.gif

IwLogo.gif
Kind Regards,

David Gill
President

IntegrityWare, Inc.(http://www.IntegrityWare.com)
nPower Software (http://www.nPowerSoftware.com/)
[email protected] mailto:[email protected]
Phone: 858-592-8866
Fax: 858-592-8844

On 1/12/2015 11:47 AM, David Hoerl wrote:

The code can only read .xls binary files - the xlsx are XML format and
totally different. Very sorry for the confusion earlier - I had a
DHxlslib project earlier, and stopped supporting it due to Jan taking
it over (renaming it and adding more features). DHlibxls to read on
Mac/ios, and JXLS to write them.


Reply to this email directly or view it on GitHub
#19 (comment).

@DLGill
Copy link

DLGill commented Feb 26, 2015

Hey Dave,

I just wondered if you have been able to track down the problem with tables.

David Gill

@ghost
Copy link

ghost commented Feb 26, 2015

Unless you can convince me this is a legitimate un-protected file, which all evidence indicates to the contrary, I don't plan to waste my time on it. Numbers refuses to open it, and also saved variants with an additional number added in the column you indicated. I just tried to open it with google documents, and only a few things show up.

This file has serious issues. If you want to compensate me for the time needed to delve into the belly of libxls and all the microsoft oddities, I can do it - but I am not going to waste the few hours of time I can devote to open source on this.

One of the projects contributors has a production app that reads and imports files using the library. From time to time files show up that cannot be opened - they were created in a buggy point release of Excel or some other program. Excel itself has knowledge of all past oddities, is supported by hundreds of paid programmers, and has a compelling reason to open any file.

@DLGill
Copy link

DLGill commented Feb 27, 2015

Can you tell me which protection error it is generating (0x86, 0x63, or 0x2F), and where that error is returned? I will try to track it down. I don't understand how the file can be protected because I saved the file by hand (converting it from a .xlsx file) from within Excel and none of the protection flags are on (it's not read-only, there is no password set, protection is turned off, etc.). Perhaps I have a buggy version of Excel, but that seems odd too as it is a standard, off the shelf purchased version.

@DLGill
Copy link

DLGill commented Feb 27, 2015

It appears from looking at xls_ParseWorkBook (in xls.c) that all of the protection flags are skipped and not processed. Is this correct?

@ghost
Copy link

ghost commented Feb 27, 2015

If you 'down' converted it from an xlsx file, well who knows what Excel does. Unfortunately at this time I only have an old 2002ish version of Excel that only runs on a PowerPC based Mac. My experience has been that when I see protection flags, it means Excel has done something to the data that makes it difficult for libxls to decode. libxls does nothing with the protection flags, but its an indication that the file has been 'protected' in some form by Excel. In fact, we've considered just aborting when we see those flags, to avoid long discussions such as this one.

The way libxls gets data is to decode the file first - to extract out all the tables and such - then as it iterates through all sheets and cells, it uses associations as indicated by the particular data cell to link to the data. In some cases with protection, there is no data - Excel itself takes blocks of 'protected data', and decodes it, then puts it in the 'common' accessible areas to be used. This is possibly one reason for the issue here. Another reason for the missing data may be that in down converting xlsx files, it uses some sophisticated technique that libxls is not aware of.

What I would do to track this down, and something you can do your self, is to turn debugging way up - 10 or higher, then dump the file. It will be easier if you use libxls directly - I have an Xcode Mac project that I use for this, but it could be anything even command line.

You set the debug flag to 10, then dump the file. You can then start adding printf's all over the place in the specific areas you are trying to see, and find out for the particular cell that has no data, where the library is trying to get it from. You can then dump that whole table (linked list, whatever). You can then find the code that creates that table and add printfs there. Use the OpenOffice Excel Format Specification (google it), it helps a lot. You will see that open office also passed over all the protection schemes and objects - that is all are marked as TBD.

Early encryption was very simple and easy to circumvent - its possible other .xls readers have implemented decoding (we passed on doing so years ago, as we didn't want to start down that path).

This library has been around for years, and has wide adoption. There have been several real bugs that appeared, but the frequency has dropped over time. I have actually had success using Numbers to both create .xls files and read them when others like you submit them - to see what they look like before trying to process them with libxls. That neither Numbers nor Google Sheets can effectively read this file is a huge red flag.

If you do find a bug in libxls, or something simple that needs correction so your file can be read, then I'd for sure add it to the source. I just am not in a position currently to burn a day tracking this down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants