Implement xlsxioread_sheet_next_cell_struct with cell types #47

Paxa · 2019-03-13T20:40:51Z

I took suggested implementation from #12 and add new method to keep existing API same

Fixes #12

Now I found similar fork matejd@a0b6a0b

Travis status: (https://travis-ci.org/Paxa/xlsxio)

How to use xlsxioread_sheet_next_cell_struct:

  xlsxioreader xlsxioread;

  if ((xlsxioread = xlsxioread_open(filename)) == NULL) {
    fprintf(stderr, "Error opening .xlsx file\n");
    return 1;
  }

  xlsxioread_cell value;
  const char *typeNames[] = { "none", "value", "boolean", "string", "date" };

  printf("Contents of first sheet:\n");
  xlsxioreadersheet sheet = xlsxioread_sheet_open(xlsxioread, NULL, XLSXIOREAD_SKIP_EMPTY_ROWS);
  while (xlsxioread_sheet_next_row(sheet)) {
    while ((value = xlsxioread_sheet_next_cell_struct(sheet)) != NULL) {
      /*
        value is a struct:
            size_t row_num;
            size_t col_num;
            char* data;
            cell_type_enum cell_type;
            char* number_fmt; - only for numbers and dates
      */
      printf("row_num = %zu \n", value->col_num);
      printf("col_num = %zu \n", value->row_num);
      printf("type = %s\n", typeNames[value->cell_type]);
      if (value->number_fmt) {
        printf("number_format = %s value = %s\n", value->number_fmt);
      }
      XML_Char_printf(X("value = %s\n"), value->data);
      free(value);
    }
    printf("\n");
  }
  xlsxioread_sheet_close(sheet);
  xlsxioread_close(xlsxioread);

How to use xlsxioread_debug_internals:
it prints internal structures with extracted number formats (just for debugging)

  xlsxioreader xlsxioread;

  if ((xlsxioread = xlsxioread_open(filename)) == NULL) {
    fprintf(stderr, "Error opening .xlsx file\n");
    return 1;
  }
  xlsxioread_debug_internals(xlsxioread);
  xlsxioread_close(xlsxioread);

Output:

XLSXIO_READ DEBUG - test_dates.xlsx
number_formats:
  number_format 0 - 0
  number_format 1 - 164
  number_format 2 - 165
date_formats:
  date_format id = 164	 isDate = 0	 fmt = #,##0.00
  date_format id = 165	 isDate = 1	 fmt = [$-409]m/d/yy h:mm AM/PM;@

(First commit message, yay!)

Move structs to the top Add xlsxioread_debug_internals(xlsxioreader) Add xlsxio_read_struct in xlsxio_read.h (now its with alias and without)

Paxa · 2019-03-15T09:50:03Z

Do you know how to install specify minizip path in windows?
Travis builds works only for windows and mac now

…igurable

brechtsanders · 2019-03-16T08:19:58Z

You can tell XLSX I/O buid to use a specific Minizip install location with the CMake parameter -DMINIZIP_DIR:PATH=

I have started reviewing some of your code, and already have several remarks:

You added a dependancy on uthash, which uses a BSD revised license. I believe this is not entirely compatible with the MIT license (but I still have to check if this is really incompatible)
You included files from uthash, instead of using it as a library, but I could fix that if your code gets accepted (I already have static and shared libraries built for Windows with MinGW)
There are several unshielded malloc() calls, causing possible vulnerabilities/unstabilities which I am not willing to introduce

Paxa · 2019-03-16T08:45:04Z

I not sure about licenses, will it be better if we download utarray.h and uthash.h in a make file?
utarray.h and uthash.h is only header files with macros, not sure if we can compile it as a library
Do you have any suggestions for alternatives?

I don't think we gonna have many styles, so performance not so critical.
And in our use case we just need append and get, may be implement own linked list for that

Paxa · 2019-03-16T08:46:20Z

What should we do with malloc? check if (res == NULL) { return NULL }?

Paxa · 2019-03-16T08:49:36Z

I deleted files inside build folder, is it ok? (I don't know what those files was for)
btw I made all these changes to create ruby library https://github.com/Paxa/fast_excel_reader
according to my benchmark already got 9-15 times improvement

brechtsanders · 2019-03-16T12:46:48Z

I usually do:

if ((s = (char*)malloc(wantedsize)) == NULL)
  return NULL; //or some error code

that way you don't assume malloc worked causing the application to crash or even to allow hacker to exploit this.

The build folder contains the Code::Blocks project files, which I use to develop and debug on Windows.
Please don't delete it.

uthash is a package in most package systems (e.g. uthash-dev apt package on Debian Linux).
I would just add a CMake parameter to locate it for systems that don't have it in their package manager (like Windows).
But can you tell me why exactly we need this? I'm not keen to add dependancies...

Question:
Do your changes look at the type on cell-level, or also to colomn or row settings in case it's not defined on cell level?

Paxa · 2019-03-16T13:22:02Z

Will try to make malloc as suggested and bring back build/ folder

But can you tell me why exactly we need this? I'm not keen to add dependancies...

it was in code uploaded in #12 and I keep it. It is used to find relations between cell and number format:

Example styles.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <numFmts count="2">
    <numFmt numFmtId="164" formatCode="#,##0.00"/>
    <numFmt numFmtId="165" formatCode="[$-409]m/d/yy h:mm AM/PM;@"/>
  </numFmts>
  ...
  <cellStyleXfs count="1">
    <xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
  </cellStyleXfs>
  <cellXfs count="3">
    <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
    <xf numFmtId="164" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
    <xf numFmtId="165" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
  </cellXfs>
  <cellStyles count="1">
    <cellStyle name="Normal" xfId="0" builtinId="0"/>
  </cellStyles>
  <dxfs count="0"/>
  <tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
</styleSheet>

Example sheet.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <dimension ref="A1:G5"/>
  ...
  <cols>
    <col min="2" max="2" width="20.7109375" style="1" customWidth="1"/>
    <col min="3" max="3" width="20.7109375" style="2" customWidth="1"/>
  </cols>
  <sheetData>
    <row r="1" spans="1:7">
      <c r="A1" t="s">
        <v>0</v>
      </c>
      <c r="B1" s="1">
        <v>1234</v>
      </c>
      <c r="C1" s="2">
        <v>43537.51777806081</v>
      </c>
      <c r="D1">
        <v>1234</v>
      </c>
      <c r="E1" t="b">
        <v>1</v>
      </c>
      <c r="F1">
        <f>(B1 * 2)</f>
        <v>0</v>
      </c>
      <c r="G1" t="s">
        <v>1</v>
      </c>
    </row>
    <row r="5" s="2" customFormat="1" ht="20" customHeight="1"/>
  </sheetData>
  <hyperlinks>
    <hyperlink ref="G1" r:id="rId1"/>
  </hyperlinks>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>

In example files

<c r="C1" s="2"> means cell have style = 2
style number 2 is <xf numFmtId="165" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>, it reference to number format 165
which is <numFmt numFmtId="165" formatCode="[$-409]m/d/yy h:mm AM/PM;@"/>

From format code we can recognize it's a date, not just number. This code also consider builtin number formats 14 - 22.

If <numFmts ...> and <cellXfs ...> will always have count attribute then we can read it and allocate simple array of int for style-format relations and struct for number formats, or event array of style_id->number format struct (should be fast and simple).

For now only on cell-level, I tried to create file with row format and column format, but when extract xml files, and each cell will have style reference e.g s="2"
If you have example of file with column format or row format - I will try to make it work

Btw where do you find format specification?

This reverts commit b708ea6.

brechtsanders · 2019-03-16T13:52:49Z

Styles are numbered. That's why I was wondering why hashing was used. A (sparse) array structure would do for this.
But I guess it will do for now. Worst case I witch the license from MIT to BSD.

How much of the API is modified? I'm assuming your changes are not very backwards compatible?

Paxa · 2019-03-16T16:15:01Z

I changed code to use arrays of structs, but not sure if I implemented it correctly, especially memory allocation and memory releasing. So not using utarray and uthash anymore

How much of the API is modified? I'm assuming your changes are not very backwards compatible?

Should be totally backwards compatible, changed xlsxioread_sheet_next_cell to xlsxioread_sheet_next_cell_struct and added xlsxioread_sheet_next_cell that returns just string

What is your preferences for variable names? camelCase or snake_case? e.g. handle->numberFormats or handle->number_formats?

brechtsanders · 2019-03-16T16:26:33Z

Wow, you're quick!
I avoid camelCase in pure C. I stick with lower case and use underscore to separate things that are separate (e.g. get_numberformat).

Paxa · 2019-03-16T17:31:39Z

lower case and use underscore to separate things that are separate (e.g. get_numberformat).

done

Paxa · 2019-03-16T17:32:13Z

may be you know how to install minizip on windows? or is windows build is important at all?

brechtsanders · 2019-03-16T22:37:34Z

For me MinGW in Windows is my primary development platform, but I also make sure it builds on Debian Linux and macOS.
MiniZip 1.2.8 (downloaded from http://www.gaia-gis.it/gaia-sins/dataseltzer-sources/) builds fine on Windows with MinGW using CMake but for me it needed this patch to avoid WinRT specifics:

patch -ulbf iowin32.c << EOF
--- iowin32.c  2013-04-29 00:57:12 +0200
+++ iowin32.c  2014-08-09 23:11:20 +0200
@@ -28,3 +28,3 @@

-#if defined(WINAPI_FAMILY_PARTITION) && (!(defined(IOWIN32_USING_WINRT_API)))
+#if defined(WINAPI_FAMILY_PARTITION) && (!(defined(IOWIN32_USING_WINRT_API))) && !defined(__MINGW32__)
 #if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_APP)
EOF

If you need binaries I could send them to you if you like.

AlexanderBurtasov · 2019-03-25T15:59:17Z

Hi, all.
(Excuse my poorly-bad English).
I'm trying to use this PR in my project and facing with problem: Date-format cells is recognised as "none", not "date".

Paxa · 2019-03-25T16:49:35Z

Do you have example of a file?

AlexanderBurtasov · 2019-03-26T07:10:09Z

mixed.xlsx

Paxa · 2019-03-26T18:04:56Z

Thank you @AlexanderBurtasov for founding a bug. I fixed it

AlexanderBurtasov · 2019-03-27T08:34:30Z

17596_reduced.xlsx
Hi again.
PR completely falls while loading this file. Falls while calling xlsxioread_open.

Paxa · 2019-03-27T09:25:31Z

@AlexanderBurtasov fixed

Paxa added 12 commits March 14, 2019 04:36

Implement xlsxioread_sheet_next_cell_struct with cell types

20e2c9d

(First commit message, yay!)

Store number & date formats inside reader handle

93085f5

Move structs to the top Add xlsxioread_debug_internals(xlsxioreader) Add xlsxio_read_struct in xlsxio_read.h (now its with alias and without)

Cleanup code

c8a731f

Better formatting for xlsxioread_debug_internals

d88307e

Try with travis

b1135f8

Try with travis

a5f0064

Remove files generated by cmake

b708ea6

Keep trying with travis

e32a702

Keep trying with travis

ed9b88e

Keep trying with travis

a7589dc

Keep trying with travis

49de0ef

Travis: Allow windows to fail

c8088d2

Update Makefile: make XLSXIOREAD_LDFLAGS and XLSXIOREADW_LDFLAGS conf…

426d61f

…igurable

Paxa added 3 commits March 16, 2019 21:35

Try to fix windows build

909fafc

Revert "Remove files generated by cmake"

68f19c5

This reverts commit b708ea6.

Try to fix windows build on travis

f99971a

Paxa added 2 commits March 16, 2019 22:02

Try to fix windows build on travis

5618ee0

Remove utarray and uthash, use arrays if structs

05f4b9b

Paxa added 2 commits March 17, 2019 01:14

Try to fix build on windows

574af78

Convert camelCase to snake_case

99162be

Fix detecting format

43402c5

Fix processing <numFmts>

74a030b

kynx mentioned this pull request Jun 7, 2021

Get type of column when reading row? viest/php-ext-xlswriter#369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement xlsxioread_sheet_next_cell_struct with cell types #47

Implement xlsxioread_sheet_next_cell_struct with cell types #47

Paxa commented Mar 13, 2019 •

edited

Loading

Paxa commented Mar 15, 2019

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019

Paxa commented Mar 16, 2019

Paxa commented Mar 16, 2019

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019 •

edited

Loading

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019

Paxa commented Mar 16, 2019

brechtsanders commented Mar 16, 2019

AlexanderBurtasov commented Mar 25, 2019

Paxa commented Mar 25, 2019

AlexanderBurtasov commented Mar 26, 2019

Paxa commented Mar 26, 2019

AlexanderBurtasov commented Mar 27, 2019 •

edited

Loading

Paxa commented Mar 27, 2019

Implement xlsxioread_sheet_next_cell_struct with cell types #47

Are you sure you want to change the base?

Implement xlsxioread_sheet_next_cell_struct with cell types #47

Conversation

Paxa commented Mar 13, 2019 • edited Loading

Paxa commented Mar 15, 2019

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019

Paxa commented Mar 16, 2019

Paxa commented Mar 16, 2019

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019 • edited Loading

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019

brechtsanders commented Mar 16, 2019

Paxa commented Mar 16, 2019

Paxa commented Mar 16, 2019

brechtsanders commented Mar 16, 2019

AlexanderBurtasov commented Mar 25, 2019

Paxa commented Mar 25, 2019

AlexanderBurtasov commented Mar 26, 2019

Paxa commented Mar 26, 2019

AlexanderBurtasov commented Mar 27, 2019 • edited Loading

Paxa commented Mar 27, 2019

Paxa commented Mar 13, 2019 •

edited

Loading

Paxa commented Mar 16, 2019 •

edited

Loading

AlexanderBurtasov commented Mar 27, 2019 •

edited

Loading