- Fixed bug where multi_extract did not return {:error, msg} for non existent file. Thanks to Peter Sumskas (@brushbox) for contribution.
- Fixed bug where
Xlsxir.get_list/1
was not populating empty cells withnil
propely. - Pattern matching on error cells was widened to include additional use cases. Thanks to Peter Sumskas (@brushbox) for contribution.
- Updated tests, docs and various code styles.
- Fixed bug where
Xlsxir.get_cell/2
raised instead of returningnil
on non-existing cell. Thanks to @ZombieHarvester for contribution. - Various documentation updates.
- Huge parsing performance improvement thanks to Alex Kovalevych's (@AlexKovalevych) contribution.
- Ability to choose between parsing in-memory or on the file system added as well as the ability to stream via the
Xlsxir.stream_list/2
. Thanks to Thibaut Decaudain (@Tricote) for contribution. - Code improved to better handle complex multi-formatted strings. Thanks to Peter Sumskas (@brushbox) for contribution.
- Bug fix to handle additional date format. Thanks to Sudhir Rao (@sudrao) for contribution.
- Fixes for some
xlsx
variants and repeatable stream issues. Thanks to @rhetzler for contribution. - Various error message improvements. Thanks to Craig Lyons (@craiglyons) for contribution.
- Fixed bug that occured when a worksheet was empty. Thanks to Alex Kovalevych (@AlexKovalevych) for contribution.
- Changed
get_cell/1
to returnnil
if the requested cell doesn't exist. Thanks to Peter Sumskas (@brushbox) for contribution.
- Removed
Timex
dependency. Thanks to Paulo Almeida (@pma) for contribution.
- Xlsxir requires Elixir v1.4+ with this update
- Added ability to extract only a given number of rows from a worksheet via
Xlsxir.peek/3
. Thanks to Ali Tahbaz (@tahbaza) for contribution. DateTime
type values are now converted to an ElixirNaive DateTime
type upon extraction. RegularDate
types are still converted to Erlang:calendar.date()
type. Thanks to Ali Tahbaz (@tahbaza) for contribution.- A bug in
convert_char_number/1
was fixed to allow support for floats with scientific notation in them. Thanks to Daniel Parnell (@dparnell) for contribution. - Minor bug fixes and documentation updates.
- Added parsing support for time values. Thanks to Edgar Cabrera (@aleandros) for contribution.
- Fixed bug that prevented worksheet ETS tables from closing. Thanks to Alex Kovalevych (@AlexKovalevych) for contribution.
- Minor documentation updates.
Xlsxir.extract/3
andXlsxir.multi_extract/3
now parse all worksheets of the file given by default, returning a list of tuple results (i.e.[{:ok, table_1_id}, {:ok, table_2_id}, ...]
). See updated docs for more detail. Thanks to Alex Kovalevych (@AlexKovalevych) for contribution.- Fixed bug where the string(s) from merged cells that contained multiple formatting leaked into other cells thereby corrupting other rows of data.
- Sorted cell attribute keys to ensure consistent pattern matching. Thanks to Alex Kovalevych (@AlexKovalevych) for contribution.
- Updated documentation to reflect changes and added additional doc tests.
- Added boolean value support. Thanks to Pikender Sharma (@pikender) for contribution.
- Added support for data type
inlineStr
. Xlsxir.extract/3
andXlsxir.multi_extract/3
now return{:error, reason}
instead of throwing an exception when an invalid file type or worksheet index are provided as arguments.- Changed the way file paths are validated prior to parsing. It no longer matters whether or not the extension is
.xlsx
. As long as it is a valid file, Xlsxir will attempt to parse it. - Refactored
Unzip.delete_dir/1
for simplification. - Minor documentation updates.
- Fixed bug where unnecessary cells with
nil
values were added to worksheets with rows containing data beyond column"Z"
.
- Fixed bug related to parsing a worksheet containing conditional formatting. Thanks to Justin Nauman (@jrnt30) for contribution.
- Fixed bug where row number was erroneously represented as a string (instead of integer) in the ETS table causing
Enum.sort
to not work as expected on larger files. - Minor documentation updates.
- Minor bug fixes.
- Fixed bug where dates in the year 1900 were off by one day due to the fact that Excel erroneously considers the year 1900 a leap year.
- Minor documentation updates.
- Fixed issue where empty cells were skipped. Empty cells will now be represented as
nil
. For example, if cells "A1" = 1, "B1" = 2, and "D1" = 4,Xlsxir.get_list/1
would return[[1, 2, 4]]
. The same situation will now return[[1, 2, nil, 4]]
to account for the fact that cell "C1" was empty. - Minor updates to documentation to reflect change.
- Added ability to parse multiple worksheets via
Xlsxir.multi_extract/3
which returns a unique table identifier for each ETS process created, enabling the user to access parsed data from multiple worksheets simultaneously. - Created an
Xlsxir.TableId
module which controls an agent process that temporarily holds a table identifier during the extraction process. - Refactored
Xlsxir
access functions to work withXlsxir.multi_extract/3
whereby a table identifier is passed through the various functions to specify which ETS process is to be accessed. - Refactored
Xlsxir.SaxParser
,Xlsxir.ParseWorksheet
andXlsxir.Worksheet
modules to support new functionality. - Refactored
Xlsxir.ParseWorksheet
to ignore empty cells. - Updated documentation and tests
- Fixed a few minor bugs that were generating warning messages.
- Removed Ex-Doc and Earmark dependencies from Hex.
- Added Change Log link to Hex.
- Minor doc changes and bug fixes.
- Added
Xlsxir
access functionXlsxir.get_mda/0
that accesses:worksheet
ETS table and returns an indexed map which functions like a multi-dimensional array in other languages.
- Modified the way rows are saved to the
:worksheet
ETS table. Replaced the generic index with the actual row number to allow for performance imporovement of supportingXlsxir
access functions. - Refactored
Xlsxir
access functions to improve performance. - Created
Xlsxir.get_info/1
function which returns number of rows, columns and cells. - Various minor modifications to docs.
Major changes in version 1.0.0 (non-backwards compatible) to improve performance and incorporate new functionality, including:
- Refactored the
Xlsxir.Unzip
module to extract.xlsx
contents to file instead of memory to improve memory usage. The following functions were created to support this functionality:Xlsxir.Unzip.extract_xml_to_file/2
- Extracts necessary files to a./temp
directory for use during the parsing processXlsxir.Unzip.delete_dir/1
- Deletes './temp' directory and all of its contents
- Implemented Simple API for XML (SAX) parsing functionalty via the Erlsom Erlang library to improve performance and allow support for large
.xlsx
files. TheSweetXml
parsing library has been deprecated fromXlsxir
and is no longer utilized in v1.0.0. - Implemented Erlang Term Storage (ETS) for temporary storage of extracted data.
- Replaced
option
argument from the initial extract function (Xlsxir.extract/3
) withtimer
which is a boolean flag that controlsXlsxir.Timer
functionality. Data is no longer returned viaXlsxir.extract/3
and is instead stored in an ETS process. - Implemented various functions for accessing the extracted data:
Xlsxir.get_list/0
- Return entire worksheet data in the form of a list of row listsXlsxir.get_map/0
- Return entire worksheet data in the form of a map of cell names and valuesXlsxir.get_cell/1
- Return value of specified cellXlsxir.get_row/1
- Return values of specified rowXlsxir.get_col/1
- Return values of specified column
- Implemented
Xlsxir.close/0
function to allow the deletion of the ETS process containing extracted worksheet data to free up memory. - Implemented
Xlsxir.Timer
module for tracking elapsed time of extraction process. - Changed cell references from
atoms
tostrings
due to Elixiratom
limitations (i.e.:A1
to"A1"
). - Updated documentation and testing to incorporate changes.
- Minor bug fixes and documentation updates.
- Expanded coverage of Office Open XML standard
numFmt
(Standard Number Format). TheformatCode
for a standardnumFmt
is implied rather than explicitly identified in the XML file. - Implemented support for Office Open XML custom
numFmt
(Custom Number Format) utilizing theformatCode
explicitly identified in the XML file. - Added
Number Styles
documentation covering standard and customnumFmt
and how to manually add an unsupportednumFmt
. - Fixed issue resulting when no strings exist in a worksheet and therefore there is no
sharedStrings.xml
file (:file_not_found
error).
- Fixed issue related to strings that contain special characters. Refactored
Xlsxir.Parse.shared_strings/1
to properly parse strings with special characters.
- Refactored
Xlsxir.Parse
functions to improveextract
performance on larger files. - Expanded documentation and test coverage.
- Completed
Xlsxir.ConvertDate
module.
- Initial draft. Functionality limited to very small Excel worksheets.
Xlsxir.ConvertDate
functionality incomplete.