Microsoft Excel Workbooks (xls and xlsx) are file formats in which one or more WorkSheets are stored. There are a number of difficulties in reading data from Excel spreadsheets and populating relational database tables. These include:
- Cells in worksheets are formatted according to the Microsoft Formatting Language
- Cells do not have to appear as a contiguous block.
- Rows / Columns can be hidden.
- Cells can contain formulas
Excel files are read using the NPOI library. Support for number formats is added by the ExcelNumberFormat library.
Data read from the file is converted to a string (textual) representation and then treated exactly like other untyped inputs (See Type Determination)
The following behaviours should be expected when reading from the above example Excel Spreadsheet.
Rule | Behaviour | Example Range |
---|---|---|
Column Headers | Headers are determined by the cells in the first row which has any data on it. | B2-G2 |
Blank Rows | If a row has no data in it for any header it will be ignored. | Rows 7 and 9 |
Blank Columns | If a column has no header it is skipped and not added to the data table. | Column A |
Hanging Values | If a value does not appear under a valid header then the cell is ignored and a warning will be issued. | H9 |
In Excel all Date/Time fields are stored only as a number. You can see this number yourself by changing the Cell Formatting to "General". This means that loading dates/times correctly involves parsing the Cell Formatting. This is done by applying the following rules:
Rule | Behaviour | Example Range |
---|---|---|
Date And Time Format | If the Cell Formatting includes both a date component (y) and a time component (h) it will be translated in an unambiguous format e.g. "2001-01-01 10:30:00" | E4 |
Date Only Format | If the Cell Formatting includes a date component (y) but no time component (h) it will be translated into a date only format e.g. "2015-09-18" | E8 |
Time Only Format | If the Cell Formatting includes a time component (h) but no date component (y) it will be translated into a time only format e.g. "15:09:00" | F8 |
Tests confirming the above behaviour are included in the following tests: