In the past, the business users had transcribed the excel data that they had one row at a time into the database through a web or Swing UI. We needed something faster this time around. We needed a way to import the excel spreadsheet directly.
This idea had appeal on an intuitive level. The business users could work in the structure they were most experienced at, and we could get fresh data almost as soon as they'd revised it. We knew that there were some data validation issues where values had to be restricted to a list, but we thought that Excel's validation features would cope with that.
I looked into reading Excel in its XML dialect, but quickly discarded the idea after eyeballing the XML Excel generated itself. Instead, I looked for a library which would do the work of conversion for me, and present an easy to use API.
There was only one freely-available Java library that handled Excel spreadsheets, and that was POI. POI is a library written specifically to handle Microsoft Office formats, and their solution for Excel was called HSSF (Horrible SpreadSheet Format).
On the database side of things, we had an advantage in that we were using ATG's persistence solution, the SQL repository (or Data Anywhere Architecture, as marketing calls it). Because the SQL repository can be fed data imported through XML, we could convert the Excel spreadsheet to a file in XML format, and then run startSQLRepository to import data.
The documentation for HSSF was eclectic, but servicable. Once I'd found the document to read an Excel document, I could iterate through the rows, and call row.getCellAt(0) to get the first cell in the row.
Once I'd done this, I copied the data from the row to a JavaBean object, passed a list of those object to the export class, and generated XML from them.
There were a number of issues that popped up in implementing this solution. The first one was that POI is abandonware. The last update was from 2004. Using Excel XP with some features (validation of specific cells, for example) cause NullPointerExceptions when it tries to parse it. The solution here was to not use those features.
Another issue was that the Excel data was linked between different workbooks, but was not linked tightly together. Because all the fields were text, any text that was not specifically validated had to match between workbooks exactly. Identifying data mismatches between workbooks was a time consuming process, and I didn't think of a good way to automate it.
Finally, as new data was added, the columns would be moved around or added to as the business users tweaked it. Sometimes I would be notified of this change, and sometimes I would find out by having the script fail.
Importing data directly from an Excel spreadsheet is a good solution in many circumstances. However, it's probably a good idea to have a 'business' that can be quickly modified, and a 'data' spreadsheet with immutable columns specifically for data export. Given a static schema, it's easy to use and extend, and is much easier to modify than tweaking XML directly.
http://jexcelapi.sourceforge.net/
One is that we need source code to be under version control, and having the VBA code inside the spreadsheet would make it inaccessible.
Second, VBA source code is not well understood by everyone at the client site. It would involve giving up Eclipse, the build system, a bunch of metrics, etc.
Third, we'd need to copy over the VBA code for every spreadsheet.
Fourth, it would bulk up the size of the spreadsheet and make it more difficult to email. Spreadsheets can get hefty.