I'm working on a mysterious bug in the usually very good open source project Excel Data Reader. It's skipping values reading from my particular OpenXML .xlsx spreadsheet.
The problem is occurring in the ReadSheetRow method (demonstration code below). The source XML is saved by Excel and contains no whitespace which is when the strange behaviour occurs. However XML that has been reformatted with whitespace (e.g. in Visual Studio go to Edit, Advanced, Format Document) works completely fine!
Test data with whitespace:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheetData>
<row r="5" spans="1:73" s="7" customFormat="1">
<c r="B5" s="12">
<v>39844</v>
</c>
<c r="C5" s="8"/>
<c r="D5" s="8"/>
<c r="E5" s="8"/>
<c r="F5" s="8"/>
<c r="G5" s="8"/>
<c r="H5" s="12">
<v>39872</v>
</c>
<c r="I5" s="8"/>
<c r="J5" s="8"/>
<c r="K5" s="8"/>
<c r="L5" s="8"/>
<c r="M5" s="8"/>
<c r="N5" s="12">
<v>39903</v>
</c>
</row>
</sheetData>
</worksheet>
Test data without whitespace:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><sheetData><row r="5" spans="1:73" s="7" customFormat="1"><c r="B5" s="12"><v>39844</v></c><c r="C5" s="8"/><c r="D5" s="8"/><c r="E5" s="8"/><c r="F5" s="8"/><c r="G5" s="8"/><c r="H5" s="12"><v>39872</v></c><c r="I5" s="8"/><c r="J5" s="8"/><c r="K5" s="8"/><c r="L5" s="8"/><c r="M5" s="8"/><c r="N5" s="12"><v>39903</v></c></row></sheetData></worksheet>
Example code that demonstrates the problem:
Note that A is output after _xmlReader.Read()
, B after ReadToDescendant
, and C after ReadElementContentAsObject
.
while (reader.Read())
{
if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*A* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
if (reader.NodeType == XmlNodeType.Element && reader.Name == "c")
{
string a_s = reader.GetAttribute("s");
string a_t = reader.GetAttribute("t");
string a_r = reader.GetAttribute("r");
bool matchingDescendantFound = reader.ReadToDescendant("v");
if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*B* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
object o = reader.ReadElementContentAsObject();
if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*C* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
}
}
Test results for XML with whitespace:
*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"' *A* NodeType: Element, Name: 'worksheet', Empty: False, Value: '' *A* NodeType: Element, Name: 'sheetData', Empty: False, Value: '' *A* NodeType: Element, Name: 'row', Empty: False, Value: '' *A* NodeType: Element, Name: 'c', Empty: False, Value: '' *B* NodeType: Element, Name: 'v', Empty: False, Value: '' *A* NodeType: EndElement, Name: 'c', Empty: False, Value: '' *A* NodeType: Element, Name: 'c', Empty: True, Value: '' *B* NodeType: Element, Name: 'c', Empty: True, Value: '' ...
Test results for XML without whitespace:
*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"' *A* NodeType: Element, Name: 'worksheet', Empty: False, Value: '' *A* NodeType: Element, Name: 'sheetData', Empty: False, Value: '' *A* NodeType: Element, Name: 'row', Empty: False, Value: '' *A* NodeType: Element, Name: 'c', Empty: False, Value: '' *B* NodeType: Element, Name: 'v', Empty: False, Value: '' *C* NodeType: EndElement, Name: 'c', Empty: False, Value: '' *A* NodeType: Element, Name: 'c', Empty: True, Value: '' *B* NodeType: Element, Name: 'c', Empty: True, Value: '' ...
The pattern changes indicate an issue in ReadElementContentAsObject
or possibly the location that ReadToDescendant
moves the XmlReader to.
Does anyone know what might be happening here?