I am trying to write a Thunderbird extension which will let you compose a message but it will process the message text before sending it out. So I need access to the plain text content of the email body.
Here is what I have so far, just as some test code in the Extension Developer Javascript console.
var composer = document.getElementById('msgcomposeWindow');
var frame = composer.getElementsByAttribute('id', 'content-frame').item(0);
if(frame.editortype != 'textmail') {
print('Sorry, you are not composing in plain text.');
return;
}
var doc = frame.contentDocument.documentElement;
// XXX: This does not work because newlines are not in the string!
var text = doc.textContent;
print('Message content:');
print(text);
print('');
// Do a TreeWalker through the composition window DOM instead.
var body = doc.getElementsByTagName('body').item(0);
var acceptAllNodes = function(node) { return NodeFilter.FILTER_ACCEPT; };
var walker = document.createTreeWalker(body, NodeFilter.SHOW_TEXT | NodeFilter.SHOW_ELEMENT, { acceptNode: acceptAllNodes }, false);
var lines = [];
var justDidNewline = false;
while(walker.nextNode()) {
if(walker.currentNode.nodeName == '#text') {
lines.push(walker.currentNode.nodeValue);
justDidNewline = false;
}
else if(walker.currentNode.nodeName == 'BR') {
if(justDidNewline)
// This indicates back-to-back newlines in the message text.
lines.push('');
justDidNewline = true;
}
}
for(a in lines) {
print(a + ': ' + lines[a]);
}
I would appreciate any feedback as to whether I'm on the right track. I also have some specific questions:
- Does
doc.textContent
really not have newlines? How stupid is that? I'm hoping it's just a bug with the Javascript console but I suspect not. - Is the TreeWalker correct? I first tried
NodeFilter.SHOW_TEXT
but it did not traverse into the<SPAN>
s which contain the quoted material in a reply. Similarly, it seems funny toFILTER_ACCEPT
every node and then manually cherry-pick it later, but I had the same problem where if I rejected aSPAN
node, the walker would not step inside. - Consecutive
<BR>
s break the naive implementation because there is no#text
node in between them. So I manually detect them and push empty lines on my array. Is it really necessary to do that much manual work to access the message content?