In the spirit of SO I have figured out what I think is the best answer and am going to post it myself.
import lxml
from lxml import html
testFile=open(r'c:\temp\testlxml.htm').read()
aTree=html.fromstring(testFile)
bolds=aTree.cssselect('b')
theTitles=[item.text for item in bolds if item.text if 'KEY' in item.text]
theBoldKeys=[item for item in bolds if item.text if 'KEY' in item.text]
theFullList=[]
for e in aTree.iter():
theFullList.append(e)
for numb,item in enumerate(theFullList):
if item==theBoldItems[0]:
first=numb
if item==theBoldItems[1]:
second=numb
theText=[]
for item in theFullList[first:second]:
if item.text:
theText.append(item.text)
if item.tail:
theText.append(item.tail)
aString=' '.join(theText)
A little bit of explanation.
My goal is to apply some logic to the bolded parts of the documents as those bolded sections that have the word KEY in them define different sections of the document. TheTitles is a list of the bolded elements that have the word 'KEY' included. Based on my particular needs I might want all of the text between any two items from theTitles, I can create tests and the necessary logic to select items from theTitles.
theBoldItems is a list of the actual elements, for any i theTitles[i]==theBoldItems[i].text
next I get theFullList which is all of the htm elements in the tree. Because LXML builds the tree in order I know that I want to capture all of the elements theBoldItems[i] and theBoldItems[i+1]. And the nice thing is that the way Python is built the test is that easy.
I can now get the text for all of those things and while I still need to clean it up some I have successfully ripped out all of the text between any two items I might want.