I am attempting to parse a Wiktionary entry to retrieve all english definitions. I am able to retrive all definitions, the problem is that some definitions are in other languages. What I would like to do is somehow retrieve only the HTML block with English definitions. I have found that, in the case that there are other language entries, the header after the english definitions can be retrieved with:
header = (doc/"h2")[3]
So I would like to only search all the elements before this header element. I thought that may be possible with header.preceding_siblings()
, but that does not seem to be working. Any suggestions?