Maybe this is just impossible and I should give up all hope. Or maybe there's a really clever way to do it that I haven't thought of.
Here's two examples of what I've got:
يَبِسَ - يَيْبَسُ (yabisa, yaybasu)[y-b-s][ي-ب-س] (To become dry, stiff, rigid) 20:77 yabasan = dry. يَسَّرَ - يُيَسِّرُ (yassara, yuyassiru)[y-s-r][ي-س-ر] (To facilitate, make it easy) 92:7 nuyassiruhuu = We will ease him.
and
Zu Hülfe! zu Hülfe! Help! Help!
Sonst bin ich verloren! Otherwise I am lost! Zu Hülfe! Zu Hülfe! Help! Help! Sonst bin ich verloren! Otherwise I am lost! Der listigen Schlange zum Opfer erkoren, Selected as offering to the cunning snake, Barmherzigige Götter! Merciful Gods! Schon nahet sie sich, Already it gets closer, Schon nahet sie sich, Already it gets closer,
... it would be really annoying to go through and delete one language in order to further process these lines of text.
One way I was thinking this could be done in NLTK was to split the text into tokens, have some way of knowing the provenance of each token based on a small corpus, and then ask NLTK to 'reconstitute' only the tokens of my choosing. Is this just a wild fantasy?