views:

17

answers:

0

I'm trying to find a workaround to display old and rare characters in unicode using character combining. Currently I'm converting some dictionaries from EPWING into text and there are 36 different characters which cannot be reproduced using normal UTF-8. Below is the problem section of the epwing gaiji to unicode mappings for one of the dictionaries that I am converting, in some areas it has an interesting syntax that is clearly being used to combine characters in different ways. I was hoping if someone could identify what this syntax is, and where I might find documentation or a tutorial on how to use it.

s/<?w=b02a>//g
s/<?w=b04b>/者/g
s/<?w=b064>/<⾱ >/g
s/<?w=b077>/<彳<匕\/匕>>/g
s/<?w=b07c>/<山\/⺀>/g
s/<?w=b12e>//g
s/<?w=b155>/</>/g
s/<?w=b156>/<\/>/g
s/<?w=b157>/<\/\/>/g
s/<?w=b158>/<こ[1]/と|ヿ>/g
s/<?w=b16f>/<㗢>/g
s/<?w=b170>/<㗥>/g
s/<?w=b171>/ଏ/g
s/<?w=b175>/lb/g
s/<?w=b22a>//g
s/<?w=b234>/ff/g
s/<?w=b25e>/㯌/g
s/<?w=b271>/<扌 晉>/g
s/<?w=b36b>//g
s/<?w=b373>//g
s/<?w=b42c>//g
s/<?w=b434>/<已\/大>/g
s/<?w=b438>//g
s/<?w=b43a>//g
s/<?w=b43f>/<㇀/丶>/g
s/<?w=b440>//g
s/<?w=b45a>/<?>/g
s/<?w=b45b>/<|>/g
s/<?w=b53d>/<?>/g
s/<?w=b53e>/<?>/g
s/<?w=b540>/<o>/g
s/<?w=b537>/<ト モ>/g
s/<?w=b541>/<一/>/g
s/<?w=b544>/<?>/g
s/<?w=b546>/<[r45]卐>/g
s/<?w=b55f>/*/g

I know that this line is supposed to represent 彳as a left vertical radical with one 匕 stacked on top of another 匕 as the right vertical portion of the character:

s/<?w=b077>/<彳<匕\/匕>>/g

This one is also pretty obvious, it's a 卐 rotated 45 degrees:

s/<?w=b546>/<[r45]卐>/g

Note: the four character hexadecimal codes that come after the ?w= is an identifier for the epwing gaiji that the unicode is supposed to correspond to.

Thank you for your time.