views:

139

answers:

3

Our publishing workflow includes Windows and Linux machines (there are some Macs too, but not in the critical-path workflow). Many texts include both English and Khmer and are marked-up in XML.

XML Copy Editor is the best cross-platform open-source XML editor I've discovered. It utilizes the Scintilla editing component, which is generally good with Unicode but which does not enable non-printing or invisible characters like U+200B (zero-width space) and U+200C (zero-width non-joiner) to be displayed. Khmer does not separate words with a space character as Western languages do, so ZWSP is used in electronic texts to enable applications to break lines easily.

Ideally I'd edit the markup and the content in a single editor, but XML awareness is less important at times than being able to display invisibles. (OpenOffice.org Writer and Microsoft Word are the only two apps I know that will display ZWSP. They are not suitable for the markup and text manipulations that need to be done to prepare manuscripts for publication, unfortunately, although I guess they're fine for authoring.)

I tried out a promising editor last week, but a search-and-replace regex operation that took under a second in TextPad 4.7.3 lasted over twenty seconds. So I want to mention that speed and the ability to handle large (up to 150mb) files is also a concern.

Is there a good, fast, free or not too expensive text editor, with versions on Windows and Linux and maybe mac too, Unicode-aware and capable of displaying invisibles like ZWSP? That has syntax highlighting, can handle large files and is customizable enough that I won't tear my hair out in frustration?

Thanks, Roger_S

A: 

I don't know about ZWSP in particular, but EditPadPro is good, fast, not expensive, has a very good regex engine and is Unicode-aware (and well-suited to editing XML, too). The developer (Jan Goyvaerts) lives in Thailand and knows about requirements for Eastern scripts and languages, so chances are good that it will be able to handle these texts.

Tim Pietzcker
I'll look more closely at this. Thai entered the computer era way ahead of Khmer, and although there are no spaces between Thai words either, the pre-Unicode-era incubation meant electronic texts received no markers of any sort put in between words. (Instead each application used its own line-breaking algorithms.)Khmer came late to the party, which means everybody wanted their text to display properly on the web, so there was little resistance to this additional authoring/production requirement.I hope it's not just aware of ZWSP but is also able to display it.Thanks!Roger_S
Roger_S
I've tried to insert 0x200B into a "normal" XML document, and I didn't see anything, but maybe you need a special font for this. If it doesn't work, ask @Jan Goyvaerts (support@editpadpro) yourself. I'm sure he'll have that implemented soon.
Tim Pietzcker
@Tim: ZWSP is a zero-width space. It is supposed to be invisible and take up no space. It is used to delimit words in (some) languages that don't write spaces between words. Making this visible would require EditPad to substitute another character when displaying text.
Jan Goyvaerts
A: 

EditPad Pro does not (yet) have the ability to visualize non-printable characters other than the ASCII space and tab. Version 6 does recognize ZWSP as a word boundary when doing word wrapping and selecting words by double-clicking or Ctrl+Shift+Left/Right.

What you can do is to search for the regular expression \u200B. Though this doesn't make the zero-width space visible, it will select it and put the cursor after it. You could use the regex \u200B\X and turn on the Highlight button on the search panel to highlight each grapheme after U+200B. You could even use the syntax coloring scheme editor to edit the provided XML scheme to use that regex always highlight each grapheme after U+200B.

EditPad Pro easily handles 150 MB files and has a powerful regex engine (same as used in RegexBuddy and PowerGREP). Maximum file size is 2 GB. Windows only.

Jan Goyvaerts
A: 

I'm using CKEditor , it's cross platform and completly support unicode.

Take a look at it

Nasser Hadjloo