It isn't a whole answer, but...when I copy'n'paste the Unicode characters "चौरेउत्तमयादव " and then use a couple of tools to analyze what's there, I do not see any spaces:
echo "चौरेउत्तमयादव " | odx
This produces a hex dump of the data; there's a blank at the end, but none in the middle.
0x0000: E0 A4 9A E0 A5 8C E0 A4 B0 E0 A5 87 E0 A4 89 E0 ................
0x0010: A4 A4 E0 A5 8D E0 A4 A4 E0 A4 AE E0 A4 AF E0 A4 ................
0x0020: BE E0 A4 A6 E0 A4 B5 20 0A ....... .
0x0029:
And the second command decodes UTF-8 data:
echo "चौरेउत्तमयादव " | utf8-unicode
It produces:
0xE0 0xA4 0x9A = U+091A
0xE0 0xA5 0x8C = U+094C
0xE0 0xA4 0xB0 = U+0930
0xE0 0xA5 0x87 = U+0947
0xE0 0xA4 0x89 = U+0909
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA5 0x8D = U+094D
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA4 0xAE = U+092E
0xE0 0xA4 0xAF = U+092F
0xE0 0xA4 0xBE = U+093E
0xE0 0xA4 0xA6 = U+0926
0xE0 0xA4 0xB5 = U+0935
0x20 = U+0020
0x0A = U+000A
So, it seems that your problem might be with the input to 'toEscapedUnicode' rather than with its output.
Also, it seems that what I copy'n'paste from the question doesn't match what you say is in the string:
Yours Mine
\u0938 U+091A
\u0941 U+094C
\u0916 U+0930
\u091A U+0947
\u0948 U+0909
\u0928 U+0924
\u093E U+094D
\u0928 U+0924
\u0940 U+092E
\u0020
\u0930 U+092F
\u0940 U+093E
\u091D U+0926
\u0941 U+0935
\u092E
\u0932
\u0020
\u091C
\u093F
\u0935
\u0924
So, the pasted text does not match the claimed translation for other reasons too.
I believe that the Unicode string you specify should look like:
सुखचैनानी रीझुमल जिवतराम
I used a file containing the values you claimed, minus the \u
prefixes and with 0020 in place of the blanks:
0938
0941
0916
091A
0948
0928
093E
0928
0940
0020
0930
0940
091D
0941
092E
0932
0020
091C
093F
0935
0924
0930
093E
092E
And then I used this pure home-brew Perl script to generate the UTF-8 string I propose as the equivalent of your escaped Unicode string. I'm sure there are mechanisms available in Perl to do it otherwise (using Unicode-related modules), but this worked for me. It would be less verbose if I didn't leave the debug code in there):
#!/bin/perl -w
use strict;
use constant debug => 0;
while (<>)
{
chomp;
my $i = hex;
printf STDERR "0x%04X = %4d\n", $i, $i if debug;
if ($i < 0x100)
{
# 1-byte UTF-8
printf STDERR " 0x%02X (%3d)\n", $i, $i if debug;
printf "%c", $i;
}
elsif ($i < 0x800)
{
# 2-byte UTF-8
my($b1) = 0xC0 | (($i >> 6) & 0xFF);
my($b2) = 0x80 | ($i & 0x3F);
printf STDERR " 0x%02X (%3d)\n", $b1, $b1 if debug;
printf STDERR " 0x%02X (%3d)\n", $b2, $b2 if debug;
printf "%c%c", $b1, $b2;
}
elsif ($i < 0x10000)
{
# 3-byte UTF-8
my($b1) = 0xE0 | (($i >> 12) & 0xFF);
my($b2) = 0x80 | (($i >> 6) & 0x3F);
my($b3) = 0x80 | ( $i & 0x3F);
printf STDERR " 0x%02X (%3d)\n", $b1, $b1 if debug;
printf STDERR " 0x%02X (%3d)\n", $b2, $b2 if debug;
printf STDERR " 0x%02X (%3d)\n", $b3, $b3 if debug;
printf "%c%c%c", $b1, $b2, $b3;
}
else
{
# 4-byte UTF-8 or error
die "Oh bother!";
}
}
print "\n";
You can fill in the 4-byte UTF-8 and error handling stuff. I don't diagnose invalid UTF-8 sequences (notably the UTF-16 surrogates), so if you put bogus Unicode data points in, you will get bogus UTF-8 values out of the script. If you need to know more about that, read Chapter 3 of the Unicode book (available for download - as a chapter - from Unicode.org) or the FAQ - UTF-8, UTF-16, UTF-32 and BOM.