Hi,
Is there any available tool in the wild that allows: 1) to figure out the encoding type of a file(e.g, UTF-8, Big5....etc) 2) to convert the encoding type. (EX. Big5 -> UTF-8)
Thanks in advance
Hi,
Is there any available tool in the wild that allows: 1) to figure out the encoding type of a file(e.g, UTF-8, Big5....etc) 2) to convert the encoding type. (EX. Big5 -> UTF-8)
Thanks in advance
You cannot reliably figure out file encodings because many byte sequences are valid in more than one encoding. As an extreme example, nearly every byte sequence is valid in all fixed-width 8-bit encodings like the ISO-8859 family. Unless you can understand the text you cannot distinguish between those encodings. Apart from that, UTF-8 and UTF-16 are easy to identify, and the heuristic built into the file
tool seems to be quite impressive. Once you have identified the encoding, converting is easy. The standard conversion tool on Unix-like systems is called iconv
.
I think this belongs on SuperUser.
I'm not sure here what you aiming at with your second question but in my experience FFmpeg is a great tool to recognize all kind of media formats. Just feed it the file - with ffmpeg -i file
- you would like to be recognized and it reports back what it knows. For example:
% ffmpeg -i image.png
FFmpeg version SVN-r21627, Copyright (c) 2000-2010 Fabrice Bellard, et al.
built on Feb 3 2010 21:28:15 with gcc 4.2.1 (Apple Inc. build 5646) (dot 1)
configuration: --prefix=/usr/local --enable-gpl --enable-nonfree --enable-shared --enable-postproc --enable-avfilter --enable-avfilter-lavf --enable-pthreads --enable-x11grab --enable-bzlib --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-zlib --enable-libfaac --enable-libfaad
libavutil 50. 8. 0 / 50. 8. 0
libavcodec 52.52. 0 / 52.52. 0
libavformat 52.50. 0 / 52.50. 0
libavdevice 52. 2. 0 / 52. 2. 0
libavfilter 1.17. 0 / 1.17. 0
libswscale 0. 9. 0 / 0. 9. 0
libpostproc 51. 2. 0 / 51. 2. 0
Input #0, image2, from 'Firefox003.png':
Duration: 00:00:00.04, start: 0.000000, bitrate: N/A
Stream #0.0: Video: png, rgb24, 386x319, 25 tbr, 25 tbn, 25 tbc
At least one output file must be specified
As you can see the it is recognized as an image and being a PNG. With some regex you can pick out both. For example while using Java (quoting):
Pattern IMAGE_PATTERN = Pattern.compile("^Input #\\d+?, (image\\d*), from.*?");
And:
Pattern VIDEO_PATTERN = Pattern.compile(".*?\\sVideo: .*?, .*?, ([0-9]+)x([0-9]+).*");
But of course this is only usefull for binaries, I hope it helps.