automatic UTF-8 detection
Submitted by Tuesday, 8 August, 2006 - 09:14
on
At several places, the devolopers state that one cannot auto-detect UTF-8 files.
That is not true. Other editors can do that reliably. Even Windows Notepad.
Here is what I think they are doing internally:
1. open the file byte-oriented
2. look if there are special UTF chars in the file (e.g. Ã)
if not: display the this way (byte-wise)
if so:
3. re-open (or scan) the file as UTF-8, looking if it loads error-free
(for instance all Ãx character pairs are valid UTF chars)
if so: display the file as UTF-8
if not:
4. re-open the file byte-oriented and display it this way
Of course, there is a chance, somebody would combine weird chars in a way it would accidently look like a valid UTF-8 file. But compare how often such a file would occur and be mis-detected by the proposed algorithm to the the amount of genuine UTF-8 files that get opened in jEdit and are mis-detected by the current auto-detection. This is a dozen against some million.
Looking at entries in this forum, I think this feature is more than due in order for jEdit to keep up-to-date and to continue gaining new users.
That is not true. Other editors can do that reliably. Even Windows Notepad.
Here is what I think they are doing internally:
1. open the file byte-oriented
2. look if there are special UTF chars in the file (e.g. Ã)
if not: display the this way (byte-wise)
if so:
3. re-open (or scan) the file as UTF-8, looking if it loads error-free
(for instance all Ãx character pairs are valid UTF chars)
if so: display the file as UTF-8
if not:
4. re-open the file byte-oriented and display it this way
Of course, there is a chance, somebody would combine weird chars in a way it would accidently look like a valid UTF-8 file. But compare how often such a file would occur and be mis-detected by the proposed algorithm to the the amount of genuine UTF-8 files that get opened in jEdit and are mis-detected by the current auto-detection. This is a dozen against some million.
Looking at entries in this forum, I think this feature is more than due in order for jEdit to keep up-to-date and to continue gaining new users.