jEdit Community - Resources for users of the jEdit Text Editor
automatic UTF-8 detection
Submitted by mh147 on Tuesday, 8 August, 2006 - 09:14
At several places, the devolopers state that one cannot auto-detect UTF-8 files.

That is not true. Other editors can do that reliably. Even Windows Notepad.

Here is what I think they are doing internally:
1. open the file byte-oriented
2. look if there are special UTF chars in the file (e.g. Ã)
if not: display the this way (byte-wise)
if so:
3. re-open (or scan) the file as UTF-8, looking if it loads error-free
(for instance all Ãx character pairs are valid UTF chars)
if so: display the file as UTF-8
if not:
4. re-open the file byte-oriented and display it this way

Of course, there is a chance, somebody would combine weird chars in a way it would accidently look like a valid UTF-8 file. But compare how often such a file would occur and be mis-detected by the proposed algorithm to the the amount of genuine UTF-8 files that get opened in jEdit and are mis-detected by the current auto-detection. This is a dozen against some million.

Looking at entries in this forum, I think this feature is more than due in order for jEdit to keep up-to-date and to continue gaining new users.
Comment viewing options
Select your preferred way to display the comments and click 'Save settings' to activate your changes.
errors detecting UTF-8 ... and other encodings
by andrX on Tue, 08/05/2007 - 08:47
Agreed that UTF-8 can be recognized most of the time. But my experience with an editor similar to jEdit is that frequently autodetection chooses an incorrect encoding. Selecting UTF-8 as the default encoding minimizes (but doesn't eliminate) such errors, but results in more errors for other encodings, such as ISO-8859-1.
However, recently I discovered ...

There is a version of UTF-8 which uses a 3-byte signature (EF BB BF in hex) at the beginning of the file to ensure correct detection.
It is called UTF-8Y in jEdit.
It is used by other editors ... also known as UTF-8(BOM) ... so it seems to be a standard.

To change the encoding, click on the status bar at the bottom.

Being new to jEdit, I can't tell you how to convert the encoding of a file.
If the encoding of a file is detected incorrectly on loading, and the specified encoding is corrected, misinterpreted characters are not corrected.
If anyone knows how to change the encoding ... ???
 
Toggle encoding macro
by takeshin on Thu, 19/06/2008 - 08:54
Maybe this macro will be helpful for you:
Toggle encoding jEdit macro

regards,
takeshin
User login
Browse archives
« April 2024  
MoTuWeThFrSaSu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
Poll
Are you interested in language packs for jEdit?
Yes, and I could help maintain translations
26%
Yes, I'd like to have translations
32%
Indifferent
35%
No, that'd be bad (please comment)
7%
Total votes: 1093
Syndication
file   ver   dls
German Localization light   4.4.2.1   82348
Context Free Art (*.cfdg)   0.31   46055
JBuilder scheme   .001   18495
BBEdit scheme   1.0   18116
ColdFusion scheme   1.0   18024
R Edit Mode - extensive version   0.1   17473
Advanced HTML edit mode   1.0   16206
Matlab Edit Mode   1.0   16068
jEdit XP icons   1.0   15229
XP icons for jEdit   1.1   14293