How to convert file encoding
In this post, I will introduce 2 ways to convert file encoding
Method 1: use Linux command iconv
iconv -f sjis -t utf-8 -o <output file> <input file>
<input file> is read in
sjis encoding and re-written to
<output file> in
Here is the explanation of the flag:
To see the list of supported encodings
-l means list.
This method requires you to remember the flag, command name, and obviously, the
iconv command has to be installed.
Method 2: use vim
This method has the following merits:
- Easily accomplished even in Window, once vim is installed
- You can check the encoding interactively in each step
- Actually, vim use
To read a file in specific encoding, namely
vim <file name> :e ++enc=sjis
To write read a file in a specific encoding (e.g.
utf-8), regardless of which encoding used when reading.
:set fenc=utf-8 :w
To check for the list of supported encodings in vim
Deeper explanation on
fenc (fileencoding) option in vim
Firstly, do not confuse with
enc (encoding) option. The
enc option is used internally and does not relate to how vim read/write/interpret file/buffer. Moreover,
enc option is removed in
neovim and its value is fixed as
utf8 in this vim implementation.
++enc is an option for
:w command. And it has nothing to do with the
enc option, except that their names are identical unintentionally.
fenc decides how vim interprets text buffer to display its content in the terminal. Because your text file is stored in hard disk as a buffer of binary character, unless you want to work with the binary character, vim requires an option (with default value) to control how it displays this binary buffer to you.
Let see an example:
There is a binary file
a.txt with its content in hex format as
e3 81 82.
When being open with vim
vim a.txt. Vim uses
utf-8 as the default
fileencoding. It interprets (decodes)
e381 82 in utf-8 encoding and displays
fenc option is changed to
sjis-8, vim tries converting the buffer's content such that when new content is decoded with
sjis-8 encoding, it should not change how the being displayed character
あ. The buffer's content is changed to
82 a0, and is marked as changed, which suggests that you need to
:write to store the converted buffer.
If you re-open the converted file (
c2 82), by default it is decoded in
utf-8, which does not include these binary characters. Thus, the text is crashed and displayed as
<82>.. To correctly read the text, you must specify
++enc option via
Note: when you try the above example. Vim usually appends
a0 character at the end of the file. This option can be disabled by executing
:set binary and