Page 1 of 1

RVHTML and UTF-8

Posted: Thu May 17, 2007 12:33 pm
by martindholmes
Hi there,

When I import an HTML file (using RVHTML) with a UTF-8 byte-order mark at the beginning (EF BB BF), the BOM is shown in my TRichViewEdit component, like this:



as if it were a sequence of single-byte characters. Any Unicode text is also garbled. The file contains not only the BOM, but also a meta tag specifying the encoding:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Can anyone tell my how to get RVHTML to treat Unicode text correctly?

All help appreciated,
Martin

Posted: Thu May 17, 2007 3:09 pm
by Sergey Tkachenko
RVHTML cannot autodetect UTF-8 encoding. Encoding property must be set to rvhtmleUTF8.
Make sure that you have the latest version of RVHTML - 0.0024

Posted: Thu May 17, 2007 4:46 pm
by martindholmes
Does rvhtml detect any encodings at all, or do I need to parse the file and figure out the encoding before passing it to rvhtml?

Cheers,
Martin

Posted: Thu May 17, 2007 5:47 pm
by Sergey Tkachenko
No, it detects no encoding. It assumes that file is either in UTF-8 or in DEFAULT_CHARSET.