RTF to HTML, and HTML to RTF
RTF to HTML, and HTML to RTF
I'm currently researching a serverside solution to being able to convert RTF data in our DB to HTML for use in mobile/web applications. The memo may be modified by the user, so I need to be able to convert it back from HTML to RTF so our windows clients handle it correctly. The RTF stored in the database is pretty simple for the most part, bold/italics, maybe font size change, possibly colored. There should be no images embedded.
I came up with a convoluted solution using TWebBrowser, but then found TRichView components. From what I've been reading I think they may be the solution we need. What I'm looking for is some input on which components, methods, demo's or help file information that would help me evaluate this as quickly as possible. I am researching & reading on my own, but turning to you all could speed up the process.
What I need is to be able to do:
- load a RichView from a dataset Tfield that is RTF format.
- Convert that to HTML and use it as a string
- Later, take an HTML string and convert that to RTF
- Load the RTF into a dataset TField
Thank you for any assistance you can provide,
Rich Werning
I came up with a convoluted solution using TWebBrowser, but then found TRichView components. From what I've been reading I think they may be the solution we need. What I'm looking for is some input on which components, methods, demo's or help file information that would help me evaluate this as quickly as possible. I am researching & reading on my own, but turning to you all could speed up the process.
What I need is to be able to do:
- load a RichView from a dataset Tfield that is RTF format.
- Convert that to HTML and use it as a string
- Later, take an HTML string and convert that to RTF
- Load the RTF into a dataset TField
Thank you for any assistance you can provide,
Rich Werning
-
- Site Admin
- Posts: 17564
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Yes, you can do it. But conversion to HTML and back may loose some formatting.
- For loading RTF from db, you can use LoadRTFFromStream
- For saving to HTML, you can use SaveHTMLToStreamEx (or a simplified version, SaveHTMLToStream) to save to TStringStream
- For HTML loading, you need to use additional free components (TrvHtmlViewImporter (recommended) or TrvHtmlImporter), see http://www.trichview.com/resources/
- For RTF saving, you can use SaveRTFToStream
- For loading RTF from db, you can use LoadRTFFromStream
- For saving to HTML, you can use SaveHTMLToStreamEx (or a simplified version, SaveHTMLToStream) to save to TStringStream
- For HTML loading, you need to use additional free components (TrvHtmlViewImporter (recommended) or TrvHtmlImporter), see http://www.trichview.com/resources/
- For RTF saving, you can use SaveRTFToStream
Thanks for the response
Thanks for the prompt response. Eventually this will all be done nonvisual on the server, but for my demo I'm using visual components to help view the results. After multiple trial and errors I have the following code working to load the RichView component and move it as HTML to a Memo component. Does this look correct to you?
Code: Select all
aStream2 := nil;
aStream := nil;
try
Memo1.Clear;
RichView1.clear;
RichView1.DeleteUnusedStyles(True, True, True);
RichView1.RTFReadProperties.TextStyleMode := rvrsAddIfNeeded;
RichView1.RTFReadProperties.ParaStyleMode := rvrsAddIfNeeded;
RichView1.Options := RichView1.Options + [rvoTagsArePChars];
RichView1.RVFOptions := RichView1.RVFOptions + [rvfoSaveTextStyles, rvfoSaveParaStyles];
Value := SqlQuery1.FieldByName('MEMO_TEXT').AsString;
aStream := TStringStream.Create(Value);
aStream.Position := 0;
if not RichView1.LoadRTFFromStream(aStream) then
ShowMessage('failed LoadRTFFromStream');
RichView1.Format;
aStream2 := TStringStream.create;
RichView1.SaveHTMLToStream(aStream2, 'c:\', 'RtfToHtml','RtfToHtml',[]);
aStream2.Position := 0;
Memo1.Text := aStream2.ReadString(aStream2.Size);
finally
aStream.free;
aStream2.Free;
end;
-
- Site Admin
- Posts: 17564
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Yes, this code looks correct.
Notes:
1) If you just need to convert RTF to HTML, and do not need displaying, you can skip calling Format. Format is quite slow, and the only saving procedures requiring it are saving RTF and DocX containing tables.
2) I recommend to include the options [rvsoUTF8, rvsoUseCheckpointsNames] in HTML saving
3) SaveHTMLToStream generates a basic HTML. SaveHTMLToStreamEx produces better result.
Notes:
1) If you just need to convert RTF to HTML, and do not need displaying, you can skip calling Format. Format is quite slow, and the only saving procedures requiring it are saving RTF and DocX containing tables.
2) I recommend to include the options [rvsoUTF8, rvsoUseCheckpointsNames] in HTML saving
3) SaveHTMLToStream generates a basic HTML. SaveHTMLToStreamEx produces better result.
HTMLViewImport
I can't seem to get the HtmlViewImport working non-visual to convert the HTML back to RTF. I get a message "Control '' has no parent window'" when I try to LoadFromString. As this is going to be done in a Windows Service there is no main form or parent control. I'll try the other HTML converter (RvHtmlImporter) tomorrow, but wanted to ask and see if you had any suggestions on how to proceed with this one.
Thank you again
Rich
Thank you again
Rich
Code: Select all
procedure SetupRichView(Sender: TRichView);
begin
// Set up the RichView component
Sender.clear;
Sender.DeleteUnusedStyles(True, True, True);
Sender.RTFReadProperties.TextStyleMode := rvrsAddIfNeeded;
Sender.RTFReadProperties.ParaStyleMode := rvrsAddIfNeeded;
Sender.Options := Sender.Options + [rvoTagsArePChars];
Sender.RVFOptions := Sender.RVFOptions + [rvfoSaveTextStyles, rvfoSaveParaStyles];
end;
function HTMLToRTF(value: string): string;
var
Viewer: THTMLViewer;
Importer: TRVHTMLViewImporter;
RichView: TRichView;
Style: TRvStyle;
p1 : integer;
aStream: TStringStream;
begin
Viewer := nil;
aStream := nil;
Importer := nil;
Style := nil;
RichView := TRichView.create(nil);
try
Style := TRvStyle.Create(nil);
RichView.Style := Style;
SetupRichView(RichView); // Standard setup of the RichView component
if value <> '' then
begin
Viewer := THTMLViewer.Create(nil);
Viewer.Visible := False;
// Viewer.Parent := RichView.Parent;
Viewer.DefBackground := clWhite;
Importer := TRVHTMLViewImporter.Create(nil);
Viewer.LoadFromString(value); // <<<< Error occurs here <<<
Importer.ImportHtmlViewer(Viewer, RichView );
aStream := TStringStream.create;
RichView.SaveRTFToStream(aStream, False);
aStream.Position := 0;
result := aStream.ReadString(aStream.Size);
end;
finally
RichView.Free;
Importer.Free;
Viewer.Free;
aStream.Free;
Style.Free;
end;
end;
-
- Site Admin
- Posts: 17564
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Additional help if possible
Thank you once again, I believe I can make it work this way. I am having issues with the RTF format once I do the final convert back from HTML to RTF, the bullet items are not displaying correctly. I am sending a demo application that demonstrates the problem to the gmail address have listed elsewhere.
- Rich
Instead of showing like normal bullets:
They are showing like this:
- Rich
Instead of showing like normal bullets:
Code: Select all
A blank window, known as a form, on which to design the UI for your application.
■ Extensive class libraries with many reusable objects.
■ An Object Inspector for examining and changing object traits.
Code: Select all
A blank window, known as a form, on which to design the UI for your application.
â– Extensive class libraries with many reusable objects.
â– An Object Inspector for examining and changing object traits.
-
- Site Admin
- Posts: 17564
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
I received your email.
I’ll try to explain what’s happening.
Loading RTF
This RTF file actually contains not bullets, but characters of “Wingdings” font followed by a tab characters. LoadRTF loads these characters as they are.
Saving HTML
But it is impossible to save “Wingdings” character (or character of any font with SYMBOL_CHARSET) to HTML. HTML can only contain characters belonging to Unicode standard (currently, only IE can display SYMBOL_CHARSET characters, and only in compatibility mode).
Two symbol fonts are extremely important, because they are the most frequently used symbol fonts: “Wingdings” and “Symbol”. TRichView has a special code for saving these characters to HTML: it converts characters of these fonts to the most similar Unicode characters (or named HTML entities).
So a character from your file is converted to Unicode character $25A0 (“black square”).
Tab characters cannot be saved to HTML as well, so they are converted to several space characters.
Loading HTML
Your HTML have UTF-8 encoding (a good choice)
But you perform some implicit conversions to UnicodeString, and data are corrupted if you do not specify explicitly that they are in Unicode.
Required changes:
1) In TForm1.btnStep2Click(Sender: TObject), aStream must be created as
aStream := TStringStream.create('', TEncoding.UTF8);
2) In TForm1.btnrvHtmlImportClick, the first time aStream must be created as
aStream := TStringStream.Create(mmoHtml.Text, TEncoding.UTF8);
3) In the same procedure, assign importer.Encoding := rvhtmleUTF8;
I’ll try to explain what’s happening.
Loading RTF
This RTF file actually contains not bullets, but characters of “Wingdings” font followed by a tab characters. LoadRTF loads these characters as they are.
Saving HTML
But it is impossible to save “Wingdings” character (or character of any font with SYMBOL_CHARSET) to HTML. HTML can only contain characters belonging to Unicode standard (currently, only IE can display SYMBOL_CHARSET characters, and only in compatibility mode).
Two symbol fonts are extremely important, because they are the most frequently used symbol fonts: “Wingdings” and “Symbol”. TRichView has a special code for saving these characters to HTML: it converts characters of these fonts to the most similar Unicode characters (or named HTML entities).
So a character from your file is converted to Unicode character $25A0 (“black square”).
Tab characters cannot be saved to HTML as well, so they are converted to several space characters.
Loading HTML
Your HTML have UTF-8 encoding (a good choice)
But you perform some implicit conversions to UnicodeString, and data are corrupted if you do not specify explicitly that they are in Unicode.
Required changes:
1) In TForm1.btnStep2Click(Sender: TObject), aStream must be created as
aStream := TStringStream.create('', TEncoding.UTF8);
2) In TForm1.btnrvHtmlImportClick, the first time aStream must be created as
aStream := TStringStream.Create(mmoHtml.Text, TEncoding.UTF8);
3) In the same procedure, assign importer.Encoding := rvhtmleUTF8;