Page 1 of 1

Issues extracting text from RTF with TRichview 20

Posted: Mon Jan 03, 2022 1:43 pm
by Alexander_Dober
Hello,

I've installed the trial version of TRichView 20 and can't compile my project because of some issues with some helper functions, that were created with your help some time ago:

Code: Select all

type
   TRTFConverter = class(TObject)
      private
         FParser: TRVRTFReader;
         FFirstLine: Boolean;
         FRTFText: TRVUnicodeString;
         FStream: TStream;
         procedure DoReaderText(Sender: TRVRTFReader; const Text: TRVAnsiString; Position: TRVRTFPosition);
         procedure DoReaderUnicodeText(Sender: TRVRTFReader; const Text: TRVUnicodeString; Position: TRVRTFPosition);
   	public
         constructor Create;
         destructor Destroy; override;

         function ExtractTextFromRTF(RTFContent: string): string;
	end;

...
...

procedure TRTFConverter.DoReaderText(Sender: TRVRTFReader; const Text: TRVAnsiString;
  Position: TRVRTFPosition);
var
   TextW: TRVUnicodeString;
   CodePage: Cardinal;
begin
   if (Position <> rtf_ts_ContinuePara) and not FFirstLine then begin
      FRTFText := FRTFText + #13#10;
   end;
   FFirstLine := False;
   if Sender.FontTable.Count = 0 then begin
      CodePage := CP_ACP;
   end
   else begin
      CodePage := RVU_Charset2CodePage(Sender.FontTable[Sender.RTFState.CharProps.FontIndex].Charset);
   end;
   TextW := RVU_RawUnicodeToWideString(RVU_AnsiToUnicode(CodePage, Text));
   FRTFText := FRTFText + TextW;
end;

procedure TRTFConverter.DoReaderUnicodeText(Sender: TRVRTFReader; const Text: TRVUnicodeString;
  Position: TRVRTFPosition);
begin
   if (Position <> rtf_ts_ContinuePara) and not FFirstLine then begin
      FRTFText := FRTFText + #13#10;
   end;
   FFirstLine := False;
   FRTFText := FRTFText + Text;
end;

function TRTFConverter.ExtractTextFromRTF(RTFContent: string): string;
var
   lList: TStringList;
   lBlobStream: TMemoryStream;
begin
   FRTFText := '';
   FFirstLine := True;

   FParser := TRVRTFReader.Create(nil);
   lList := TStringList.Create;
   lBlobStream := TMemoryStream.Create;

   try
      lList.Text := RTFContent;
      lList.SaveToStream(lBlobStream);
      lBlobStream.Position := 0;

      FParser.OnNewText := DoReaderText;				<- Incompatible Types: 'TCustomRVMSWordReader' und 'TRVRTFReader'
      FParser.OnNewUnicodeText := DoReaderUnicodeText;			<- Incompatible Types: 'TCustomRVMSWordReader' und 'TRVRTFReader'
      if FParser.ReadFromStream(lBlobStream) = rtf_ec_OK then begin	<- Incompatible Types
         Result := FRTFText;
      end;
   finally
      FreeAndNil(lBlobStream);
      FreeAndNil(lList);
      FParser.Free;
   end;
end;
Can't find anything in the help file or in the announcement threads since version 18.0 (where it still worked).

Re: Issues extracting text from RTF with TRichview 20

Posted: Mon Jan 03, 2022 2:02 pm
by Sergey Tkachenko
Parameters of these events were changed when DocX import was implemented: DocX and RTF readers are inherited from the same class.

1) Change the declarations:

Code: Select all

procedure DoReaderText(Sender: TCustomRVMSWordReader; const Text: TRVAnsiString; Position: TRVRTFPosition);
procedure DoReaderUnicodeText(Sender: TCustomRVMSWordReader; const Text: TRVUnicodeString; Position: TRVRTFPosition);
2) ReadFromStream now returns a boolean value (True: success; False: error)

Re: Issues extracting text from RTF with TRichview 20

Posted: Mon Jan 03, 2022 3:33 pm
by Alexander_Dober
Thanks,

after changing the sender to TCustomRVMSWordReader this part still isn't working:

Code: Select all

...
   if Sender.FontTable.Count = 0 then begin
      CodePage := CP_ACP;
   end
   else begin
      CodePage := RVU_Charset2CodePage(Sender.FontTable[Sender.RTFState.CharProps.FontIndex].Charset);
   end;
...
.FontTable and .RTFState don't exist here. What is supposed to be used here?

Re: Issues extracting text from RTF with TRichview 20

Posted: Mon Jan 03, 2022 4:05 pm
by Sergey Tkachenko
Typecast Sender to TRVRTFReader:

if TRVRTFReader(Sender).FontTable.Count = 0 then begin

Re: Issues extracting text from RTF with TRichview 20

Posted: Mon Jan 03, 2022 5:11 pm
by Sergey Tkachenko
One more thing, the call of RVU_RawUnicodeToWideString must be removed. I.e. instead of
TextW := RVU_RawUnicodeToWideString(RVU_AnsiToUnicode(CodePage, Text));
must be
TextW := RVU_AnsiToUnicode(CodePage, Text);

You used the demo from this topic:
https://www.trichview.com/forums/viewtopic.php?t=2702
It was rather old, so I updated it for TRichView version 20 compatibility.
Now it supports not only RTF, but also DocX.