<?xml version="1.0" encoding="utf-8"?><!-- generator="wordpress/1.5.1.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: An implementation of a Copernic Desktop Search Custom Extractor in C#</title>
	<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/</link>
	<description>General musings and programming stuff</description>
	<pubDate>Sat, 22 Nov 2008 05:36:55 +0000</pubDate>
	<generator>http://wordpress.org/?v=1.5.1.3</generator>

	<item>
		<title>by: Tomer Gabel</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-186321</link>
		<pubDate>Thu, 17 Jul 2008 08:41:38 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-186321</guid>
					<description>In case someone still needs this, the URL for the .NET CF version of the IStream wrapper has moved to http://www.tomergabel.com/ManagedIStreamWrapperForNETCompactFramework20.aspx</description>
		<content:encoded><![CDATA[	<p>In case someone still needs this, the URL for the .NET CF version of the IStream wrapper has moved to <a href='http://www.tomergabel.com/ManagedIStreamWrapperForNETCompactFramework20.aspx'>http://www.tomergabel.com/ManagedIStreamWrapperForNETCompactFramework20.aspx</a>
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Oliver Sturm</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-63408</link>
		<pubDate>Tue, 12 Dec 2006 10:54:57 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-63408</guid>
					<description>Tomer, thanks for posting this, and also for your work. 

You're the second guy to comment on the captcha, so I decided to fiddle with the settings somewhat to make it easier to read in the majority of cases. Before, it managed to kill at least 99.9% of the comment spam coming this way (and I know that this means thousands of spams, from the time I didn't have a captcha) - I'll see how well it does now. Thanks for the feedback in any case!</description>
		<content:encoded><![CDATA[	<p>Tomer, thanks for posting this, and also for your work. </p>
	<p>You&#8217;re the second guy to comment on the captcha, so I decided to fiddle with the settings somewhat to make it easier to read in the majority of cases. Before, it managed to kill at least 99.9% of the comment spam coming this way (and I know that this means thousands of spams, from the time I didn&#8217;t have a captcha) - I&#8217;ll see how well it does now. Thanks for the feedback in any case!
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Tomer Gabel</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-63407</link>
		<pubDate>Tue, 12 Dec 2006 10:33:30 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-63407</guid>
					<description>Thanks for the IStream wrapper - it saved me an awful lot of work. I've adapted it to work on the .NET Compact Framework 2.0; if you're interested, it's posted &lt;a href=&quot;http://www.tomergabel.com/Managed+IStream+Wrapper+For+NET+Compact+Framework+20.aspx&quot;&gt;here&lt;/a&gt; with full credit. Thanks again for publishing your findings!

(BTW, your Captcha provider is horrible - the text is barely legible, and I think would still be easier on OCRs than most Captchas. Please consider replacing it?)</description>
		<content:encoded><![CDATA[	<p>Thanks for the IStream wrapper - it saved me an awful lot of work. I&#8217;ve adapted it to work on the .NET Compact Framework 2.0; if you&#8217;re interested, it&#8217;s posted <a href="http://www.tomergabel.com/Managed+IStream+Wrapper+For+NET+Compact+Framework+20.aspx">here</a> with full credit. Thanks again for publishing your findings!</p>
	<p>(BTW, your Captcha provider is horrible - the text is barely legible, and I think would still be easier on OCRs than most Captchas. Please consider replacing it?)
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Stefe</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-1395</link>
		<pubDate>Tue, 05 Jul 2005 14:59:06 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-1395</guid>
					<description>I cant get it running, always get missing GUID reference or so.</description>
		<content:encoded><![CDATA[	<p>I cant get it running, always get missing GUID reference or so.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Oliver Sturm</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-420</link>
		<pubDate>Mon, 23 May 2005 14:49:16 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-420</guid>
					<description>Guys, thanks for updated information on this. I haven't had a look at the newer release versions of CDS and the corresponding APIs for quite a while, because the original purpose of the extension I was looking got pushed to a lower priority. I appreciate the current information, though!</description>
		<content:encoded><![CDATA[	<p>Guys, thanks for updated information on this. I haven&#8217;t had a look at the newer release versions of CDS and the corresponding APIs for quite a while, because the original purpose of the extension I was looking got pushed to a lower priority. I appreciate the current information, though!
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Bloggit</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-419</link>
		<pubDate>Mon, 23 May 2005 14:45:05 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-419</guid>
					<description>Actually that code won't work with the newest release, due to new bug in build 646 of Copernic.  And it turns out that if you had implemented IStream::Seek() (as I did), it also wouldn't work due to a bug in their Preview (though indexing works), in the Build 644 they released as 1.5.  And lastly it takes about 20 seconds (and 250 spurrious reads) for their preview to return.

All-in-all, there are quite a few issues with Copernic's current API.  I hope they get it debugged soon, but they haven't been very responsive yet.</description>
		<content:encoded><![CDATA[	<p>Actually that code won&#8217;t work with the newest release, due to new bug in build 646 of Copernic.  And it turns out that if you had implemented IStream::Seek() (as I did), it also wouldn&#8217;t work due to a bug in their Preview (though indexing works), in the Build 644 they released as 1.5.  And lastly it takes about 20 seconds (and 250 spurrious reads) for their preview to return.</p>
	<p>All-in-all, there are quite a few issues with Copernic&#8217;s current API.  I hope they get it debugged soon, but they haven&#8217;t been very responsive yet.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: ian thomas</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-359</link>
		<pubDate>Thu, 05 May 2005 08:51:01 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-359</guid>
					<description>A couple of things, here. First, your remark &quot;there’s no way to visualise your data 
in Copernic up to now&quot;. There is a way - 
 
I have followed, in recipe fashion, the approach used by 'Pythonner’ 
(http://pythonner.blogspot.com/2005/03/copernic-desktop-search-plug-in.html) 
where he presents a quick-and-dirty solution for indexing and previewing 
of .PS (postscript) files with CDS. 

This uses a C# wrapper. I have compiled, signed, installed this and it all works 
as expected on my Win2K system with Copernic Desktop Search installed.

Because in this special case, GhostScript is used and it has a mechanism to process
any .PS files using a conversion file (PS2ASCII.PS) and a range of command parameters,
producing a plain text stream, search and preview are possible.
(I used GhostScript 8.15).  

Using something like GhostScript is a very heavy burden (several Mb of DLL and files, 
file locations to be explicitly defined, etc) but I guess it's no worse than using 
the Adobe iFilter for Acrobat 7, when using Microsoft's OS Indexing Service: 
that's 7Mb, as I recall. 

I might use the same approach, since it is quite feasible for an application I have 
in mind to extract quite particular text from a GIS document file, using an installed
application (which is closer to 70Mb in size, as it happens). 

Secondly, I'm a real novice with COM - I can *just* follow what's in your code. 
First, I need to implement what you've written in your 3March2005 blog article 
before scratching the head about how to 'simply “append” content from other sources 
to the data from the file itself', as you say. That part's relatively easy, I guess.

Thnaks for the helping hand.</description>
		<content:encoded><![CDATA[	<p>A couple of things, here. First, your remark &#8220;there’s no way to visualise your data<br />
in Copernic up to now&#8221;. There is a way - </p>
	<p>I have followed, in recipe fashion, the approach used by &#8216;Pythonner’<br />
(http://pythonner.blogspot.com/2005/03/copernic-desktop-search-plug-in.html)<br />
where he presents a quick-and-dirty solution for indexing and previewing<br />
of .PS (postscript) files with CDS. </p>
	<p>This uses a C# wrapper. I have compiled, signed, installed this and it all works<br />
as expected on my Win2K system with Copernic Desktop Search installed.</p>
	<p>Because in this special case, GhostScript is used and it has a mechanism to process<br />
any .PS files using a conversion file (PS2ASCII.PS) and a range of command parameters,<br />
producing a plain text stream, search and preview are possible.<br />
(I used GhostScript 8.15).  </p>
	<p>Using something like GhostScript is a very heavy burden (several Mb of DLL and files,<br />
file locations to be explicitly defined, etc) but I guess it&#8217;s no worse than using<br />
the Adobe iFilter for Acrobat 7, when using Microsoft&#8217;s OS Indexing Service:<br />
that&#8217;s 7Mb, as I recall. </p>
	<p>I might use the same approach, since it is quite feasible for an application I have<br />
in mind to extract quite particular text from a GIS document file, using an installed<br />
application (which is closer to 70Mb in size, as it happens). </p>
	<p>Secondly, I&#8217;m a real novice with COM - I can *just* follow what&#8217;s in your code.<br />
First, I need to implement what you&#8217;ve written in your 3March2005 blog article<br />
before scratching the head about how to &#8217;simply “append” content from other sources<br />
to the data from the file itself&#8217;, as you say. That part&#8217;s relatively easy, I guess.</p>
	<p>Thnaks for the helping hand.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Oliver Sturm</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-219</link>
		<pubDate>Thu, 28 Apr 2005 11:45:25 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-219</guid>
					<description>The code works with the final 1.5 release just as well as it did with the beta. 

About indexing alternate data streams... you'd have to combine the content from the file with that from the alternate stream, I guess. It would be easy to write an extension to my IStreamWrapper that would simply &quot;append&quot; content from other sources to the data from the file itself. Copernic won't be able to distinguish between data from the file and data from alternate streams, but as there's no way to visualise your data in Copernic up to now, this shouldn't be a great problem.</description>
		<content:encoded><![CDATA[	<p>The code works with the final 1.5 release just as well as it did with the beta. </p>
	<p>About indexing alternate data streams&#8230; you&#8217;d have to combine the content from the file with that from the alternate stream, I guess. It would be easy to write an extension to my IStreamWrapper that would simply &#8220;append&#8221; content from other sources to the data from the file itself. Copernic won&#8217;t be able to distinguish between data from the file and data from alternate streams, but as there&#8217;s no way to visualise your data in Copernic up to now, this shouldn&#8217;t be a great problem.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: ian thomas</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-218</link>
		<pubDate>Thu, 28 Apr 2005 11:39:27 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-218</guid>
					<description>I've become interested in Copernic after a long absence (v1.2), and thinking that I should use MS Indexing Service (on 2000 / XP / 2003 systems and NTFS volumes only) instead.
I have a couple of quick questions - 
1. Have you tried your C# code with the actual 1.5 release?
2. Have you got any ideas on how to implement indexing of the NTFS file alternate data streams, using Copernic?

I'll be in touch later - once I have done a few tests with Copernic DS v1.5

Regards

Ian Thomas</description>
		<content:encoded><![CDATA[	<p>I&#8217;ve become interested in Copernic after a long absence (v1.2), and thinking that I should use MS Indexing Service (on 2000 / XP / 2003 systems and NTFS volumes only) instead.<br />
I have a couple of quick questions -<br />
1. Have you tried your C# code with the actual 1.5 release?<br />
2. Have you got any ideas on how to implement indexing of the NTFS file alternate data streams, using Copernic?</p>
	<p>I&#8217;ll be in touch later - once I have done a few tests with Copernic DS v1.5</p>
	<p>Regards</p>
	<p>Ian Thomas
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Oliver Sturm&#8217;s weblog - Copernic Desktop Search 1.5 has been released</title>
		<link>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-56</link>
		<pubDate>Thu, 31 Mar 2005 18:05:30 +0000</pubDate>
		<guid>http://www.sturmnet.org/blog/archives/2005/03/03/cds-csharp-extractor/#comment-56</guid>
					<description>[...] DS since the beta and they haven&amp;#8217;t taken up much that was suggested at that time. My C# custom extractor for CDS still runs, but they haven&amp;#8217;t done anythi [...]</description>
		<content:encoded><![CDATA[	<p>[&#8230;] DS since the beta and they haven&#8217;t taken up much that was suggested at that time. My C# custom extractor for CDS still runs, but they haven&#8217;t done anythi [&#8230;]
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
