Kinook Software Forum

Go Back   Kinook Software Forum > Ultra Recall > [UR] Suggestions
Register FAQ Community Calendar Today's Posts Search

 
 
Thread Tools Rate Thread Display Modes
Prev Previous Post   Next Post Next
  #4  
Old 01-24-2007, 06:04 PM
danson danson is online now
Registered User
 
Join Date: 01-10-2006
Posts: 96
I bet there is some way to make this even cleverer -

Can you think of some kind of datastructure that allows you to index not only what words occur in what documents but also some kind of offset from the beginning value?

I suppose the current index looks like:

WORD DOCUMENT-ID
================
wordA: 2 5 9 1 3
wordB: 2 12 99 293

You could update the index to show not just what documents the word lies in but also it's position:

wordA: 2(4) 5(29)...
wordB: 2(5) 9(23)...

So wordA occurs in document 2, offset 4 and document 5, offset 29.

Then searching for the phrase "wordA wordB" would simply be a case of returning all documents and comparing offsets that are different by 1 (or perhaps with some tolerance factor).

That final comparison can probably also be optimised with the right algorithm.

Perhaps though you do something much more clever already...

Daniel
Reply With Quote
 


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



All times are GMT -5. The time now is 11:52 AM.


Copyright © 1999-2023 Kinook Software, Inc.