Thread: tag cloud
View Single Post
  #22  
Old 08-04-2007, 08:07 AM
quant's Avatar
quant quant is online now
Registered User
 
Join Date: 11-30-2006
Posts: 967
Quote:
Originally posted by janrif
So why do I care how many times the keyword is expressed in each item?

The only thing I would care about is if the list is ordered correctly, i..e the most relevant item displayrd @ the top of the list with some relevance statistic attribute which can be seen in a column in the search pane.
I don't understand, you ask a question in the first paragraph, and answer yourself in the second.

"Tag cloud" and "relevance" features could be implemented at the same time.

Example:
Imagine you imported several documents dealing with some financial analysis, and put them in the "new documents" directory. Now you create search, tick "limit search to siblings" and tick the newly created "keyword frequency analysis" option. The search returns this list:

keyword frequency
dollar 1000
volatility 600
bond 300
market 200
price 100
...

By this search, I know what is the main topic the documents deal with. If I wanted to learn more about bonds, I'd click on "bond", this would perform the search on the same subset of documents, but with search string "bond", and the result would look like:

item frequency
Bond analysis.pdf 78
Capital markets.pdf 23
Fixed income products.pdf 20
...

So if I want to learn about bonds, the above documents would be probably the best candidates to start with ... without frequency, the search would return 20 documents that contain keyword "bond" maybe once, and I'd waste time opening each of them trying to see if I that is the document I need to learn about bonds

This is very primitive frequency analysis, easily implemented, ... maybe enough for the start. For the better search results, I decided to only link documents to UR, on which I can perform much more with SearchInform that includes morphology, exact, containing, phrase, ... search
Reply With Quote