PDA

View Full Version : URD is Bloating


tfjern
07-05-2008, 07:54 PM
Not much activity on the forum these days, so here is a minor contribution. I recently created a UR database, and I decided to take someone's advice (which I read on the forum) that it is better to link and not store material (esp. pdfs and the like). But I noticed that the file already has 80 megs, and I've barely started. Kinook cautions that things start to slow down ("degrade" is the word he uses) after 200 megs, though the maximum for a UR file is much larger.

My question: how can my UR database be so large already when it consists mostly of links? At this rate of bloat I expect things such as searches will start to suffer major degradation.

quant
07-06-2008, 03:21 AM
Did you delete large amount of data? If YES, go to Tools->Compact and Repair and choose "Compact Database":

http://www.kinook.com/UltraRecall/Manual/compactandrepairdialog.htm

"Compact database: If this check box is checked when OK is clicked, the current Info Database will be shrunk, removing any free space.

Note: Ultra Recall uses a highly efficient, binary format for Info Database files. As Info Items are deleted, empty space remains within the file, and is reused when new data is added to the Info Database. If you delete a large amount of data, you can immediately shrink the Info Database, removing this free space by using the provided Compact functionality."

tfjern
07-06-2008, 05:31 AM
Thanks, but I had already looked into that. I'll just have to be sure I don't store but link whenever possible. Be nice if UR had global searches across databases. Maybe someday.

I just noticed something. In Database Properties my URD reads:

1. File Size: 74,345,688
2. Stored Document Size: 387,992
3. Item Rich Text Text: 27,203,470
4. Icon Data: 235,373

Now 1. indicates the total of both stored and linked files. I was wondering if this item is the one to start being concerned about when the file size exceeds 200 megs?

Also, how can 3. (Rich Text Stored) exceed 2. (Stored Doc. Size)? Something wrong here.

quant
07-06-2008, 06:02 AM
hmmm, ok. So when you go to file->properties what is the size of Stored Documents?

PS:
I don't know if you were referring to my comments about storing/linking preferences. I think the search speed will not deteriorate that much when file size increases (as everything is indexed), i.e. that's not the reason I personally prefer linking to storing. My main concern is the quality of the search results. When I link the files, they are still available to other applications, for example, in my case Archivarius3000, which has much superior searching capabilities to UR (relevance, showing highlighted fragments of found terms, morphology, searching for keywords close to each other, ...)

tfjern
07-06-2008, 06:05 AM
Quant, could you look at the edited post please?

quant
07-06-2008, 06:20 AM
Originally posted by tfjern

Now 1. indicates the total of both stored and linked files. no, 1. is the URD file size (linked files are not included)

Originally posted by tfjern
Also, how can 3. (Rich Text) exceed 2. (Stored Doc. Size)? 3. is the text in your Text, Folders, ... etc. all those derived from Text core template. Remember, even if your item has few words, the rtf is huge!



I'm still wondering why is your file size so big ... what was your urd file size before compacting? When you did compact, did it show sth like "80% shrunk"?

tfjern
07-06-2008, 06:53 AM
First, you said that 1. File Size does NOT include linked files, but when I tested this by linking five pdf files, the only change in the four sizes occurred in the first, 1. File Size.

Second, how can Stored Document Size be smaller than Item Rich Text? This doesn't make sense.

Third, upon compacting no percentage change was indicated, since, as I said, almost all the data in the database is linked, not stored.

quant
07-06-2008, 07:12 AM
Originally posted by tfjern
First, you said that 1. File Size does NOT include linked files, but when I tested this by linking five pdf files, the only change in the four sizes occurred in the first, 1. File Size.
you probably index them that's why the urd increases, but it certainly doesn't include the file sizes themselves. How many keywords has your urd file (file->properties)? This is probably answer to your 80MB urd file size.

Originally posted by tfjern
Second, how can Stored Document Size be smaller than Item Rich Text? This doesn't make sense.
because they are most probably zipped

Originally posted by tfjern
Third, upon compacting no percentage change was indicated, since, as I said, almost all the data in the database is linked, not stored.
ok, I thought you were changing your urd file, when you first removed those that were stored, and then you linked them.

Jon Polish
07-06-2008, 11:11 AM
I think the bloat you are observing is due to keywords. I have observed my databases are at minimum twice the expected size.

I would try this experiment. Before attempting this, backup your database!

1. In the explorer pane, select all items.

2, Go to keywords and delete all of them.

3. Compact this database.

4. Compare the sizes of the newly created, keyword-less database with your backup copy.

I would be interested if you could post your results.

Jon

tfjern
07-06-2008, 06:40 PM
OK, I selected everything in Data Explorer. Then went to Item / Keywords, but both panes -- user-defined keywords and auto-generated keywords -- were empty. Delete what?

On the other hand, if I Control + K on a particular item (linked, by the way), I get a list of auto-generated items (which seems to vary, item by item, though not always!). Am I supposed to go through the entire database and delete the item keywords? Please say no.

Also, in Properties it says I have 1.1 million keywords! Nice.

So how, pray tell, am I supposed to delete these keywords?

You would think Kinook would address this problem in a more serious manner, but I get the feeling that after creating a great piece of software, UR, they feel their job is done, and the users should be saavy enough to figure out things on their own. And they do, albeit usually serendipidously, or more often on the forum. I have a sinking feeling keywords is a flawed concept or at least a work in progress.

quant
07-07-2008, 02:37 AM
Originally posted by tfjern
You would think Kinook would address this problem in a more serious manner, but I get the feeling that after creating a great piece of software, UR, they feel their job is done, and the users should be saavy enough to figure out things on their own. And they do, albeit usually serendipidously, or more often on the forum. I have a sinking feeling keywords is a flawed concept or at least a work in progress.

You set to keyword your documents, have 1 million of them, so no wonder your file is 80MB, and it will make lightning fast searches.

Unfortunately, Jon Polish was not right cause it's not the way to delete keywords. To delete them, first set which kind of file extensions should not be keyworded, then select the items, and resynchronize ...

Everything is in the help file!!! And it's right where you'd first look at, auto-generated keywords. So if by "saavy enough" you mean someone who knows how to use help file, then I share your frustration ;-)

"Auto-generated Keywords can't be manually added (see User-Defined Keywords) but can be deleted (they are automatically replaced when the Info Item is Synchronized)."

tfjern
07-07-2008, 03:46 AM
The UR help file stinks -- and this is the consensus, excluding the outliers. Too often the explanations there are as clear as mud.

For a simple example, why is my Stored Document Size: 387,992 smaller than my Item Rich Text Text: 27,203,470?

You wrote, "Everything is in the help file!!!" Really? Where?

quant
07-07-2008, 04:08 AM
Originally posted by tfjern
The UR help file stinks -- and this is the consensus, excluding the outliers. Too often the explanations there are as clear as mud.

For a simple example, why is my Stored Document Size: 387,992 smaller than my Item Rich Text Text: 27,203,470?

You wrote, "Everything is in the help file!!!" Really? Where?
Please calm down ... I don't take away from you your opinion on UR help file, mine is simply different. So I repeat mine, UR help file is very good and comprehensive. If Kinook is planning to make it even better, that's great!

So in help file, you can read:

Stored Document Size: The combined size of all stored documents.

Item Rich Text: The combined size of all rich text stored.

Something unclear?

tfjern
07-07-2008, 07:22 AM
I am calm. Annoyed, perhaps, but still calm.

You borrowed two definitions from the your crystal-clear "help" file ("Mine is simply different"), but you still haven't answered my simple question:

viz., why is my Stored Document Size (387,992) smaller than my Item Rich Text (also stored, but NOT zipped) (27,203,470)?

Logically shouldn't the size of the latter be smaller than the former, or am I missing something? Or perhaps we have entered the mysterious realm of quantum computing and Heisenberg's Uncertainty Principle has taking effect.

Kinook, are you there? Hello?

Jon Polish
07-07-2008, 07:37 AM
Originally posted by quant
Unfortunately, Jon Polish was not right cause it's not the way to delete keywords. To delete them, first set which kind of file extensions should not be keyworded, then select the items, and resynchronize ...

Hi Quant:

I don't quite understand why this is not correct. UR cannot keyword all files (WordPerfect for example), but those that it can (text based pdf, Word, etc.) do display for me. The keywords appear using the method I suggested whether the items are stored or linked.

Jon

kinook
07-07-2008, 07:39 AM
There is no relationship between 'Stored Document Size' and 'Item Rich Text' in the File | Properties dialog, nor any reason to expect one to be larger or smaller than the other.

Stored Document Size is the total size of any documents stored in the .urd file (PDF files, stored web pages, Word documents, etc.).

Item Rich Text is the total size of any rich text in the .urd file (from non-document items [i.e., Text, Task, Contact, etc.] and Item Notes).

And both documents and rich text are stored compressed in the .urd file.

These numbers don't include space used to index keywords in documents (whether stored or linked) or rich text. If linked documents are keyworded by UR, that can add to the size of the .urd file (and make searches much quicker -- there is always a tradeoff between size and speed).

You could use SQLite to delete all auto-generated keywords from a database: http://www.kinook.com/Forum/showthread.php?threadid=2825 (and then compact the database).

tfjern
07-07-2008, 07:50 AM
Ah, I understand now. What threw me off was that in our beloved "help" file, in the Database Properties Dialog, you can see Stored Doc. Size as 122,907 bytes and Item Rich Text as 32,128 bytes, so I assumed, mistakenly, that the latter should be smaller than the former. You might want to add this information to future new and improved help files because I don't think I'm the only dummy who might be confused by this.

quant
07-07-2008, 07:51 AM
Originally posted by Jon Polish
Hi Quant:

I don't quite understand why this is not correct. UR cannot keyword all files (WordPerfect for example), but those that it can (text based pdf, Word, etc.) do display for me. The keywords appear using the method I suggested whether the items are stored or linked.

Jon
because if you select more than one item, the keyword dialogue will show only the common keywords ...

Jon Polish
07-07-2008, 08:49 AM
Originally posted by quant
because if you select more than one item, the keyword dialogue will show only the common keywords ...

Ah, yes. You are correct. This makes sense and I should have realized how this works. Related to this, it is amazing how many common keywords exist and are indexed. So many for me that when I performed my experiment, I thought I was deleting the whole index. Still, the size of my database, once compressed, was reduced dramatically.

I take no particular exception to the added size of the database. My problem has always been with UR's speed. Importing, exporting, copying from the web have been improved over time, but are still quite slow when compared to some other programs. One could also make an arguement that indexing does not have to increase searching speed. Just look at programs that do not index (and the relatively small size of their databases). I am referring to Info Select and askSam (the professional version indexes, the other does not).

However, nothing is close to URs versatility, consistency, and depth.

Jon