Why did we choose a binary data format for Ultra Recall databases?

Posted January 24th, 2006 by kevin

An Ultra Recall database (.urd) is stored as a modified SQLite database file. There are several good reasons for storing Ultra Recall data in a real database as opposed to a flat text or XML file:

  • Instant load time: Only the expanded portion of the Search and Explorer trees and the detail information for the last selected item needs to be loaded at startup, which is much quicker than loading and parsing an entire text or XML file, especially as your database grows.
  • Performance: Using a true database engine, the time required to add, modify or even retrieve data remains remains constant as database grows. Storing to a flat text or XML file causes save time to increase with the size of the file.
  • Indexed searches: Since .urd files are actually binary database files, they can be indexed in multiple ways, in some cases increasing the performance of searches by a factor of a 100 or more.
  • Smaller files: Most human-readable formats such as XML are not very space efficient, with repetitive text consuming significant amounts of space and little opportunity for compression. An Ultra Recall database file is highly efficient in size, and all stored binary data stored is compressed. The net result is that quite often, less space is consumed for documents stored in Ultra Recall (including meta data and indices) than the original files on disk.
  • Data consistency: Due to the additional information stored (attributes, keywords, logical links, etc.) for every item of data added to Ultra Recall, multiple database changes are required for most operations. Using a database with atomic commits ensures that all or none of the changes in a overall update are done; other file formats do not provide for this capability.
    However, we do understand users’ concerns about having their data locked into our database format, which is why Ultra Recall provides the ability to export all data with full fidelity (organizational hierarchy including logical linking references, all stored document data, rich text details and notes, meta data, item attributes, etc.) in XML format. If you ever want to migrate your data to another application (provided that you can find one that can represent and manage all of the data Ultra Recall handles), you can simply export to XML and then import into that application (tweaking the data format if necessary, depending on the import formats it supports). The XML dialect generated by Ultra Recall export is an extension of the OML standard, which by itself is rather limited and can’t express all of the data and relationships that Ultra Recall defines.

    Because our modified SQLite format uses a custom header signature, supports encryption (which isn’t included in the open source version of SQLite), uses ZIP compressed storage for blob information (documents, icons, rich text, etc.), the data is not readily accessible via standard SQLite APIs at this time. However, we plan to eventually expose the data in a read-only fashion via an API and/or ODBC driver in a future release.

    Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
    • BlinkList
    • blogmarks
    • del.icio.us
    • digg
    • feedmelinks
    • Furl
    • LinkaGoGo
    • NewsVine
    • Reddit
    • Simpy
    • Spurl
    • YahooMyWeb

  • One Comment on “Why did we choose a binary data format for Ultra Recall databases?”

    1. Adrian McNeil Says:

      I just can’t resist that “be the first to comment” suggestion.

      However, this is the second time I’ve been the first comment on this blog post. You reposted, I think.

      The first time, I was pleased to discover that it is possible to drop out the entire database in an XML form,for some sort of statistical processing. I have now found the name of the multivariate analysis software which is designed to draw inferences from the pattern of data stored in a set of endnote abstract fields. I think it is called VizRef and is also by Thomson research. Their web site shows how you can display and elicit related/common themes from your endnote abstracts.

      PS I recall some discussion about comparing Abstract fields. Actually text fields, generally. people were suggesting that the whole process breaks down if some of your fields have only one or two words in them, or if some have several thousand. I think they suggested about 300 to 500 words was a good choice, but it may have been 50 words. having a fairly consistent size is probably more important than exactly what the size is. I would guess that they also drop a large number of common words before they do the analysis.

      Until now I have used endnote as my ultra recall. It was fairly convenient to hold selected fulltext information in the abstract field, and of course it does bibliographies.

      I have just bought ultra recall, and it works fine with Firefox. However, there is almost no discussion about how to do bibliographies in Ultra recall.

      Since I get endnote free from the University, I will be looking for some way to set up and export something endnote can read. Then I don’t have to worry about the bibliography stuff that other people have already done for me.

      warm regards,
      Adrian

    Comment: