EggSqliteStore, A GtkTreeModel ontop of SQLite3.

The other day, Saturday to be exact, I was trying to come up with ways to load an immense amount of data into a GtkTreeView without loading huge amounts of memory. I've come up with part one of what I hope to be a 2 part series in dealing with this.

So what exactly is the problem? In my case, I want to have aggregated RSS data from every feed I regularly collect stored in a sqlite database. There are lots of reasons for this, perhaps the most relevant is that python pickling is far from my desired data format. So with this being said, I found that a couple weeks of data was huge, and not something I wanted to persist in memory.

This brought me to write a GtkTreeModel which gets its data directly from sqlite. The benefit here is that strings are only loaded into memory when the user moves the GtkTreeView to the visible rows. What this doesn't solve, its hundreds of SQL queries and parsing of SQL queries per second.

So step two here is to put a basic caching front on all these SQL queries. I decided to use two separate b-trees for indexing, as GtkTreeModel seems to ask for data in vastly different ways depending on the context. I found the most commonly used path was to ask for a row at offset N, while the other is finding the next iter in a sequence.

Therefore, one b-tree is the loaded data with the offset from position 0 as the key. This is an integer key of course, so GINTTOPOINTER and vice-versa is used if you have any need to look at the code. The second index (or as coded the primary index) is storing pointers to the same data, but with the rows `oid' from sqlite as the key. While the oid is always an integer as far as I can tell, I choose to have the index as a gchar*.

Now, you may ask, if the rows oid are in ascending order, why do you need two indexes again? Simple, if a user deletes data, you would obviously have gaps in the sequence and the offset would not be accurate.

So in part 2, I hope to add a LRU or ARC based cache (ARC actually uses 4 LRU's to self balance). This will allow us to free the least hit cached items and have a relative variance of loaded memory. The number of void pointers loaded will be exact, but I would imagine the strings they point to are relatively similar in size over a large sample set.

Anyway, here is a quick sample of how you might use it. I'm including absolutely no gui code in this to keep it brief.

Oh, and you can grab the code here. I probably wont update this, but look for a release of an app soon that may be using this. Who knows though, I got a lot on my plate now with getting some new exciting stuff started.

#include <stdlib.h>

#include <glib.h>
#include <gtk/gtk.h>

#include "egg-sqlite-store.h"

gint
main (gint argc, gchar **argv)
{
  GtkTreeModel *tree_model;
  GError *error;

  if (argc < 3)
    {
      g_print ("Usage: %s filename tablename\\n", argv[0]);
      return EXIT_FAILURE;
    }

  gtk_init (&argc, &argv);

  tree_model = egg_sqlite_store_new ();

  error = NULL;
  egg_sqlite_store_set_filename (EGG_SQLITE_STORE (tree_model), argv[1]);

  if (error)
    {
      g_print ("Error: %s\\n", error->message);
      g_error_free (error);
      return EXIT_FAILURE;
    }

  error = NULL;
  egg_sqlite_store_set_table (EGG_SQLITE_STORE (tree_model), argv[2]);

  if (error)
    {
      g_print ("Error: %s\\n", error->message);
      g_error_free (error);
      return EXIT_FAILURE;
    }

  g_print ("Opened %s successfully and introspected %s.\\n", argv[1], argv[2]);

  return EXIT_SUCCESS;
}

-- Christian Hergert 2007-05-30

Back to Index