Skip to content

Nextgen

vmx edited this page Feb 5, 2013 · 1 revision

Next generation GeoCouch

Current plan

Creating an index structure similar to LSM-trees. You won't have dynamic inserts but only statically built (bulk loaded) trees.

Each tree can held an exponentially higher number of nodes. Whenever it overlows, it will be merged with the next bigger one.

Queries

For querying your data, you need to query all files. The scan through the files will happen sequentially, starting with the smallest one. This is important for the way deletions work.

Insertion

Try to insert your data into the first tree. If it is empty you are done, else it overflows and you need to merge it with the next one. Keep on doing the merging until there's a place where all nodes fit in.

Deletion

The deletion works with writing tomb stones. You get the needed information from the back-index (which is a B-tree having a compound key containing the partition and the Document ID)

Thanks to the nature of merging with previously written data it is guaranteed that the information that there is a tomb stone is encountered first, and every subsequent occurence of the item can be safely ignored.

Updates

Updates are just a deletes followed by an insert.

Partition awareness

The trees needs to be partition aware. The leaf nodes will contain the partition they belong to so that they can be filtered out on query time.

In the header of each tree the current state of the partitions is kept. This way you can atomically between active and passive partitions (e.g. a partition switched from active to passive, and it switches back to active, though the updates haven't fully caught up yet. In this case the query bitmap mask contains this partition, but the tree can indicate in the header, that it is not ready yet to server those partitions and that another node will serve those).

If a partition switches from active to non-active, write tomb stones. Get all the needed information from one partition from the back-index.

Updates during cascaded tree rebuilding

The tree rebuilds might cascade to a very deep level and will take some time. During this period there will be new updates. In order to store and query them, they will build an alternate forest of trees. We call this concept versioning.

Once the rebuild is done, the new forst will be merged into the old tree. Often the old tree will now contain emtpy trees (thanks to the cascaded rebuilding), so those new files can just be re-used as they are.

Future improvements

Make it similar to the generational storage, where the hot data is kept in the small trees. This would need dynamic updates of the index structure.