Secondary indexes in Cassandra

Ed Anuff —  July 15, 2010 — 7 Comments

One of the common issues of dealing with the Apache Cassandra database is how to do secondary indexes of columns within a row. This post will discuss one technique, far from the only one, for
how to manage this. One thing that experienced Cassandra users will hopefully find interesting is that SuperColumns will not be used at all to accomplish this in order to
avoid the complexity and limitations they introduce. Also, it should also be pointed out that Cassandra will have native secondary index support in the upcoming 0.7 release (see CASSANDRA-749), which will make this all much simpler, but the idea is still valid for how to think about about this sort of thing, and will still be applicable in some situations. Once that version gets closer to release, I’ll do a follow up post looking at it.

So, to start, let’s assume a scenario where we have a container (ex. a group) of items (ex. users in the group), each of which has an arbitrary set of properties, which
are searchable by value in the context of the container. Items might also be members of other containers, but we won’t explicitly deal with that in this
scenario.

One way to model this in Cassandra is to have two Column Families (Cassandra-speak for table-like entities) which we’ll just call CF’s for short. The first
CF would be for the item’s properties, which we’ll call Item_Properties. This is the simplest form of using Cassandra’s data model. Rows in
Item_Properties will be looked up by a key, which in this example will be UUID’s. The columns inside the CF are the property names and the values stored
in the columns are the property values.

CF: Item_Properties
Key: item_id
Compare with: BytesType
Name Value
property_name property_value

The next CF is for the containers that will hold sets of items, and is appropriately named Container_Items. Each column in a Container_Items row is
the key of an row in Item_Properties. This is one of the first things that trips people up in using Cassandra. While you can use Column Families
as simple tables and use each row the same way you’d use a row as a record in a normal database, each row can be used as a simple table in and of
itself, particularly for modeling join tables. In the Container_Items CF, each column uses the key of an row in Item_Properties as it’s name
and the timestamp of when it was added to the collection as the column value. This row can grow quite large as more and more items are added
to it, but since each column is about 42 bytes for the UUID and the timestamp, this is going to hit a limit of 40M+ items in a group under pre-0.7
limitations. For a container of users in a group, this might be a reasonable limitation, but you’d also use this same mechanism for collecting
things like status updates (i.e. "tweets") which could much more conceivably exceed that. The 0.7 release of Cassandra will remove this limitation. UPDATE: In version 0.7, you can now have 2B columns in a row

CF: Container_Items
Key: container_id
Compare with: TimeUUIDType
Name Value
item_id insertion_timestamp

So far, this is fairly basic Cassandra data modeling. Where things get complicated is when one wants to start to retrieve items from
the group by querying against specific property values. For this, you’ll need to manage your own indexes as this is outside
of the purposefully minimalist design of Cassandra. To do this, we’ll create two more column families. This first
CF holds the actual index, and we use the container ID with the property name we want to index as the key to look it
up. It would look something like this:

CF: Container_Items_Property_Index
Key: container_id + property_name
Compare with: compositecomparer.CompositeType
Name Value
composite(property_value, item_id, entry_timestamp) item_id

Where this indexing technique departs a bit from techniques described elsewhere is how each column in the index is constructed.
Cassandra provides a set of column types that it can use to sort the columns in a row. You can only specify one and
it’s defined at column family creation time. Cassandra also lets you define your own column types and in what we’ve done here
is create a CompositeType (see http://github.com/edanuff/CassandraCompositeType).
This will let us define columns that Cassandra will sort for us that combine several component values into a single composite type.
This lets us create unique column entries for potentially (and probably) non-unique property values by adding additional values to the
column as discriminators.

So, now, in order to query a container for all the items where a property equals a certain value, we’d take the key of the container and append
the property name to it, and do a slice query against the value. All columns where the value was the first component in the composite type
would be returned, and the value of each column would be the key of the appropriate row in Item_Properties. Not that it doesn’t actually have to be precise
equality, range queries will work with this technique too.

The final issue to deal with is what happens when the value of a property is changed and the index has to be updated. The simple answer is that you’d
introduce a column for the new value in the Container_Items_Property_Index column family and remove the column for the old value. However,
for a number of reasons related to Cassandra’s model of eventual consistency and lack of transactions, simply reading the previous value from Item_Properties
before updating it and then removing the index entry for that value from Container_Items_Property_Index will not reliably work. To do this
we maintain a list of previous values for the specific property of a given item and use that to remove these values from the index before adding the
new value to the index. These are stored in the following CF:

CF: Container_Item_Property_Index_Entries
Key: container_id + item_id + property_name
Compare with: LongType
Name Value
entry_timestamp property_value

We remove these columns from the CF after retrieving them, so this row never gets too large, in most cases probably never grows much larger than one or two columns, more if frequent changes are being made. By the way, it’s a really good idea to make sure you understand why this CF is necessary because you can use variations of it to solve a lot of problems with “eventual consistency” datastores.

So, to wrap it up, there are two basic operations that you’ll end up wanting to peform – setting a property value for an item in a collection, and getting items
in the collection that match a certain value. These would look like this:

Set Property(property_name) For Item(item_id) In Container(container_id) To Value(property_value)

  1. Get the entry_timestamp as the current timestamp value.
  2. Get_Slice all columns on CF Container_Item_Property_Index_Entries for key (container_id + item_id + property_name)
  3. Do batch_mutate with the following:
    • Delete all matching columns in Container_Items_Property_Index from the entries retrieved from Container_Item_Property_Index_Entries in the previous step
    • Delete all columns in Container_Item_Property_Index_Entries that were previously retrieved
    • Insert column named property_name with column value of property_value in Item_Properties
    • Insert new entry in Container_Items_Property_Index
    • Insert new entry in Container_Item_Property_Index_Entries

Get Items In Container(container_id) Where Property(property_name) Equals Value(property_value)

  1. Get_slice range on CF Container_Items_Property_Index for key (container_id + property_name) for columns matching property_value

This may all seem like a lot of work, but in practice, this all gets wrapped up in middleware that hides it. You can find an implementation of the composite column comparer at CassandraCompositeType on GitHub and a simple implementation of the indexing technique described at CassandraIndexedCollections on GitHub.

Update: Mike Malone points out that, since Cassandra already stores a timestamp along with the column value, that it’s redundant to store in the column value as well and can be omitted in the Container_Items and Container_Item_Property_Index_Entries column families, which would reduce storage space by about 20%.

7 responses to Secondary indexes in Cassandra

  1. Patricio Echague January 15, 2011 at 4:57 pm

    Hi Ed, this is an interesting approach. The potential issue I see there is what happens if between the point 3.3 and 3.4 the clients or cassandra fails.

    If I understood properly, at that point you have deleted the previous values and inserted the new ones, but the index didn’t get updated. Is it correct or I’m missing something?

  2. That’s a good catch. Some provision for repeating step 3 until it succeeds would be recommended, or, more complicated, a rollback mechanism could be implemented.

  3. Cyril Auburtin May 13, 2012 at 11:34 pm

    Would it possible to do:

    Get Items In Container(container_id) Where Property(property_name) Equals Value(property_value) And Property(other_property_name) Equals Value(other_property_value)?

    maybe by extending the model

    thx and congrats on bringing this model of query much more performant

  4. Hello Ed, can you elaborate on need for the CF Container_Item_Property_Index_Entries? You mention that “for a number of reasons related to Cassandra’s model of eventual consistency … will not reliably work” and “it’s a really good idea to make sure you understand why this CF is necessary”, yet it is unclear why that is actually the case. Without further explanations, it sounds like any and all Cassandra reads are inherently unsafe. Is the purpose of such a CF to ensure consistency when multiple writes are present, or to assist with recovery in case an operation is interrupted midway?

  5. This allows you to repair the index at read. Cassandra isn’t inherently unsafe, but creating a ledger of past values means the index can always be returned to a consistent state. The example code doesn’t show this, but the Usergrid code does have a repair utility that fixes inconsistent indexes using this information to remove old index values.

  6. Won’t this method flood your SSTables with tombstones over time, due to deleting a column on every value change of an indexed value?
    Reference: http://www.youtube.com/watch?v=0u-EKJBPrj8

    If that is the case, it seems like one way around this may be to recreate index rows periodically.

    • Yes, it will under heavy load. Some solutions that we’ve used include lowering tombstones and checking for a delta before deleting. Recreating the row might be a good way as well.

Leave a Reply

*

Text formatting is available via select HTML. <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>