Saturday, June 15, 2013

Notes From The Cassandra Summit 2013

It was nice to see my former Datastax coworkers during the summit that took place at fort Mason in San Francisco. As expected, each Cassandra summit is becoming more and more crowded which proves that the technology is getting more popular in the industry. I was very impressed and happy at the same time to see that all sort of companies are moving towards using Cassandra for different areas of their stack. Few of them include Instagram replacing their sharded Redis cluster with a Cassandra one running on SSDs serving peaks of 20k write operations 15k reads per second, and Spotify serving their 24+ million users out of Cassandra cluster with about 300 Cassandra nodes between the 24 different services that use the database to store 50TB of data.
Solid state Drives (SSDs) give really high throughput specially if your access pattern is mostly random. If you manage your own hardware you should definitely consider SSDs for the data partition. If AWS is your call then hi1.4xlarge is your friend.

Saturday, June 01, 2013

Distributed Cassandra-based Locks in Hector Client

After almost 3 years of not updating this blog I decided that writing about the latest interesting feature in Hector was a good excuse to break the ice. I wrote the first implementation of distributed lock support for Hector on July 15th 2012 and Todd Nine took it to the next step.
The feature is an implementation of Dominic Williams's Wait Chain with minor adjustments, backed 100% by Cassandra, which means that it is horizontally scalable.

The framework is composed by three main entities:
  • HLock : Self explanatory. It is the lock we are trying to acquire.
  • HLockManager : The entity responsible to acquiring and releasing the lock, and
  • HLockManagerConfigurator : Responsible to configuring the lock system. HLMC from now on.

HLMC defines important properties needed for the normal operation of the lock system. Hector implements this feature by storing information in HLocks column family under a specific keyspace  HLockingManager with a default replication factor of 3. Additionally row cache is enabled by default and the locks last 5 seconds after which the lock will expire.

All the above mentioned properties can be change via HLMC.

How to initialize the locking system


The following snippet of code shows how to initialize the framework and can be place along to where you set up Hector's CassandraHostConfigurator
   
// Initialize Locking Framework
cluster = getOrCreateCluster("MyCluster", getCHCForTest());
HLockManagerConfigurator hlc = new HLockManagerConfigurator();
hlc.setReplicationFactor(1);
lm = new HLockManagerImpl(cluster, hlc);
lm.init();

Acquiring and Releasing Locks


This snippet shows to to use the locks. It assumes you hold an instance of LockManager somewhere. Guice and Spring are good frameworks to solve this problem.
HLock lock = lm.createLock("/Users/patricioe");
try {
    lm.acquire(lock);

    // Do something ...
} finally {
    lm.release(lock)
}

Thread safety


The implementation of HLockManager (HLockManagerImp) is thread safe and thus can be share across different threads. Instances of HLock (HLockImp) on the other hand are state-full  and should not be share across threads. They are meant to be created and release within a short period of time (5 Seconds by default).

Miscellaneous 


Besides the fact that some people are using this feature I recommend to you to give it a try and send us feedback or questions to hector-users@googlegroups.com or here in this blog. Hope you enjoyed the reading.

You should follow me on Twitter @patricioe