Selecting a Datastore

Selecting a Datastore, Part 1: What I’ve Learned

iStock-162055194Small.jpgLast week, in a rush to get where I needed to be, my friends and I succumbed to the call of fast food and went directly to the nearest drive-thru.  Looking for finger food, we opted for the 20-piece chicken nuggets, which included 2 sides of fries and 2 drinks, amounting to about $10.  We already had drinks in the car and didn’t want the drinks that were offered in the value meal so we asked that they hold the drinks.  

As it turns out, due to the way their system was set up, this was not possible at the same price. If we ordered the food without the drinks, it would no longer be considered a “value meal” so ordering less items ends up being more costly.  It would’ve been cheaper to buy the value meal and throw away the drinks, than to remove the drinks from the                                                                                      register order.

This got me thinking about the “out-of-the-box” solutions we have offered at Predii; we often integrate and implement based on client needs in our efforts to provide the best possible solutions. We work hard to mix and match solutions to provide results that are effective and make sense for our clients.  With one particular client, what we normally offered just didn’t work for their particular environmental constraints. Given a different situation, the tools we had available at the time would definitely work as they were, but due to the nature of the data it wouldn’t suffice. Our client required that the data be hosted on premises and supported by their in-house operations team. It was important to us to come up with something “out-of- the-box” so we could find a solution that worked for our client.

I thought I’d share with you what we learned. Hopefully it will help if you need to figure out what datastore will work best for you. When it came to making decisions about databases, queueing systems, and distributed processing platforms our requirements were fairly straightforward: there was a large amount of data to process and it needed to be kept on-site so whatever solution we used it would need to be set up using servers hosted on premises, any operational requirements would be handled by internal teams, and any cloud services such as Amazon AWS, would be out of the question.

With that in mind, we played around with brief exploratory trials with power houses like Couchbase and MemSQL, and Mongo.  For the most part, we have settled on Mongo.

 

The Top 5 Reasons We Chose the Datastore We Did:

1. Optional Authentication

One of the surprising quirks we realized coming from a MySQL and Postgres world was the unnecessary username and password.  This was, after all, an on-site server, barricaded with a high-class VPN and locked down with IP filtering, even for internal clients.  Managing another password was one less thing we had to worry about.  Kudos to that!  You can still have a password if you want to, Mongo supports the choice.  


2. Schemaless

At one point in time, in a prior project at another company, we had the nerve to try dynamically creating MySQL columns.  This time I’m glad we opted to go with Mongo for this, because creating additional attributes to documents within a collection is as simple as adding additional keys.  Creating a database, collection or document when they are first referenced allows for flexible setup of the domain objects.  


3. Sharding

Now, this was a concept that made so much sense, even if it did cause a management nightmare.  Once you’ve come up with strong enough sharding algorithm for your ID key, all other operations should be straightforward.  There are no additional special commands to perform CRUD operations across the shards.  The benefit is the distribution of data across the shards. This allows each of the Mongo servers to perform the CRUD operations individually, and then aggregate.  


4. Storage Engines

Since the Mongo version 3 release, we’ve been introduced to additional storage engines.  Storage engines in Mongo are the type of compression algorithm used to persist the data.  The integration with WiredTiger was one of the impressive highlights we were happy to discover.  It is not on by default, but the configuration change is worth it.  Our initial database went from 1.6TB down to about 600GB.  It is more difficult to selectively backup and restore the individual database collections by copying the system files than in previous versions. If you do not have to maintain this frequently, the benefits of a compact database outweigh this.   


5. Client API

With any tool, the availability and integration with existing software drastically reduces development cycles.  Time and effort spent in quick integration using supported Mongo frameworks was preferable to building and testing our own.  The time and effort of testing and integration would have been needed to be done in either case.  There are even Spring packages available.  We chose a mesh of the various packages because in some of our projects we needed to connect to multiple Mongo instances, which is not easy to do with Spring.


In evaluating software, the intricacies and quirks start to bubble up along with the features.  As with any relationship, you must determine if these little quirks are show stoppers or not.  In our follow up article, we’ll look at some of these quirks and how we got around them.