
One of my clients is proposing a project that requires good storage performance and high reliability. It's an entirely new project so there's no legacy code to deal with. The traditional way of doing this would be to have a cluster of systems maybe in an activa/passive configuration with database replication or with MySQL or PostgreSQL clustering. Those solutions are difficult to manage and upgrade. http://en.wikipedia.org/wiki/Apache_Cassandra I think that probably the best thing to do is to use something like Cassandra on a cluster of servers in a DC to run this. The Cassandra feature set seems good (including being able to add new servers at run-time) and developing it from scratch can't be a lot harder than doing MySQL development. Does anyone have any suggestions for planning at this early stage? I had thought of doing something similar with the Amazon EC2 equivalent to Cassandra, but a quick scan of their web site reveals no mention of it. Did Amazon cancel their cloud key-value store service? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Tue, 17 Apr 2012 05:29:13 pm Russell Coker wrote:
I had thought of doing something similar with the Amazon EC2 equivalent to Cassandra, but a quick scan of their web site reveals no mention of it. Did Amazon cancel their cloud key-value store service?
SimpleDB and DynamoDB both still exist in AWS, DynamoDB being the newer of the two.

On 17/04/12 17:29, Russell Coker wrote:
I had thought of doing something similar with the Amazon EC2 equivalent to Cassandra, but a quick scan of their web site reveals no mention of it. Did Amazon cancel their cloud key-value store service?

Hi Russell We maintain, and regularly deploy, dual master setups for MySQL. Upgrading and maintenance is very easy - zero downtime for the frontend. Obviously for scaling of a web app, read-only slaves are used and extra can be put in as needed, and there too obviously maintenance and upgrades are not an issue provided there's sufficient capacity on the overall system (which there should be, otherwise it can't handle failures either). Just to be clear, a master-master setup is not "active/passive" in the traditional sense, technically both masters are active. One thing to note is that Amazon is not a good place to host such setups because a floating IP address is used, and Amazon's Elastic IP offering is charged at datacenter-external rates. Depending on the load/activity, Amazon is not particularly economical for bigger web/db setups - so it's important to do the math. For a site that has lots of traffic for some of the day but is doing nothing the rest of the day, Amazon works well. For a site that has traffic most of the day, the cost in CPU cycles will be much higher than for instance a setup on a set of Linode servers. Linode also has an API and other tools so deployments can be automated. We use Puppet as well. Depending on dataset size and load, solutions like Cassandra can be overkill. Factors to consider are whether you need RDBMS-type access with joins, grouping and aggregates (which typically you don't get with distributed systems - you have to DIY those things, it's a trade-off) and knowledge of the technology. MySQL is not necessarily the optimal for some scenarios, but it's a general purpose system and its performance, behaviour, advantages and pitfalls are well known - known factors are easier to manage. Regards, Arjen. ----- Original Message -----
One of my clients is proposing a project that requires good storage performance and high reliability. It's an entirely new project so there's no legacy code to deal with.
The traditional way of doing this would be to have a cluster of systems maybe in an activa/passive configuration with database replication or with MySQL or PostgreSQL clustering. Those solutions are difficult to manage and upgrade.
http://en.wikipedia.org/wiki/Apache_Cassandra
I think that probably the best thing to do is to use something like Cassandra on a cluster of servers in a DC to run this. The Cassandra feature set seems good (including being able to add new servers at run-time) and developing it from scratch can't be a lot harder than doing MySQL development.
Does anyone have any suggestions for planning at this early stage?
I had thought of doing something similar with the Amazon EC2 equivalent to Cassandra, but a quick scan of their web site reveals no mention of it. Did Amazon cancel their cloud key-value store service?
-- Exec.Director @ Open Query (http://openquery.com) MySQL services Sane business strategy explorations at http://Upstarta.biz Personal blog at http://lentz.com.au/blog/

On Wed, 18 Apr 2012, Arjen Lentz wrote:
Factors to consider are whether you need RDBMS-type access with joins, grouping and aggregates (which typically you don't get with distributed systems - you have to DIY those things, it's a trade-off)
Sorry for hijacking this thread.. I am just re-implementing an outdated software using a proper RDBMS. The old software suffers from spontanous inconsistencies mainly because the underlying storage does not support transactions and parts are written before something unexpected happens. To know that I cannot write anything without having the foreign key depencies right, and written everything that belongs together, and .. .. is such a relief. I don't know how people write this stuff on NoSQL these days but it feels like a step back into pre-historic IT times. I would be interested to know how developers deal with this. Regards Peter

On Wed, 18 Apr 2012 09:57:37 am Peter Ross wrote:
I don't know how people write this stuff on NoSQL these days but it feels like a step back into pre-historic IT times.
I would be interested to know how developers deal with this.
One thing to consider is that it's not necessarily about NoSQL -OR- RDBMS. You can easily use both depending on various data requirements - an application we are working on uses both models for different types of data. For some things like storing session stores or other key=>value data where it isn't essential to rely on foreign keys NoSQL can be quite useful and very efficient. For some projects you will only use one but not the other and with no issues, yet bigger applications (or even small ones with specialised uses) may use both.

On Wed, 18 Apr 2012, Mark Johnson wrote:
On Wed, 18 Apr 2012 09:57:37 am Peter Ross wrote:
I don't know how people write this stuff on NoSQL these days but it feels like a step back into pre-historic IT times.
I would be interested to know how developers deal with this.
One thing to consider is that it's not necessarily about NoSQL -OR- RDBMS. You can easily use both depending on various data requirements - an application we are working on uses both models for different types of data. For some things like storing session stores or other key=>value data where it isn't essential to rely on foreign keys NoSQL can be quite useful and very efficient.
Yes. I wondered whether it is feasible to build something "you would do with a RDBMS" these days with a cloud-based NoSQL. I have quite respect for this old software I am rewriting - it had to do without and still it is working most of the time. But I also see how much pain it is to maintain this piece. The loss of RDBMS functionality needs a lot of thought, and for my software (as an example) I would be very unhappy if someone is just offering me just a bit of key->value store. Russell considers it because of the complexity of a cluster setup, but he might end up with unhappy developers because of increased application complexity. Regards Peter

On Wed, Apr 18, 2012 at 09:57:37AM +1000, Peter Ross wrote:
I don't know how people write this stuff on NoSQL these days but it feels like a step back into pre-historic IT times.
I would be interested to know how developers deal with this.
"Mongo DB Is Web Scale" http://www.youtube.com/watch?v=b2F-DItXtZs funny. and painfully true to real-life. "piping to /dev/null is fast as hell." craig -- craig sanders <cas@taz.net.au> BOFH excuse #396: Mail server hit by UniSpammer.

On 18/04/12 21:03, Craig Sanders wrote:
On Wed, Apr 18, 2012 at 09:57:37AM +1000, Peter Ross wrote:
I don't know how people write this stuff on NoSQL these days but it feels like a step back into pre-historic IT times.
I would be interested to know how developers deal with this.
"Mongo DB Is Web Scale"
http://www.youtube.com/watch?v=b2F-DItXtZs
funny. and painfully true to real-life.
"piping to /dev/null is fast as hell."

On 17/04/12 17:29, Russell Coker wrote:
One of my clients is proposing a project that requires good storage performance and high reliability. It's an entirely new project so there's no legacy code to deal with.
The traditional way of doing this would be to have a cluster of systems maybe in an activa/passive configuration with database replication or with MySQL or PostgreSQL clustering. Those solutions are difficult to manage and upgrade.
I'm not sure if they (Pg and Mysql) are significantly harder to manage and upgrade than Cassandra. Various very large, high-traffic organisations successfully use PostgreSQL; eg. Instragram, and Japan's largest telco, NTT. Managing a cluster of Cassandra servers comes with its own complications, I'm sure. Using a NoSQL solution will require more effort on your coders' part to manage all the edge-cases that regular SQL protects you from.
I didn't know that Facebook had ditched Cassandra in favour of HBase. Does anyone know the reasons behind that? -Toby

Matt Bottrell mbottrell@gmail.com On 17 April 2012 17:29, Russell Coker <russell@coker.com.au> wrote:
One of my clients is proposing a project that requires good storage performance and high reliability. It's an entirely new project so there's no legacy code to deal with.
Russell, before choosing a technology... it might pay to give some rough ball parks of data sizes and potential size of growth and traffic you would expect to see. This will help determine the tech selected. Remember what goes in today becomes the next decades legacy app.
Does anyone have any suggestions for planning at this early stage?
Some specs first before jumping in.... Also have you had experience with NoSQL databases? It's quite a leap from a traditional RDBMS if you've never gone that way before. I wouldn't rule out a clustered database so quickly. See what features the project requires then match the technlogy. Cheers, Matt.

Data size is in a way less important than the access pattern. At some scale it's both important.
On 17 April 2012 17:29, Russell Coker < russell@coker.com.au > wrote:
One of my clients is proposing a project that requires good storage performance and high reliability. It's an entirely new project so there's no legacy code to deal with.
Russell, before choosing a technology... it might pay to give some rough ball parks of data sizes and potential size of growth and traffic you would expect to see. This will help determine the tech selected. Remember what goes in today becomes the next decades legacy app.
-- Exec.Director @ Open Query (http://openquery.com) MySQL services Sane business strategy explorations at http://Upstarta.biz Personal blog at http://lentz.com.au/blog/
participants (7)
-
Arjen Lentz
-
Craig Sanders
-
Mark Johnson
-
Matt Bottrell
-
Peter Ross
-
Russell Coker
-
Toby Corkindale