
On Wed, Apr 04, 2012 at 07:38:15PM +1000, Russell Coker wrote:
Does anyone know of a mail store that uses a distributed database like Cassandra?
http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf BlueRunner: Building an Email Service in the Cloud by Jun Rao IBM Almaden Research Center Apache Cassandra Committer found with: http://www.google.com.au/search?q=apache+cassandra+%2B%22mail+store%22 it occurs to me that openstack's Swift[1] object store might be good for this. store the message with an object id of the message-id. you probably don't even have to care about the fact that message-id is only unique(*) per message, not per recipient (in fact, that's probably an advantage). i've always been against the idea of storing mail in a database, but an object store isn't a database....it's more like an enormous flat filesystem (with buckets) or a giant key/value-pair store...a much better fit for this task than a relational database. [1] http://swift.openstack.org/ hmmm. you'd still need some sort of database so that you could get from a recipient address, subject, and/or other fields to the message-id (and hence to the msg body in the object store). apart from offloading the fulltext storage to something outside of the db, there might not be enough value in doing this. not sure. might be good in combination with cassandra.
I want something that has a delivery agent with a similar interface to maildrop or procmail and which has POP and IMAP servers to provide client access.
you'd have to write a swift access module for the pop/imap daemon of your choice (dovecot is quite modular and would probably be a good choice), and inserting incoming messages into the store would be a simple wrapper around either the command-line tools or the http api. there are also python libs. (*) for pretty-damn-good values of "unique". note: you need at least three nodes (preferably more than 5) to run swift. you also need a second NIC for the nodes to talk to each other - they chatter a LOT. you can imagine it as something like: node1 -> node2: do you have version x of foo? node1 -> node3: do you have version x of foo? node2 -> node1: yes. node2 -> node1: do you have version y of bar? node3 -> node1: i have a later version, here it is. node3 -> node2: do you have version x of foo? node1 -> node2: no, gimme. node2 -> node3: yes. node2 -> node1: node 5 has gone down, you're secondary so grab a copy of this. node2 -> node1: here it is. blah blah blah. the chatter is constant. however the data is highly redundant, highly available and the data store is self-repairing. it's also massively scalable - add more nodes as storage and load requires. craig -- craig sanders <cas@taz.net.au> BOFH excuse #89: Electromagnetic energy loss