
On Wed, 4 Apr 2012, Craig Sanders <cas@taz.net.au> wrote:
On Wed, Apr 04, 2012 at 07:38:15PM +1000, Russell Coker wrote:
Does anyone know of a mail store that uses a distributed database like Cassandra?
http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
BlueRunner: Building an Email Service in the Cloud by Jun Rao IBM Almaden Research Center Apache Cassandra Committer
found with:
http://www.google.com.au/search?q=apache+cassandra+%2B%22mail+store%22
Yes, that's the one that has nothing released apart from a PDF.
it occurs to me that openstack's Swift[1] object store might be good for this. store the message with an object id of the message-id. you probably don't even have to care about the fact that message-id is only unique(*) per message, not per recipient (in fact, that's probably an advantage).
Does IMAP allow altering a message? If so you would need copy on write in that case. Also the "Delivered-To" header would need to be fudged somehow. Also rumor has it that some MTAs duplicate the message-ID. I haven't tried to verify that claim.
hmmm. you'd still need some sort of database so that you could get from a recipient address, subject, and/or other fields to the message-id (and hence to the msg body in the object store). apart from offloading the fulltext storage to something outside of the db, there might not be enough value in doing this. not sure.
might be good in combination with cassandra.
It apparently worked well for IBM, but they didn't share the code.
I want something that has a delivery agent with a similar interface to maildrop or procmail and which has POP and IMAP servers to provide client access.
you'd have to write a swift access module for the pop/imap daemon of your choice (dovecot is quite modular and would probably be a good choice), and inserting incoming messages into the store would be a simple wrapper around either the command-line tools or the http api. there are also python libs.
If I was going to write it myself then I would look at Dovecot and Cassandra. But I really don't have the time for it, so if I end up doing some coding on such things it'll be helping out with someone else's project.
note: you need at least three nodes (preferably more than 5) to run swift. you also need a second NIC for the nodes to talk to each other - they chatter a LOT. you can imagine it as something like:
3 nodes is the practical minimum for any sort of distributed system no matter how you do it. With less than 3 you can't have quorum if one node goes away. A second Ethernet card on each server with a GigE switch doesn't add much to the cost. On Wed, 4 Apr 2012, Craig Sanders <cas@taz.net.au> wrote:
OTOH, have you seen what use google's ganeti has made of DRBD layered on top of LVM?
No, but my recent experience with DRBD hasn't made me inclined to go back for more. :( -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/