Re: mail storage in a distributed database

4 Apr 2012

      On Wed, 4 Apr 2012, Craig Sanders <cas@taz.net.au> wrote:
...
On Wed, Apr 04, 2012 at 07:38:15PM +1000, Russell Coker wrote:
...
Does anyone know of a mail store that uses a distributed database like
Cassandra?
http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
BlueRunner: Building an Email Service in the Cloud
  by Jun Rao
    IBM Almaden Research Center
    Apache Cassandra Committer
found with:
http://www.google.com.au/search?q=apache+cassandra+%2B%22mail+store%22
Yes, that's the one that has nothing released apart from a PDF.
...
it occurs to me that openstack's Swift[1] object store might be good
for this. store the message with an object id of the message-id. you
probably don't even have to care about the fact that message-id is only
unique(*) per message, not per recipient (in fact, that's probably an
advantage).
Does IMAP allow altering a message?  If so you would need copy on write in 
that case.  Also the "Delivered-To" header would need to be fudged somehow.

Also rumor has it that some MTAs duplicate the message-ID.  I haven't tried to 
verify that claim.
...
hmmm.  you'd still need some sort of database so that you could get
from a recipient address, subject, and/or other fields to the message-id
(and hence to the msg body in the object store).  apart from offloading
the fulltext storage to something outside of the db, there might not be
enough value in doing this.  not sure.
might be good in combination with cassandra.
It apparently worked well for IBM, but they didn't share the code.
...
...
I want something that has a delivery agent with a similar interface to
maildrop or procmail and which has POP and IMAP servers to provide client
access.
you'd have to write a swift access module for the pop/imap daemon of
your choice (dovecot is quite modular and would probably be a good
choice), and inserting incoming messages into the store would be
a simple wrapper around either the command-line tools or the http
api.  there are also python libs.
If I was going to write it myself then I would look at Dovecot and Cassandra.  
But I really don't have the time for it, so if I end up doing some coding on 
such things it'll be helping out with someone else's project.
...
note: you need at least three nodes (preferably more than 5) to run
swift. you also need a second NIC for the nodes to talk to each
other - they chatter a LOT. you can imagine it as something like:
3 nodes is the practical minimum for any sort of distributed system no matter 
how you do it.  With less than 3 you can't have quorum if one node goes away.

A second Ethernet card on each server with a GigE switch doesn't add much to 
the cost.

On Wed, 4 Apr 2012, Craig Sanders <cas@taz.net.au> wrote:
...
OTOH, have you seen what use google's ganeti has made of DRBD layered
on top of LVM?
No, but my recent experience with DRBD hasn't made me inclined to go back for 
more.  :(

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/