RE: CEPH (was ZFS vs RAID (was gpt and grub))

James Harper <james.harper@bendigoit.com.au> wrote:
CEPH does sound exciting. Is anyone here doing it with Xen?
It was still under heavy development when last I read about it. The kernel module entered the mainline and, I assume, has undergone further work since then.
File systems take a long time to mature. Testing and bug finding efforts always help, however.
Full of enthusiasm, I've just created 3 xen VM's and created a 3 node CEPH cluster inside. It's amazingly easy to set up, but from what I can see some of the init tools automate a lot of the steps that I assume need to be done manually for ongoing maintenance, so I may go back and set it up the long way around to get a feel for the steps involved, or maybe just add a few more nodes. For a production CEPH installation, I think I would still use RAID1 underneath and then deploy CEPH on top, as it lowers the risk of a node-down situation, at the expense of requiring additional disks. Debian appears to have some OCF RA's which I haven't investigated yet, as I'd still want to use pacemaker for managing the actual VM's. For a 5 node cluster with 2 copies of all xen virtual disks I assume it becomes important to run VM's on a node that has a local replica of the data... or is the data striped across all nodes transparently in a "you don't need to know" way? I'd better do some more reading I guess :) James

James Harper <james.harper@bendigoit.com.au> wrote:
For a production CEPH installation, I think I would still use RAID1 underneath and then deploy CEPH on top, as it lowers the risk of a node-down situation, at the expense of requiring additional disks.
I think it's supposed to perform all of the replication transparently across the disks that you provide, based on what I've read, so you shouldn't need RAID. A quick Web search did not reveal a recent overview of Ceph, but the original paper is here: http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf and it's now on my reading list.

Jason White <jason@jasonjgw.net> wrote:
I think it's supposed to perform all of the replication transparently across the disks that you provide, based on what I've read, so you shouldn't need RAID.
Furthermore, the CRUSH map feature looks very interesting: http://ceph.com/docs/master/rados/operations/crush-map/ so you can make the logical structure of the map correspond to the organization of your physical infrastructure and then ensure that replicas are automatically created to avoid potential hardware failure scenarios.

Jason White <jason@jasonjgw.net> wrote:
I think it's supposed to perform all of the replication transparently across the disks that you provide, based on what I've read, so you shouldn't need RAID.
Furthermore, the CRUSH map feature looks very interesting: http://ceph.com/docs/master/rados/operations/crush-map/ so you can make the logical structure of the map correspond to the organization of your physical infrastructure and then ensure that replicas are automatically created to avoid potential hardware failure scenarios.
Yes I'm liking this more and more. Each of my servers have two network interfaces, and two disks. I can bond the network interfaces together (probably LACP), or put them on separate subnets (they are now for iscsi multipath) and use CRUSH map to distribute the data in such a way that the networks see similar load. In fact this probably happens automatically with uniform distribution across nodes. Some testing is in order I think. James
participants (2)
-
James Harper
-
Jason White