
On 22/05/12 17:17, Russell Coker wrote:
On Tue, 22 May 2012, James Harper<james.harper@bendigoit.com.au> wrote:
vg's 0, 1, and 2 are on different sets of spindles to give higher IOPS. The problem is that the 4 snapshots need to be created atomically or else I'm going to get my database and transaction logs out of sync as there could be a few ms between the snapshot of lv2 and lv3.
If a database is designed correctly shouldn't it be possible to be consistent even if the snapshots are at different times?
The purpose of the log is that logged transactions can be replayed if they didn't get to the main data store. So if you snapshot the database filesystem and then snapshot the log a bit later then the DB server should just replay the newer transactions and everything should be fine - according to my knowledge of databases.
Am I wrong in my understanding of databases?
I think you're missing a bit of the understanding of how they work, but the end result is the same -- if you capture the main datastore and enough surrounding logs, then you're usually OK.. In case you're interested about how PostgreSQL (and to a lesser degreee, SQLite) work with logs.. Keep in mind that your database needs to maintain quite a few sets of the *same* info. ie. If a user has started a transaction and is working on some data, but another user has come along and changed that data in a separate transaction. Firstly, you need to keep uncommitted changes to data around without committing it to the primary data store. Then once it's committed, you still can't write over the old data until every other transaction looking at it has finished too. Thirdly, databases were written with spinning disks in mind, which suck at random write i/o. So they tend to write everything out to log files, sequentially, then write it to the main datastore in the background as time allows. So, the "log" files are actually quite important -- they contain data that may *not be allowed (yet)* to be written to the main store.
FWIW I have my MySQL VMs setup with a /mysql filesystem that contains database files under /mysql/db and logs under /mysql/log.
It's generally claimed that the benefits of having a separate filesystem for the database log at the low end involve different RAID levels. For high end databases I've seen an Oracle document about how to tune different ZFS filesystems for different parts of an Oracle database. But both those cases probably aren't great options for a VM.
Indeed -- as mentioned above, the logs are all sequential writes, whereas the data store is random writes. (Both then feature random and sequential reads, but the logs are usually still in your file system cache so it doesn't matter for them)