
Is there any good FOSS distributed database that's not a heap of Maven rubbish that can't be supported in a distribution? I've been briefly looking at Cockroach, Hbase, Voldemort, Ignite Accumulo, and of course I had tried Cassandra at a LUV event. All the ones I looked at in detail couldn't be packaged for Debian because they used Maven for the build system and a build process that downloads java programs from the web doesn't fit with reproducible builds. I presume that the others which aren't in Debian are in a similar situation. Does anyone know of a good candidate that could be packaged for Debian? Failing that which of the ones that suck too badly for inclusion in Debian don't suck so badly that they are horrible to use? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Wed, Jun 10, 2020 at 04:35:11PM +1000, Russell Coker via luv-main wrote:
Is there any good FOSS distributed database that's not a heap of Maven rubbish that can't be supported in a distribution? I've been briefly looking at Cockroach, Hbase, Voldemort, Ignite Accumulo, and of course I had tried Cassandra at a LUV event.
What properties do you (not) need? Considering your list; I'll recommend Anna. Page 3 of Anna's paper references 20; including 3/5 from your list Hbase, Voldemort, and Cassandra. https://dsf.berkeley.edu/jmh/papers/anna_ieee18.pdf#page=3
All the ones I looked at in detail couldn't be packaged for Debian because they used Maven for the build system and a build process that downloads java programs from the web doesn't fit with reproducible builds. I presume that the others which aren't in Debian are in a similar situation.
Anna is written in C++ with the usual cmake build process. No Java. https://github.com/hydro-project/anna https://hydro-project.github.io/ https://databeta.wordpress.com/2018/03/09/anna-kvs/ https://rise.cs.berkeley.edu/blog/going-fast-and-cheap-how-we-made-anna-auto... I discovered Anna while following 'The Morning Paper'. This site has a very high signal to noise ratio. https://blog.acolyer.org/2018/03/27/anna-a-kvs-for-any-scale/ https://blog.acolyer.org/
Does anyone know of a good candidate that could be packaged for Debian? Failing that which of the ones that suck too badly for inclusion in Debian don't suck so badly that they are horrible to use?
I don't believe Anna is packaged on Debian. It shouldn't be difficult to add. The table on page 3 points to many alternatives, some of which may be packaged. Anna is Apache-2 licensed and its last commit was on the 9th of May.

The paper I linked for Anna is "v0" and has been extended with "v1". The C++ implementation follows "v1". https://arxiv.org/abs/1809.00089 v0 implements any-scale coordination-free partitioned lattice replication with simple multi-master replication using a single number for the whole system. v1 replaces the single number with selective replication (per key), adds vertical tiering (in-memory vs. persistent) and horizontal elasticity (scaling the cluster respectively with load to keep the minimum latency within a bound). This adds two services around the core anna-kvs, namely anna-monitor and anna-router. The monitor watches the cluster and tunes the selective replication such that hot keys are available on many in-memory nodes, while cold keys exist on fewer on-disk nodes. If the network is under or over utilized, the monitor can add or remove resources from the network (if presented the functionality and authority to do so; otherwise it may just warn its administrator accordingly.) The router abstracts away the indirections of the elastic system from clients. v0 was approximately 2k lines of C++ excluding external libraries (zmq and protobufs) but including the lattice library and client code. (Probably excluding comments and blank lines.) v1 as-of-now is 3166 lines of code excluding comments and blank lines. There are a few lattices defined with different consistency semantics. (Causally) (un)ordered (multi)values. I.e. {get,put} key value, {get,put}_set key [values], {get,put}_causal key value. It only takes about 10-20 lines of additional code to implement a different semantic if necessary for your application. Finally, v0 was a prototype and I haven't seen any indication that v1 is "production ready" but it does not appear far from it. I have fixed one issue and have reported a few more. But these are minor and should be easy to resolve. Hint; don't worry about the current 100% CPU utilization... That will likely be fixed by replacing just one async operation with a blocking one.

On 10/6/20 4:35 pm, Russell Coker via luv-main wrote:
Is there any good FOSS distributed database that's not a heap of Maven rubbish that can't be supported in a distribution?
ScyllaDB is a C++ rewrite of Cassandra: https://www.scylladb.com/ I haven't tried it, but it is apparently much faster.
participants (3)
-
James McGlashan
-
Paul Dwerryhouse
-
Russell Coker