
On Wed, Jul 08, 2015 at 10:19:19PM +1000, Tim Connors wrote:
I mentioned at the meeting that sometimes memory has to be shunted between NUMA nodes via swapping out to disk because.... brokenness.
AFAIK that hasn't happened for years. that was the first numa migration implementation that SGI did (IIRC the place I used to work paid for that work as part of a supercomputer contract). numa migration has since been refined to not use anyting so crude. all happens in ram now, as it should. cgroups have re-raised a whole bunch of these issues though, as they pretend to be mini-machines using a subset of ram, and they don't yet have all the sophistication of the real virtual memory system. they're getting there though...
Stewart's post just handily came up and reminded me about this, and came with Citations Needed[TM]:
https://www.flamingspork.com/blog/2015/07/08/the-sad-state-of-mysql-and-numa...
in 2010 the default distro zone_reclaim_mode could still have been wrong and I guess it could cause spurious swapping. setting it to zone_reclaim_mode=0 fixes it. that sounds like their problem to me. zone_reclaim_mode can even cause numa related deadlocks if != 0. BoM almost kicked out their last supercomputer vendor before I told folks at BoM to change that setting :) the default is 0 for modern kernels. cheers, robin