Re: NUMA swapping

8 Jul 2015

      On Wed, Jul 08, 2015 at 10:19:19PM +1000, Tim Connors wrote:
...
I mentioned at the meeting that sometimes memory has to be shunted between
NUMA nodes via swapping out to disk because.... brokenness.
AFAIK that hasn't happened for years. that was the first numa
migration implementation that SGI did (IIRC the place I used to work
paid for that work as part of a supercomputer contract).

numa migration has since been refined to not use anyting so crude.
all happens in ram now, as it should.

cgroups have re-raised a whole bunch of these issues though, as they
pretend to be mini-machines using a subset of ram, and they don't yet
have all the sophistication of the real virtual memory system. they're
getting there though...
...
Stewart's post just handily came up and reminded me about this, and came
with Citations Needed[TM]:
https://www.flamingspork.com/blog/2015/07/08/the-sad-state-of-mysql-and-numa...
in 2010 the default distro zone_reclaim_mode could still have been
wrong and I guess it could cause spurious swapping. setting it to
zone_reclaim_mode=0 fixes it. that sounds like their problem to me.

zone_reclaim_mode can even cause numa related deadlocks if != 0.
BoM almost kicked out their last supercomputer vendor before I told
folks at BoM to change that setting :)

the default is 0 for modern kernels.

cheers,
robin

Re: NUMA swapping

Robin Humble