Re: [luv-main] Purchasing a new laptop

On Tue, 27 Sep 2011, Daniel Pittman <daniel@rimspace.net> wrote:
Yes, but do you care? The day to day performance difference approximates zero, and it has done for several years, for most practical purposes. You can tell the difference when it comes to compiling software in languages with highly efficient compilers (eg: not, generally, C or C++), and when doing extremely CPU intensive operations (3D rendering, encoding), but desktop stuff?
The main performance issue I have on DESKTOP tasks (as opposed to compiling, video processing, etc) is web browsing. For that the difference between a Pentium-M 1.7GHz (like a P3) and a dual-core 64bit CPU wasn't that obvious - Mozilla is slow everywhere. Supporting more than 4G of RAM is a real performance benefit for modern desktop software, something that is a problem with my latest desktop system that is limited to 3300M of RAM (nasty Intel). For my desktop stuff moving from Mozilla to Chrome was a much better performance boost than moving from 32bit to dual-core 64bit. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Tue, 27 Sep 2011 14:32:27 +1000 Russell Coker <russell@coker.com.au> wrote: latest desktop system that is limited to
3300M of RAM (nasty Intel).
For my desktop stuff moving from Mozilla to Chrome was a much better performance boost than moving from 32bit to dual-core 64bit.
IMHO there is a noticeable performance improvement with Firefox 7 (or even 8 beta) Daniel. -- ---------------------------------------- Daniel Jitnah Melbourne, Australia e: djitnah@greenwareit.com.au w: www.greenwareit.com.au SIP: dj-git@ekiga.net ---------------------------------------- ** For All your Linux, Open Source and IT requirements visit: www.greenwareit.com.au ** -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. For All your Open Source and IT requirements see: www.greenwareit.com.au

On Tue, 27 Sep 2011, Russell Coker wrote:
On Tue, 27 Sep 2011, Daniel Pittman <daniel@rimspace.net> wrote:
Yes, but do you care? The day to day performance difference approximates zero, and it has done for several years, for most practical purposes. You can tell the difference when it comes to compiling software in languages with highly efficient compilers (eg: not, generally, C or C++), and when doing extremely CPU intensive operations (3D rendering, encoding), but desktop stuff?
The main performance issue I have on DESKTOP tasks (as opposed to compiling, video processing, etc) is web browsing.
For that the difference between a Pentium-M 1.7GHz (like a P3) and a dual-core 64bit CPU wasn't that obvious - Mozilla is slow everywhere.
My server is a laptop with 4G of RAM. According to munin[1], its average IO service time is 1 second! Some IO tasks are quick, but on the whole it is very bogged down. Opera taking 1GB, mozilla taking 500MB, a bunch of xterms, and a gig of cache should all fit in that fine. Why then do so many tasks end up in a D state waiting for each other? I wonder if I have a rogue process doing extraneous fsync()s screwing up filesystem access? I already run mozilla with libeatmydata, because of the ill-conceived misinterpretation of POSIX file atomicity. The really weird thing is that according to vmstat, it is swapping a lot. The really really weird thing is that it, at any given moment, might have 300MB free, 900MB cached and/or buffered. That cache was allocated an hour ago when kaffeine was recording a show. Why on earth does it preferentially swap out currently used apps rather than cache I haven't accessed in an hour? /proc/.../*/swappiness is at the default 10. It's just getting worse and worse as I upgrade kernels. The other machines are at 2.6.32, which was bad enough, but 3.0 seems absolutely terrible at virtual memory management. I've had to echo 3 to /proc/sys/vm/drop_caches to temporarily clear up the thrashing. Obviously no one else is suffering from these issues, as otherwise people would be rioting on the streets. I wonder what is different about mine? Is it just that I'm trying to do desktop style usage on a laptop harddisk? It is 5400rpm from memory, and it can sustain 80MB/s when writing contiguously, but the seeks involved in swapping drag it down to 3MB/s or so. But that's all my desktops could ever do when swapping too. Now that I think of it, I think the VMM behaviour sucks on large machines too. We've currently got a case open at work with redhat, where a development webserver with 12G of RAM is filling up with cache to the point where order-1 allocations are failing. Just drop the frigging cache when there's memory pressure or fragmentation for fscks sake! See also: http://tau-iota-mu-c.livejournal.com/172693.html (and no, I had to give up on zram in the end, because the compression was still too slow. I've just bought an extra 4G of ram. ark.intel.com tells me my chipset won't support it, but plenty of people on the web say it will actually work, and others say 8G total doesn't work, but 6G does. I guess we'll find out!) [1] Of course, I can't run munin all the time, because every 5 minutes it fires up and tries to allocate 3 processes that take 20MB of ram each. Apparently that's too much for my poor widdle machine with 4GB of ram, and it dives back into swap (instead of you know, dropping some of the 800MB of cache). Regularly as cronwork^Wclockwork. -- Tim Connors

On 27/09/11 15:48, Tim Connors wrote:
It's just getting worse and worse as I upgrade kernels. The other machines are at 2.6.32, which was bad enough, but 3.0 seems absolutely terrible at virtual memory management.
I've seen the same, dropped back to 2.6.38 in Ubuntu after trying 3.0.x for a while to regain some sanity. Sigh.. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Chris Samuel <chris@csamuel.org> wrote:
I've seen the same, dropped back to 2.6.38 in Ubuntu after trying 3.0.x for a while to regain some sanity. Sigh..
I don't recall reading about any virtual memory changes in recent kernels. There was an interesting LWN article on patches (not yet integrated, because not adeuqately tested) to improve page write-back performance significantly. Someone affected by this really should push bug reports upstream.

On 27/09/11 16:43, Jason White wrote:
I don't recall reading about any virtual memory changes in recent kernels.
Neither do I - but something has certainly gone askew in post 2.6.38 kernels from what I saw. Sadly I don't have the time or energy to do a git bisect to try and track this down at present. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Chris Samuel <chris@csamuel.org> wrote:
On 27/09/11 16:43, Jason White wrote:
I don't recall reading about any virtual memory changes in recent kernels.
Neither do I - but something has certainly gone askew in post 2.6.38 kernels from what I saw.
Mainline kernels or distribution-patched kernels?

On 27/09/11 16:54, Jason White wrote:
Mainline kernels or distribution-patched kernels?
Ubuntu mainline PPA (which I *believe* are unpatched). -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Tue, 27 Sep 2011, Jason White wrote:
Chris Samuel <chris@csamuel.org> wrote:
I've seen the same, dropped back to 2.6.38 in Ubuntu after trying 3.0.x for a while to regain some sanity. Sigh..
I don't recall reading about any virtual memory changes in recent kernels.
There was an interesting LWN article on patches (not yet integrated, because not adeuqately tested) to improve page write-back performance significantly.
Someone affected by this really should push bug reports upstream.
Problem is, it's just 2% here and there. 2.6.38 is much worse than 2.6.26, but I can't boot the latter anymore. 2.6.26 is much worse than 2.6.8 which was much worse than 2.4.x. How do I submit benchmarks? "2.6.38 feels much slower than old kernels. But in the meantime, I upgraded opera, Xorg, and everything else. Please fix."? The current production internal webserver with 15million files and 146,000 directories rsyncs daily to our new machine. They're identical apart from OS and the old one actually having a load and being busy. The new machine can't allocate order-1 pages when backing up despite having 11.5 out of 12GB memory free. The old machine could. But it was redhat 4 and the new one is redhat 6. Naturally, we have a case open with redhat, but if you've ever had the misfortune of doing that yourself, you'll know how futile that is. -- Tim Connors

Tim Connors <tconnors@rather.puzzling.org> wrote:
The current production internal webserver with 15million files and 146,000 directories rsyncs daily to our new machine. They're identical apart from OS and the old one actually having a load and being busy. The new machine can't allocate order-1 pages when backing up despite having 11.5 out of 12GB memory free. The old machine could. But it was redhat 4 and the new one is redhat 6. Naturally, we have a case open with redhat, but if you've ever had the misfortune of doing that yourself, you'll know how futile that is.
I would suggest taking it to a kernel-oriented list, but I don't know how interested they would be if it's a Red Hat kernel rather than a mainline kernel, since RH apply extensive patches, or so I've read. I'm also wondering whether the kernel tracing frameworks would help you to track down the cause. The problem is that they aren't well documented as far as I've seen. A further problem is that if nobody reports these issues properly, they won't get fixed.

On Tue, 27 Sep 2011, Jason White <jason@jasonjgw.net> wrote:
A further problem is that if nobody reports these issues properly, they won't get fixed.
The best thing to do is to report bugs that can be easily reproduced. Reporting something that happens on your production network is not easy to reproduce. A bug report that starts with "get a server with 4*SATA disks and 100 copies of the 3.0.0 kernel source tree on an Ext4 filesystem on software RAID-5" is something that can be reproduced without excessive effort. If your employer is serious about server performance then they should buy you a bunch of systems for testing such things. Getting some commodity servers to reproduce the bug is a good thing as it increases the chance that someone with the required skill and interest will have access to the hardware. Also it's not THAT hard to have an auto-build configuration with PXE and setting up some servers with ssh access isn't THAT hard either. If you put a pair of servers online that are identical apart from kernel version and invited any interested person to login and find out why the performance was different then it would get some interest. For any organisation that has millions of files on a web server they should be able to afford a couple of white-box systems for such testing. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On 27/09/11 17:50, Russell Coker wrote:
If your employer is serious about server performance then they should buy you a bunch of systems for testing such things.
That'd be nice. But that costs, and they may not have the budget. At VLSCI we're lucky enough to have a spare box for QA, but that's not a compute node clone, it's an infrastructure box clone. There's not really such a thing as a spare HPC compute node, they're just broken ones waiting to be fixed so you can get it back in service to run some more of the queue of jobs you've got waiting.. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

On Tue, 27 Sep 2011, Chris Samuel wrote:
On 27/09/11 17:50, Russell Coker wrote:
If your employer is serious about server performance then they should buy you a bunch of systems for testing such things.
That'd be nice. But that costs, and they may not have the budget.
We're government. As far as I can work out, given my experience with two of them, it's been impossible for government departments to actually buy equipment in the last couple of years. -- Tim Connors

On Tue, 27 Sep 2011, Chris Samuel <chris@csamuel.org> wrote:
On 27/09/11 17:50, Russell Coker wrote:
If your employer is serious about server performance then they should buy you a bunch of systems for testing such things.
That'd be nice. But that costs, and they may not have the budget. At VLSCI we're lucky enough to have a spare box for QA, but that's not a compute node clone, it's an infrastructure box clone.
You shouldn't need a box of the same specs as production systems. But hypothetically speaking even if you did need one with the same specs any reasonable cost-benefit analysis would justify it. If you have 100 production systems then the 1 test system only needs to deliver a 1% performance benefit in production to pay for itself. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Tue, 27 Sep 2011, Chris Samuel wrote:
On 27/09/11 15:48, Tim Connors wrote:
It's just getting worse and worse as I upgrade kernels. The other machines are at 2.6.32, which was bad enough, but 3.0 seems absolutely terrible at virtual memory management.
I've seen the same, dropped back to 2.6.38 in Ubuntu after trying 3.0.x for a while to regain some sanity. Sigh..
Given how contiguous reads/writes off commodity hardware today yields data at about 80MB/s, and data being pulled in and out of swap is more like about 2MB/s because of seeks, I wonder why no one has thought to implement a cache flushing policy heuristic that preferentially evicts large contiguous blocks from the cache 40 times more readily than small blocks. Or evicts large blocks 40 times more readily than causing something to swap. Keep the current policy for blocks less than say, about 1 MB. Tunable of course (although, it would be far better if it self-tuned dynamically based on measured IO reponse times and data accessing patterns). No need to keep my HD movie in RAM from when my recorder recorded it 3 hours ago, given that when I come to read it in again in 3 weeks time, the few MB/s needed to read the compressed stream at real-time rates will be orders of magnitude less than my disk could provide. If it so desired, it could buffer an entire movie into memory in the order of a few tens of secs, but if it decided to dump the equivalent amount of data from mozilla out to swap, would take about half an hour. Funnily enough, the same heuristic will be just as useful for SSDs, given their crappy write performance for small writes. -- Tim Connors

here's the LWN article that I had in mind earlier in the thread: http://lwn.net/Articles/456904/
participants (5)
-
Chris Samuel
-
Daniel Jitnah
-
Jason White
-
Russell Coker
-
Tim Connors