Mail Server Really Slow

Piers Rowan

19 Jan 2016 19 Jan '16

12:12 a.m.

Hi there, The server runs: - dovecot - apache (roundcube webmail) - sendmail - clamav-milter - amavisd-milter RAM: 4.10 GB CPU: QEMU Virtual CPU version (cpu64-rhel6), 5 cores The server is a VM on a host server that also provides http / mysql services. The host server runs cron jobs to poll the email server (importing data from mail boxes into the CRM) so - to clutch at straws - I am not sure if the host and guest are competing for the disk IO at the same time with these calls. Contrary to that is that the host server does not experience any slow downs. Before the holidays we added another 30 users to the servers. Below is some of the output of some commands. At the moment I am thinking of moving the VM services into a cloud based host but this it not ideal in the short term. Any ideas? Thanks P CPU load averages 8.74 (1 min) 8.05 (5 mins) 5.63 (15 mins) [root]# vmstat 5 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 5 211056 141268 42948 2472992 1 1 246 47 2 2 2 1 93 5 0 0 3 211056 127876 43208 2486312 0 0 2710 46 1212 895 2 1 66 31 0 0 3 211104 141252 41136 2470440 0 10 3164 45 1109 804 2 1 74 23 0 0 3 211104 128228 41412 2484500 0 0 2859 18 566 640 0 0 60 39 0 0 3 211180 143320 40464 2464324 0 15 3122 70 1254 1030 2 1 57 40 0 Tasks: 502 total, 1 running, 501 sleeping, 0 stopped, 0 zombie Cpu(s): 7.7%us, 2.1%sy, 0.0%ni, 48.3%id, 41.6%wa, 0.0%hi, 0.0%si, 0.2%st Mem: 4296356k total, 4169448k used, 126908k free, 41144k buffers Swap: 2621436k total, 211980k used, 2409456k free, 2484540k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7067 clam 20 0 1771m 361m 1584 S 29.5 8.6 265:23.91 clamd 6450 kylier 20 0 19300 2828 1404 D 1.0 0.1 0:00.03 imap 6544 root 20 0 15396 1552 888 R 1.0 0.0 0:00.29 top 3848 root 20 0 78752 3024 2244 S 0.7 0.1 0:00.23 auth 6398 vickij 20 0 18528 2072 1400 D 0.7 0.0 0:00.48 imap 6419 uuu.s 20 0 18524 2024 1388 D 0.7 0.0 0:00.42 imap 6424 uuuu 20 0 19300 2812 1404 D 0.7 0.1 0:00.23 imap 23252 jasmin.s 20 0 19004 2652 1572 S 0.7 0.1 0:00.42 imap 23345 jasmin.s 20 0 26136 2472 1688 S 0.7 0.1 0:02.72 imap 29215 abigail. 20 0 18840 2068 1536 S 0.7 0.0 0:00.58 imap 1 root 20 0 19356 872 648 S 0.3 0.0 6:07.28 init 1336 root 20 0 244m 5128 508 S 0.3 0.1 19:20.96 rsyslogd 6293 glenn.ge 20 0 18388 1888 1432 S 0.3 0.0 0:00.01 imap 6567 nic.will 20 0 18408 1996 1512 D 0.3 0.0 0:00.08 imap 7091 root 20 0 19268 568 404 S 0.3 0.0 12:40.65 dovecot 7175 root 20 0 98.6m 364 320 S 0.3 0.0 3:43.93 tail 23388 uuuu 20 0 19000 2568 1544 S 0.3 0.1 0:14.31 imap 23460 jasmin.s 20 0 23248 2756 1756 D 0.3 0.1 0:01.54 imap ---8<---------8<---- # like this from here down 2 root 20 0 0 0 0 S 0.0 0.0 0:28.72 kthreadd

Show replies by date

Russell Coker

19 Jan 19 Jan

12:21 a.m.

On Tue, 19 Jan 2016 11:12:12 AM Piers Rowan via luv-main wrote:

...

RAM: 4.10 GB CPU: QEMU Virtual CPU version (cpu64-rhel6), 5 cores

The server is a VM on a host server that also provides http / mysql services. The host server runs cron jobs to poll the email server (importing data from mail boxes into the CRM) so - to clutch at straws - I am not sure if the host and guest are competing for the disk IO at the same time with these calls. Contrary to that is that the host server does not experience any slow downs.

The best way to see if disk IO is the problem is to run iostat. I run "iostat -x 10" as I find that if a disk is at high load for 10 seconds that usually means that there is a serious performance issue. In the top output you can see that multiple users have IMAP processes blocked on disk IO which is an indication that disk IO speed is probably the issue.

...

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7067 clam 20 0 1771m 361m 1584 S 29.5 8.6 265:23.91 clamd 6450 kylier 20 0 19300 2828 1404 D 1.0 0.1 0:00.03 imap 6544 root 20 0 15396 1552 888 R 1.0 0.0 0:00.29 top 3848 root 20 0 78752 3024 2244 S 0.7 0.1 0:00.23 auth 6398 vickij 20 0 18528 2072 1400 D 0.7 0.0 0:00.48 imap 6419 uuu.s 20 0 18524 2024 1388 D 0.7 0.0 0:00.42 imap 6424 uuuu 20 0 19300 2812 1404 D 0.7 0.1 0:00.23 imap

-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Piers Rowan

2:03 a.m.

...

The best way to see if disk IO is the problem is to run iostat. I run "iostat -x 10" as I find that if a disk is at high load for 10 seconds that usually means that there is a serious performance issue. In the top output you can see that multiple users have IMAP processes blocked on disk IO which is an indication that disk IO speed is probably the issue.

sda = a usb backup drive dm-0 = / (MySQL) RAID dm-1 = / (MySQL) RAID dm-2-5 = /home (also where VM's Live) [1 x hot spare] Does this help at all? Thanks for your input. Cheers Piers iostat -x 10 Linux 2.6.32-573.7.1.el6.x86_64 19/01/16 _x86_64_ (12 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.27 0.00 1.74 3.98 0.00 89.01 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 144.40 398.34 92.54 95.71 5790.73 4493.15 54.63 0.32 1.71 1.38 25.92 dm-0 0.00 0.00 154.36 81.59 1234.90 652.68 8.00 0.04 0.16 0.44 10.43 dm-1 0.00 0.00 154.36 81.59 1234.90 652.68 8.00 0.04 0.17 0.44 10.44 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 3.36 3.07 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 9.63 8.24 77.01 65.93 8.00 0.22 12.41 1.52 2.72 dm-5 0.00 0.00 73.20 404.46 4478.78 3774.52 17.28 0.24 0.27 0.46 21.75 avg-cpu: %user %nice %system %iowait %steal %idle 7.60 0.00 1.51 7.34 0.00 83.55 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 574.70 116.50 191.70 62.50 8408.00 1175.20 37.70 2.62 9.83 1.67 42.39 dm-0 0.00 0.00 654.50 168.90 5236.00 1351.20 8.00 17.86 9.75 0.36 29.93 dm-1 0.00 0.00 654.50 168.90 5236.00 1351.20 8.00 17.86 9.76 0.36 29.93 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 111.40 10.50 3216.00 233.60 28.30 1.46 11.89 2.98 36.29 avg-cpu: %user %nice %system %iowait %steal %idle 6.20 0.00 1.83 16.75 0.00 75.22 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 1108.30 102.70 499.10 31.70 18019.20 1560.00 36.89 4.53 8.76 1.88 99.87 dm-0 0.00 0.00 1346.00 126.90 10768.00 1015.20 8.00 28.12 25.76 0.50 74.09 dm-1 0.00 0.00 1346.00 126.90 10768.00 1015.20 8.00 28.12 25.76 0.50 74.08 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 3.90 0.00 31.20 0.00 8.00 0.04 10.62 4.38 1.71 dm-5 0.00 0.00 257.80 7.10 7170.40 135.20 27.58 2.67 10.09 3.76 99.68 avg-cpu: %user %nice %system %iowait %steal %idle 7.48 0.00 2.10 11.25 0.00 79.17 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 70.20 54.90 421.30 33.90 12352.00 855.20 29.01 2.13 4.69 2.08 94.63 dm-0 0.00 0.00 86.80 77.70 694.40 621.60 8.00 1.43 8.66 1.29 21.17 dm-1 0.00 0.00 86.80 77.70 694.40 621.60 8.00 1.43 8.66 1.29 21.18 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 12.40 0.00 99.20 0.00 8.00 0.19 15.31 3.06 3.79 dm-5 0.00 0.00 392.50 11.10 11556.00 233.60 29.21 1.65 4.10 2.34 94.48 avg-cpu: %user %nice %system %iowait %steal %idle 7.65 0.00 2.05 13.18 0.00 77.12 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 672.70 144.50 447.50 78.20 16368.00 1842.40 34.64 4.91 9.32 1.81 95.16 dm-0 0.00 0.00 820.20 208.60 6561.60 1668.80 8.00 24.26 23.57 0.68 70.00 dm-1 0.00 0.00 820.20 208.60 6561.60 1668.80 8.00 24.26 23.57 0.68 70.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 4.30 0.00 34.40 0.00 8.00 0.13 29.98 4.95 2.13 dm-5 0.00 0.00 295.80 14.20 9770.40 174.40 32.08 2.75 8.87 3.03 93.81 avg-cpu: %user %nice %system %iowait %steal %idle 10.51 0.00 2.53 14.68 0.00 72.28 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 45.00 88.70 128.50 94.70 3609.60 1676.80 23.68 35.44 131.92 4.00 89.20 dm-0 0.00 0.00 83.40 182.10 667.20 1456.80 8.00 44.11 135.81 2.08 55.29 dm-1 0.00 0.00 83.40 182.10 667.20 1456.80 8.00 44.11 135.81 2.08 55.29 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 3.30 0.00 26.40 0.00 8.00 0.09 28.73 6.48 2.14 dm-5 0.00 0.00 86.90 15.60 2922.40 388.80 32.30 1.93 18.82 8.25 84.59 avg-cpu: %user %nice %system %iowait %steal %idle 9.36 0.00 1.62 6.39 0.00 82.63 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 39.10 49.00 67.60 180.80 1193.60 3053.60 17.10 2.95 36.02 1.67 41.40 dm-0 0.00 0.00 64.60 75.90 516.80 607.20 8.00 3.23 80.40 2.02 28.38 dm-1 0.00 0.00 64.60 75.90 516.80 607.20 8.00 3.24 80.40 2.02 28.39 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 15.80 0.00 126.40 0.00 8.00 0.25 15.58 4.72 7.45 dm-5 0.00 0.00 26.20 139.60 544.00 2277.60 17.02 0.87 5.20 2.20 36.51 avg-cpu: %user %nice %system %iowait %steal %idle 12.29 0.00 1.80 12.47 0.00 73.44 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 43.40 123.30 194.70 323.90 2689.60 4763.20 14.37 4.36 8.42 1.39 72.07 dm-0 0.00 0.00 77.90 233.60 623.20 1868.80 8.00 2.75 8.83 1.26 39.38 dm-1 0.00 0.00 77.90 233.60 623.20 1868.80 8.00 2.75 8.83 1.26 39.39 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 28.90 0.00 231.20 0.00 8.00 0.29 9.65 1.99 5.76 dm-5 0.00 0.00 131.40 213.50 1838.40 2893.60 13.72 2.91 8.45 1.93 66.50 avg-cpu: %user %nice %system %iowait %steal %idle 8.93 0.00 1.91 8.96 0.00 80.20 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 45.30 51.80 144.70 142.80 1862.40 2254.40 14.32 4.33 15.05 2.30 66.21 dm-0 0.00 0.00 59.30 85.50 474.40 684.00 8.00 4.01 27.67 2.42 35.08 dm-1 0.00 0.00 59.30 85.50 474.40 684.00 8.00 4.01 27.67 2.42 35.11 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 36.00 0.00 288.00 0.00 8.00 0.43 11.91 2.08 7.50 dm-5 0.00 0.00 94.60 109.10 1096.80 1570.40 13.09 1.28 6.27 2.59 52.83

Piers Rowan

2:15 a.m.

...

...
The best way to see if disk IO is the problem is to run iostat. I run "iostat -x 10" as I find that if a disk is at high load for 10 seconds that usually means that there is a serious performance issue. In the top output you can see that multiple users have IMAP processes blocked on disk IO which is an indication that disk IO speed is probably the issue.

The mail server VM iostat -x 10 Linux 2.6.32-573.12.1.el6.x86_64 (clients.webgen.com.au) 19/01/16 _x86_64_ (5 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 1.56 0.01 0.58 4.68 0.04 93.14 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vda 1.20 14.03 1.51 4.86 60.27 145.11 32.21 0.21 33.21 6.05 3.86 vdb 0.08 19.10 10.89 0.84 982.21 161.89 97.52 0.27 22.99 6.96 8.17 vdc 0.30 8.25 18.70 11.90 1396.54 158.89 50.83 1.98 64.62 3.61 11.05 dm-0 0.00 0.00 1.35 17.22 49.29 137.51 10.06 0.19 10.34 1.72 3.20 dm-1 0.00 0.00 1.37 0.95 10.96 7.59 8.00 1.08 467.01 4.59 1.06 dm-2 0.00 0.00 19.03 20.11 1396.54 158.89 39.74 1.46 37.35 2.82 11.05 dm-3 0.00 0.00 10.97 20.24 982.21 161.89 36.66 0.98 31.25 2.62 8.17 avg-cpu: %user %nice %system %iowait %steal %idle 1.17 0.00 0.70 44.64 0.06 53.43 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vda 0.90 5.20 3.00 3.70 96.80 68.00 24.60 0.79 117.54 31.31 20.98 vdb 0.20 57.90 126.70 1.70 4590.40 475.20 39.45 2.77 21.47 7.72 99.17 vdc 0.00 3.20 68.90 15.80 1103.20 147.20 14.76 4.56 53.92 11.70 99.09 dm-0 0.00 0.00 1.70 8.40 78.40 67.20 14.42 1.59 157.81 19.95 20.15 dm-1 0.00 0.00 0.00 0.10 0.00 0.80 8.00 0.00 3.00 3.00 0.03 dm-2 0.00 0.00 68.70 18.80 1102.40 147.20 14.28 4.61 52.67 11.32 99.09 dm-3 0.00 0.00 126.80 59.40 4588.80 475.20 27.20 3.81 20.43 5.33 99.17 avg-cpu: %user %nice %system %iowait %steal %idle 0.92 0.00 0.56 41.01 0.04 57.46 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util vda 0.00 7.00 0.00 2.60 0.00 72.00 27.69 0.14 55.31 43.92 11.42 vdb 0.00 2.20 101.20 1.10 5320.80 31.20 52.32 1.21 11.92 9.68 99.01 vdc 0.60 2.30 104.10 2.10 1779.20 33.60 17.07 1.33 12.49 9.39 99.75 dm-0 0.00 0.00 0.00 9.00 0.00 72.00 8.00 0.81 90.49 12.69 11.42 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 104.70 4.20 1779.20 33.60 16.65 1.34 12.29 9.16 99.75 dm-3 0.00 0.00 101.20 3.90 5319.20 31.20 50.91 1.21 11.62 9.42 99.01

Piers Rowan

2:18 a.m.

...

The mail server VM

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_1192521767-lv_root 22G 3.3G 18G 16% / tmpfs 2.9G 0 2.9G 0% /dev/shm /dev/vda1 477M 147M 305M 33% /boot /dev/mapper/vg_c-home 245G 216G 16G 94% /home /dev/mapper/vg_b-spool 44G 36G 5.5G 87% /var/spool/mail

Piers Rowan

2:31 a.m.

avg-cpu: %user %nice %system %iowait %steal %idle 0.35 0.00 0.39 27.75 0.04 71.46 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util .. dm-1 0.00 0.00 1.37 0.95 10.97 7.61 8.00 1.08 466.65 4.58 1.06 [root@clients webgen]# ls -l /dev/mapper/* .. lrwxrwxrwx. 1 root root 7 Dec 26 13:54 /dev/mapper/vg_1192521767-lv_swap -> ../dm-1 Looks like it is trying to write a lot of stuff to swap - but its not a large amount of data: Filename Type Size Used Priority /dev/dm-1 partition 2621436 500044 -1 How would having RAM not reduce this? Any help appreciated. Cheers Piers

Russell Coker

3:22 a.m.

On Tue, 19 Jan 2016 01:31:14 PM Piers Rowan via luv-main wrote:

...

avg-cpu: %user %nice %system %iowait %steal %idle 0.35 0.00 0.39 27.75 0.04 71.46

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util .. dm-1 0.00 0.00 1.37 0.95 10.97 7.61 8.00 1.08 466.65 4.58 1.06

[root@clients webgen]# ls -l /dev/mapper/* .. lrwxrwxrwx. 1 root root 7 Dec 26 13:54 /dev/mapper/vg_1192521767-lv_swap -> ../dm-1

Looks like it is trying to write a lot of stuff to swap - but its not a large amount of data:

Swap isn't being accessed much. Often a server will swap out things that aren't used much to free RAM for disk cache. If iostat says it's not accessed much then it's not a big deal. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Russell Coker

3:24 a.m.

On Tue, 19 Jan 2016 01:15:30 PM Piers Rowan via luv-main wrote:

...

dm-2 0.00 0.00 68.70 18.80 1102.40 147.20 14.28 4.61 52.67 11.32 99.09 dm-3 0.00 0.00 126.80 59.40 4588.80 475.20 27.20 3.81 20.43 5.33 99.17

dm-2 and dm-3 are the problem ones, 99% IO utilisation. Why do you use LVM inside a virtual machine? That offers no real benefit and makes things more difficult to debug things as it will be a pain if the VM doesn't boot properly and you need to fix that LV from the Dom0. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Craig Sanders

4:40 a.m.

On Tue, Jan 19, 2016 at 02:24:01PM +1100, Russell Coker wrote:

...

Why do you use LVM inside a virtual machine?

my guess is that it's the default for the RH installer to use lvm. ditto for centos and fedora.

...

That offers no real benefit and makes things more difficult to debug things as it will be a pain if the VM doesn't boot properly and you need to fix that LV from the Dom0.

the best way to use LVM for VMs is to create an LVM volume on the host machine and tell the VM to use that as its disk. The VM doesn't need to know or care what the underlying disk is, and it certainly shouldn't be running LVM on top of whatever the host gives it. the only time that makes any sense is when you're using a VM to experiment with LVM (or ZFS or btrfs or whatever)....i.e. testing and research, not production use. craig -- craig sanders <cas@taz.net.au>

Piers Rowan

6:46 a.m.

On 19/01/16 13:24, Russell Coker wrote:

...

dm-2 and dm-3 are the problem ones, 99% IO utilisation. Why do you use LVM inside a virtual machine? That offers no real benefit and makes things more difficult to debug things as it will be a pain if the VM doesn't boot properly and you need to fix that LV from the Dom0.

Sorry for the late reply - told you checking email was slow. LVM was put in as a measure because of the default CenTOS install + running out of disk space. Previously to that I used NFS mounts for /home /var/spool/mail but this played up with shared mailboxes and IMAP/Dovecot on large (4GB+ mail files). Probably best to start again with this machine as a VM. Thanks P

Russell Coker

3:26 a.m.

On Tue, 19 Jan 2016 01:03:24 PM Piers Rowan via luv-main wrote:

...

dm-2-5 = /home (also where VM's Live) [1 x hot spare]

...

dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 3.90 0.00 31.20 0.00 8.00 0.04 10.62 4.38 1.71 dm-5 0.00 0.00 257.80 7.10 7170.40 135.20 27.58 2.67 10.09 3.76 99.68

What are dm-2 dm-3 and dm-4 doing? Why are 3 of them idle while 1 is busy? What are the physical devices for those dm devices? Can you show us iostat for the physical devices? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Mike O'Connor

20 Jan 20 Jan

3:23 a.m.

On 19/01/2016 12:33 pm, Piers Rowan wrote:

...

...
The best way to see if disk IO is the problem is to run iostat. I run "iostat -x 10" as I find that if a disk is at high load for 10 seconds that usually means that there is a serious performance issue. In the top output you can see that multiple users have IMAP processes blocked on disk IO which is an indication that disk IO speed is probably the issue.

sda = a usb backup drive dm-0 = / (MySQL) RAID dm-1 = / (MySQL) RAID

dm-2-5 = /home (also where VM's Live) [1 x hot spare]

Does this help at all?

Thanks for your input.

Cheers

Piers

Hi Piers I use a program called 'atop', it make seeing IO and other issues very easy. I suggest that your run this on the host and each of your guest VM. You need to find which guest is causing the IO load. Look for the line of the left with DSK, the busy number which is in % gives very good idea of the load on your disk subsystem. Also the 'avio' is a really good indication also. Mike

Craig Sanders

19 Jan 19 Jan

1:22 a.m.

On Tue, Jan 19, 2016 at 10:12:12AM +1000, Piers Rowan wrote:

...

- dovecot - apache (roundcube webmail) - sendmail

unless you're a sendmail expert with a decade or two of experience working with it, you might want to think about switching to postfix.

...

- clamav-milter - amavisd-milter

RAM: 4.10 GB CPU: QEMU Virtual CPU version (cpu64-rhel6), 5 cores

CPU allocation of 5 cores is probably overkill for mail, unless you're processing a LOT of incoming mail with clamav and spamassassin. how many users do you have, and how much mail are you receiving and processing (msgs/day and megabytes/day)? btw, I used to run mail servers at ISPs for thousands of users on 1990s and 2000s hardware, which was nowhere near as fast (slower disks, slower CPU, *much* less RAM) as what you can get today for a fraction of the price. if your mail VM can't even handle a few dozen or few hundred users, there's something seriously wrong....probably disk I/O contention from other services running on the same machine and disk.

...

The server is a VM on a host server that also provides http / mysql services.

mail is VERY dependent on disk I/O - especially if you have multiple users reading their mail via POP/IMAP (or via webmail such as roundcube, which connects via imap). and it only gets worse if you have other processes fighting dovecot for disk access. If at all possible, you should consider having a dedicated mail server, or at least dedicated drive(s) for mail, that doesn't have to share disk I/O with anything else - and certainly not with other I/O heavy services like a web server or mysql. If your server is located in-house (i.e. not in a co-lo facility), you may also want to consider adding a fast SSD (or two in RAID-1 configuration for safety AND roughly double the read performance) just for the mail spool (and make sure dovecot etc are configured NOT to move mail to user home directories, but to leave them on the SSD). You can get Samsung 850 Pro 128GB for $116. $188 for 256GB. if your total mail size is < 128GB and unlikely to grow that large in the forseeable future, you're better off with a pair of 128GB drives in RAID-1 than a single 256GB drive. if it's a VM at a co-lo facility, talk to them about getting a host with at least one SSD so you can move your mail spool to that. the random-access nature of SSDs (i.e. they don't have to waste time moving the disk head around to access data) mitigates many of the speed problems caused by having multiple users read large mailboxes at the same time. spinning hard disks will get around 100 IOPS at best. a good SSD will get anywhere up to 100,000 IOPS depending on how it's being used (but you can expect a minimum of 10,000)

...

The host server runs cron jobs to poll the email server (importing data from mail boxes into the CRM) so - to clutch at straws -

so you're regularly importing mail into a database of some sort? that may be the source of your problem - dovecot will be contending with mysql for disk I/O unless the mysql db and the mail spool are on different disks (ideally, two separate SSDs. or two separate RAID-1 devices on SSD) how big is the database? if it's huge, can you archive older stuff to another server, or do you need instant access to the old data? btw it's worthwhile doing some research on database tuning. here's a useful Q&A on tuning mysql for SSDs http://dba.stackexchange.com/questions/59828/ssd-vs-hdd-for-databases

...

I am not sure if the host and guest are competing for the disk IO at the same time with these calls. Contrary to that is that the host server does not experience any slow downs.

they almost certainly are, from what you've said about what the server is doing. as russell suggested, run iostat - and run it on the host, not on the VM.

...

Before the holidays we added another 30 users to the servers.

30 users is not a lot, and is unlikely to have made much difference unless they're extraordinarily heavy users of mail, several orders of magnitudemuch more so than all your previous users. i could see adding a few thousand or even a few hundred extra users making a significant performance impact, but not just a few dozen. craig -- craig sanders <cas@taz.net.au>

Russell Coker

1:59 a.m.

On Tue, 19 Jan 2016 12:22:31 PM Craig Sanders via luv-main wrote:

...

On Tue, Jan 19, 2016 at 10:12:12AM +1000, Piers Rowan wrote:

...
- dovecot - apache (roundcube webmail) - sendmail

unless you're a sendmail expert with a decade or two of experience working with it, you might want to think about switching to postfix.

While Sendmail is generally a bad choice, it seems unlikely that it is contributing to the disk IO performance here. One thing that can be done to improve performance on the MTA side is to use Dovecot's delivery agent instead of having the MTA deliver directly or use Procmail or similar. If Dovecot delivers directly it will index the mail while delivering it which will save tim later.

...

You can get Samsung 850 Pro 128GB for $116. $188 for 256GB. if your total mail size is < 128GB and unlikely to grow that large in the forseeable future, you're better off with a pair of 128GB drives in RAID-1 than a single 256GB drive.

if it's a VM at a co-lo facility, talk to them about getting a host with at least one SSD so you can move your mail spool to that.

hetzner.de has some good deals on servers with SSD.

...

so you're regularly importing mail into a database of some sort?

that may be the source of your problem - dovecot will be contending with mysql for disk I/O unless the mysql db and the mail spool are on different disks (ideally, two separate SSDs. or two separate RAID-1 devices on SSD)

I find it difficult to imagin a mail service of that nature which needs performance that is greater than a pair of SSDs in a RAID-1 can provide. I think we're talking about a mail server for a small company not hotmail.

...

as russell suggested, run iostat - and run it on the host, not on the VM.

Run it on both. Generally run everything everywhere. It doesn't do any harm and you never know what you might discover. But in the specific case of iostat if the host is doing disk IO that causes delays for the VM it's something you need to know. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Craig Sanders

4:07 a.m.

On Tue, Jan 19, 2016 at 12:59:09PM +1100, Russell Coker wrote:

...

On Tue, 19 Jan 2016 12:22:31 PM Craig Sanders via luv-main wrote:

...
On Tue, Jan 19, 2016 at 10:12:12AM +1000, Piers Rowan wrote:

...
- dovecot - apache (roundcube webmail) - sendmail

unless you're a sendmail expert with a decade or two of experience working with it, you might want to think about switching to postfix.

While Sendmail is generally a bad choice, it seems unlikely that it is contributing to the disk IO performance here.

probably not. but there's no good reason to use it these days unless you know it extremely well. there are better options that are easier and saner to configure (postfix and exim for starters). postfix's an almost drop-in replacement for sendmail, so it's an easy conversion - IIRC some of the map tables (like the virtual table ) have a slighly different format. OTOH, last time i used sendmail (admittedly over 10 years ago) it suffered from a massive thundering herd problem....too much inbound *or* outbound mail or both at once and the system would slow to a crawl and eventually crash (due mostly to running out of memory) because it tried to send/receive/process it all at once. no amount of tuning would help. postfix's queue management was vastly superior....just switching to postfix on the exact same server (IIRC, a pentium pro with something like 64MB or 256MB running a mailing list with a few hundred thousand subscribers - so a lot of mail to send out, and a lot of incoming bounces) caused it to stop crashing under the load and deliver all the mail in about 1/10th of the time.

...

One thing that can be done to improve performance on the MTA side is to use Dovecot's delivery agent instead of having the MTA deliver directly or use Procmail or similar. If Dovecot delivers directly it will index the mail while delivering it which will save tim later.

yep.

...

...
so you're regularly importing mail into a database of some sort?

that may be the source of your problem - dovecot will be contending with mysql for disk I/O unless the mysql db and the mail spool are on different disks (ideally, two separate SSDs. or two separate RAID-1 devices on SSD)

I find it difficult to imagin a mail service of that nature which needs performance that is greater than a pair of SSDs in a RAID-1 can provide.

i meant one SSD or RAID-1 pair for mail, and another for mysql. they're the two big disk I/O hogs, so moving them onto separate disks is going to be a huge win for performance. RAID-1 is a bonus, but just having them on separate SSDs would help enormously. and with only one disk for each, you can create them as degraded RAID-1 (i.e. RAID-1 without a mirror disk - Linux mdadm supports this, without any problem) so it's easy to add a 2nd drive to each later if buying four drives at once stretches the budget too far.

...

I think we're talking about a mail server for a small company not hotmail.

that's my impression too. craig -- craig sanders <cas@taz.net.au>

Andrew Pam

4:40 a.m.

On 19/01/16 12:59, Russell Coker via luv-main wrote:

...

One thing that can be done to improve performance on the MTA side is to use Dovecot's delivery agent instead of having the MTA deliver directly or use Procmail or similar. If Dovecot delivers directly it will index the mail while delivering it which will save tim later.

Another advantage of using Dovecot's LDA is that you can then enable compression on the stored mail, which not only saves some disk space but also likely improves performance. Cheers, Andrew

Chris Samuel

20 Jan 20 Jan

11:36 a.m.

On Tue, 19 Jan 2016 10:12:12 AM Piers Rowan via luv-main wrote:

...

The server is a VM on a host server that also provides http / mysql services. The host server runs cron jobs to poll the email server (importing data from mail boxes into the CRM) so - to clutch at straws - I am not sure if the host and guest are competing for the disk IO at the same time with these calls. Contrary to that is that the host server does not experience any slow downs.

Some ideas that I've not seen mentioned before yet: 1) perf top - to see where the system is spending time as a whole (and if you need to drill down on a process you can do perf top -p $PID). 2) latencytop - as long as your kernel has CONFIG_LATENCYTOP 3) iotop - if your version is new enough then the -o option will hide idle processes, otherwise just press 'o' when you get the main display. Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

3450

Age (days ago)

3451

Last active (days ago)

List overview

Download

16 comments

6 participants

participants (6)

Andrew Pam
Chris Samuel
Craig Sanders
Mike O'Connor
Piers Rowan
Russell Coker