Root filesystem unexpectedly remounted in read-only

Hello. I am working on an Ubuntu 12.04, virtualized by a VMWare server, usually through a remote desktop protocol (see xrdp) or a simple ssh connection. This machine which is rebooted very rarely (basically only when there is an updates which needs a reboot to be applied). This morning, when I've reconnected to rdp session, I found that the root was mounted readonly (visible from the /proc/mount, becouse /etc/mtab was not up to update). After a reboot, and a filesystem check all seems to work, but it was quite a curious situation! Searching on the web I found that it happens tipically when kernel finds bad blocks on the disk, but since it's a virtual disk, I have some question. 1) Can bad block appear on a virtual disk too? Even if it is eventually just a flat file in the host filesystem? 2) Are those bad blocks related to real bad blocks on the physical host file system? Thanks in advance for any answer. -- Mick

On Thu, Sep 18, 2014 at 10:48:00AM +0200, Michele Bert wrote:
1) Can bad block appear on a virtual disk too? Even if it is eventually just a flat file in the host filesystem? 2) Are those bad blocks related to real bad blocks on the physical host file system?
yes and yes and maybe. for example, if the physical disk has bad blocks and the VM's virtual disk uses those blocks then both the VM and the VMWare server could have errors when trying to access those blocks. it's even possible that the VM will have errors while the VMWare server doesn't, if the VM retries less often or times out the request earlier than the server. other possibilities include: - VMWare server overloaded for a long time and unable to service ubuntu VM's request for disk IO - disk faults - cabling faults (e.g. loose cables can vibrate and cause transient errors) - disk controller faults - RAM faults - network outages if the disks (virtual and/or physical) are accessed over the network (e.g. iscsi or nfs or whatever) craig -- craig sanders <cas@taz.net.au>

On Fri, 19 Sep 2014, Craig Sanders wrote:
On Thu, Sep 18, 2014 at 10:48:00AM +0200, Michele Bert wrote:
1) Can bad block appear on a virtual disk too? Even if it is eventually just a flat file in the host filesystem? 2) Are those bad blocks related to real bad blocks on the physical host file system?
yes and yes and maybe. for example, if the physical disk has bad blocks and the VM's virtual disk uses those blocks then both the VM and the VMWare server could have errors when trying to access those blocks.
it's even possible that the VM will have errors while the VMWare server doesn't, if the VM retries less often or times out the request earlier than the server.
other possibilities include:
- VMWare server overloaded for a long time and unable to service ubuntu VM's request for disk IO - disk faults - cabling faults (e.g. loose cables can vibrate and cause transient errors) - disk controller faults - RAM faults - network outages if the disks (virtual and/or physical) are accessed over the network (e.g. iscsi or nfs or whatever)
Ie, look in the vmware logs for that host, as well as alerts and alarms. vmware will tend to drop disk paths well before linux would have a problem with them, in the name of High Availability. Whilst Linux would just log a 120s hangcheck timer alert to the syslog if the disk didn't answer in 120 seconds, vmware might respond to the same disk outage by *) dropping IO that happened to be in progress on the floor (only symptoms are that 4 of your 250 VMs go spontaneously readonly, and you only notice that if you're looking at your syslogs religiously, because most monitoring sure as hell won't pick up on it) *) rebooting the VM *) vmotioning the VM *) isolating the VM host from the cluster and a whole bunch of other failures that I've probably seen but subsequently purged from my memory. -- Tim Connors

On 19 settembre 2014 00:28:13 CEST, Craig Sanders <cas@taz.net.au> wrote:
On Thu, Sep 18, 2014 at 10:48:00AM +0200, Michele Bert wrote:
1) Can bad block appear on a virtual disk too? Even if it is eventually just a flat file in the host filesystem? 2) Are those bad blocks related to real bad blocks on the physical host file system?
yes and yes and maybe. for example, if the physical disk has bad blocks and the VM's virtual disk uses those blocks then both the VM and the VMWare server could have errors when trying to access those blocks.
it's even possible that the VM will have errors while the VMWare server doesn't, if the VM retries less often or times out the request earlier than the server.
Really interesting. Unfortunately I haven't got access to server logs. But I have some more idea on what could have appened. Thanks all! Mick
participants (4)
-
Craig Sanders
-
Ing. Michele Bert
-
Michele Bert
-
Tim Connors