SmartOS reboots spontaneously

I run a SmartOS system on a Hetzner EX4S (Intel Core i7-2600, 32G RAM, 2x3Tb SATA HDD).
There are six virtual machines on the host:

[root@10-bf-48-7f-e7-03 ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
d2223467-bbe5-4b81-a9d1-439e9a66d43f  KVM   512      running           xxxx1
5f36358f-68fa-4351-b66f-830484b9a6ee  KVM   1024     running           xxxx2
d570e9ac-9eac-4e4f-8fda-2b1d721c8358  OS    1024     running           xxxx3
ef88979e-fb7f-460c-bf56-905755e0a399  KVM   1024     running           xxxx4
d8e06def-c9c9-4d17-b975-47dd4836f962  KVM   4096     running           xxxx5
4b06fe88-db6e-4cf3-aadd-e1006ada7188  KVM   9216     running           xxxx5
[root@10-bf-48-7f-e7-03 ~]#

The host reboots several times a week with no crash dump in /var/crash and no messages in the /var/adm/messages log.
Basically /var/adm/messages looks like there was a hard reset:

2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK --
2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit
2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved.
2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg
2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc
2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr
2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr
2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge
2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de
2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov
2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx
2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca
2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae
2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8

The problem is that sometimes the host loses the network interface on reboot so we need to perform a manual hardware reset to bring it back.
We do not have physical or virtual access to the server console – no KVM, no iLO or anything like this. So, the only way to debug is to analyze crash dumps/log files.
I am not a SmartOS/Solaris expert so I am not sure how to proceed. Is there any equivalent of Linux netconsole for SmartOS? Can I just redirect the console output to the network port somehow? Maybe I am missing something obvious and crash information is located somewhere else.

Answer

Run the command dumpadm to check crash dumps are enabled, and on what device.

If it is enabled and you find no crash dumps, then suspect a hardware fault and ask your hosting company to move you to a different physical server. (They will also be able to check hardware logs and fault lights and call the vendor and so on.)

Attribution
Source : Link , Question Author : Alex , Answer Author : ramruma

Leave a Comment