I run a SmartOS system on a Hetzner EX4S (Intel Core i7-2600, 32G RAM, 2x3Tb SATA HDD).
There are six virtual machines on the host:[root@10-bf-48-7f-e7-03 ~]# vmadm list UUID TYPE RAM STATE ALIAS d2223467-bbe5-4b81-a9d1-439e9a66d43f KVM 512 running xxxx1 5f36358f-68fa-4351-b66f-830484b9a6ee KVM 1024 running xxxx2 d570e9ac-9eac-4e4f-8fda-2b1d721c8358 OS 1024 running xxxx3 ef88979e-fb7f-460c-bf56-905755e0a399 KVM 1024 running xxxx4 d8e06def-c9c9-4d17-b975-47dd4836f962 KVM 4096 running xxxx5 4b06fe88-db6e-4cf3-aadd-e1006ada7188 KVM 9216 running xxxx5 [root@10-bf-48-7f-e7-03 ~]#
The host reboots several times a week with no crash dump in
/var/crash
and no messages in the/var/adm/messages
log.
Basically/var/adm/messages
looks like there was a hard reset:2012-11-23T08:54:43.210625+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T09:14:43.187589+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T09:34:43.165100+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T09:54:43.142065+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T10:14:43.119365+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T10:34:43.096351+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T10:54:43.073821+00:00 10-bf-48-7f-e7-03 rsyslogd: -- MARK -- 2012-11-23T10:57:55.610954+00:00 10-bf-48-7f-e7-03 genunix: [ID 540533 kern.notice] #015SunOS Release 5.11 Version joyent_20121018T224723Z 64-bit 2012-11-23T10:57:55.610962+00:00 10-bf-48-7f-e7-03 genunix: [ID 299592 kern.notice] Copyright (c) 2010-2012, Joyent Inc. All rights reserved. 2012-11-23T10:57:55.610967+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: lgpg 2012-11-23T10:57:55.610971+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: tsc 2012-11-23T10:57:55.610974+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: msr 2012-11-23T10:57:55.610978+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mtrr 2012-11-23T10:57:55.610981+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pge 2012-11-23T10:57:55.610984+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: de 2012-11-23T10:57:55.610987+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cmov 2012-11-23T10:57:55.610995+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mmx 2012-11-23T10:57:55.611000+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: mca 2012-11-23T10:57:55.611004+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: pae 2012-11-23T10:57:55.611008+00:00 10-bf-48-7f-e7-03 unix: [ID 223955 kern.info] x86_feature: cv8
The problem is that sometimes the host loses the network interface on reboot so we need to perform a manual hardware reset to bring it back.
We do not have physical or virtual access to the server console – no KVM, no iLO or anything like this. So, the only way to debug is to analyze crash dumps/log files.
I am not a SmartOS/Solaris expert so I am not sure how to proceed. Is there any equivalent of Linux netconsole for SmartOS? Can I just redirect the console output to the network port somehow? Maybe I am missing something obvious and crash information is located somewhere else.
Answer
Run the command dumpadm
to check crash dumps are enabled, and on what device.
If it is enabled and you find no crash dumps, then suspect a hardware fault and ask your hosting company to move you to a different physical server. (They will also be able to check hardware logs and fault lights and call the vendor and so on.)
Attribution
Source : Link , Question Author : Alex , Answer Author : ramruma