THogan.com Adventures In Information Systems Engineering

6Sep/101

Xen 3.4 – 4.0 Bug On AMD 6100 Series (Magny-Cours) Opteron

ADVERTISEMENT

Xen 3.4 through 4.0 will not boot on the new AMD 6100 series (Magny-Cours) Opteron CPU (Socket G34).  There seems to be a bug in the Machine Check Exception (MCE) handling code that causes Xen to panic.  The 'nomce' (3.4) and 'no-mce' (4.0) boot options do not properly turn off the MCE code, so this cannot be used to avoid the issue.

There is a patch in Xen 4.0 unstable that fixes the 'nomce' boot option so that it works properly.  Unfortunately this change has not been back-ported to Xen 3.4 by most of the Linux distros yet.

I have applied the fix to Xen 3.4.1 for OpenSUSE 11.2 and built RPMs for the affected packages.  This did the trick and my AMD 6100 series systems are now running Xen just fine with the 'nomce' boot option.  Read on for more details and the link to the RPMs with the fix.

Xen Panic Output

If you have an AMD 6100 series CPU and are getting the following output, then you are probably running into the MCE bug:

(XEN) Xen BUG at amd_nonfatal.c:165
(XEN) ----[ Xen-3.4.2  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000ffe   rbx: ffff828c8024ff28   rcx: 0000000000000000
(XEN) rdx: c0080ffe01000000   rsi: 0000000000000413   rdi: 0000000000000000
(XEN) rbp: 000000025f13f8e0   rsp: ffff828c8024fe60   r8:  ffff828c8028f800
(XEN) r9:  0000000000000000   r10: 0000000000000005   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: ffff828c80177720   r14: ffff83081fd7b190
(XEN) r15: ffff83081fd7b190   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 00000004ca4a6000   cr2: 000000000083c770
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff828c8024fe60:
(XEN)    0000000000000000 c0080ffe01000000 ffff828c80221180 ffff828c8011a12c
(XEN)    ffff8300dfc2c060 ffff828c80221180 ffff83081fd7b198 ffff828c8011a20d
(XEN)    000000024ab06880 0000000000000000 ffff828c8024ff28 ffff828c80267900
(XEN)    ffff828c80266900 0000000000000000 ffff828c80221100 ffff828c801185b8
(XEN)    000000000000e008 ffff828c8024ff28 ffff828c80266900 ffff828c802215b0
(XEN)    000000025e3b7f20 ffff828c80138fcc 0000000000000000 ffff8300dfafc000
(XEN)    ffff8300dfc2c000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000246
(XEN)    0000000000000008 00000000ffff8e54 0000000000000054 0000000000000000
(XEN)    ffffffff802053aa 0000000000000001 0000000000000000 0000000000000001
(XEN)    0000010000000000 ffffffff802053aa 000000000000e033 0000000000000246
(XEN)    ffffffff80511f50 000000000000e02b 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff8300dfafc000
(XEN) Xen call trace:
(XEN)    [<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0
(XEN)    [<ffff828c8011a12c>] execute_timer+0x2c/0x50
(XEN)    [<ffff828c8011a20d>] timer_softirq_action+0xbd/0x2e0
(XEN)    [<ffff828c801185b8>] do_softirq+0x58/0x80
(XEN)    [<ffff828c80138fcc>] idle_loop+0x4c/0xa0
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at amd_nonfatal.c:165
(XEN) ****************************************

Xen Community Discussion

Since these CPUs are just starting to hit the market, I was not able to find a lot of details on this issue.  The only real discussion I was able to find was this thread on the [Xen-devel] mailing list:

[Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board

There is a patch attached in the thread and it works.  For those not wanting to get into building the package from source I have created some patched RPMs for all the platforms I currently have access to.

Binary Package With Fix

I have built RPMs for the current stable version of OpenSUSE (11.2) and will be building packages for SLES 11 shortly.  I will then be back-porting to OpenSUSE 11.1 and posting those RPMs within a week.  Maybe I will do OpenSUSE 11.0 and SLES 10.  Leave a comment if you need one of these platforms and, if there is demand, I will build packages for them.

Using The Patch

Untar the archive with the proper RPMs for your platform.  Install the package with 'zypper', it will fetch any dependencies required from your configured repositories:

# zypper install xen-3.4.1_20360_04-2.1.x86_64.rpm \
    xen-tools-3.4.1_20360_04-2.1.x86_64.rpm \
    xen-libs-3.4.1_20360_04-2.1.x86_64.rpm

After you have installed the patched RPMs, you need to add the 'nomce' boot option to your Xen entry in GRUB.  It should be on the line that reads 'kernel /xen.gz ...'  Below is an example from one of my patched hosts:

title Xen -- openSUSE 11.2 - 2.6.31.12-0.2
 root (hd0,0)
 kernel /xen.gz nomce noreboot com2=115200,8n1 console=com2
 module /vmlinuz-2.6.31.12-0.2-xen root=/dev/rootvg/rootlv
 module /initrd-2.6.31.12-0.2-xen

OpenSUSE 11.2 Packages

These RPMs are based on the latest (as of 2010-09-06) source RPM package for 'xen'.  I have artificially inflated the version number so that it will be applied as an upgrade to an existing install if necessary.  This may interfere with future official OpenSUSE updates to this package, and if so, will need to be removed manually and re-installed.

I will try to maintain up-to-date versions of these RPMs as long as official OpenSUSE updates come out that do not include the patch.  Check back often as I will setup a mailing list this week to handle notifications on when I update these packages.

Simply install the RPMs included with 'zypper', it will properly fetch the dependencies from your other repositories.

Xen 3.4.1 for OpenSUSE 11.2 with 'nomce' option fix.

Novell SUSE Linux Enterprise Server 11 Packages

My day job runs SLES 11, so I will have an opportunity to build SLES 11 packages with the fix this week.  Check back after 2010-09-09 for these packages.

UPDATE: Things have been busy at work and I have not gotten around to building a SLES 11 package.  I am probably not going to build one unless someone asks for it, so if you need it, leave a comment!

Filed under: SUSE, Xen Leave a comment
Comments (1) Trackbacks (0)
  1. Thanks a lot for this post. It has been very helpful for me. Good work.


Leave a comment

You must be logged in to post a comment.

No trackbacks yet.