*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->IRIX man pages -> fru (1)              
Title
Content
Arch
Section
 

Contents


FRU(1M)								       FRU(1M)


NAME    [Toc]    [Back]

     fru - Field replacement unit analyzer for Challenge/Onyx systems

SYNOPSIS    [Toc]    [Back]

     fru [-a] namelist corefile

DESCRIPTION    [Toc]    [Back]

     fru is a hardware state analyzer that provides board replacement
     information based on system crash dumps.  The output provided by fru
     displays what system boards, if any, are the most likely suspects that
     might have	induced	a hardware failure.

     fru can be	run on any namelist and	corefile specified on the command
     line.  namelist contains symbol table information needed for symbolic
     access to the system memory image being examined.	This will typically be
     the unix.N	kernel copied into /var/adm/crash, where N is the number of
     the crash dump you	are analyzing.	corefile is a file containing the
     system memory image.  This	will typically be the vmcore.N.comp file
     copied into /var/adm/crash	by savecore(1) when the	machine	reboots	after
     a system panic.  If the memory image being	analyzed is from a system core
     dump (vmcore.N.comp), then	namelist must be a copy	of the unix file that
     was executing at the time (unix.N).

     Note that fru cannot be run against live systems, as there	is no system
     board replacement information available while the system is running
     properly.

     The fru command has the following options listed below.  By default, all
     information will be sent to the standard output:

     -a		Print the entirety of the error	dump buffer, which might not
		be complete in the kernel console buffer of the	core dump.

NOTES    [Toc]    [Back]

     If	fru finds a hardware error state, it will try and report a confidence
     level on each system board	(and in	some cases, the	components on a
     board).  When fru reports a confidence level, it means that it has	some
     measure of	confidence that	the board reported has a problem.  Typically
     each board	in the system will be assigned a 10% confidence	level if it
     reports anything into a hardware error state.  Note that there are	only a
     few levels	of confidence, and it is important to recognize	what the
     percentages mean:

	 10%	  The board was	witnessed in the hardware error	state only.
	 30%	  The board has	a possible error, with a low likelihood.
	 40%	  The board has	a possible error, with a medium	likelihood.
	 70%	  The board has	a *probable* error, with a high	likelihood.
	 90%	  The board is a *definite* problem.

     Given that	there is the possibility of multiple boards being reported,
     care should be taken before when replacing	a board	on the system. For
     example, if two boards are	reported at 10%, that is not enough confidence



									Page 1






FRU(1M)								       FRU(1M)



     that the boards listed are	bad. If	there is one board at 70% or better,
     however, there is a good likelihood that the board	listed is a problem,
     and should	be replaced. Boards at 30% to 40% are questionable, and	should
     be	reviewed based on the frequency	of the failure of the specific board
     (in the same slot)	between	system crashes.

     The objective is to catch real hardware problems, rather than just
     replacing boards on systems where there isn't a problem.

     Here is some sample output	from a fru analysis on a system	crash dump:

	 # fru -a /var/adm/crash/unix.0	/var/adm/crash/vmcore.0.comp
	 ---------------------------------------------------------------
	     FRU ANALYZER (2.2):
	     ++	MEMORY BANK: leaf 1 bank 0 (B)
	     ++	  on the MC3 board in slot 3: 90% confidence.
	     ++	END OF ANALYSIS
	 ---------------------------------------------------------------

	 HARDWARE ERROR	STATE:
	 +  IP19 in slot 2
	 +    CC in IP19 Slot 2, cpu 3
	 +	CC ERTOIP  Register: 0x10
	 +	  4: Parity Error on Data from D-chip
	 +  MC3	in slot	3
	 +	MA EBus	Error register:	0x4
	 +	  2: My	EBus Data Error
	 +	MA Leaf	1 Error	Status Register: 0x2
	 +	  1: Read Uncorrectable	(Multiple Bit) Error
	 +	MA Leaf	1 Bad Memory Address: 0x3fb27380
	 +	  Slot 3, leaf 1, bank 0 (B)
	 +  IO4	board in slot 15
	 +	IA EBUS	Error Register:	0x201
	 +	   0: Sticky Error
	 +	   9: DATA_ERROR Received

     In	this example, it would be a good idea to have the memory in leaf 1,
     bank 0 (B)	changed, and have the MC3 examined (unless the memory and
     board in that slot	has been replaced before, in which case	further
     analysis of the hardware on the machine should be completed.)

     Please also note that it is possible the system problem being reported
     might be something	unknown	to the version of fru you are currently
     running with.  There might	also be	some bugs within fru that SGI is
     unaware of	that will keep field replacement unit analysis from being
     completed.


									PPPPaaaaggggeeee 2222
[ Back ]
 Similar pages
Name OS Title
cleanpowerdown IRIX control the power-down behavior of Onyx/Challenge L/XL systems
sysctlrd IRIX communicates with the system controller and LCD front panel on Onyx/Challenge L/XL systems
cpumeter IRIX control the CPU activity meter on the Onyx/Challenge L/XL LCD panel
sysctlr IRIX Onyx and Challenge L/XL system controller communication device
XmConvertUnits HP-UX A function that converts a value in one unit type to another unit type
XmConvertUnits Tru64 A function that converts a value in one unit type to another unit type
XmConvertUnits IRIX A function that converts a value in one unit type to another unit type
gvo IRIX Serial digital video option for Onyx InfiniteReality graphics
otp-rmd160 OpenBSD respond to an OTP challenge
otp-sha1 OpenBSD respond to an OTP challenge
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service