sched_stat - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> sched_stat (8)

sched_stat(8)

NAME [Toc] [Back]

       sched_stat  -  Displays  CPU  usage and process-scheduling
       statistics for SMP and NUMA platforms

SYNOPSIS [Toc] [Back]

       /usr/sbin/sched_stat [-l] [-s]  [-f]  [-u]  [-R]  [command
       [cmd_arg]...]

OPTIONS [Toc] [Back]

       Prints the count of calls that are not multiprocessor safe
       and therefore funneled to the master CPU. For example:

               Funnelling counts

               unix master calls 11174   resulting blocks 2876

              The impact of funneled  calls  on  the  master  CPU
              needs  to  be  taken  into  account when evaluating
              statistics for the master  CPU.   Prints  scheduler
              load-balancing statistics. For example:
                                         Scheduler Load Balancing

                                                            |
              5-second averages
                        steal               idle       desired  |
              current   interrupt     RT
               cpu       trys     steals    steals     load     |
              load                      %                       %
              -----+-------------------------------------------------------------------------
                 0   |        288         3     20609       0.000
              0.000      0.454      0.156
                 1  |       615         6      21359        0.000
              0.000      0.002      0.203
                 2   |        996         4     20135       0.000
              0.001      0.000      0.237
                 3  |      1302         4      16195        0.000
              0.001      0.000      0.330
                 6   |          5         0      3029       0.000
              0.000      0.000      0.034

              .  .  .

              In the displayed table, each row  contains  per-CPU
              information  as  follows:  The number identifier of
              the CPU.  The number of attempts made to steal processes/threads
 from other CPUs when the CPU was not
              idle.  The  number  of  processes/threads  actually
              stolen  from  other CPUs when the CPU was not idle.
              The number of processes/threads stolen  from  other
              CPUs  when  the  CPU  was idle.  The number of time
              slices that should be used on this CPU for  running
              timeshare  threads.  This information is calculated
              by comparing the current load, interrupt %, and  RT
              %  statistics  obtained  for  this  CPU  with those
              obtained for other CPUs in the same PAG.

              When current load is less than  desired  load,  the
              scheduler will attempt to migrate timeshare threads
              to this CPU in order to better  balance  the  timeshare
 workload among CPUs in the same PAG.

              See  DESCRIPTION  for information about PAGs.  Over
              the last five seconds, the average number  of  time
              slices  used  to run timeshare threads on this CPU.
              Over the last five seconds, the average  percentage
              of  time  slices  that  this CPU spent in interrupt
              context.  Over the last five seconds,  the  average
              precentage of time slices that this CPU used to run
              threads according to FIFO  or  round-robin  policy.
              Prints   information  about  CPU  locality  in  two
              tables: Shows the order-of-preference (in terms  of
              memory affinity) that exists between a CPU and different
 RADs.  Order-of-preference indicates, for  a
              given  home RAD, the ranking of other RADs in terms
              of increasing physical distance from that home RAD.
              If  a  process or thread needs more memory or needs
              to be scheduled on a RAD other than its  home  RAD,
              the  kernel  automatically  searches RADs for additional
 memory or CPU cycles in the order of preference
 shown in this table.  Shows the distance (number
 of hops) between different RADs and, by association,
  between CPUs. The information in this table
              is coarser-grained than  in  the  preceding  Radtab
              table   and   more  relevant  to  NUMA  programming
              choices. For example, the expression RAD_DIST_LOCAL
              +  2  indicates RADs that are no more than two hops
              from a thread's home RAD.

              For example (a small, switchless mesh NUMA system):


              Radtab (rads in order of preference)
                                       CPU # Preference    0    1
              2    3
                        -------------------- 0             0    1
              2      3   1               1      0      3     2  2
              2    3    0    1 3             3    2    1    0


              Hoptab (hops indexed by rad)
                                       CPU # To rad #      0    1
              2    3
                        -------------------- 0             0    1
              1     2  1               1      0      2      1   2
              1    2    0    1 3             2    1    1    0

              In  these  tables,  the  CPU identifiers are listed
              across the top from left to right and the RAD identifiers
  are listed on the left from top to bottom.
              For example if a process running  on  CPU  2  needs
              additional memory, Radtab indicates that the kernel
              will search for that memory first in RAD 2, then in
              RAD  3,  then  in RAD 0, and last in RAD 1.  Hoptab
              shows the basis of this preference in that RAD 2 is
              CPU  2's  local RAD, RADs 0 and 3 are one hop away,
              and RAD 1 is two hops away.

              The -R option is useful  only  on  NUMA  platforms,
              such  as  GS1280  and  ES80  AlphServer systems, in
              which memory latency times varies from one  RAD  to
              another.  The  information  in these tables is less
              useful for GS80, GS160 and GS320  AlphaServer  systems
  because  both coarse and finer-grained memory
              affinity is the same from any CPU in one RAD to any
              CPU  in another RAD; however, the displays can tell
              you which CPUs are in which RAD.

              Make sure that you both maximize the size  of  your
              terminal emulator window and minimize the font size
              before using the -R option;  otherwise,  line-wrapping
  will render the tables very difficult to read
              on systems that have many CPUs.  Prints schedulingdispatch
 (processor-usage) statistics for each CPU.
              For example:

                                       Scheduler Dispatch Statistics


              cpu    0          local         global         idle
              remote        |             total           percent
              ---------------------------------------------------------------------------
              hot            60827          12868        19158991
              0   |      19232686      91.6   warm             78
              21     1542019           0  |      1542118      7.3
              cold              315          27289         184784
              7855         |              220243              1.0
              ---------------------------------------------------------------------------
              total          61220          40178        20885794
              7855  |     20995047  percent       0.3         0.2
              99.5         0.0


              cpu   1         local          global          idle
              remote         |             total          percent
              ---------------------------------------------------------------------------
              hot             33760          11788       16412544
              0   |      16458092      89.5   warm             66
              24      1707014            0  |     1707104     9.3
              cold             201          26191          203513
              0           |               229905              1.2
              ---------------------------------------------------------------------------

              .  .  .


              These  statistics  show the count and percentage of
              thread context  switches  (times  that  the  kernel
              switches  to  a new thread) for the following categories:
 Threads scheduled from the CPU's Local  Run
              Queue  Threads  scheduled from the Global Run Queue
              of the PAG to which the CPU belongs Threads  scheduled
  from  the  Idle CPU Queue of the PAG to which
              the CPU belongs Threads stolen from Global or Local
              Run Queues in another PAG

              Note  that  these  statistics do not count CPU time
              slices that were used to re-run the same thread.

              Each SMP unit (or RAD on a NUMA system) has a  Processor
  Affinity Group (PAG). Each PAG contains the
              following queues:

              A Global Run Queue from which processes or  threads
              are  scheduled  on  the  first available CPU One or
              more Local  Run  Queues  from  which  processes  or
              threads  are  scheduled  on  a specific CPU A queue
              that contains idle CPUs

              A thread  that  is  handed  to  an  idle  CPU  goes
              directly  to that CPU without first being placed on
              the other queues.

              If there is insufficient  work  queued  locally  to
              keep  the PAG's CPUs busy, threads are stolen first
              from the Global and then the Local Run Queues in  a
              remote PAG.

              For   each  of  these  categories,  statistics  are
              grouped into hot,  warm,  and  cold  subcategories.
              The hot statistics show context switches to threads
              that last ran on the CPU only  a  very  short  time
              before.  The  warm statistics show context switches
              to threads that last ran  on  the  CPU  a  somewhat
              longer  time  before.  The cold statistics indicate
              context switches to threads that never ran  on  the
              CPU  before.  These statistics are a measure of how
              well cache affinity is being maintained;  that  is,
              how  likely the data used by threads when they last
              ran is still in the  cache  when  the  threads  are
              rescheduled.  You  cannot evaluate this information
              without knowledge of the type of work being done on
              the  system;  maintenance  of cache affinity can be
              very important on systems (or processor sets)  that
              are dedicated to running certain applications (such
              as those doing high performance  technical  computing)
  but  is  less  critical for systems serving a
              variety of applications and users.  Prints  processor-usage
 statistics for each CPU. For example:

                                              Processor Usage

               cpu     user   nice  system idle widle |    scalls
              intr                   csw                    tbsyc
              -----+-------------------------------+------------------------------------------
                 0 |   0.0   0.0   0.7  99.2    0.1  |    3327337
              50861486   41885424   317108
                 1  |    0.0    0.0   0.4  99.5   0.1 |   3514438
              0   36710149   268667
                 2 |   0.0   0.0   0.4  99.5    0.1  |    3182064
              0   37384120   257749
                 3  |    0.0    0.0   0.4  99.5   0.1 |   3528519
              0   36468319   249492
                 6 |   0.0   0.0   0.1  99.9    0.0  |     668892
              11664   11793053   352294
                 7  |    0.0    0.0   0.1  99.9   0.0 |    772821
              0    9341527   352319
                 8 |   0.0   0.0   0.0 100.0    0.0  |     529050
              11724    5717059   347267
                 9  |    0.0    0.0   0.0 100.0   0.0 |    492386
              0    6603681   351509

              .  .  .

              In this table: The number identifier  of  the  CPU.
              The percentage of time slices spent running threads
              in user context.  The percentage of time slices  in
              which  lower-priority threads were scheduled. These
              are user-context threads whose priority was explicitly
 lowered by using an interface such as the nice
              command or the class-scheduling software.  The percentage
  of  time  slices  spent running threads in
              system context. This  work  includes  servicing  of
              interrupts and system calls that are made on behalf
              of user processes.  An unusually high percentage in
              the system category might indicate a system bottleneck.
 Running kprofile and lockinfo  provides  more
              specific  information  about  where  system time is
              being  spent.  See  uprofile(1)  and   lockinfo(8),
              respectively,    for    information   about   these
              utilities.  The percentage of time slices in  which
              no  threads were scheduled.  The percentage of time
              slices in which available threads were  blocked  by
              pending  I/O and the CPU was idle. If this count is
              unusually high, it suggests that a bottleneck in an
              I/O  channel  might  be  causing suboptimal performance.
  The count of system calls  that  were  serviced.
  The count of interrupts that were serviced.
              The  count  of  thread  context  switches   (thread
              scheduling  changes) that completed.  The number of
              times that the translation buffer was synchronized.

OPERANDS [Toc] [Back]

       The  command  to be executed by sched_stat.  Any arguments
       to the preceding command.

       The command and cmd_arg operands are  used  to  limit  the
       length  of  time  in  which sched_stat gathers statistics.
       Typically, sleep is specified for command and some  number
       of seconds is specified for cmd_arg.

       If  you do not specify a command to specify an time interval
 for statistics gathering, the statistics will  reflect
       what has occurred since the system was last booted.

DESCRIPTION [Toc] [Back]

       The  sched_stat  utility  helps you determine how well the
       system load is distributed among CPUs, what kinds of  jobs
       are  getting  (or  not  getting) sufficient cycles on each
       CPU, and how well cache affinity is being  maintained  for
       these jobs.

       Answers to the following questions influence how a process
       and its threads are scheduled: Is the request to  be  serviced
 multiprocessor-safe?

              If  not, the kernel funnels the request to the master
 CPU. The master CPU must reside in the  default
              processor  set  (which  contains all system CPUs if
              none were assigned to user-defined processor  sets)
              and  is  typically  CPU  0; however, some platforms
              permit CPUs other than CPU 0 to be the master  CPU.
              Few requests generated by software distributed with
              the operating system need to  be  funneled  to  the
              master  CPU  and  most of these are associated with
              certain device drivers. However, if the system runs
              many  third-party  drivers,  the number of requests
              that must be funneled to the master  CPU  might  be
              higher.  What is the job priority?

              Job  priority influences how frequently a thread is
              scheduled. Realtime requests  and  interrupts  have
              higher priority than time-share jobs, which include
              the majority of user-mode threads. So, if a significant
  number  of  CPU  cycles  are spent servicing
              realtime requests and interrupts, there  are  fewer
              cycles available for time-share jobs.

              Default  priority  for  time-share jobs can also be
              changed by using the  nice  command,  the  runclass
              command, or through class-scheduling software. On a
              busy system, cache affinity is less  likely  to  be
              maintained for a thread from a time-share job whose
              priority was lowered because more time is likely to
              elapse  between  rescheduling  operations  for each
              thread. Conversely, cache affinity is  more  likely
              to  be  maintained for threads of a higher-priority
              time-share job because less  time  elapses  between
              rescheduling  operations.  Note  that the scheduler
              always  prioritizes  the  need  for  low   response
              latency  (as  demanded  by interrupts and real-time
              requests) higher than maintenance of  cache  affinity,
 regardless of the priority assigned to a timeshare
 job.   Are  there  user-defined  restrictions
              that limit where a process may run?

              If so, the kernel must schedule all threads of that
              process on CPUs in  the  restricted  set.  In  some
              cases,  user-defined  restrictions are explicit RAD
              or CPU bindings specified either in an  application
              or  by  a  command (such as runon) that was used to
              launch the program or reassign one of its  threads.

              The  set  of  CPUs  where the kernel can schedule a
              thread is also influenced by the presence of  userdefined
  processor  sets.  If  the  process was not
              explicitly started in  or  reassigned  to  a  userdefined
  processor  set, the kernel must run it and
              all of its threads only on CPUs in the default processor
 set.  Are any CPUs idle?

              The scheduler is very aggressive in its attempts to
              steal jobs from other CPUs to run on an  idle  CPU.
              This  means  that  the  scheduler will migrate processes
 or threads across RAD boundaries to give  an
              idle  CPU  work  to  do unless one of the preceding
              restrictions is in place to prevent that. For example,
  the  scheduler  does  not cross processor set
              boundaries when stealing  work  from  another  CPU,
              even  when  a CPU is idle. In general, keeping CPUs
              busy with work has higher priority than maintaining
              memory  or  cache  affinity  during  load-balancing
              operations.

       Explicit memory-allocation advice provided in  application
       code  influences  scheduling  only  to the extent that the
       preceding factors do not override  that  advice.  However,
       explicit  memory-allocation  advice does make a difference
       (and thereby can improve performance)  when  CPUs  in  the
       processor  set  where the program is running are kept busy
       but are not overloaded.

       To gather statistics with sched_stat, you typically follow
       these steps: Start up a system workload and wait for it to
       get to a steady state.  Start sched_stat with sleep as the
       specified command and some number of seconds as the specified
 cmd_arg. This causes sched_stat to gather  statistics
       for  the length of time it takes the sleep command to execute.


       For example, the following command  causes  sched_stat  to
       collect statistics for 60 seconds and then print a report:
       # /usr/sbin/sched_stat sleep 60

       If you include options on the command line,  only  statistics
 for the specified options are reported.

       If  you  specify  the  command  without  any  options, all
       options except for -R are assumed. (See  the  descriptions
       of the -f, -l, -s, and -u options in the OPTIONS section.)

NOTES [Toc] [Back]

       Running the sched_stat command has minimal impact on  system
 performance.

RESTRICTIONS [Toc] [Back]

       The  sched_stat  utility  is  subject  to  change, without
       advance notice, from one release to another.  The  utility
       is  intended mainly for use by other software applications
       included in the operating system product, kernel  developers,
  and  software  support  representatives.  Therefore,
       sched_stat should be used only interactively; any customer
       scripts  or  programs written to depend on its output data
       or display format might be broken  by  changes  in  future
       versions  of  the  utility  or  by  patches  that might be
       applied to it.

EXIT STATUS [Toc] [Back]

       Success.  An error occurred.

FILES [Toc] [Back]

       The pseudo driver that is opened by the sched_stat utility
       for RAD-related statistics gathering.

sched_stat(8)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

OPTIONS [Toc] [Back]

OPERANDS [Toc] [Back]

DESCRIPTION [Toc] [Back]

NOTES [Toc] [Back]

RESTRICTIONS [Toc] [Back]

EXIT STATUS [Toc] [Back]

FILES [Toc] [Back]

SEE ALSO [Toc] [Back]