cpr(1) cpr(1)
cpr - checkpoint and restart processes; info query; delete statefiles
cview - graphical user interface for checkpoint and restart (CPR)
cpr -c pathname -p id[:type],[id[:type]...] [ -fgku ]
cpr -i pathname ...
cpr [ -j ] -r pathname ...
cpr -D pathname ...
cview [ -display XwindowDisplay ]
IRIX Checkpoint and Restart (CPR) offers a set of user-transparent
software management tools, allowing system administrators, operators, and
users with suitable privileges to suspend a job or a set of jobs in midexecution,
and restart them later on. The jobs may be running on a
single machine or on an array of networking connected machines. CPR may
be used to enhance system availability, provide load and resource control
or balancing, and to facilitate simulation or modeling.
The cview command provides an X Windows interface to CPR, and is composed
of two decks: the Checkpoint Control Panel and the Restart Control
Panel. See the cview Help menu for more information.
Use the -c, -i, -r, and -D options to: create, query, restart, and
delete checkpoints, respectively.
Create Checkpoint [Toc] [Back]
-c Checkpoint a process or set of processes and create a statefile
directory in pathname, based on the process id specified after -p.
-f Force overwrite of an existing pathname, so existing statefiles are
replaced with new ones according to the new checkpoint.
-g Have checkpoint target processes continue running (go) after this
checkpoint is finished. This overrides the default WILL policy, and
the WILL policy specified in a user's CPR attribute file.
-k Kill checkpoint target processes after this checkpoint is finished.
This is the default WILL policy, but overrides a CONT setting in the
user's CPR attribute file (see below).
-u Use this option only when issuing a checkpoint immediately before an
operating system upgrade. This forces a save of all executable
files and DSO libraries used by the current processes, so that
target processes can be restarted in an upgraded environment. This
flag must be used again if restarted processes must be recursively
checkpointed in the new environment.
Page 1
cpr(1) cpr(1)
-p Specifies the process or set of processes to checkpoint. Processes
may have any type in the following list:
PID for Unix process and POSIX pthread ID (the default type)
GID for Unix process group ID
SID for Unix process session ID; see termio(7)
ASH for IRIX Array Session ID; see array_services(5)
HID for process hierarchy (tree) rooted at that PID
SGP for IRIX sproc shared group; see sproc(2)
If type is not given in a checkpoint request, id is interpreted to use
its default type PID. Here are some examples:
cpr -c ckpt01 -p 1111
cpr -c ckpt02 -p 2222:GID
The first example checkpoints a process with PID 1111 to the statefile
directory ./ckpt01. The second example checkpoints all processes with
process group ID 2222 to the statefile directory ./ckpt02.
Users may checkpoint a random set of processes into one statefile by
specifying more comma-separated ids (with optional type) after the -p
flag, as in this example:
cpr -c ckpt03 -p 111:GID,222,333:SID
This saves all processes with process group ID 111, process ID 222, and
process session ID 333 into the statefile directory ./ckpt03.
Only the super user and the owner of a process or set of processes (the
checkpoint owner) can checkpoint the targeted processes.
Checkpoint Info [Toc] [Back]
-i statefile ...
Provides information about existing CPR statefile(s): the statefile
revision number, process name(s), credential information of the
process, current working directory, open file information, and the
time when the checkpoint was performed.
Restart Checkpoint [Toc] [Back]
-r statefile ...
Restarts a process or set of processes from the statefile. If a
restart involves more than one processes, the restart on all
processes has to succeed before any process starts running;
otherwise, all restarts are aborted.
-j Make processes interactive and job controllable. If a checkpoint is
issued against an interactive process or a group of processes rooted
at an interactive process, it can be restarted interactively with
the -j option. It runs in the foreground, even the original process
ran in the background. Users may issue job control signals to
background the process if desired. An interactive job is defined as
Page 2
cpr(1) cpr(1)
a process with a controlling terminal; see termio(7). Only one
controlling terminal is restored even if the original process had
multiple controlling terminals.
Note that statefile remain unchanged after a restart unless users use the
-D option to delete the statefile.
A restart may fail due to a number of reasons including:
Resource Limitation: This happens when the original PID is not available
and the application may not use another PID; or when certain
application-related files, binaries, or libraries are no longer available
on the system if the REPLACE or SUBSTITUTE option was not set at
checkpoint time for missing files; or when other system resources such as
memory or disk run out due to restart.
Security and Data Integrity: Restart fails if the restarting user lacks
the proper permission to restart the statefile, or if the restart
destroys or replaces data without proper permission. The basic rule is
that only the superuser and checkpoint owner can restart the processes.
This implies that if the superuser checkpoints a process owned by a
regular user, only the superuser has permission to restart it.
Other Fatal Failures: If important parts of the original processes
cannot be restored due to any other reasons.
Delete Checkpoint [Toc] [Back]
-D statefile ...
Delete one or more statefiles. After a successful restart,
statefiles might no longer be needed, and may be removed. The
delete option removes all files associated with the statefile,
including saved open files, mapped files, pipe data, etc. Only the
superuser and checkpoint owner may delete a statefile directory.
Cview Window [Toc] [Back]
How to Checkpoint: Under the STEP I button, select a process or set of
processes from the list. To checkpoint a process group, a session group,
an IRIX array session, a process hierarchy, or an sproc shared group,
select a category from the Individual Process drop-down menu. In the
filename field below, enter the name of a directory for storing the
statefile. Click the STEP II button if you want to change checkpoint
options, such as whether to exit or continue the process, or control open
file and mapped file dispositions. Click the STEP III OK button to
initiate the checkpoint, or the Cancel Checkpoint button to discontinue.
How to Restart: Click the Restart Control Panel tab at the bottom of the
cview window. From the scrolling list of files and directories, select a
statefile to restart. Note that all files and directories are shown, not
just statefile directories. If a statefile is located somewhere besides
your home directory, change directories using the icon finder at the top.
Select any options you want, such as whether to retain the original
process ID, whether to restore the original working directory, or whether
Page 3
cpr(1) cpr(1)
to restore the original root directory. Click the OK Go Restart button
to initiate restart.
Querying a Statefile: From the scrolling list of files and directories,
select a statefile to query. At the bottom of the cview window, click
the Tell Me More About This Statefile button.
Deleting a Statefile: From the scrolling list of files and directories,
select a statefile to delete. At the bottom of the cview window, click
the Remove This Statefile button.
SIGNALS AND EVENT HANDLING
Two signals, SIGCKPT and SIGRESTART, are designed to give application
programs adequate warning to take special action upon checkpoint or at
restart time. The default action is to ignore both signals unless
applications catch the signals; see signal(2). By catching the signals,
an application gets an opportunity to set up its signal handler and be
prepared for checkpoint or restart. An application can clean up files,
flush buffers, close or reconnect socket connections, etc.
Meanwhile, the main CPR process waits as long as necessary for the
application to finish the signal handling, before cpr proceeds with
further checkpoint activities after SIGCKPT. At restart the first thing
an application runs is the SIGRESTART signal handler, if the application
is catching the signal.
However, these two signals (SIGCKPT and SIGRESTART) are not recommended
for direct use by applications wishing to be checkpointed. Instead,
applications call atcheckpoin
to register event
handlers for checkpoint and restart, and activate signal handling. This
is especially important for applications that need to register multiple
callback handlers for checkpoint or restart events.
Warning: if applications catch the two CPR signals directly, it may undo
all of the CPR signal handler registration provided by atcheckpoint(3C)
and atrestart(3C), including handlers that some libraries reserve without
the application programmer's knowledge.
statefile Directory containing images of checkpointed processes
$HOME/.cpr User-configurable options for checkpoint and restart
/etc/cpr_proto Attribute file prototype for creating $HOME/.cpr
/usr/lib/X11/app-defaults/Cview Application defaults file
/usr/lib/images/Cview.icon Image for minimized window
The $HOME/.cpr files control CPR behavior, and consist of one or more
CKPT attribute definitions, each in the following form:
CKPT IDtype IDvalue {
policy: instance: action
...
Page 4
cpr(1) cpr(1)
}
The IDtype is the same as for the -c option; see above. The IDvalue is
the process or process set ID. Both can be given as a star (*) to
represent any IDtype or IDvalue.
Here are the policy keywords and what they control:
FILE policies of handling open files
WILL actions on the original process after checkpoint
CDIR policy on the original working directory; see chdir(2)
RDIR policy on the original root directory; see chroot(2)
FORK policy on original process ID
FILE takes an instance, which is the filename.
FORK can take instance PID. If no instance is specified, the specified
action is applied to all instances.
FILE offers the following action keywords:
MERGE upon restart, reopen the file and seek to the previous offset
IGNORE upon restart, reopen the file as originally opened
APPEND upon restart, reopen the file for appending
REPLACE save file at checkpoint; replace the original file at restart
SUBSTITUTE save file at checkpoint; at restart, open the saved file as
an anonymous substitute, not touching the original file
WILL offers the following action keywords:
EXIT the original process exits after checkpoint (default action)
CONT the original process continues to run after checkpoint
CDIR and RDIR offer the following action keywords:
REPLACE restore original current working directory or root directory
(default action)
IGNORE ignore original current working directory or root directory;
restart according to new process environment
FORK offers the following action keywords:
ORIGINAL attempt to recover the original process ID (default action)
ANY it is acceptable to restart using any process ID
Due to the nature of UNIX checkpoint and restart, it is impossible to
claim that everything a process owns or connects with can be restored.
The bullet items below attempt to list what is supported, and what is
known to be not supported. For system objects not covered below, safety
decisions must be made by application programmers and users.
Page 5
cpr(1) cpr(1)
The following system objects are checkpoint-safe:
o UNIX processes, process groups, terminal control sessions, IRIX
array sessions, process hierarchies, sproc(2) groups, POSIX pthreads
(pthread_create(3P)) and random process sets
o all user memory area, including user stack and data regions
o system states, including process and user information, signal
disposition and signal mask, scheduling information, owner
credentials, accounting data, resource limits, current directory,
root directory, locked memory, and user semaphores
o system calls, if applications handle return values and error numbers
correctly, although slow system calls may return partial results
o undelivered and queued signals are saved at checkpoint and delivered
at restart
o open files (including NFS-mounted files), mapped files, file locks,
and inherited file descriptors
o special files /dev/tty, /dev/console, /dev/zero, /dev/null,
ccsync(7M)
o open pipes, pipeline data and streams pipe read and write message
modes
o System V shared memory
o POSIX semaphores (psema(D3X))
o semaphore and lock arenas (usinit(3P))
o jobs started with CHALLENGEarray services, provided they have a
unique ASH number; see array_services(5)
o applications using node-lock licenses; see IRIX Checkpoint and
Restart Operation Guide on what to do for applications using
floating licenses
o applications using the prctl() PR_ATTACHADDR option; see prctl(2)
o applications using blockproc and unblockproc; see blockproc(2)
o R10000 counters; see libperfex(3C) and perfex(1)
The following system objects are not checkpoint-safe:
o network socket connections; see socket(2)
Page 6
cpr(1) cpr(1)
o X terminals and X11 client sessions
o special devices such as tape drivers and CDROM
o files opened with setuid credential that cannot be reestablished
o System V semaphores and messages; see semop(2) and msgop(2)
o memory mapped files using the /dev/mmem file; see mmap(2)
o open directories
atcheckpoint(3C), atrestart(3C), ckpt_create(3), ckpt_remove(3),
ckpt_restart(3), ckpt_stat(3)
IRIX Checkpoint and Restart Operation Guide
PPPPaaaaggggeeee 7777 [ Back ]
|