uvm - virtual memory system external interface
#include <sys/param.h>
#include <uvm/uvm.h>
The UVM virtual memory system manages access to the computer's memory resources.
User processes and the kernel access these resources through
UVM's external interface. UVM's external interface includes
functions
that:
- initialise UVM sub-systems
- manage virtual address spaces
- resolve page faults
- memory map files and devices
- perform uio-based I/O to virtual memory
- allocate and free kernel virtual memory
- allocate and free physical memory
In addition to exporting these services, UVM has two kernellevel processes:
pagedaemon and swapper. The pagedaemon process
sleeps until
physical memory becomes scarce. When that happens, pagedaemon is awoken.
It scans physical memory, paging out and freeing memory that
has not been
recently used. The swapper process swaps in runnable processes that are
currently swapped out, if there is room.
There are also several miscellaneous functions.
void
uvm_init(void);
void
uvm_init_limits(struct proc *p);
void
uvm_setpagesize(void);
void
uvm_swap_init(void);
The uvm_init() function sets up the UVM system at system
boot time, after
the copyright has been printed. It initialises global
state, the page,
map, kernel virtual memory state, machine-dependent physical
map, kernel
memory allocator, pager and anonymous memory sub-systems,
and then enables
paging of kernel objects. uvm_init() must be called
after machinedependent
code has registered some free RAM with the
uvm_page_physload()
function.
The uvm_init_limits() function initialises process limits
for the named
process. This is for use by the system startup for process
zero, before
any other processes are created.
The uvm_setpagesize() function initialises the uvmexp members pagesize
(if not already done by machine-dependent code), pageshift
and pagemask.
It should be called by machine-dependent code early in the
pmap_init(9)
call.
The uvm_swap_init() function initialises the swap sub-system.
VIRTUAL ADDRESS SPACE MANAGEMENT [Toc] [Back] int
uvm_map(vm_map_t map, vaddr_t *startp, vsize_t size,
struct uvm_object *uobj, voff_t uoffset, vsize_t
alignment,
uvm_flag_t flags);
int
uvm_map_pageable(vm_map_t map, vaddr_t start, vaddr_t end,
boolean_t new_pageable, int lockflags);
int
uvm_map_pageable_all(vm_map_t map, int flags, vsize_t
limit);
boolean_t
uvm_map_checkprot(vm_map_t map, vaddr_t start, vaddr_t end,
vm_prot_t protection);
int
uvm_map_protect(vm_map_t map, vaddr_t start, vaddr_t end,
vm_prot_t new_prot, boolean_t set_max);
int
uvm_deallocate(vm_map_t map, vaddr_t start, vsize_t size);
struct vmspace *
uvmspace_alloc(vaddr_t min, vaddr_t max, int pageable);
void
uvmspace_exec(struct proc *p, vaddr_t start, vaddr_t end);
struct vmspace *
uvmspace_fork(struct vmspace *vm);
void
uvmspace_free(struct vmspace *vm1);
void
uvmspace_share(struct proc *p1, struct proc *p2);
void
uvmspace_unshare(struct proc *p);
int
UVM_MAPFLAG(vm_prot_t prot, vm_prot_t maxprot, vm_inherit_t
inh,
int advice, int flags);
The uvm_map() function establishes a valid mapping in map
map, which must
be unlocked. The new mapping has size size, which must be
in PAGE_SIZE
units. If alignment is non-zero, it describes the required
alignment of
the list, in power-of-two notation. The uobj and uoffset
arguments can
have four meanings. When uobj is NULL and uoffset is
UVM_UNKNOWN_OFFSET,
uvm_map() does not use the machine-dependent PMAP_PREFER
function. If
uoffset is any other value, it is used as the hint to
PMAP_PREFER. When
uobj is not NULL and uoffset is UVM_UNKNOWN_OFFSET,
uvm_map() finds the
offset based upon the virtual address, passed as startp. If
uoffset is
any other value, we are doing a normal mapping at this offset. The start
address of the map will be returned in startp.
flags passed to uvm_map() are typically created using the
UVM_MAPFLAG()
macro, which uses the following values. The prot and
maxprot can take
the following values:
#define UVM_PROT_MASK 0x07 /* protection mask */
#define UVM_PROT_NONE 0x00 /* protection none */
#define UVM_PROT_ALL 0x07 /* everything */
#define UVM_PROT_READ 0x01 /* read */
#define UVM_PROT_WRITE 0x02 /* write */
#define UVM_PROT_EXEC 0x04 /* exec */
#define UVM_PROT_R 0x01 /* read */
#define UVM_PROT_W 0x02 /* write */
#define UVM_PROT_RW 0x03 /* read-write */
#define UVM_PROT_X 0x04 /* exec */
#define UVM_PROT_RX 0x05 /* read-exec */
#define UVM_PROT_WX 0x06 /* write-exec */
#define UVM_PROT_RWX 0x07 /* read-write-exec */
The values that inh can take are:
#define UVM_INH_MASK 0x30 /* inherit mask */
#define UVM_INH_SHARE 0x00 /* "share" */
#define UVM_INH_COPY 0x10 /* "copy" */
#define UVM_INH_NONE 0x20 /* "none" */
#define UVM_INH_DONATE 0x30 /* "donate" << not used */
The values that advice can take are:
#define UVM_ADV_NORMAL 0x0 /* 'normal' */
#define UVM_ADV_RANDOM 0x1 /* 'random' */
#define UVM_ADV_SEQUENTIAL 0x2 /* 'sequential' */
#define UVM_ADV_MASK 0x7 /* mask */
The values that flags can take are:
#define UVM_FLAG_FIXED 0x010000 /* find space */
#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */
#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries
*/
#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag
*/
#define UVM_FLAG_AMAPPAD 0x100000 /* bss: pad amap to reduce
malloc() */
#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock
map */
The UVM_MAPFLAG macro arguments can be combined with an or
operator.
There are several special purpose macros for checking protection combinations,
e.g., the UVM_PROT_WX macro. There are also some additional
macros to extract bits from the flags. The UVM_PROTECTION,
UVM_INHERIT,
UVM_MAXPROTECTION and UVM_ADVICE macros return the protection, inheritance,
maximum protection and advice, respectively.
uvm_map() returns a
standard UVM return value.
The uvm_map_pageable() function changes the pageability of
the pages in
the range from start to end in map map to new_pageable. The
uvm_map_pageable_all() function changes the pageability of
all mapped regions.
If limit is non-zero and pmap_wired_count() is implemented,
KERN_NO_SPACE is returned if the amount of wired pages exceed limit. The
map is locked on entry if lockflags contain UVM_LK_ENTER,
and locked on
exit if lockflags contain UVM_LK_EXIT. uvm_map_pageable()
and
uvm_map_pageable_all() return a standard UVM return value.
The uvm_map_checkprot() function checks the protection of
the range from
start to end in map map against protection. This returns
either TRUE or
FALSE.
The uvm_map_protect() function changes the protection start
to end in map
map to new_prot, also setting the maximum protection to the
region to
new_prot if set_max is non-zero. This function returns a
standard UVM
return value.
The uvm_deallocate() function deallocates kernel memory in
map map from
address start to start + size.
The uvmspace_alloc() function allocates and returns a new
address space,
with ranges from min to max, setting the pageability of the
address space
to pageable.
The uvmspace_exec() function either reuses the address space
of process p
if there are no other references to it, or creates a new one
with
uvmspace_alloc(). The range of valid addresses in the address space is
reset to start through end.
The uvmspace_fork() function creates and returns a new address space
based upon the vm1 address space, typically used when allocating an address
space for a child process.
The uvmspace_free() function lowers the reference count on
the address
space vm, freeing the data structures if there are no other
references.
The uvmspace_share() function causes process p2 to share the
address
space of p1.
The uvmspace_unshare() function ensures that process p has
its own, unshared
address space, by creating a new one if necessary by
calling
uvmspace_fork().
int
uvm_fault(vm_map_t orig_map, vaddr_t vaddr, vm_fault_t
fault_type,
vm_prot_t access_type);
The uvm_fault() function is the main entry point for faults.
It takes
orig_map as the map the fault originated in, a vaddr offset
into the map
the fault occurred, fault_type describing the type of fault,
and
access_type describing the type of access requested.
uvm_fault() returns
a standard UVM return value.
MEMORY MAPPING FILES AND DEVICES [Toc] [Back] struct uvm_object *
uvn_attach(void *arg, vm_prot_t accessprot);
void
uvm_vnp_setsize(struct vnode *vp, voff_t newsize);
void
uvm_vnp_sync(struct mount *mp);
void
uvm_vnp_terminate(struct vnode *vp);
boolean_t
uvm_vnp_uncache(struct vnode *vp);
The uvn_attach() function attaches a UVM object to vnode
arg, creating
the object if necessary. The object is returned.
The uvm_vnp_setsize() function sets the size of vnode vp to
newsize.
Caller must hold a reference to the vnode. If the vnode
shrinks, pages
no longer used are discarded. This function will be removed
when the
file system and VM buffer caches are merged.
The uvm_vnp_sync() function flushes dirty vnodes from either
the mount
point passed in mp, or all dirty vnodes if mp is NULL. This
function
will be removed when the file system and VM buffer caches
are merged.
The uvm_vnp_terminate() function frees all VM resources allocated to vnode
vp. If the vnode still has references, it will not be
destroyed;
however all future operations using this vnode will fail.
This function
will be removed when the file system and VM buffer caches
are merged.
The uvm_vnp_uncache() function disables vnode vp from persisting when all
references are freed. This function will be removed when
the file-system
and UVM caches are unified. Returns true if there is no active vnode.
VIRTUAL MEMORY I/O
int
uvm_io(vm_map_t map, struct uio *uio);
The uvm_io() function performs the I/O described in uio on
the memory described
in map.
ALLOCATION OF KERNEL MEMORY [Toc] [Back] vaddr_t
uvm_km_alloc(vm_map_t map, vsize_t size);
vaddr_t
uvm_km_zalloc(vm_map_t map, vsize_t size);
vaddr_t
uvm_km_alloc1(vm_map_t map, vsize_t size, boolean_t zeroit);
vaddr_t
uvm_km_kmemalloc(vm_map_t map, struct uvm_object *obj,
vsize_t size,
int flags);
vaddr_t
uvm_km_valloc(vm_map_t map, vsize_t size);
vaddr_t
uvm_km_valloc_wait(vm_map_t map, vsize_t size);
struct vm_map *
uvm_km_suballoc(vm_map_t map, vaddr_t *min, vaddr_t *max,
vsize_t size,
int flags, boolean_t fixed, vm_map_t submap);
void
uvm_km_free(vm_map_t map, vaddr_t addr, vsize_t size);
void
uvm_km_free_wakeup(vm_map_t map, vaddr_t addr, vsize_t
size);
The uvm_km_alloc() and uvm_km_zalloc() functions allocate
size bytes of
wired kernel memory in map map. In addition to allocation,
uvm_km_zalloc() zeros the memory. Both of these functions
are defined as
macros in terms of uvm_km_alloc1(), and should almost always
be used in
preference to uvm_km_alloc1().
The uvm_km_alloc1() function allocates and returns size
bytes of wired
memory in the kernel map, zeroing the memory if the zeroit
argument is
non-zero.
The uvm_km_kmemalloc() function allocates and returns size
bytes of wired
kernel memory into obj. The flags can be any of:
#define UVM_KMF_NOWAIT 0x1 /* matches
M_NOWAIT */
#define UVM_KMF_VALLOC 0x2 /* allocate
VA only */
#define UVM_KMF_TRYLOCK UVM_FLAG_TRYLOCK /* try locking only */
The UVM_KMF_NOWAIT flag causes uvm_km_kmemalloc() to return
immediately
if no memory is available. UVM_KMF_VALLOC causes no pages
to be allocated,
only a virtual address. UVM_KMF_TRYLOCK causes
uvm_km_kmemalloc() to
use simple_lock_try() when locking maps.
The uvm_km_valloc() and uvm_km_valloc_wait() functions return a newly allocated
zero-filled address in the kernel map of size size.
uvm_km_valloc_wait() will also wait for kernel memory to become available,
if there is a memory shortage.
The uvm_km_suballoc() function allocates submap (with the
specified
flags, as described above) from map, creating a new map if
submap is
NULL. The addresses of the submap can be specified exactly
by setting
the fixed argument to non-zero, which causes the min argument to specify
the beginning of the address in the submap. If fixed is zero, any address
of size size will be allocated from map and the start
and end addresses
returned in min and max.
The uvm_km_free() and uvm_km_free_wakeup() functions free
size bytes of
memory in the kernel map, starting at address addr.
uvm_km_free_wakeup()
calls thread_wakeup() on the map before unlocking the map.
ALLOCATION OF PHYSICAL MEMORY [Toc] [Back] struct vm_page *
uvm_pagealloc(struct uvm_object *uobj, voff_t off, struct
vm_anon *anon,
int flags);
void
uvm_pagerealloc(struct vm_page *pg, struct uvm_object
*newobj,
voff_t newoff);
void
uvm_pagefree(struct vm_page *pg);
int
uvm_pglistalloc(psize_t size, paddr_t low, paddr_t high,
paddr_t alignment, paddr_t boundary, struct pglist
*rlist,
int nsegs, int waitok);
void
uvm_pglistfree(struct pglist *list);
void
uvm_page_physload(vaddr_t start, vaddr_t end, vaddr_t
avail_start,
vaddr_t avail_end, int free_list);
The uvm_pagealloc() function allocates a page of memory at
virtual address
off in either the object uobj or the anonymous memory
anon, which
must be locked by the caller. Only one of off and uobj can
be non NULL.
The flags can be any of:
#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve
pages */
#define UVM_PGA_ZERO 0x0002 /* returned page
must be zeroed */
The UVM_PGA_USERESERVE flag means to allocate a page even if
that will
result in the number of free pages being lower than
uvmexp.reserve_pagedaemon (if the current thread is the
pagedaemon) or
uvmexp.reserve_kernel (if the current thread is not the
pagedaemon). The
UVM_PGA_ZERO flag causes the returned page to be filled with
zeroes, either
by allocating it from a pool of pre-zeroed pages or by
zeroing it
in-line as necessary.
The uvm_pagerealloc() function reallocates page pg to a new
object
newobj, at a new offset newoff, and returns NULL when no
page can be
found.
The uvm_pagefree() function frees the physical page pg.
The uvm_pglistalloc() function allocates a list of pages for
size size
byte under various constraints. low and high describe the
lowest and
highest addresses acceptable for the list. If alignment is
non-zero, it
describes the required alignment of the list, in power-oftwo notation.
If boundary is non-zero, no segment of the list may cross
this power-oftwo
boundary, relative to zero. The nsegs and waitok arguments are currently
ignored.
The uvm_pglistfree() function frees the list of pages pointed to by list.
The uvm_page_physload() function loads physical memory segments into VM
space on the specified free_list. uvm_page_physload() must
be called at
system boot time to set up physical memory management pages.
The arguments
describe the start and end of the physical addresses
of the segment,
and the available start and end addresses of pages not
already in
use.
void
uvm_pageout(void *arg);
void
uvm_scheduler(void);
void
uvm_swapin(struct proc *p);
The uvm_pageout() function is the main loop for the page
daemon. The arg
argument is ignored.
The uvm_scheduler() function is the process zero main loop,
which is to
be called after the system has finished starting other processes.
uvm_scheduler() handles the swapping in of runnable, swapped
out processes
in priority order.
The uvm_swapin() function swaps in the named process.
MISCELLANEOUS FUNCTIONS [Toc] [Back] struct uvm_object *
uao_create(vsize_t size, int flags);
void
uao_detach(struct uvm_object *uobj);
void
uao_reference(struct uvm_object *uobj);
boolean_t
uvm_chgkprot(caddr_t addr, size_t len, int rw);
void
uvm_kernacc(caddr_t addr, size_t len, int rw);
void
uvm_vslock(struct proc *p, caddr_t addr, size_t len,
vm_prot_t access_type);
void
uvm_vsunlock(struct proc *p, caddr_t addr, size_t len);
void
uvm_meter();
int
uvm_sysctl(int *name, u_int namelen, void *oldp, size_t
*oldlenp,
void *newp, size_t newlen, struct proc *p);
void
uvm_fork(struct proc *p1, struct proc *p2, boolean_t shared,
void *stack,
size_t stacksize, void (*func)(void *arg), , void
*arg);
int
uvm_grow(struct proc *p, vaddr_t sp);
int
uvm_coredump(struct proc *p, struct vnode *vp, struct ucred
*cred,
struct core *chdr);
The uao_create(), uao_detach() and uao_reference() functions
operate on
anonymous memory objects, such as those used to support System V shared
memory. uao_create() returns an object of size size with
flags:
#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */
#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel
swap */
which can only be used once each at system boot time.
uao_reference()
creates an additional reference to the named anonymous memory object.
uao_detach() removes a reference from the named anonymous
memory object,
destroying it if removing the last reference.
The uvm_chgkprot() function changes the protection of kernel
memory from
addr to addr + len to the value of rw. This is primarily
useful for debuggers,
for setting breakpoints. This function is only
available with
options KGDB.
The uvm_kernacc() function checks the access at address addr
to addr +
len for rw access, in the kernel address space.
The uvm_vslock() and uvm_vsunlock() functions control the
wiring and unwiring
of pages for process p from addr to addr + len. The
access_type
argument of uvm_vslock() is passed to uvm_fault(). These
functions are
normally used to wire memory for I/O.
The uvm_meter() function calculates the load average and
wakes up the
swapper if necessary.
The uvm_sysctl() function provides support for the CTL_VM
domain of the
sysctl(3) hierarchy. uvm_sysctl() handles the VM_LOADAVG,
VM_METER and
VM_UVMEXP calls, which return the current load averages,
calculates current
VM totals, and returns the uvmexp structure respectively. The load
averages are accessed from userland using the getloadavg(3)
function.
The uvmexp structure has all global state of the UVM system,
and has the
following members:
/* vm_page constants */
int pagesize; /* size of a page (PAGE_SIZE): must be power
of 2 */
int pagemask; /* page mask */
int pageshift; /* page shift */
/* vm_page counters */
int npages; /* number of pages we manage */
int free; /* number of free pages */
int active; /* number of active pages */
int inactive; /* number of pages that we free'd but may
want back */
int paging; /* number of pages in the process of being
paged out */
int wired; /* number of wired pages */
int reserve_pagedaemon; /* number of pages reserved for
pagedaemon */
int reserve_kernel; /* number of pages reserved for kernel
*/
/* pageout params */
int freemin; /* min number of free pages */
int freetarg; /* target number of free pages */
int inactarg; /* target number of inactive pages */
int wiredmax; /* max number of wired pages */
/* swap */
int nswapdev; /* number of configured swap devices in system */
int swpages; /* number of PAGE_SIZE'ed swap pages */
int swpginuse; /* number of swap pages in use */
int nswget; /* number of times fault calls
uvm_swap_get() */
int nanon; /* number total of anons in system */
int nfreeanon; /* number of free anons */
/* stat counters */
int faults; /* page fault count */
int traps; /* trap count */
int intrs; /* interrupt count */
int swtch; /* context switch count */
int softs; /* software interrupt count */
int syscalls; /* system calls */
int pageins; /* pagein operation count */
/* pageouts are in pdpageouts below
*/
int swapins; /* swapins */
int swapouts; /* swapouts */
int pgswapin; /* pages swapped in */
int pgswapout; /* pages swapped out */
int forks; /* forks */
int forks_ppwait; /* forks where parent waits */
int forks_sharevm; /* forks where vmspace is shared */
/* fault subcounters */
int fltnoram; /* number of times fault was out of ram */
int fltnoanon; /* number of times fault was out of anons */
int fltpgwait; /* number of times fault had to wait on a
page */
int fltpgrele; /* number of times fault found a released
page */
int fltrelck; /* number of times fault relock called */
int fltrelckok; /* number of times fault relock is a success
*/
int fltanget; /* number of times fault gets anon page */
int fltanretry; /* number of times fault retrys an anon get
*/
int fltamcopy; /* number of times fault clears "needs copy"
*/
int fltnamap; /* number of times fault maps a neighbor
anon page */
int fltnomap; /* number of times fault maps a neighbor obj
page */
int fltlget; /* number of times fault does a locked
pgo_get */
int fltget; /* number of times fault does an unlocked
get */
int flt_anon; /* number of times fault anon (case 1a) */
int flt_acow; /* number of times fault anon cow (case 1b)
*/
int flt_obj; /* number of times fault is on object page
(2a) */
int flt_prcopy; /* number of times fault promotes with copy
(2b) */
int flt_przero; /* number of times fault promotes with zerofill (2b) */
/* daemon counters */
int pdwoke; /* number of times daemon woke up */
int pdrevs; /* number of times daemon rev'd clock hand
*/
int pdswout; /* number of times daemon called for swapout
*/
int pdfreed; /* number of pages daemon freed since boot
*/
int pdscans; /* number of pages daemon scanned since boot
*/
int pdanscan; /* number of anonymous pages scanned by daemon */
int pdobscan; /* number of object pages scanned by daemon
*/
int pdreact; /* number of pages daemon reactivated since
boot */
int pdbusy; /* number of times daemon found a busy page
*/
int pdpageouts; /* number of times daemon started a pageout
*/
int pdpending; /* number of times daemon got a pending
pagout */
int pddeact; /* number of pages daemon deactivates */
The uvm_fork() function forks a virtual address space for
process' (old)
p1 and (new) p2. If the shared argument is non zero, p1
shares its address
space with p2, otherwise a new address space is created. The
stack, stacksize, func and arg arguments are passed to the
machine-dependent
cpu_fork() function. The uvm_fork() function currently
has no return
value, and thus cannot fail.
The uvm_grow() function increases the stack segment of process p to include
sp.
The uvm_coredump() function generates a coredump on vnode vp
for process
p with credentials cred and core header description in chdr.
STANDARD UVM RETURN VALUES [Toc] [Back] This section documents the standard return values that
callers of UVM
functions can expect. They are derived from the Mach VM
values of the
same function. The full list of values can be seen below.
#define KERN_SUCCESS 0
#define KERN_INVALID_ADDRESS 1
#define KERN_PROTECTION_FAILURE 2
#define KERN_NO_SPACE 3
#define KERN_INVALID_ARGUMENT 4
#define KERN_FAILURE 5
#define KERN_RESOURCE_SHORTAGE 6
#define KERN_NOT_RECEIVER 7
#define KERN_NO_ACCESS 8
#define KERN_PAGES_LOCKED 9
Note that KERN_NOT_RECEIVER and KERN_PAGES_LOCKED values are
not actually
returned by the UVM code.
The structure and types whose names begin with ``vm_'' were
named so UVM
could coexist with BSD VM during the early development
stages. They will
be renamed to ``uvm_''.
getloadavg(3), kvm(3), sysctl(3), ddb(4), options(4),
pmap(9)
UVM is a new VM system developed at Washington University in
St. Louis
(Missouri). UVM's roots lie partly in the Mach-based 4.4BSD
VM system,
the FreeBSD VM system, and the SunOS4 VM system. UVM's basic structure
is based on the 4.4BSD VM system. UVM's new anonymous memory system is
based on the anonymous memory system found in the SunOS4 VM
(as described
in papers published by Sun Microsystems, Inc.). UVM also
includes a number
of features new to BSD including page loanout, map entry
passing,
simplified copy-on-write, and clustered anonymous memory pageout. UVM is
also further documented in an August 1998 dissertation by
Charles D. Cranor.
UVM appeared in OpenBSD 2.9.
Charles D. Cranor <[email protected]> designed and implemented UVM.
Matthew Green <[email protected]> wrote the swap-space management code.
Chuck Silvers <[email protected]> implemented the aobj pager,
thus allowing
UVM to support System V shared memory and process swapping.
Artur Grabowski <[email protected]> handled the logistical issues involved
with merging UVM into the OpenBSD source tree.
The uvm_fork() function should be able to fail in low memory
conditions.
OpenBSD 3.6 March 26, 2000
[ Back ] |