vnode, vcount, vref, VREF, vrele, vget, vput, vhold, VHOLD, holdrele,
HOLDRELE, getnewvnode, ungetnewvnode, vrecycle, vgone, vflush, vaccess,
checkalias, bdevvp, cdevvp, vfinddev, vdevgone, vwakeup, vflushbuf,
vinvalbuf, vtruncbuf, vprint - kernel representation of a file or directory
#include <sys/param.h>
#include <sys/vnode.h>
int
vcount(struct vnode *vp);
void
vref(struct vnode *vp);
void
VREF(struct vnode *vp);
void
vrele(struct vnode *vp);
int
vget(struct vnode *vp, int lockflag);
void
vput(struct vnode *vp);
void
vhold(struct vnode *vp);
void
VHOLD(struct vnode *vp);
void
holdrele(struct vnode *vp);
void
HOLDRELE(struct vnode *vp);
int
getnewvnode(enum vtagtype tag, struct mount *mp, int (**vops)(void *),
struct vnode **vpp);
void
ungetnewvnode(struct vnode *vp);
int
vrecycle(struct vnode *vp, struct simplelock *inter_lkp, struct proc *p);
void
vgone(struct vnode *vp);
int
vflush(struct mount *mp, struct vnode *skipvp, int flags);
int
vaccess(enum vtype type, mode_t file_mode, uid_t uid, gid_t gid,
mode_t acc_mode, struct ucred *cred);
struct vnode *
checkalias(struct vnode *vp, dev_t nvp_rdev, struct mount *mp);
int
bdevvp(dev_t dev, struct vnode **vpp);
int
cdevvp(dev_t dev, struct vnode **vpp);
int
vfinddev(dev_t dev, enum vtype, struct vnode **vpp);
void
vdevgone(int maj, int minl, int minh, enum vtype type);
void
vwakeup(struct buf *bp);
void
vflushbuf(struct vnode *vp, int sync);
int
vinvalbuf(struct vnode *vp, int flags, struct ucred *cred,
struct proc *p, int slpflag, int slptimeo);
int
vtruncbuf(struct vnode *vp, daddr_t lbn, int slpflag, int slptimeo);
void
vprint(char *label, struct vnode *vp);
The vnode is the focus of all file activity in NetBSD. There is a unique
vnode allocated for each active file, directory, mounted-on file, fifo,
domain socket, symbolic link and device. The kernel has no concept of a
file's structure and so it relies on the information stored in the vnode
to describe the file. Thus, the vnode associated with a file holds all
the adminstration information pertaining to it.
When a process requests an operation on a file, the vfs interface passes
control to a file system type dependent function to carry out the operation.
If the file system type dependent function finds that a vnode representing
the file is not in main memory, it dynamically allocates a new
vnode from the system main memory pool. Once allocated, the vnode is
attached to the data structure pointer associated with the cause of the
vnode allocation and it remains resident in the main memory until the
system decides that it is no longer needed and can be recycled.
The vnode has the following structure:
struct vnode {
struct uvm_object v_uobj; /* uvm object */
#define v_usecount v_uobj.uo_refs
#define v_interlock v_uobj.vmobjlock
voff_t v_size; /* size of file */
int v_flag; /* flags */
int v_numoutput; /* num pending writes */
long v_writecount; /* ref count of writers */
long v_holdcnt; /* page buffer refs */
daddr_t v_lastr; /* last read */
u_long v_id; /* capability id */
struct mount *v_mount; /* ptr to vfs we are in */
int (**v_op)(void *); /* vnode ops vector */
TAILQ_ENTRY(vnode) v_freelist; /* vnode freelist */
LIST_ENTRY(vnode) v_mntvnodes; /* vnodes for mount pt */
struct buflists v_cleanblkhd; /* clean blocklist head */
struct buflists v_dirtyblkhd; /* dirty blocklist head */
LIST_ENTRY(vnode) v_synclist; /* dirty vnodes */
union {
struct mount *vu_mountedhere;/* ptr to mounted vfs */
struct socket *vu_socket; /* unix ipc (VSOCK) */
struct specinfo *vu_specinfo; /* device (VCHR, VBLK) */
struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
} v_un;
#define v_mountedhere v_un.vu_mountedhere
#define v_socket v_un.vu_socket
#define v_specinfo v_un.vu_specinfo
#define v_fifoinfo v_un.vu_fifoinfo
struct nqlease *v_lease; /* Soft ref to lease */
enum vtype v_type; /* vnode type */
enum vtagtype v_tag; /* underlying data type */
struct lock v_lock; /* lock for this vnode */
struct lock *v_vnlock; /* ptr to vnode lock */
void *v_data; /* private data for fs */
};
Most members of the vnode structure should be treated as opaque and only
manipulated using the proper functions. There are some rather common
exceptions detailed throughout this page.
Files and file systems are inextricably linked with the virtual memory
system and v_uobj contains the data maintained by the virtual memory system.
For compatibility with code written before the integration of
uvm(9) into NetBSD C-preprocessor directives are used to alias the members
of v_uobj.
Vnode flags are recorded by v_flag. Valid flags are:
VROOT This vnode is the root of its file system.
VTEXT This vnode is a pure text prototype
VEXECMAP This vnode has executable mappings
VSYSTEM This vnode being used by kernel; only used to skip the
vflush() operation quota files.
VISTTY This vnode represents a tty; used when reading dead
vnodes.
VXLOCK This vnode is currently locked to change underlying
type.
VXWANT A process is waiting for this vnode.
VBWAIT Waiting for output associated with this vnode to complete.
VALIASED This vnode has an alias.
VDIROP This vnode is involved in a directory operation. This
flag is used exclusively by LFS.
VLAYER This vnode is on a layer filesystem.
VONWORKLST This vnode is on syncer work-list.
VDIRTY This vnode possibly has dirty pages.
The VXLOCK flag is used to prevent multiple processes from entering the
vnode reclamation code. It is also used as a flag to indicate that reclamation
is in progress. The VXWANT flag is set by threads that wish to be
awaken when reclamation is finished. Before v_flag can be modified, the
v_interlock simplelock must be acquired. See lock(9) for details on the
kernel locking API.
Each vnode has three reference counts: v_usecount, v_writecount and
v_holdcnt. The first is the number of active references within the kernel
to the vnode. This count is maintained by vref(), vrele(), and
vput(). The second is the number of active references within the kernel
to the vnode performing write access to the file. It is maintained by
the open(2) and close(2) system calls. The third is the number of references
within the kernel requiring the vnode to remain active and not be
recycled. This count is maintained by vhold() and holdrele(). When both
the v_usecount and v_holdcnt reach zero, the vnode is recycled to the
freelist and may be reused for another file. The transition to and from
the freelist is handled by getnewvnode(), ungetnewvnode() and vrecycle().
Access to v_usecount, v_writecount and v_holdcnt is also protected by the
v_interlock simplelock.
The number of pending synchronous and asynchronous writes on the vnode
are recorded in v_numoutput. It is used by fsync(2) to wait for all
writes to complete before returning to the user. Its value must only be
modified at splbio. See spl(9). It does not track the number of dirty
buffers attached to the vnode.
Every time a vnode is reassigned to a new file, the vnode capability
identifier v_id is changed. It is used to maintain the name lookup cache
consistency by providing a unique <vnode *,v_id> tuple without requiring
the cache to hold a reference. The name lookup cache can later compare
the vnode's capability identifier to its copy and see if the vnode still
points to the same file. See namecache(9) for details on the name lookup
cache.
The link to the file system which owns the vnode is recorded by v_mount.
See vfsops(9) for further information of file system mount status.
The v_op pointer points to its vnode operations vector. This vector
describes what operations can be done to the file associated with the
vnode. The system maintains one vnode operations vector for each file
system type configured into the kernel. The vnode operations vector contains
a pointer to a function for each operation supported by the file
system. See vnodeops(9) for a description of vnode operations.
When not in use, vnodes are kept on the freelist through v_freelist. The
vnodes still reference valid files but may be reused to refer to a new
file at any time. Often, these vnodes are also held in caches in the
system, such as the name lookup cache. When a valid vnode which is on
the freelist is used again, the user must call vget() to increment the
reference count and retrieve it from the freelist. When a user wants a
new vnode for another file getnewvnode() is invoked to remove a vnode
from the freelist and initialise it for the new file.
The type of object the vnode represents is recorded by v_type. It is
used by generic code to perform checks to ensure operations are performed
on valid file system objects. Valid types are:
VNON The vnode has no type.
VREG The vnode represents a regular file.
VDIR The vnode represents a directory.
VBLK The vnode represents a block special device.
VCHR The vnode represents a character special device.
VLNK The vnode represents a symbolic link.
VSOCK The vnode represents a socket.
VFIFO The vnode represents a pipe.
VBAD The vnode represents a bad file (not currently used).
Vnode tag types are used by external programs only (eg pstat(8)), and
should never be inspected by the kernel. Its use is deprecated since new
v_tag values cannot be defined for loadable file systems. The v_tag member
is read-only. Valid tag types are:
VT_NON non file system
VT_UFS universal file system
VT_NFS network file system
VT_MFS memory file system
VT_MSDOSFS FAT file system
VT_LFS log-structured file system
VT_LOFS loopback file system
VT_FDESC file descriptor file system
VT_PORTAL portal daemon
VT_NULL null file system layer
VT_UMAP sample file system layer
VT_KERNFS kernel interface file system
VT_PROCFS process interface file system
VT_AFS AFS file system
VT_ISOFS ISO file system(s)
VT_UNION union file system
VT_ADOSFS Amiga file system
VT_EXT2FS Linux's EXT2 file system
VT_CODA Coda file system
VT_FILECORE filecore file system
VT_NTFS Microsoft NT's file system
VT_VFS virtual file system
VT_OVERLAY overlay file system
All vnode locking operations use v_vnlock. This lock is acquired by
calling vn_lock(9) and released by calling vn_unlock(9). The vnode locking
operation is complicated because it is used for many purposes. Sometimes
it is used to bundle a series of vnode operations (see vnodeops(9))
into an atomic group. Many file systems rely on it to prevent race conditions
in updating file system type specific data structures rather than
using their own private locks. The vnode lock operates as a multiplereader
(shared-access lock) or single-writer lock (exclusive access
lock). The lock may be held while sleeping. While the v_vnlock is
acquired, the holder is guaranteed that the vnode will not be reclaimed
or invalidated. Most file system functions require that you hold the
vnode lock on entry. See lock(9) for details on the kernel locking API.
For leaf file systems (such as ffs, lfs, msdosfs, etc), v_vnlock will
point to v_lock. For stacked filesystems, v_vnlock will generally point
to v_vlock of the lowest file system. Additionally, the implementation
of the vnode lock is the responsibility of the individual file systems
and v_vnlock may also be NULL indicating that a leaf node does not export
a lock for vnode locking. In this case, stacked file systems (such as
nullfs) must call the underlying file system directly for locking.
Each file system underlying a vnode allocates its own private area and
hangs it from v_data. If non-null, this area is freed by getnewvnode().
Most functions discussed in this page that operate on vnodes cannot be
called from interrupt context. The members v_numoutput, v_holdcnt,
v_dirtyblkhd, v_cleanblkhd, v_freelist, and v_synclist are modified in
interrupt context and must be protected by splbio(9) unless it is certain
that there is no chance an interrupt handler will modify them. The vnode
lock must not be acquired within interrupt context.
vcount(vp)
Calculate the total number of reference counts to a special
device with vnode vp.
vref(vp)
Increment v_usecount of the vnode vp. Any kernel thread system
which uses a vnode (e.g. during the operation of some algorithm
or to store in a data structure) should call vref().
VREF(vp)
This function is an alias for vref().
vrele(vp)
Decrement v_usecount of unlocked vnode vp. Any code in the system
which is using a vnode should call vrele() when it is finished
with the vnode. If v_usecount of the vnode reaches zero
and v_holdcnt is greater than zero, the vnode is placed on the
holdlist. If both v_usecount and v_holdcnt are zero, the vnode
is placed on the freelist.
vget(vp, lockflags)
Reclaim vnode vp from the freelist, increment its reference
count and lock it. The argument lockflags specifies the
lockmgr(9) flags used to lock the vnode. If the VXLOCK is set
in vp's v_flag, vnode vp is being recycled in vgone() and the
calling thread sleeps until the transition is complete. When it
is awakened, an error is returned to indicate that the vnode is
no longer usable (possibly having been recycled to a new file
system type).
vput(vp)
Unlock vnode vp and decrement its v_usecount. Depending of the
reference counts, move the vnode to the holdlist or the freelist.
This operation is functionally equivalent to calling
VOP_UNLOCK(9) followed by vrele().
vhold(vp)
Mark the vnode vp as active by incrementing vp->v_holdcnt and
moving the vnode from the freelist to the holdlist. Once on the
holdlist, the vnode will not be recycled until it is released
with holdrele().
VHOLD(vp)
This function is an alias for vhold().
holdrele(vp)
Mark the vnode vp as inactive by decrementing vp->v_holdcnt and
moving the vnode from the holdlist to the freelist.
HOLDRELE(vp)
This function is an alias for holdrele().
getnewvnode(tag, mp, vops, vpp)
Retrieve the next vnode from the freelist. getnewvnode() must
choose whether to allocate a new vnode or recycle an existing
one. The criterion for allocating a new one is that the total
number of vnodes is less than the number desired or there are no
vnodes on either free list. Generally only vnodes that have no
buffers associated with them are recycled and the next vnode
from the freelist is retrieved. If the freelist is empty,
vnodes on the holdlist are considered. The new vnode is
returned in the address specified by vpp.
The argument mp is the mount point for the file system requested
the new vnode. Before retrieving the new vnode, the file system
is checked if it is busy (such as currently unmounting). An
error is returned if the file system is unmounted.
The argument tag is the vnode tag assigned to *vpp->v_tag. The
argument vops is the vnode operations vector of the file system
requesting the new vnode. If a vnode is successfully retrieved
zero is returned, otherwise and appropriate error code is
returned.
ungetnewvnode(vp)
Undo the operation of getnewvnode(). The argument vp is the
vnode to return to the freelist. This function is needed for
VFS_VGET(9) which may need to push back a vnode in case of a
locking race condition.
vrecycle(vp, inter_lkp, p)
Recycle the unused vnode vp to the front of the freelist.
vrecycle() is a null operation if the reference count is greater
than zero.
vgone(vp)
Eliminate all activity associated with the vnode vp in preparation
for recycling.
vflush(mp, skipvp, flags)
Remove any vnodes in the vnode table belonging to mount point
mp. If skipvp is not NULL it is exempt from being flushed. The
argument flags is a set of flags modifying the operation of
vflush(). If MNT_NOFORCE is specified, there should not be any
active vnodes and an error is returned if any are found (this is
a user error, not a system error). If MNT_FORCE is specified,
active vnodes that are found are detached.
vaccess(type, file_mode, uid, gid, acc_mode, cred)
Do access checking. The arguments file_mode, uid, and gid are
from the vnode to check. The arguments acc_mode and cred are
passed directly to VOP_ACCESS(9).
checkalias(vp, nvp_rdev, mp)
Check to see if the new vnode vp represents a special device for
which another vnode represents the same device. If such an
aliases exists the existing contents and the aliased vnode are
deallocated. The caller is responsible for filling the new
vnode with its new contents.
bdevvp(dev, vpp)
Create a vnode for a block device. bdevvp() is used for root
file systems, swap areas and for memory file system special
devices.
cdevvp(dev, vpp)
Create a vnode for a character device. cdevvp() is used for the
console and kernfs special devices.
vfinddev(dev, vtype, vpp)
Lookup a vnode by device number. The vnode is returned in the
address specified by vpp.
vdevgone(int maj, int min, int minh, enum vtype type)
Reclaim all vnodes that correspond to the specified minor number
range minl to minh (endpoints inclusive) of the specified major
maj.
vwakeup(bp)
Update outstanding I/O count vp->v_numoutput for the vnode
bp->b_vp and do wakeup if requested and vp->vflag has VBWAIT
set.
vflushbuf(vp, sync)
Flush all dirty buffers to disk for the file with the locked
vnode vp. The argument sync specifies whether the I/O should be
synchronous and vflushbuf() will sleep until vp->v_numoutput is
zero and vp->v_dirtyblkhd is empty.
vinvalbuf(vp, flags, cred, p, slpflag, slptimeo)
Flush out and invalidate all buffers associated with locked
vnode vp. The argument p and cred specified the calling process
and its credentials. The arguments flags, slpflag and slptimeo
are ignored in the present implementation. If the operation is
successful zero is returned, otherwise and appropriate error
code is returned.
vtruncbuf(vp, lbn, slpflag, slptimeo)
Destroy any in-core buffers past the file truncation length for
the locked vnode vp. The truncation length is specified by lbn.
vtruncbuf() will sleep while the I/O is performed, The sleep(9)
flag and timeout are specified by the arguments slpflag and
slptimeo respectively. If the operation is successful zero is
returned, otherwise and appropriate error code is returned.
vprint(label, vp)
This function is used by the kernel to dump vnode information
during a panic. It is only used if kernel option DIAGNOSTIC is
compiled into the kernel. The argument label is a string to
prefix the information dump of vnode vp.
This section describes places within the NetBSD source tree where actual
code implementing or utilising the vnode framework can be found. All
pathnames are relative to /usr/src.
The vnode framework is implemented within the file sys/kern/vfs_subr.c.
intro(9), lock(9), namecache(9), namei(9), uvm(9), vattr(9), vfs(9),
vfsops(9), vnodeops(9), vnsubr(9)
The locking protocol is inconsistent. Many vnode operations are passed
locked vnodes on entry but release the lock before they exit. The locking
protocol is used in some places to attempt to make a series of operations
atomic (eg access check then operation). This does not work for
non-local file systems that do not support locking (e.g. NFS). The vnode
interface would benefit from a simpler locking protocol.
BSD September 22, 2001 BSD
[ Back ] |