Manual Page Result
0
Command: volrec | Section: 4 | Source: Digital UNIX | File: volrec.4.gz
volrec(4) Kernel Interfaces Manual volrec(4)
NAME
volrec - Structure defining a volume record
SYNOPSIS
#include <sys/types.h> #include <sys/vol.h>
#define NAME_LEN 14 #define COMMENT_LEN 40 #define UTIL_NUM
3 #define UTIL_LEN 14 #define NAME_SZ (NAME_LEN + 1) #de-
fine COMMENT_SZ (COMMENT_LEN + 1) #define UTIL_SZ (UTIL_LEN
+ 1)
struct volseqno { ulong_t seqno_lo, seqno_hi; }; typedef struct volse-
qno volseqno_t; typedef struct volseqno volrid_t;
struct volrec {
struct v_tmp v_tmp; /* non-persistent fields */
struct v_perm v_perm; /* persistent fields */ };
Fields for the v_perm structure:
char v_name[NAME_SZ]; /* record name */ char
v_use_type[NAME_SZ]; /* volume usage type name */ char
v_fstype[FSTYPE_SZ]; /* guess of volumes fstype */ char
v_comment[COMMENT_SZ]; /* comment field */ char
v_putil[UTIL_NUM][UTIL_SZ]; /* persistent util fields */ char
v_state[STATE_SZ]; /* utility state of volume */ char
v_pref_name[NAME_SZ]; /* plex name if V_PREPER */ char
v_start_opts[V_STOPTS_SZ]; /* volume start options */ enum
vol_r_pol v_read_pol; /* method of plex selection */ minor_t
v_minor; /* minor number in disk group */ uid_t
v_uid; /* owner of /dev/vol/name */ gid_t
v_gid; /* group of /dev/vol/name */ mode_t
v_mode; /* mode of /dev/vol/name */ ulong_t
v_pflag; /* persistent volume flags */ long
v_pl_num; /* associated plex count */ volseqno_t
v_update_tid; /* trans id of last update */ voff_t
v_len; /* byte length of volume */ voff_t
v_log_len; /* length of log area */ volrid_t
v_rid; /* unique identifier */ volrid_t
v_pref_plex_rid; /* preferred plex record ID */ volseqno_t
v_detach_tid; /* trans id of kernel detach*/
Fields for the v_tmp structure:
char v_tutil[UTIL_NUM][UTIL_SZ]; /* non-persistent util fields */
long v_rec_lock; /* 1 if record is locked */ long
v_data_lock; /* 1 if volume is data locked */ enum
vol_kstate v_kstate; /* relation to file space */ enum
vol_except v_r_all; /* if all plex reads fail */ enum
vol_except v_r_some; /* if some plex reads fail */ enum
vol_except v_w_all; /* if all plex writes fail */ enum
vol_except v_w_some; /* if some plex writes fail */ long
v_lasterr; /* last volume error or 0 */ ulong_t
v_tflag; /* non-persistent volume flags */ long
v_log_serial_lo; /* log serial number/low part */ long
v_log_serial_hi; /* log serial number/hi part */ dev_t
v_bdev; /* block dev for volume */ dev_t v_cdev;
/* char dev for volume */ size_t v_iosize; /* mini-
mum size for raw I/Os */ voff_t v_rwback_offset; /*
read/write-back offset */
DESCRIPTION
The volrec structure is used internally by LSM. This structure is used
to communicate volume record information between the volume configura-
tion daemon, vold, and programs using the Logical Storage Manager li-
brary to query for configurations and to make configuration changes.
The two structures contained in the volrec structure differentiate ele-
ments of the volume record that are persistent and that are non-persis-
tent. The division of fields between v_tmp and v_perm structures is
somewhat historical, however the v_perm structure contains information
that is stored persistently (for example, fields that are recovered un-
changed after a system reboot), or is directly derivable from persis-
tent volume record information. The v_tmp field, on the other hand,
contains fields that can be modified without the changes being stored
persistently.
The uses of the various volume fields are defined as follows: The vol-
ume name. This is a 64-bit record ID assigned to the volume record,
which is unique within the disk group for the duration of existence for
the disk group. The usage type associated with the volume. This is
used to select a utility set that maintains state and plex consistency
in a manner appropriate to the usage of the volume. The file system
type of any file system residing on the volume. A usage type may
choose to use or ignore this field. A null-terminated comment string
associated with the record. The contents are arbitrary except that
they cannot contain a new line. An array of three null-terminated
strings that can be used as scratch pads by utilities. These fields
are preserved across reboots. By convention, the first field is re-
served for usage types; the second field for higher-level applications,
such as the Visual Administrator; and the third field for local site
administrators. A null-terminated state field that is reserved specif-
ically for use by usage types. The name of the preferred plex for use
when the v_read_pol field is set to V_PREFER. This field is derived
from the v_pref_plex_rid field. This is an arbitrary string that is
reserved for usage-type utilities. The intention is that this field be
used to store options that apply to the volume, such as for the volume
start operation. This is normally a comma-separated list of flag
names, and option=value pairs. See the gen and fsgen versions of vol-
ume(8) for information on how this field is used by the gen and fsgen
utilities. The policy for selecting plexes to satisfy volume read op-
erations. This can have one of the following values: Candidate plexes
are selected in sequence for each sequential volume read operation.
This is known as a round-robin approach. The plex named by the
v_pref_name field is used if it can satisfy the read request. If the
preferred plex cannot satisfy the read request, then this policy be-
comes equivalent to the round-robin policy. A default policy is se-
lected based on the current configuration of the volume. If the volume
has two or more active plexes, and exactly one of those plexes is
striped, then the striped plex is preferred; otherwise, the round-robin
read policy is used. The minor number of the block and character vol-
ume devices associated with the volume record. The volume minor number
is assigned when the volume is created. This is a read-only field.
Conditions may force the actual volume device minor number to differ
from the v_minor field. This can happen in disk groups other than
rootdg, if a conflict occurs. This can also happen in the rootdg disk
group if the V_PFLAG_FORCEMINOR flag is used to force a particular
value for v_minor, even if the indicated number is unavailable. The
user ID, group ID, and permission modes for the volume's block and
character device nodes, and for the device nodes for the associated
plexes. Flags associated with the volume that are preserved across re-
boots. The set of persistent flags that can be set is: The write-back-
on-read-failure flag. If set, then an attempt is made to fix a read
error from a participating plex (i.e., one without the noerror flag).
The method used to fix the read error is to read from another plex as-
sociated with the volume and write back to the plex with the read er-
ror. The read operation is then retried to verify that the operation
is fixed. This requires at least two associated, enabled, participat-
ing, read-mode plexes.
This is an effective way of handling device drivers that can
revector blocks on write failures, and can be used to handle the
majority of media failures on many disk drives. For this opera-
tion to be effective, the underlying device driver must not
revector blocks on read errors. If set (volmake and volassist
set this by default), then some writes to mirrored volumes that
use block change logging will be copied into an allocated kernel
buffer before being written to disk. The reason for doing such
a copy is that write requests given to the volume device driver
can point to pages of memory that are still undergoing change.
Without doing a copy, the blocks written to each plex might be
different. If you are sure that your application does not mod-
ify pages while they are written, or if you are certain that
mirrors with differing contents do not represent a problem, then
you can turn off this flag. This flag is set on a reboot if the
volume was open at the time of a system crash, and the volume
had been written at least once. This implies that the volume,
if it is mirrored, requires recovery to ensure consistency be-
tween plexes. If this is set, then force the setting of v_minor
specified on creation of the volume record. If this flag is not
set, v_minor might be remapped to an unused value. This flag is
required to set minor numbers less than 5. This does not guar-
antee that the actual volume device node will have the indicated
minor number, however, if the volume is in rootdg, then the vol-
ume will be given that minor number (if no other volume in the
disk group has that minor number) after a reboot. This is a
bit-mask that specifies bits in the v_pflag field that indicate
the logging type for the volume. The bits masked out by this
macro can have one of the following values: The logging type is
undefined. Volumes that were created in Release 1.0 of the Log-
ical Storage Manager have this type. This value is effectively
identical to V_PFLAG_NONE except that utilities are able to use
the V_PFLAG_LOGUNDEF flage as a license to default the logging
type to something else. No logging is performed for the volume.
Even if a logging subdisk is defined for a plex, the logging
subdisk is not used. A block change log is written periodically
to each log subdisk associated with an associated, enabled,
write-only plex. This log lists all blocks which have been re-
ceived but which have not yet been written to disk. These logs
are exactly one sector in length. All writes to the volume are
first written into the log, and are not removed from the log un-
til the write to disk has been confirmed, or has failed. If the
log fills up, then some writes are delayed until entries in the
log are freed. The number of plexes associated with the volume.
The transaction ID of the last update to this record. This
field is assigned when changes to a disk group are committed.
The length of the volume. This can be set arbitrarily, even if
it is longer or shorter than some or all of the associated
plexes. This value is in sectors. The length for a volume log.
For the block-change-logging log type, this value must always be
1. However, future logging types may support larger log
lengths. The length for all subdisk logs associated with the
volume must be at least this long. This value is in sectors.
Specify the record ID of the preferred plex for the volume.
This field is used only if v_read_pol is set to V_PREFER. An
array of three null-terminated strings that can be used as
scratch pads by utilities. These fields are cleared on reboot.
By convention, the first field is reserved for usage types, the
second field for higher-level applications, such as the Visual
Administrator; and the third field for local site administra-
tors. A boolean value that is 1 if the volume is date-locked in
the caller's current transaction, and 0 otherwise. This is a
read-only field. A boolean value that is 1 if the volume is
data-locked in the caller's current transaction, and 0 other-
wise. This is a read-only field. The accessibility of the vol-
ume. This field can have one of the following values: The vol-
ume block device can be used, and reads and writes to the block
or character volume device are accepted. The volume block de-
vice cannot be used, and reads or writes to the character device
are rejected. Volume ioctls are still usable, and the plex de-
vices for associated plexes can be used, within the bounds of
the plex pl_kstate fields. The volume cannot be used for any
operations, and neither can the plex devices for any of the as-
sociated plexes.
This field is set to V_DISABLED after a reboot. Exception poli-
cies for the volume. These devices are classified by the fol-
lowing types: Read failure on all plexes Read failure on some
plexes Write failure on all plexes Write failure on some plexes
If one of these exception conditions is encountered, then the
corresponding action is taken. The possible actions are: Takes
no action. However, if the operation fails for all candidate
plexes, then the operation still fails. Fails the operation,
but takes no further action. Detaches the plex with the fail-
ure. The operation fails only if the operation fails for all
candidate plexes. Detaches the plex with the failure and re-
turns a failure for the operation, even if the operation can be
satisfied by another plex. Detaches the volume but does not
fail the operation. Detaches the volume and fails the opera-
tion. A higher-level error policy which detaches failing
plexes. However, if detaching a complete plex would result in
no complete plexes remaining, then V_GEN_DET detaches the volume
rather than detaching the failing plexes. A complete plex is
one that has the PL_TFLAG_COMPLETE flag set in the plex pl_tflag
field. A higher-level error policy which detaches failing
plexes. However, if detaching a plex results in no complete
plexes remaining, then V_GEN_DET_SPARSE leaves exactly one com-
plete plex enabled, and detaches all incomplete plexes that have
volume blocks mapped to subdisks in the region of the failure.
This policy allows the volume to continue operating on a failing
plex, and does not disable mirrored regions that are unaffected
by the failing operation.
In the case of a logging volume, the volume is detached if a
write failure occurs to all enabled log subdisks associated with
the volume. Detaches the failing plexes, and the volume, and
returns a failure for the operation. This policy can be used by
applications that wish to make decisions about changing the Log-
ical Storage Manager configuration based on failures. The de-
tached state of a plex can be used as an indication of which
plexes failed, and making the volume detached prevents future
I/Os from succeeding until the problem is resolved. This oper-
ates exactly like the V_GEN_DET error policy, except that it de-
taches the volume if the number of complete plexes would drop
below two. This ensures that a volume is either mirrored to at
least two plexes, or is non-operational until the situation is
repaired.
Not all plexes are taken into account in the exception policy
selection or actions. A plex is ignored under any of the fol-
lowing conditions: The plex is not enabled. The plex does not
have a read or write mode appropriate for the operation. The
plex has the PL_PFLAG_NOERROR flag set. The plex does not have
mapped subdisk blocks that are appropriate for the range of the
requested operation.
The exception policies are normally set implicitly by the opera-
tional utilities. The utilities provided by set all the excep-
tion policies to V_GEN_DET_SPARSE and do not provide a means for
changing the policies to something else. A sequence number for
the last I/O error to be encountered on the volume. This is a
read-only field. A bitmask of flags that is cleared after a re-
boot. Flags defined in this field are: A flag that can be
turned on to request read/writeback mode. In read/writeback
mode, a read request for a mirrored volume will write back to
all other plexes the resulting data from the read. The opera-
tion is affected by the v_rwback_offset field. This mode is in-
tended for volume recovery operations. This is a status flag
which indicates that the read/writeback mode operation is still
in effect. This flag is set when V_TFLAG_RWBACK is set. If the
read/writeback offset (see v_rwback_offset) reaches the end of
the volume, then the kernel will turn off this flag. A status
flag that indicates that the volume device that corresponds to
the volume record is open or mounted as a file system. A status
flag which indicates the volume has a logging type of
VOL_PFLAG_LOGBLKNO, is enabled, and has at least one enabled,
associated plex with an enabled log subdisk. This flag is not
cleared when exception policies are invoked that detach a volume
or its plexes. An error has rendered the volume unusable. The
volume cannot be started. These values, taken together, yield a
unique monotonically increasing value that is changed for every
log write that occurs to a volume with logging enabled. These
two numbers are cleared by a reboot, but are normally set ex-
plicitly by a volume start operation. The value in v_log_ser-
ial_lo is incremented by one for every log write.
Unlike all other flags, the values of the log serial number
fields cannot always be trusted within a transaction. The rea-
son for this is that data-locks are not obtained by vold until
after a utility has completely described a transaction for vold
to transmit to the kernel. Other fields that can be changed by
the kernel are checked at the time of a vol_commit to ensure
that the fields haven't changed, and if any kernel-modifiable
fields have changed since the corresponding vol_trans call, then
the utility is asked to retry the transaction.
However, a volume with significant I/O activity is likely to
change the value of the serial number fields often enough that
such volumes may have to be retried an unacceptable number of
times, so these fields are not checked.
Utilities must be prepared to ensure that volume logs are in a
quiescent state (normally by setting the volume to V_DETACHED or
by disabling logging) before using the value of a log within a
transaction. The existing utility set uses the log serial num-
ber fields only to set the serial number for a volume. The de-
vice numbers for the volume block device node. Normally, these
are computed from the v_minor number. However, in cases of col-
lision, they may have different minor numbers. The largest sec-
tor size of any disk associated (through a subdisk) with the
volume. At the present time, only one sector size (normally 512
bytes) is supported, so this field will always match the single
system sector size. When read/writeback mode is turned on, this
field is loaded into the kernel as the current read/writeback
offset pointer. Reads then occur before this offset into the
volume will not invoke read/writeback recovery. If a read oc-
curs on the boundary, then then the kernel will increase the
pointer to the end of that read, after a successful result from
the operation. This automatically-increasing pointer causes the
degradation from the read/writeback mode to decrease as volume
recovery progresses.
RELATED INFORMATION
volintro(8), vold(8), voliod(8), volmake(4), plexrec(4), sdrec(4). de-
lim off
volrec(4)