SOSPLICE(9) FreeBSD Kernel Developer's Manual SOSPLICE(9)
NAME
sosplice, somove - splice two sockets for zero-copy data transfer
SYNOPSIS
int
sosplice(struct socket *so, int fd, off_t max, struct timeval *tv);
int
somove(struct socket *so, int wait);
DESCRIPTION
The function sosplice() is used to splice together a source and a drain
socket. The source socket is passed as the so argument; the file
descriptor of the drain is passed in fd. If fd is negative, an existing
splicing gets dissolved. If max is positive, at most that many bytes
will get transferred. If tv is not NULL, a timeout(9) is scheduled to
dissolve splicing in the case when no data can be transferred for the
specified period of time. Socket splicing can be invoked from userland
via the setsockopt(2) system-call at the SOL_SOCKET level with the socket
option SO_SPLICE.
Before connecting both sockets, several checks are executed. See the
ERRORS section for possible failures. The connection between both
sockets is implemented by setting these additional fields in the struct
sosplice *so_sp field in struct socket:
- struct socket *ssp_socket links from the source to the drain
socket.
- struct socket *ssp_soback links back from the drain to the
source socket.
- off_t ssp_len counts the number of bytes spliced so far from
this socket.
- off_t ssp_max specifies the maximum number of bytes to splice
from this socket if non-zero.
- struct timeval ssp_idletv specifies the maximum idle time if
non-zero.
- struct timeout ssp_idleto provides storage for the kernel
timeout if idle time is used.
After connecting both sockets, sosplice() calls somove() to transfer the
mbufs already in the source receive buffer to the drain send buffer.
Finally the socket buffer flag SB_SPLICE is set on both socket buffers,
to indicate that the protocol layer has to call somove() whenever data or
space is available.
The function somove() transfers data from the source's receive buffer to
the drain's send buffer. It must be called at splsoftnet(9) and so must
be a spliced source socket. It may be necessary to split an mbuf to
handle out-of-band data inline or when the maximum splice length has been
reached. If wait is M_WAIT, splitting mbufs will always succeed. For
M_DONTWAIT the out-of-band property might get lost or a short splice
might happen. In the latter case, less than the given maximum number of
bytes are transferred and userland has to cope with this. Note that a
short splice cannot happen if somove() was called by sosplice(). So a
second setsockopt(2) after a short splice pointing to the same maximum
will always succeed.
Before transferring data, somove() checks both sockets for errors and
that the drain socket is connected. If the drain cannot send anymore, an
EPIPE error is set on the source socket. The data length to move is
limited by the optional maximum splice length and the space in the
drain's send socket buffer. Up to this amount of data is taken out of
the source's receive socket buffer. To avoid splicing loops created by
userland, the number of times an mbuf may be moved between sockets is
limited to 128.
For atomic protocols, either one complete packet is taken out, or nothing
is taken at all if: the packet is bigger than the drain's send buffer
size, in which case the splicing gets aborted with an EMSGSIZE error; the
packet does not fit into the drain's current send buffer space, in which
case it is left in the source's receive buffer for later processing; or
the maximum splice length is located within a packet, in which case
splicing gets dissolved like a short splice. All address or control
mbufs associated with the taken packet are dropped.
If the maximum splice length has been reached, an mbuf may get split for
non-atomic protocols. Otherwise an mbuf is either moved completely to
the send buffer or left in the receive buffer for later processing. If
SO_OOBINLINE is set, out-of-band data will get moved as such although
this might not be reliable. The data is sent out to the drain socket via
the protocol function. If that fails and the drain socket cannot send
anymore, an EPIPE error is set on the source socket.
For packet oriented protocols somove() iterates over the next packet
queue.
If a maximum splice length was specified and at least this amount of data
has been received from the drain socket, splicing gets dissolved. In
this case, an EFBIG error is set on the source socket if the maximum
amount of data has been transferred. Userland can process this error to
distinguish the full splice from a short splice or to react to the
completed maximum splice immediately. If an idle timeout was specified
and no data has been transferred for that period of time, the handler
soidle() dissolves splicing and sets an ETIMEDOUT error on the source
socket.
The function sounsplice() is called to dissolve the socket splicing if
the source socket cannot receive anymore and its receive buffer is empty;
or if the drain socket cannot send anymore; or if the maximum has been
reached; or if an error occurred; or if the idle timeout has fired.
If the socket buffer flag SB_SPLICE is set, the functions sorwakeup() and
sowwakeup() will call somove() to trigger the transfer when new data or
buffer space is available. While socket splicing is active, any read(2)
from the source socket will block. Neither read nor write wakeups will
be delivered to the file descriptors. After dissolving, a read event or
a socket error is signaled to userland on the source socket. If space is
available, a write event will be signaled on the drain socket.
RETURN VALUES
sosplice() returns 0 on success and otherwise the error number. somove()
returns 0 if socket splicing has been finished and 1 if it continues.
ERRORS
sosplice() will succeed unless:
[EBADF] The given file descriptor fd is not an active
descriptor.
[EBUSY] The source or the drain socket is already spliced.
[EINVAL] The given maximum value max is negative.
[ENOTCONN] The source socket requires a connection and is neither
connected nor in the process of connecting to a peer.
[ENOTCONN] The drain socket is neither connected nor in the
process of connecting to a peer.
[EPROTO] The source socket is not spliced with the drain
socket.
[ENOTSOCK] The given file descriptor fd is not a socket.
[EOPNOTSUPP] The source or the drain socket is a listen socket.
[EPROTONOSUPPORT] The source socket's protocol layer does not have the
PR_SPLICE flag set. Only TCP and UDP socket splicing
is supported.
[EPROTONOSUPPORT] The drain socket's protocol does not have the same
pr_usrreq function as the source.
[EWOULDBLOCK] The source socket is non-blocking and the receive
buffer is already locked.
SEE ALSO
setsockopt(2), options(4), timeout(9)
HISTORY
Socket splicing for TCP first appeared in OpenBSD 4.9; support for UDP
was added in OpenBSD 5.3.
AUTHORS
The idea for socket splicing originally came from Markus Friedl
<
[email protected]>, and Alexander Bluhm <
[email protected]> implemented
it. Mike Belopuhov <
[email protected]> added the timeout feature.
FreeBSD 14.1-RELEASE-p8 January 9, 2025 FreeBSD 14.1-RELEASE-p8