mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-01-12 01:20:14 +00:00
The namespace tree is, among other things, currently used to support
file handles for namespaces. When a namespace is created it is placed on
the namespace trees and when it is destroyed it is removed from the
namespace trees.
While a namespace is on the namespace trees with a valid reference count
it is possible to reopen it through a namespace file handle. This is all
fine but has some issues that should be addressed.
On current kernels a namespace is visible to userspace in the
following cases:
(1) The namespace is in use by a task.
(2) The namespace is persisted through a VFS object (namespace file
descriptor or bind-mount).
Note that (2) only cares about direct persistence of the namespace
itself not indirectly via e.g., file->f_cred file references or
similar.
(3) The namespace is a hierarchical namespace type and is the parent of
a single or multiple child namespaces.
Case (3) is interesting because it is possible that a parent namespace
might not fulfill any of (1) or (2), i.e., is invisible to userspace but
it may still be resurrected through the NS_GET_PARENT ioctl().
Currently namespace file handles allow much broader access to namespaces
than what is currently possible via (1)-(3). The reason is that
namespaces may remain pinned for completely internal reasons yet are
inaccessible to userspace.
For example, a user namespace my remain pinned by get_cred() calls to
stash the opener's credentials into file->f_cred. As it stands file
handles allow to resurrect such a users namespace even though this
should not be possible via (1)-(3). This is a fundamental uapi change
that we shouldn't do if we don't have to.
Consider the following insane case: Various architectures support the
CONFIG_MMU_LAZY_TLB_REFCOUNT option which uses lazy TLB destruction.
When this option is set a userspace task's struct mm_struct may be used
for kernel threads such as the idle task and will only be destroyed once
the cpu's runqueue switches back to another task. But because of ptrace()
permission checks struct mm_struct stashes the user namespace of the
task that struct mm_struct originally belonged to. The kernel thread
will take a reference on the struct mm_struct and thus pin it.
So on an idle system user namespaces can be persisted for arbitrary
amounts of time which also means that they can be resurrected using
namespace file handles. That makes no sense whatsoever. The problem is
of course excarabted on large systems with a huge number of cpus.
To handle this nicely we introduce an active reference count which
tracks (1)-(3). This is easy to do as all of these things are already
managed centrally. Only (1)-(3) will count towards the active reference
count and only namespaces which are active may be opened via namespace
file handles.
The problem is that namespaces may be resurrected. Which means that they
can become temporarily inactive and will be reactived some time later.
Currently the only example of this is the SIOGCSKNS socket ioctl. The
SIOCGSKNS ioctl allows to open a network namespace file descriptor based
on a socket file descriptor.
If a socket is tied to a network namespace that subsequently becomes
inactive but that socket is persisted by another process in another
network namespace (e.g., via SCM_RIGHTS of pidfd_getfd()) then the
SIOCGSKNS ioctl will resurrect this network namespace.
So calls to open_related_ns() and open_namespace() will end up
resurrecting the corresponding namespace tree.
Note that the active reference count does not regulate the lifetime of
the namespace itself. This is still done by the normal reference count.
The active reference count can only be elevated if the regular reference
count is elevated.
The active reference count also doesn't regulate the presence of a
namespace on the namespace trees. It only regulates its visiblity to
namespace file handles (and in later patches to listns()).
A namespace remains on the namespace trees from creation until its
actual destruction. This will allow the kernel to always reach any
namespace trivially and it will also enable subsystems like bpf to walk
the namespace lists on the system for tracing or general introspection
purposes.
Note that different namespaces have different visibility lifetimes on
current kernels. While most namespace are immediately released when the
last task using them exits, the user- and pid namespace are persisted
and thus both remain accessible via /proc/<pid>/ns/<ns_type>.
The user namespace lifetime is aliged with struct cred and is only
released through exit_creds(). However, it becomes inaccessible to
userspace once the last task using it is reaped, i.e., when
release_task() is called and all proc entries are flushed. Similarly,
the pid namespace is also visible until the last task using it has been
reaped and the associated pid numbers are freed.
The active reference counts of the user- and pid namespace are
decremented once the task is reaped.
Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-11-2e6f823ebdc0@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
691 lines
17 KiB
C
691 lines
17 KiB
C
// SPDX-License-Identifier: GPL-2.0-or-later
|
|
/* Task credentials management - see Documentation/security/credentials.rst
|
|
*
|
|
* Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
|
|
* Written by David Howells (dhowells@redhat.com)
|
|
*/
|
|
|
|
#define pr_fmt(fmt) "CRED: " fmt
|
|
|
|
#include <linux/export.h>
|
|
#include <linux/cred.h>
|
|
#include <linux/slab.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/sched/coredump.h>
|
|
#include <linux/key.h>
|
|
#include <linux/keyctl.h>
|
|
#include <linux/init_task.h>
|
|
#include <linux/security.h>
|
|
#include <linux/binfmts.h>
|
|
#include <linux/cn_proc.h>
|
|
#include <linux/uidgid.h>
|
|
|
|
#if 0
|
|
#define kdebug(FMT, ...) \
|
|
printk("[%-5.5s%5u] " FMT "\n", \
|
|
current->comm, current->pid, ##__VA_ARGS__)
|
|
#else
|
|
#define kdebug(FMT, ...) \
|
|
do { \
|
|
if (0) \
|
|
no_printk("[%-5.5s%5u] " FMT "\n", \
|
|
current->comm, current->pid, ##__VA_ARGS__); \
|
|
} while (0)
|
|
#endif
|
|
|
|
static struct kmem_cache *cred_jar;
|
|
|
|
/* init to 2 - one for init_task, one to ensure it is never freed */
|
|
static struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
|
|
|
|
/*
|
|
* The initial credentials for the initial task
|
|
*/
|
|
struct cred init_cred = {
|
|
.usage = ATOMIC_INIT(4),
|
|
.uid = GLOBAL_ROOT_UID,
|
|
.gid = GLOBAL_ROOT_GID,
|
|
.suid = GLOBAL_ROOT_UID,
|
|
.sgid = GLOBAL_ROOT_GID,
|
|
.euid = GLOBAL_ROOT_UID,
|
|
.egid = GLOBAL_ROOT_GID,
|
|
.fsuid = GLOBAL_ROOT_UID,
|
|
.fsgid = GLOBAL_ROOT_GID,
|
|
.securebits = SECUREBITS_DEFAULT,
|
|
.cap_inheritable = CAP_EMPTY_SET,
|
|
.cap_permitted = CAP_FULL_SET,
|
|
.cap_effective = CAP_FULL_SET,
|
|
.cap_bset = CAP_FULL_SET,
|
|
.user = INIT_USER,
|
|
.user_ns = &init_user_ns,
|
|
.group_info = &init_groups,
|
|
.ucounts = &init_ucounts,
|
|
};
|
|
|
|
/*
|
|
* The RCU callback to actually dispose of a set of credentials
|
|
*/
|
|
static void put_cred_rcu(struct rcu_head *rcu)
|
|
{
|
|
struct cred *cred = container_of(rcu, struct cred, rcu);
|
|
|
|
kdebug("put_cred_rcu(%p)", cred);
|
|
|
|
if (atomic_long_read(&cred->usage) != 0)
|
|
panic("CRED: put_cred_rcu() sees %p with usage %ld\n",
|
|
cred, atomic_long_read(&cred->usage));
|
|
|
|
security_cred_free(cred);
|
|
key_put(cred->session_keyring);
|
|
key_put(cred->process_keyring);
|
|
key_put(cred->thread_keyring);
|
|
key_put(cred->request_key_auth);
|
|
if (cred->group_info)
|
|
put_group_info(cred->group_info);
|
|
free_uid(cred->user);
|
|
if (cred->ucounts)
|
|
put_ucounts(cred->ucounts);
|
|
put_user_ns(cred->user_ns);
|
|
kmem_cache_free(cred_jar, cred);
|
|
}
|
|
|
|
/**
|
|
* __put_cred - Destroy a set of credentials
|
|
* @cred: The record to release
|
|
*
|
|
* Destroy a set of credentials on which no references remain.
|
|
*/
|
|
void __put_cred(struct cred *cred)
|
|
{
|
|
kdebug("__put_cred(%p{%ld})", cred,
|
|
atomic_long_read(&cred->usage));
|
|
|
|
BUG_ON(atomic_long_read(&cred->usage) != 0);
|
|
BUG_ON(cred == current->cred);
|
|
BUG_ON(cred == current->real_cred);
|
|
|
|
if (cred->non_rcu)
|
|
put_cred_rcu(&cred->rcu);
|
|
else
|
|
call_rcu(&cred->rcu, put_cred_rcu);
|
|
}
|
|
EXPORT_SYMBOL(__put_cred);
|
|
|
|
/*
|
|
* Clean up a task's credentials when it exits
|
|
*/
|
|
void exit_creds(struct task_struct *tsk)
|
|
{
|
|
struct cred *real_cred, *cred;
|
|
|
|
kdebug("exit_creds(%u,%p,%p,{%ld})", tsk->pid, tsk->real_cred, tsk->cred,
|
|
atomic_long_read(&tsk->cred->usage));
|
|
|
|
real_cred = (struct cred *) tsk->real_cred;
|
|
tsk->real_cred = NULL;
|
|
|
|
cred = (struct cred *) tsk->cred;
|
|
tsk->cred = NULL;
|
|
|
|
if (real_cred == cred) {
|
|
put_cred_many(cred, 2);
|
|
} else {
|
|
put_cred(real_cred);
|
|
put_cred(cred);
|
|
}
|
|
|
|
#ifdef CONFIG_KEYS_REQUEST_CACHE
|
|
key_put(tsk->cached_requested_key);
|
|
tsk->cached_requested_key = NULL;
|
|
#endif
|
|
}
|
|
|
|
/**
|
|
* get_task_cred - Get another task's objective credentials
|
|
* @task: The task to query
|
|
*
|
|
* Get the objective credentials of a task, pinning them so that they can't go
|
|
* away. Accessing a task's credentials directly is not permitted.
|
|
*
|
|
* The caller must also make sure task doesn't get deleted, either by holding a
|
|
* ref on task or by holding tasklist_lock to prevent it from being unlinked.
|
|
*/
|
|
const struct cred *get_task_cred(struct task_struct *task)
|
|
{
|
|
const struct cred *cred;
|
|
|
|
rcu_read_lock();
|
|
|
|
do {
|
|
cred = __task_cred((task));
|
|
BUG_ON(!cred);
|
|
} while (!get_cred_rcu(cred));
|
|
|
|
rcu_read_unlock();
|
|
return cred;
|
|
}
|
|
EXPORT_SYMBOL(get_task_cred);
|
|
|
|
/*
|
|
* Allocate blank credentials, such that the credentials can be filled in at a
|
|
* later date without risk of ENOMEM.
|
|
*/
|
|
struct cred *cred_alloc_blank(void)
|
|
{
|
|
struct cred *new;
|
|
|
|
new = kmem_cache_zalloc(cred_jar, GFP_KERNEL);
|
|
if (!new)
|
|
return NULL;
|
|
|
|
atomic_long_set(&new->usage, 1);
|
|
if (security_cred_alloc_blank(new, GFP_KERNEL_ACCOUNT) < 0)
|
|
goto error;
|
|
|
|
return new;
|
|
|
|
error:
|
|
abort_creds(new);
|
|
return NULL;
|
|
}
|
|
|
|
/**
|
|
* prepare_creds - Prepare a new set of credentials for modification
|
|
*
|
|
* Prepare a new set of task credentials for modification. A task's creds
|
|
* shouldn't generally be modified directly, therefore this function is used to
|
|
* prepare a new copy, which the caller then modifies and then commits by
|
|
* calling commit_creds().
|
|
*
|
|
* Preparation involves making a copy of the objective creds for modification.
|
|
*
|
|
* Returns a pointer to the new creds-to-be if successful, NULL otherwise.
|
|
*
|
|
* Call commit_creds() or abort_creds() to clean up.
|
|
*/
|
|
struct cred *prepare_creds(void)
|
|
{
|
|
struct task_struct *task = current;
|
|
const struct cred *old;
|
|
struct cred *new;
|
|
|
|
new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
|
|
if (!new)
|
|
return NULL;
|
|
|
|
kdebug("prepare_creds() alloc %p", new);
|
|
|
|
old = task->cred;
|
|
memcpy(new, old, sizeof(struct cred));
|
|
|
|
new->non_rcu = 0;
|
|
atomic_long_set(&new->usage, 1);
|
|
get_group_info(new->group_info);
|
|
get_uid(new->user);
|
|
get_user_ns(new->user_ns);
|
|
|
|
#ifdef CONFIG_KEYS
|
|
key_get(new->session_keyring);
|
|
key_get(new->process_keyring);
|
|
key_get(new->thread_keyring);
|
|
key_get(new->request_key_auth);
|
|
#endif
|
|
|
|
#ifdef CONFIG_SECURITY
|
|
new->security = NULL;
|
|
#endif
|
|
|
|
new->ucounts = get_ucounts(new->ucounts);
|
|
if (!new->ucounts)
|
|
goto error;
|
|
|
|
if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
|
|
goto error;
|
|
|
|
return new;
|
|
|
|
error:
|
|
abort_creds(new);
|
|
return NULL;
|
|
}
|
|
EXPORT_SYMBOL(prepare_creds);
|
|
|
|
/*
|
|
* Prepare credentials for current to perform an execve()
|
|
* - The caller must hold ->cred_guard_mutex
|
|
*/
|
|
struct cred *prepare_exec_creds(void)
|
|
{
|
|
struct cred *new;
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return new;
|
|
|
|
#ifdef CONFIG_KEYS
|
|
/* newly exec'd tasks don't get a thread keyring */
|
|
key_put(new->thread_keyring);
|
|
new->thread_keyring = NULL;
|
|
|
|
/* inherit the session keyring; new process keyring */
|
|
key_put(new->process_keyring);
|
|
new->process_keyring = NULL;
|
|
#endif
|
|
|
|
new->suid = new->fsuid = new->euid;
|
|
new->sgid = new->fsgid = new->egid;
|
|
|
|
return new;
|
|
}
|
|
|
|
/*
|
|
* Copy credentials for the new process created by fork()
|
|
*
|
|
* We share if we can, but under some circumstances we have to generate a new
|
|
* set.
|
|
*
|
|
* The new process gets the current process's subjective credentials as its
|
|
* objective and subjective credentials
|
|
*/
|
|
int copy_creds(struct task_struct *p, u64 clone_flags)
|
|
{
|
|
struct cred *new;
|
|
int ret;
|
|
|
|
#ifdef CONFIG_KEYS_REQUEST_CACHE
|
|
p->cached_requested_key = NULL;
|
|
#endif
|
|
|
|
if (
|
|
#ifdef CONFIG_KEYS
|
|
!p->cred->thread_keyring &&
|
|
#endif
|
|
clone_flags & CLONE_THREAD
|
|
) {
|
|
p->real_cred = get_cred_many(p->cred, 2);
|
|
kdebug("share_creds(%p{%ld})",
|
|
p->cred, atomic_long_read(&p->cred->usage));
|
|
inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
|
|
get_cred_namespaces(p);
|
|
return 0;
|
|
}
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return -ENOMEM;
|
|
|
|
if (clone_flags & CLONE_NEWUSER) {
|
|
ret = create_user_ns(new);
|
|
if (ret < 0)
|
|
goto error_put;
|
|
ret = set_cred_ucounts(new);
|
|
if (ret < 0)
|
|
goto error_put;
|
|
}
|
|
|
|
#ifdef CONFIG_KEYS
|
|
/* new threads get their own thread keyrings if their parent already
|
|
* had one */
|
|
if (new->thread_keyring) {
|
|
key_put(new->thread_keyring);
|
|
new->thread_keyring = NULL;
|
|
if (clone_flags & CLONE_THREAD)
|
|
install_thread_keyring_to_cred(new);
|
|
}
|
|
|
|
/* The process keyring is only shared between the threads in a process;
|
|
* anything outside of those threads doesn't inherit.
|
|
*/
|
|
if (!(clone_flags & CLONE_THREAD)) {
|
|
key_put(new->process_keyring);
|
|
new->process_keyring = NULL;
|
|
}
|
|
#endif
|
|
|
|
p->cred = p->real_cred = get_cred(new);
|
|
inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
|
|
get_cred_namespaces(p);
|
|
|
|
return 0;
|
|
|
|
error_put:
|
|
put_cred(new);
|
|
return ret;
|
|
}
|
|
|
|
static bool cred_cap_issubset(const struct cred *set, const struct cred *subset)
|
|
{
|
|
const struct user_namespace *set_ns = set->user_ns;
|
|
const struct user_namespace *subset_ns = subset->user_ns;
|
|
|
|
/* If the two credentials are in the same user namespace see if
|
|
* the capabilities of subset are a subset of set.
|
|
*/
|
|
if (set_ns == subset_ns)
|
|
return cap_issubset(subset->cap_permitted, set->cap_permitted);
|
|
|
|
/* The credentials are in a different user namespaces
|
|
* therefore one is a subset of the other only if a set is an
|
|
* ancestor of subset and set->euid is owner of subset or one
|
|
* of subsets ancestors.
|
|
*/
|
|
for (;subset_ns != &init_user_ns; subset_ns = subset_ns->parent) {
|
|
if ((set_ns == subset_ns->parent) &&
|
|
uid_eq(subset_ns->owner, set->euid))
|
|
return true;
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
/**
|
|
* commit_creds - Install new credentials upon the current task
|
|
* @new: The credentials to be assigned
|
|
*
|
|
* Install a new set of credentials to the current task, using RCU to replace
|
|
* the old set. Both the objective and the subjective credentials pointers are
|
|
* updated. This function may not be called if the subjective credentials are
|
|
* in an overridden state.
|
|
*
|
|
* This function eats the caller's reference to the new credentials.
|
|
*
|
|
* Always returns 0 thus allowing this function to be tail-called at the end
|
|
* of, say, sys_setgid().
|
|
*/
|
|
int commit_creds(struct cred *new)
|
|
{
|
|
struct task_struct *task = current;
|
|
const struct cred *old = task->real_cred;
|
|
|
|
kdebug("commit_creds(%p{%ld})", new,
|
|
atomic_long_read(&new->usage));
|
|
|
|
BUG_ON(task->cred != old);
|
|
BUG_ON(atomic_long_read(&new->usage) < 1);
|
|
|
|
get_cred(new); /* we will require a ref for the subj creds too */
|
|
|
|
/* dumpability changes */
|
|
if (!uid_eq(old->euid, new->euid) ||
|
|
!gid_eq(old->egid, new->egid) ||
|
|
!uid_eq(old->fsuid, new->fsuid) ||
|
|
!gid_eq(old->fsgid, new->fsgid) ||
|
|
!cred_cap_issubset(old, new)) {
|
|
if (task->mm)
|
|
set_dumpable(task->mm, suid_dumpable);
|
|
task->pdeath_signal = 0;
|
|
/*
|
|
* If a task drops privileges and becomes nondumpable,
|
|
* the dumpability change must become visible before
|
|
* the credential change; otherwise, a __ptrace_may_access()
|
|
* racing with this change may be able to attach to a task it
|
|
* shouldn't be able to attach to (as if the task had dropped
|
|
* privileges without becoming nondumpable).
|
|
* Pairs with a read barrier in __ptrace_may_access().
|
|
*/
|
|
smp_wmb();
|
|
}
|
|
|
|
/* alter the thread keyring */
|
|
if (!uid_eq(new->fsuid, old->fsuid))
|
|
key_fsuid_changed(new);
|
|
if (!gid_eq(new->fsgid, old->fsgid))
|
|
key_fsgid_changed(new);
|
|
|
|
/* do it
|
|
* RLIMIT_NPROC limits on user->processes have already been checked
|
|
* in set_user().
|
|
*/
|
|
if (new->user != old->user || new->user_ns != old->user_ns)
|
|
inc_rlimit_ucounts(new->ucounts, UCOUNT_RLIMIT_NPROC, 1);
|
|
|
|
rcu_assign_pointer(task->real_cred, new);
|
|
rcu_assign_pointer(task->cred, new);
|
|
if (new->user != old->user || new->user_ns != old->user_ns)
|
|
dec_rlimit_ucounts(old->ucounts, UCOUNT_RLIMIT_NPROC, 1);
|
|
if (new->user_ns != old->user_ns)
|
|
switch_cred_namespaces(old, new);
|
|
|
|
/* send notifications */
|
|
if (!uid_eq(new->uid, old->uid) ||
|
|
!uid_eq(new->euid, old->euid) ||
|
|
!uid_eq(new->suid, old->suid) ||
|
|
!uid_eq(new->fsuid, old->fsuid))
|
|
proc_id_connector(task, PROC_EVENT_UID);
|
|
|
|
if (!gid_eq(new->gid, old->gid) ||
|
|
!gid_eq(new->egid, old->egid) ||
|
|
!gid_eq(new->sgid, old->sgid) ||
|
|
!gid_eq(new->fsgid, old->fsgid))
|
|
proc_id_connector(task, PROC_EVENT_GID);
|
|
|
|
/* release the old obj and subj refs both */
|
|
put_cred_many(old, 2);
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(commit_creds);
|
|
|
|
/**
|
|
* abort_creds - Discard a set of credentials and unlock the current task
|
|
* @new: The credentials that were going to be applied
|
|
*
|
|
* Discard a set of credentials that were under construction and unlock the
|
|
* current task.
|
|
*/
|
|
void abort_creds(struct cred *new)
|
|
{
|
|
kdebug("abort_creds(%p{%ld})", new,
|
|
atomic_long_read(&new->usage));
|
|
|
|
BUG_ON(atomic_long_read(&new->usage) < 1);
|
|
put_cred(new);
|
|
}
|
|
EXPORT_SYMBOL(abort_creds);
|
|
|
|
/**
|
|
* cred_fscmp - Compare two credentials with respect to filesystem access.
|
|
* @a: The first credential
|
|
* @b: The second credential
|
|
*
|
|
* cred_cmp() will return zero if both credentials have the same
|
|
* fsuid, fsgid, and supplementary groups. That is, if they will both
|
|
* provide the same access to files based on mode/uid/gid.
|
|
* If the credentials are different, then either -1 or 1 will
|
|
* be returned depending on whether @a comes before or after @b
|
|
* respectively in an arbitrary, but stable, ordering of credentials.
|
|
*
|
|
* Return: -1, 0, or 1 depending on comparison
|
|
*/
|
|
int cred_fscmp(const struct cred *a, const struct cred *b)
|
|
{
|
|
struct group_info *ga, *gb;
|
|
int g;
|
|
|
|
if (a == b)
|
|
return 0;
|
|
if (uid_lt(a->fsuid, b->fsuid))
|
|
return -1;
|
|
if (uid_gt(a->fsuid, b->fsuid))
|
|
return 1;
|
|
|
|
if (gid_lt(a->fsgid, b->fsgid))
|
|
return -1;
|
|
if (gid_gt(a->fsgid, b->fsgid))
|
|
return 1;
|
|
|
|
ga = a->group_info;
|
|
gb = b->group_info;
|
|
if (ga == gb)
|
|
return 0;
|
|
if (ga == NULL)
|
|
return -1;
|
|
if (gb == NULL)
|
|
return 1;
|
|
if (ga->ngroups < gb->ngroups)
|
|
return -1;
|
|
if (ga->ngroups > gb->ngroups)
|
|
return 1;
|
|
|
|
for (g = 0; g < ga->ngroups; g++) {
|
|
if (gid_lt(ga->gid[g], gb->gid[g]))
|
|
return -1;
|
|
if (gid_gt(ga->gid[g], gb->gid[g]))
|
|
return 1;
|
|
}
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(cred_fscmp);
|
|
|
|
int set_cred_ucounts(struct cred *new)
|
|
{
|
|
struct ucounts *new_ucounts, *old_ucounts = new->ucounts;
|
|
|
|
/*
|
|
* This optimization is needed because alloc_ucounts() uses locks
|
|
* for table lookups.
|
|
*/
|
|
if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->uid))
|
|
return 0;
|
|
|
|
if (!(new_ucounts = alloc_ucounts(new->user_ns, new->uid)))
|
|
return -EAGAIN;
|
|
|
|
new->ucounts = new_ucounts;
|
|
put_ucounts(old_ucounts);
|
|
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* initialise the credentials stuff
|
|
*/
|
|
void __init cred_init(void)
|
|
{
|
|
/* allocate a slab in which we can store credentials */
|
|
cred_jar = KMEM_CACHE(cred,
|
|
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
|
|
}
|
|
|
|
/**
|
|
* prepare_kernel_cred - Prepare a set of credentials for a kernel service
|
|
* @daemon: A userspace daemon to be used as a reference
|
|
*
|
|
* Prepare a set of credentials for a kernel service. This can then be used to
|
|
* override a task's own credentials so that work can be done on behalf of that
|
|
* task that requires a different subjective context.
|
|
*
|
|
* @daemon is used to provide a base cred, with the security data derived from
|
|
* that; if this is "&init_task", they'll be set to 0, no groups, full
|
|
* capabilities, and no keys.
|
|
*
|
|
* The caller may change these controls afterwards if desired.
|
|
*
|
|
* Returns the new credentials or NULL if out of memory.
|
|
*/
|
|
struct cred *prepare_kernel_cred(struct task_struct *daemon)
|
|
{
|
|
const struct cred *old;
|
|
struct cred *new;
|
|
|
|
if (WARN_ON_ONCE(!daemon))
|
|
return NULL;
|
|
|
|
new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
|
|
if (!new)
|
|
return NULL;
|
|
|
|
kdebug("prepare_kernel_cred() alloc %p", new);
|
|
|
|
old = get_task_cred(daemon);
|
|
|
|
*new = *old;
|
|
new->non_rcu = 0;
|
|
atomic_long_set(&new->usage, 1);
|
|
get_uid(new->user);
|
|
get_user_ns(new->user_ns);
|
|
get_group_info(new->group_info);
|
|
|
|
#ifdef CONFIG_KEYS
|
|
new->session_keyring = NULL;
|
|
new->process_keyring = NULL;
|
|
new->thread_keyring = NULL;
|
|
new->request_key_auth = NULL;
|
|
new->jit_keyring = KEY_REQKEY_DEFL_THREAD_KEYRING;
|
|
#endif
|
|
|
|
#ifdef CONFIG_SECURITY
|
|
new->security = NULL;
|
|
#endif
|
|
new->ucounts = get_ucounts(new->ucounts);
|
|
if (!new->ucounts)
|
|
goto error;
|
|
|
|
if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
|
|
goto error;
|
|
|
|
put_cred(old);
|
|
return new;
|
|
|
|
error:
|
|
put_cred(new);
|
|
put_cred(old);
|
|
return NULL;
|
|
}
|
|
EXPORT_SYMBOL(prepare_kernel_cred);
|
|
|
|
/**
|
|
* set_security_override - Set the security ID in a set of credentials
|
|
* @new: The credentials to alter
|
|
* @secid: The LSM security ID to set
|
|
*
|
|
* Set the LSM security ID in a set of credentials so that the subjective
|
|
* security is overridden when an alternative set of credentials is used.
|
|
*/
|
|
int set_security_override(struct cred *new, u32 secid)
|
|
{
|
|
return security_kernel_act_as(new, secid);
|
|
}
|
|
EXPORT_SYMBOL(set_security_override);
|
|
|
|
/**
|
|
* set_security_override_from_ctx - Set the security ID in a set of credentials
|
|
* @new: The credentials to alter
|
|
* @secctx: The LSM security context to generate the security ID from.
|
|
*
|
|
* Set the LSM security ID in a set of credentials so that the subjective
|
|
* security is overridden when an alternative set of credentials is used. The
|
|
* security ID is specified in string form as a security context to be
|
|
* interpreted by the LSM.
|
|
*/
|
|
int set_security_override_from_ctx(struct cred *new, const char *secctx)
|
|
{
|
|
u32 secid;
|
|
int ret;
|
|
|
|
ret = security_secctx_to_secid(secctx, strlen(secctx), &secid);
|
|
if (ret < 0)
|
|
return ret;
|
|
|
|
return set_security_override(new, secid);
|
|
}
|
|
EXPORT_SYMBOL(set_security_override_from_ctx);
|
|
|
|
/**
|
|
* set_create_files_as - Set the LSM file create context in a set of credentials
|
|
* @new: The credentials to alter
|
|
* @inode: The inode to take the context from
|
|
*
|
|
* Change the LSM file creation context in a set of credentials to be the same
|
|
* as the object context of the specified inode, so that the new inodes have
|
|
* the same MAC context as that inode.
|
|
*/
|
|
int set_create_files_as(struct cred *new, struct inode *inode)
|
|
{
|
|
if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
|
|
return -EINVAL;
|
|
new->fsuid = inode->i_uid;
|
|
new->fsgid = inode->i_gid;
|
|
return security_kernel_create_files_as(new, inode);
|
|
}
|
|
EXPORT_SYMBOL(set_create_files_as);
|