1
0
mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-01-11 17:10:13 +00:00

517 Commits

Author SHA1 Message Date
Tetsuo Handa
2704024d83 loop: add missing bd_abort_claiming in loop_set_status
Commit 08e136ebd193 ("loop: don't change loop device under exclusive
opener in loop_set_status") forgot to call bd_abort_claiming() when
mutex_lock_killable() failed.

Fixes: 08e136ebd193 ("loop: don't change loop device under exclusive opener in loop_set_status")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-07 08:04:42 -07:00
Raphael Pinsonneault-Thibeault
08e136ebd1 loop: don't change loop device under exclusive opener in loop_set_status
loop_set_status() is allowed to change the loop device while there
are other openers of the device, even exclusive ones.

In this case, it causes a KASAN: slab-out-of-bounds Read in
ext4_search_dir(), since when looking for an entry in an inlined
directory, e_value_offs is changed underneath the filesystem by
loop_set_status().

Fix the problem by forbidding loop_set_status() from modifying the loop
device while there are exclusive openers of the device. This is similar
to the fix in loop_configure() by commit 33ec3e53e7b1 ("loop: Don't
change loop device under exclusive opener") alongside commit ecbe6bc0003b
("block: use bd_prepare_to_claim directly in the loop driver").

Reported-by: syzbot+3ee481e21fd75e14c397@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3ee481e21fd75e14c397
Tested-by: syzbot+3ee481e21fd75e14c397@syzkaller.appspotmail.com
Tested-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Signed-off-by: Raphael Pinsonneault-Thibeault <rpthibeault@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-06 05:30:18 -07:00
Yongpeng Yang
54891a96b7 loop: use READ_ONCE() to read lo->lo_state without locking
When lo->lo_mutex is not held, direct access may read stale data. This
patch uses READ_ONCE() to read lo->lo_state and data_race() to silence
code checkers, and changes all assignments to use WRITE_ONCE().

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-15 09:32:42 -07:00
Chaitanya Kulkarni
71075d25ca blk-mq: add blk_rq_nr_bvec() helper
Add a new helper function blk_rq_nr_bvec() that returns the number of
bvecs in a request. This count represents the number of iterations
rq_for_each_bvec() would perform on a request.

Drivers need to pre-allocate bvec arrays before iterating through
a request's bvecs. Currently, they manually count bvecs using
rq_for_each_bvec() in a loop, which is repetitive. The new helper
centralizes this logic.

This pattern exists in loop and zloop drivers, where multi-bio requests
require copying bvecs into a contiguous array before creating
an iov_iter for file operations.

Update loop and zloop drivers to use the new helper, eliminating
duplicate code.

This patch also provides a clear API to avoid any potential misuse of
blk_nr_phys_segments() for calculating the bvecs since, one bvec can
have more than one segments and use of blk_nr_phys_segments() can
lead to extra memory allocation :-

[ 6155.673749] nullb_bio: 128K bio as ONE bvec: sector=0, size=131072
[ 6155.673846] null_blk: #### null_handle_data_transfer:1375
[ 6155.673850] null_blk: nr_bvec=1 blk_rq_nr_phys_segments=2
[ 6155.674263] null_blk: #### null_handle_data_transfer:1375
[ 6155.674267] null_blk: nr_bvec=1 blk_rq_nr_phys_segments=1

Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-04 07:19:26 -07:00
Jens Axboe
96f03c8cb2 Revert "Merge branch 'loop-aio-nowait' into for-6.19/block"
This reverts commit f43fdeb9a368a5ff56b088b46edc245bd4b52cde, reversing
changes made to 2c6d792d4b7676e2b340df05425330452fee1f40.

There are concerns that doing inline submits can cause excessive
stack usage, particularly when going back into the filesystem. Revert
the loop dio nowait change for now.

Link: https://lore.kernel.org/linux-block/aSP3SG_KaROJTBHx@infradead.org/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-24 20:53:19 -07:00
Chaitanya Kulkarni
b11e483a1c loop: clear nowait flag in workqueue context
The loop driver advertises REQ_NOWAIT support through BLK_FEAT_NOWAIT
(enabled by default for all blk-mq devices), and honors the nowait
behavior throughout loop_queue_rq().

However, actual I/O to the backing file is performed in a workqueue,
where blocking is allowed.

To avoid imposing unnecessary non-blocking constraints in this blocking
context, clear the REQ_NOWAIT flag before processing the request in the
workqueue context.

Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-20 07:42:32 -07:00
Ming Lei
837ed30396 loop: add hint for handling aio via IOCB_NOWAIT
Add hint for using IOCB_NOWAIT to handle loop aio command for avoiding
to cause write(especially randwrite) perf regression on sparse backed file.

Try IOCB_NOWAIT in the following situations:

- backing file is block device

OR

- READ aio command

OR

- there isn't any queued blocking async WRITEs, because NOWAIT won't cause
contention with blocking WRITE, which often implies exclusive lock

With this simple policy, perf regression of randwrite/write on sparse
backing file is fixed.

Link: https://lore.kernel.org/dm-devel/7d6ae2c9-df8e-50d0-7ad6-b787cb3cfab4@redhat.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18 06:49:52 -07:00
Ming Lei
0ba93a906d loop: try to handle loop aio command via NOWAIT IO first
Try to handle loop aio command via NOWAIT IO first, then we can avoid to
queue the aio command into workqueue. This is usually one big win in
case that FS block mapping is stable, Mikulas verified [1] that this way
improves IO perf by close to 5X in 12jobs sequential read/write test,
in which FS block mapping is just stable.

Fallback to workqueue in case of -EAGAIN. This way may bring a little
cost from the 1st retry, but when running the following write test over
loop/sparse_file, the actual effect on randwrite is obvious:

```
truncate -s 4G 1.img    #1.img is created on XFS/virtio-scsi
losetup -f 1.img --direct-io=on
fio --direct=1 --bs=4k --runtime=40 --time_based --numjobs=1 --ioengine=libaio \
	--iodepth=16 --group_reporting=1 --filename=/dev/loop0 -name=job --rw=$RW
```

- RW=randwrite: obvious IOPS drop observed
- RW=write: a little drop(%5 - 10%)

This perf drop on randwrite over sparse file will be addressed in the
following patch.

BLK_MQ_F_BLOCKING has to be set for calling into .read_iter() or .write_iter()
which might sleep even though it is NOWAIT, and the only effect is that rcu read
lock is replaced with srcu read lock.

Link: https://lore.kernel.org/linux-block/a8e5c76a-231f-07d1-a394-847de930f638@redhat.com/ [1]
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18 06:49:52 -07:00
Ming Lei
f4788ae9d7 loop: move command blkcg/memcg initialization into loop_queue_work
Move loop command blkcg/memcg initialization into loop_queue_work,
and prepare for supporting to handle loop io command by IOCB_NOWAIT.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18 06:49:52 -07:00
Ming Lei
c66e9708f9 loop: add lo_submit_rw_aio()
Refactor lo_rw_aio() by extracting the I/O submission logic into a new
helper function lo_submit_rw_aio(). This further improves code organization
by separating the I/O preparation, submission, and completion handling into
distinct phases.

Prepare for using NOWAIT to improve loop performance.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18 06:49:51 -07:00
Ming Lei
fd858d1ca9 loop: add helper lo_rw_aio_prep()
Add helper lo_rw_aio_prep() to separate the preparation phase(setting up bio
vectors and initializing the iocb structure) from the actual I/O execution
in the loop block driver.

Prepare for using NOWAIT to improve loop performance.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18 06:49:51 -07:00
Ming Lei
c3e6c11147 loop: add helper lo_cmd_nr_bvec()
Add lo_cmd_nr_bvec() and prepare for refactoring lo_rw_aio().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18 06:49:51 -07:00
Pedro Demarchi Gomes
455281c0ef loop: remove redundant __GFP_NOWARN flag
GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the
redundant __GFP_NOWARN.

Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-08 06:27:53 -06:00
Li Chen
98b7bf5433 loop: fix backing file reference leak on validation error
loop_change_fd() and loop_configure() call loop_check_backing_file()
to validate the new backing file. If validation fails, the reference
acquired by fget() was not dropped, leaking a file reference.

Fix this by calling fput(file) before returning the error.

Cc: stable@vger.kernel.org
Cc: Markus Elfring <Markus.Elfring@web.de>
CC: Yang Erkun <yangerkun@huawei.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter")
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-02 15:25:47 -06:00
Yu Kuai
d14469ed7c loop: fix zero sized loop for block special file
By default, /dev/sda is block special file from devtmpfs, getattr will
return file size as zero, causing loop failed for raw block device.

We can add bdev_statx() to return device size, however this may
introduce changes that are not acknowledged by user. Fix this problem by
reverting changes for block special file, file mapping host is set to
bdev inode while opening, and use i_size_read() directly to get device
size.

Fixes: 47b71abd5846 ("loop: use vfs_getattr_nosec for accurate file size")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508200409.b2459c02-lkp@intel.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250825093205.3684121-1-yukuai1@huaweicloud.com
[axboe: fix spelling error]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-25 07:46:57 -06:00
Rajeev Mishra
47b71abd58 loop: use vfs_getattr_nosec for accurate file size
Use vfs_getattr_nosec() in lo_calculate_size() for getting the file
size, rather than just read the cached inode size via i_size_read().
This provides better results than cached inode data, particularly for
network filesystems where metadata may be stale.

Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250818184821.115033-3-rajeevm@hpe.com
[axboe: massage commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-18 13:10:35 -06:00
Rajeev Mishra
8aa5a3b68a loop: Consolidate size calculation logic into lo_calculate_size()
Renamed get_size to lo_calculate_size and merged the logic from get_size
and get_loop_size into a single function. Update all callers to use
lo_calculate_size. This is done in preparation for improving the size
detection logic.

Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250818184821.115033-2-rajeevm@hpe.com
[axboe: massage commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-18 13:10:25 -06:00
Linus Torvalds
6e11664f14 for-6.17/block-20250728
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiHdZ8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptRED/9o3dQ1QHL5yNM/AyCCGox0V4zra8qGS/Vc
 cBWpAVrmPGRw0IYlLZENtN9PdwKcbMzJq3l6cxeC7dBnAZP0AxTzP4YYJYUNVsqo
 WtJ3d/k5+cVp0OyOp4uabaqNeMeLoPk9/JXe1Ml2KxtDmHtj5yee0JRh7zlPZmZj
 tsrpIUTeHgAPn6yR1EI+0ybx/mjCb05Mv2Y8gF5hkUPA2PuON+MTFixJmqoy2ySh
 n+22mz/prqlyOSYh/VVv1+9jcQ94wMjcW0JIpg9lM3Kg8BCPU4IetvO1UiX6X33v
 154zEh2aJJDBx+yORS4BM4JMXjRZI7lYea2dkHM8Cajctu1Wpja9bNwnK9ibXvEc
 WtyBwztleLbAZef25fA/W87JE23fGa/r3nwIb2cF4QqkAFslCvhjA93WkOzNJCgQ
 qsWOrlCh3IK2NUu4b1Ncs3ZHOPvc51+zzjMzC6SUr54xhrxDK+gngDPhRy7XDqWJ
 DTMpIlr366o8GdJqnib0/e/CPBrThS6Vl6u0tgLnNbwdpK1svgo/uHW5ksKvDqHX
 kGEIhyRRJJC+4wyl4dsYKXa2twcyFrlWdAE+pZguEC2nZRYqYl9uXftOtvfp1x0y
 /skDX0FIDjvyjRqCLcqF03FSGqwCGS8WuWXZjPhVhcfz47NvbHeFDh1G/jMzsbpj
 S9zrPve/DQ==
 =e86T
 -----END PGP SIGNATURE-----

Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - MD pull request via Yu:
      - call del_gendisk synchronously (Xiao)
      - cleanup unused variable (John)
      - cleanup workqueue flags (Ryo)
      - fix faulty rdev can't be removed during resync (Qixing)

 - NVMe pull request via Christoph:
      - try PCIe function level reset on init failure (Keith Busch)
      - log TLS handshake failures at error level (Maurizio Lombardi)
      - pci-epf: do not complete commands twice if nvmet_req_init()
        fails (Rick Wertenbroek)
      - misc cleanups (Alok Tiwari)

 - Removal of the pktcdvd driver

   This has been more than a decade coming at this point, and some
   recently revealed breakages that had it causing issues even for cases
   where it isn't required made me re-pull the trigger on this one. It's
   known broken and nobody has stepped up to maintain the code

 - Series for ublk supporting batch commands, enabling the use of
   multishot where appropriate

 - Speed up ublk exit handling

 - Fix for the two-stage elevator fixing which could leak data

 - Convert NVMe to use the new IOVA based API

 - Increase default max transfer size to something more reasonable

 - Series fixing write operations on zoned DM devices

 - Add tracepoints for zoned block device operations

 - Prep series working towards improving blk-mq queue management in the
   presence of isolated CPUs

 - Don't allow updating of the block size of a loop device that is
   currently under exclusively ownership/open

 - Set chunk sectors from stacked device stripe size and use it for the
   atomic write size limit

 - Switch to folios in bcache read_super()

 - Fix for CD-ROM MRW exit flush handling

 - Various tweaks, fixes, and cleanups

* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
  block: restore two stage elevator switch while running nr_hw_queue update
  cdrom: Call cdrom_mrw_exit from cdrom_release function
  sunvdc: Balance device refcount in vdc_port_mpgroup_check
  nvme-pci: try function level reset on init failure
  dm: split write BIOs on zone boundaries when zone append is not emulated
  block: use chunk_sectors when evaluating stacked atomic write limits
  dm-stripe: limit chunk_sectors to the stripe size
  md/raid10: set chunk_sectors limit
  md/raid0: set chunk_sectors limit
  block: sanitize chunk_sectors for atomic write limits
  ilog2: add max_pow_of_two_factor()
  nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
  nvme-tcp: log TLS handshake failures at error level
  docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
  nvme: fix typo in status code constant for self-test in progress
  nvmet: remove redundant assignment of error code in nvmet_ns_enable()
  nvme: fix incorrect variable in io cqes error message
  nvme: fix multiple spelling and grammar issues in host drivers
  block: fix blk_zone_append_update_request_bio() kernel-doc
  md/raid10: fix set but not used variable in sync_request_write()
  ...
2025-07-28 16:43:54 -07:00
Ming Lei
c4706c5058 loop: use kiocb helpers to fix lockdep warning
The lockdep tool can report a circular lock dependency warning in the loop
driver's AIO read/write path:

```
[ 6540.587728] kworker/u96:5/72779 is trying to acquire lock:
[ 6540.593856] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop]
[ 6540.603786]
[ 6540.603786] but task is already holding lock:
[ 6540.610291] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop]
[ 6540.620210]
[ 6540.620210] other info that might help us debug this:
[ 6540.627499]  Possible unsafe locking scenario:
[ 6540.627499]
[ 6540.634110]        CPU0
[ 6540.636841]        ----
[ 6540.639574]   lock(sb_writers#9);
[ 6540.643281]   lock(sb_writers#9);
[ 6540.646988]
[ 6540.646988]  *** DEADLOCK ***
```

This patch fixes the issue by using the AIO-specific helpers
`kiocb_start_write()` and `kiocb_end_write()`. These functions are
designed to be used with a `kiocb` and manage write sequencing
correctly for asynchronous I/O without introducing the problematic
lock dependency.

The `kiocb` is already part of the `loop_cmd` struct, so this change
also simplifies the completion function `lo_rw_aio_do_completion()` by
using the `iocb` from the `cmd` struct directly, instead of retrieving
the loop device from the request queue.

Fixes: 39d86db34e41 ("loop: add file_start_write() and file_end_write()")
Cc: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250716114808.3159657-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-16 06:33:44 -06:00
Jan Kara
7e49538288 loop: Avoid updating block size under exclusive owner
Syzbot came up with a reproducer where a loop device block size is
changed underneath a mounted filesystem. This causes a mismatch between
the block device block size and the block size stored in the superblock
causing confusion in various places such as fs/buffer.c. The particular
issue triggered by syzbot was a warning in __getblk_slow() due to
requested buffer size not matching block device block size.

Fix the problem by getting exclusive hold of the loop device to change
its block size. This fails if somebody (such as filesystem) has already
an exclusive ownership of the block device and thus prevents modifying
the loop device under some exclusive owner which doesn't expect it.

Reported-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20250711163202.19623-2-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-11 20:39:45 -06:00
Ming Lei
4bb08cf974 loop: move lo_set_size() out of queue freeze
It isn't necessary to freeze queue for updating disk size given submit_bio()
doesn't grab queue usage counter for checking eod.

Also many driver won't freeze queue for calling set_capacity_and_notify().

Move lo_set_size() out of queue freeze for fixing many lockdep warning
report.

Link: https://lore.kernel.org/linux-block/67ea99e0.050a0220.3c3d88.0042.GAE@google.com/
Reported-by: syzbot+9dd7dbb1a4b915dee638@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250611084938.108829-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-11 06:41:45 -06:00
Ming Lei
39d86db34e loop: add file_start_write() and file_end_write()
file_start_write() and file_end_write() should be added around ->write_iter().

Recently we switch to ->write_iter() from vfs_iter_write(), and the
implied file_start_write() and file_end_write() are lost.

Also we never add them for dio code path, so add them back for covering
both.

Cc: Jeff Moyer <jmoyer@redhat.com>
Fixes: f2fed441c69b ("loop: stop using vfs_iter_{read,write} for buffered I/O")
Fixes: bc07c10a3603 ("block: loop: support DIO & AIO")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250527153405.837216-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-27 10:14:26 -06:00
Christoph Hellwig
355341e435 loop: don't require ->write_iter for writable files in loop_configure
Block devices can be opened read-write even if they can't be written to
for historic reasons.  Remove the check requiring file->f_op->write_iter
when the block devices was opened in loop_configure. The call to
loop_check_backing_file just below ensures the ->write_iter is present
for backing files opened for writing, which is the only check that is
actually needed.

Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter")
Reported-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20 09:16:23 -06:00
Lizhi Xu
f5c84eff63 loop: Add sanity check for read/write_iter
Some file systems do not support read_iter/write_iter, such as selinuxfs
in this issue.
So before calling them, first confirm that the interface is supported and
then call it.

It is releavant in that vfs_iter_read/write have the check, and removal
of their used caused szybot to be able to hit this issue.

Fixes: f2fed441c69b ("loop: stop using vfs_iter__{read,write} for buffered I/O")
Reported-by: syzbot+6af973a3b8dfd2faefdc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6af973a3b8dfd2faefdc
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250428143626.3318717-1-lizhi.xu@windriver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05 07:18:05 -06:00
Christoph Hellwig
f2fed441c6 loop: stop using vfs_iter_{read,write} for buffered I/O
vfs_iter_{read,write} always perform direct I/O when the file has the
O_DIRECT flag set, which breaks disabling direct I/O using the
LOOP_SET_STATUS / LOOP_SET_STATUS64 ioctls.

This was recenly reported as a regression, but as far as I can tell
was only uncovered by better checking for block sizes and has been
around since the direct I/O support was added.

Fix this by using the existing aio code that calls the raw read/write
iter methods instead.  Note that despite the comments there is no need
for block drivers to ever call flush_dcache_page themselves, and the
call is a left-over from prehistoric times.

Fixes: ab1cb278bc70 ("block: loop: introduce ioctl command of LOOP_SET_DIRECT_IO")
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20250409130940.3685677-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 18:59:15 -06:00
Thomas Weißschuh
0dba7a05b9 loop: LOOP_SET_FD: send uevents for partitions
Remove the suppression of the uevents before scanning for partitions.
The partitions inherit their suppression settings from their parent device,
which lead to the uevents being dropped.

This is similar to the same changes for LOOP_CONFIGURE done in
commit bb430b694226 ("loop: LOOP_CONFIGURE: send uevents for partitions").

Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v3-1-60ff69ac6088@linutronix.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 13:15:21 -06:00
Thomas Weißschuh
e7bc0010ce loop: properly send KOBJ_CHANGED uevent for disk device
The original commit message and the wording "uncork" in the code comment
indicate that it is expected that the suppressed event instances are
automatically sent after unsuppressing.
This is not the case, instead they are discarded.
In effect this means that no "changed" events are emitted on the device
itself by default.
While each discovered partition does trigger a changed event on the
device, devices without partitions don't have any event emitted.

This makes udev miss the device creation and prompted workarounds in
userspace. See the linked util-linux/losetup bug.

Explicitly emit the events and drop the confusingly worded comments.

Link: https://github.com/util-linux/util-linux/issues/2434
Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v2-1-0c4e6a923b2a@linutronix.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 08:58:42 -06:00
Yunlong Xing
1fdb8188c3 loop: aio inherit the ioprio of original request
Set cmd->iocb.ki_ioprio to the ioprio of loop device's request.
The purpose is to inherit the original request ioprio in the aio
flow.

Signed-off-by: Yunlong Xing <yunlong.xing@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250414030159.501180-1-yunlong.xing@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 08:58:38 -06:00
Ming Lei
86947bdc28 loop: move vfs_fsync() out of loop_update_dio()
If vfs_flush() is called with queue frozen, the queue freeze lock may be
connected with FS internal lock, and lockdep warning can be triggered
because the queue freeze lock is connected with too many global or
sub-system locks.

Fix the warning by moving vfs_fsync() out of loop_update_dio():

- vfs_fsync() is only needed when switching to dio

- only loop_change_fd() and loop_configure() may switch from buffered
IO to direct IO, so call vfs_fsync() directly here. This way is safe
because either loop is in unbound, or new file isn't attached

- for the other two cases of set_status and set_block_size, direct IO
can only become off, so no need to call vfs_fsync()

Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Kun Hu <huk23@m.fudan.edu.cn>
Reported-by: Jiaji Qin <jjtan24@m.fudan.edu.cn>
Closes: https://lore.kernel.org/linux-block/359BC288-B0B1-4815-9F01-3A349B12E816@m.fudan.edu.cn/T/#u
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250318072955.3893805-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-18 08:03:30 -06:00
Zhu Yanjun
3aab938c93 loop: Remove struct loop_func_table
The struct is introduced in the commit 754d96798fab
("loop: remove loop.h"), but it is not used now.
So remove it.

No functional changes.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250227163343.55952-1-yanjun.zhu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-04 07:14:47 -07:00
Christoph Hellwig
f4774e92aa loop: take the file system minimum dio alignment into account
The loop driver currently uses the logical block size of the underlying
bdev as the lower bound of the loop device block size.  While this works
for many cases, it fails for file systems made up of multiple devices
with different logical block sizes (e.g. XFS with a RT device that has a
larger logical block size), or when the file systems doesn't support
direct I/O writes at the sector size granularity (e.g. because it does
out of place writes with a file system block size larger than the sector
size).

Fix this by querying the minimum direct I/O alignment from statx when
available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24 16:17:56 -07:00
Christoph Hellwig
f6f9e32fe1 loop: check in LO_FLAGS_DIRECT_IO in loop_default_blocksize
We can't go below the minimum direct I/O size no matter if direct I/O is
enabled by passing in an O_DIRECT file descriptor or due to the explicit
flag.  Now that LO_FLAGS_DIRECT_IO is set earlier after assigning a
backing file, loop_default_blocksize can check it instead of the
O_DIRECT flag to handle both conditions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24 16:17:56 -07:00
Christoph Hellwig
984c2ab4b8 loop: set LO_FLAGS_DIRECT_IO in loop_assign_backing_file
Assigning LO_FLAGS_DIRECT_IO from the O_DIRECT flag is related to
assigning a new backing file.  Move the assignment in preparation
of using the flag more and earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24 16:17:56 -07:00
Christoph Hellwig
d278164832 loop: factor out a loop_assign_backing_file helper
Split the code for setting up a backing file into a helper in preparation
of adding more code to this path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24 16:17:52 -07:00
Zhaoyang Huang
02b3c61aab Revert "driver: block: release the lo_work_lock before queue_work"
This reverts commit ad934fc1784802fd1408224474b25ee5289fadfc.

loop_queue_work should be strictly serialized to loop_process_work since
the lo_worker could be freed without noticing new work has been queued
again.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Link: https://lore.kernel.org/r/20250218065835.19503-1-zhaoyang.huang@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-18 09:28:26 -07:00
Zhaoyang Huang
3bee991f2b loop: release the lo_work_lock before queue_work
queue_work could spin on wq->cpu_pwq->pool->lock which could lead to
concurrent loop_process_work failed on lo_work_lock contention and
increase the request latency. Remove this combination by moving the
lock release ahead of queue_work.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Link: https://lore.kernel.org/r/20250207091942.3966756-1-zhaoyang.huang@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-11 07:14:28 -07:00
Christoph Hellwig
1e1a9cecfa block: force noio scope in blk_mq_freeze_queue
When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock.  Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.

Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-31 07:20:08 -07:00
Christoph Hellwig
5aa21b0495 loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}
LOOP_SET_STATUS{,64} can set a lot more flags than it is supposed to
clear (the LOOP_SET_STATUS_CLEARABLE_FLAGS vs
LOOP_SET_STATUS_SETTABLE_FLAGS defines should have been a hint..).

Fix this by only clearing the bits in LOOP_SET_STATUS_CLEARABLE_FLAGS.

Fixes: ae074d07a0e5 ("loop: move updating lo_flag s out of loop_set_status_from_info")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250127143045.538279-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-27 09:06:26 -07:00
Christoph Hellwig
afd69d5c4a loop: remove the use_dio field in struct loop_device
This field duplicate the LO_FLAGS_DIRECT_IO flag in lo_flags.  Remove it
to have a single source of truth about using direct I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
0cd719aa63 loop: don't freeze the queue in loop_update_dio
All callers of loop_update_dio except for loop_configure already have the
queue frozen, and loop_configure works on an unbound device.  Remove the
superfluous recursive freezing in loop_update_dio and add asserts for the
locking and freezing state instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
3a693110af loop: allow loop_set_status to re-enable direct I/O
Unlike all other calls of (__)loop_update_dio, loop_set_status never
looks at the O_DIRECT flag of the backing file, and thus doesn't
re-enable direct I/O on an O_DIRECT backing file if e.g. the new block
size would allow it.  Fix that and remove the need for the separate
__loop_update_dio flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
dc909525da loop: open code the direct I/O flag update in loop_set_dio
loop_set_dio is different from the other (__)loop_update_dio callers in
that it doesn't take any implicit conditions into account and wants to
update the direct I/O flag to the user passed in value and fail if that
can't be done.

Open code the logic here to prepare for simplifying the other direct I/O
flag updates and to make the error handling less convoluted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
09ccf5549d loop: only write back pagecache when starting to to use direct I/O
There is no point in doing an fdatasync to write out pages when switching
away from direct I/O, as there won't be any.  The writeback is only
needed when switching to direct I/O, which would have to invalidate the
pagecache less efficiently from the I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
781fc49a0e loop: create a lo_can_use_dio helper
Factor out a part of __loop_update_dio in preparation for further
refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
4155adb01e loop: update commands in loop_set_status still referring to transfers
The concept of transfers is gone since commit 47e9624616c8 ("block:
remove support for cryptoloop and the xor transfer").

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
ae074d07a0 loop: move updating lo_flags out of loop_set_status_from_info
While loop_configure simplify assigns the flags passed in by userspace,
loop_set_status only looks at the two changeable flags, and currently
has to do a complicate dance to implement that.

Move assign lo->lo_flags out of loop_set_status_from_info into the
callers and thus drastically simplify the lo_flags handling in
loop_set_status.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:31:50 -07:00
Christoph Hellwig
b03732a9c0 loop: fix queue freeze vs limits lock order
Match the locking order used by the core block code by only freezing
the queue after taking the limits lock using the
queue_limits_commit_update_frozen helper and document the callers that
do not freeze the queue at all.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250110054726.1499538-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:29:24 -07:00
Christoph Hellwig
b38c8be255 loop: refactor queue limits updates
Replace loop_reconfigure_limits with a slightly less encompassing
loop_update_limits that expects the caller to acquire and commit the
queue limits to prepare for sorting out the freeze vs limits lock
ordering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://lore.kernel.org/r/20250110054726.1499538-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 07:29:24 -07:00
Christoph Hellwig
cc76ace465 block: remove BLK_MQ_F_SHOULD_MERGE
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely
process passthrough commands (bsg-lib, ufs tmf, various nvme admin
queues) and thus don't even check the flag.  Remove it to simplify the
driver interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23 08:17:23 -07:00
OGAWA Hirofumi
b49125574c loop: Fix ABBA locking race
Current loop calls vfs_statfs() while holding the q->limits_lock. If
FS takes some locking in vfs_statfs callback, this may lead to ABBA
locking bug (at least, FAT fs has this issue actually).

So this patch calls vfs_statfs() outside q->limits_locks instead,
because looks like no reason to hold q->limits_locks while getting
discord configs.

Chain exists of:
  &sbi->fat_lock --> &q->q_usage_counter(io)#17 --> &q->limits_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&q->limits_lock);
                               lock(&q->q_usage_counter(io)#17);
                               lock(&q->limits_lock);
  lock(&sbi->fat_lock);

 *** DEADLOCK ***

Reported-by: syzbot+a5d8c609c02f508672cc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a5d8c609c02f508672cc
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-19 07:54:56 -07:00