torvalds-linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-01-17 20:10:49 +00:00

Author	SHA1	Message	Date
Jaegeuk Kim	9d5c4f5c7a	f2fs: fix wrong block mapping for multi-devices Assuming the disk layout as below, disk0: 0 --- 0x00035abfff disk1: 0x00035ac000 --- 0x00037abfff disk2: 0x00037ac000 --- 0x00037ebfff and we want to read data from offset=13568 having len=128 across the block devices, we can illustrate the block addresses like below. 0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 ------- \| ^ ^ ^ \| fofs 0 13568 13568+128 \| ------------------------------------------------------ \| LBA 0x37e8aa9 0x37ebfa9 0x37ec029 --- map 0x3caa9 0x3ffa9 In this example, we should give the relative map of the target block device ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by 0x37ebfff + 1 - 0x37ebfa9. In the below equation, however, map->m_pblk was supposed to be the original address instead of the one from the target block address. - map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk); Cc: stable@vger.kernel.org Fixes: 71f2c8206202 ("f2fs: multidevice: support direct IO") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-10-13 23:55:44 +00:00
Mateusz Guzik	1ee889fdf4	f2fs: don't call iput() from f2fs_drop_inode() iput() calls the problematic routine, which does a ->i_count inc/dec cycle. Undoing it with iput() recurses into the problem. Note f2fs should not be playing games with the refcount to begin with, but that will be handled later. Right now solve the immediate regression. Fixes: bc986b1d756482a ("fs: stop accessing ->i_count directly in f2fs and gfs2") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509301450.138b448f-lkp@intel.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-10-13 23:55:44 +00:00
Linus Torvalds	86d563ac5f	f2fs-for-6.18-rc1 This release focuses on two primary updates for Android devices. First, it sets hash-based file name lookup as the default method to improve performance, while retaining an option to fall back to a linear lookup. Second, it resolves a persistent issue with the checkpoint=enable feature. The update further boosts performance by prefetching node blocks, merging FUA writes more efficiently, and optimizing block allocation policies. The release is rounded out by a comprehensive set of bug fixes that address memory safety, data integrity, and potential system hangs, along with minor documentation and code clean-ups. Enhancement: - add mount option and sysfs entry to tune the lookup mode - dump more information and add a timeout when enabling/disabling checkpoints - readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode - merge FUA command with the existing writes - allocate HOT_DATA for IPU writes - Use allocate_section_policy to control write priority in multi-devices setups - add reserved nodes for privileged users - Add bggc_io_aware to adjust the priority of BG_GC when issuing IO - show the list of donation files Bug fix: - add missing dput() when printing the donation list - fix UAF issue in f2fs_merge_page_bio() - add sanity check on ei.len in __update_extent_tree_range() - fix infinite loop in __insert_extent_tree() - fix zero-sized extent for precache extents - fix to mitigate overhead of f2fs_zero_post_eof_page() - fix to avoid migrating empty section - fix to truncate first page in error path of f2fs_truncate() - fix to update map->m_next_extent correctly in f2fs_map_blocks() - fix wrong layout information on 16KB page - fix to do sanity check on node footer for non inode dnode - fix to avoid NULL pointer dereference in f2fs_check_quota_consistency() - fix to detect potential corrupted nid in free_nid_list - fix to clear unusable_cap for checkpoint=enable - fix to zero data after EOF for compressed file correctly - fix to avoid overflow while left shift operation - fix condition in __allow_reserved_blocks() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmjgDGsACgkQQBSofoJI UNICWQ//VJPl1HPhdvWB1QhGIL/kt0/9yxhmgdz3NAeU399NfE9rTvMQC9gunLV0 EW0o0EUhI/nOM+m/bOKlqwvklYe6AcO4RglXDzE3eq13k3Z3g3phM+YUwXQib/m5 jRcDWnHwSd9YY5iTHcJlxsVlWBe8nEQXJlHjo6+Iq70bLfT50hTiqPbgYwjoBy+B ISolj70XIFXlPsciG9AW7VOGjJBPMsNsRqrd08neYxVycIhC8rcolTLm+8hUQkLc 9y/E+wYypYlaHrN8jBqYLNOXBffql+9qOFDKAXRwDvfVxt4nIlLUHzcLvtVLDGC3 hTMPIcKm8D3EwqxY4SjpQH66EkC63XrquFm9zveU4ckJhs4++Kb9uwuKUofNhCWj 8gw9OKafb8SSoBimjnCpQpXecvfwMbIoTUPJ5ytpNV+q27eBs+pe3lkDcA2O4Xdu SEMGeBlrxvOAgrRbnE65uIv/GjXcUK9LqXERuErjNs/YJOrj/ByDT2wJH5yqASwH 9csO/3fKc91EAGy+Kd49z3E8S2wuoI+22noir/AB7WKyRg5ZO7q3ZiZxqsrc1iJN Z/gh0QrWVQVVnn23z8VPArQX2fMZQ8iOMvcM54G+05ipj3mUBNT5eZlyEPb3FcUe o4XvTtKkcFhEIawf+WgED07PBpdzz5w1f8hx3EWCLda0LacHILQ= =cIQy -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This focuses on two primary updates for Android devices. First, it sets hash-based file name lookup as the default method to improve performance, while retaining an option to fall back to a linear lookup. Second, it resolves a persistent issue with the 'checkpoint=enable' feature. The update further boosts performance by prefetching node blocks, merging FUA writes more efficiently, and optimizing block allocation policies. The release is rounded out by a comprehensive set of bug fixes that address memory safety, data integrity, and potential system hangs, along with minor documentation and code clean-ups. Enhancements: - add mount option and sysfs entry to tune the lookup mode - dump more information and add a timeout when enabling/disabling checkpoints - readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode - merge FUA command with the existing writes - allocate HOT_DATA for IPU writes - Use allocate_section_policy to control write priority in multi-devices setups - add reserved nodes for privileged users - Add bggc_io_aware to adjust the priority of BG_GC when issuing IO - show the list of donation files Bug fixes: - add missing dput() when printing the donation list - fix UAF issue in f2fs_merge_page_bio() - add sanity check on ei.len in __update_extent_tree_range() - fix infinite loop in __insert_extent_tree() - fix zero-sized extent for precache extents - fix to mitigate overhead of f2fs_zero_post_eof_page() - fix to avoid migrating empty section - fix to truncate first page in error path of f2fs_truncate() - fix to update map->m_next_extent correctly in f2fs_map_blocks() - fix wrong layout information on 16KB page - fix to do sanity check on node footer for non inode dnode - fix to avoid NULL pointer dereference in f2fs_check_quota_consistency() - fix to detect potential corrupted nid in free_nid_list - fix to clear unusable_cap for checkpoint=enable - fix to zero data after EOF for compressed file correctly - fix to avoid overflow while left shift operation - fix condition in __allow_reserved_blocks()" * tag 'f2fs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (43 commits) f2fs: add missing dput() when printing the donation list f2fs: fix UAF issue in f2fs_merge_page_bio() f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode f2fs: add sanity check on ei.len in __update_extent_tree_range() f2fs: fix infinite loop in __insert_extent_tree() f2fs: fix zero-sized extent for precache extents f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page() f2fs: fix to avoid migrating empty section f2fs: fix to truncate first page in error path of f2fs_truncate() f2fs: fix to update map->m_next_extent correctly in f2fs_map_blocks() f2fs: fix wrong layout information on 16KB page f2fs: clean up error handing of f2fs_submit_page_read() f2fs: avoid unnecessary folio_clear_uptodate() for cleanup f2fs: merge FUA command with the existing writes f2fs: allocate HOT_DATA for IPU writes f2fs: Use allocate_section_policy to control write priority in multi-devices setups Documentation: f2fs: Reword title Documentation: f2fs: Indent compression_mode option list Documentation: f2fs: Wrap snippets in literal code blocks Documentation: f2fs: Span write hint table section rows ...	2025-10-03 14:05:12 -07:00
Jaegeuk Kim	4e715744bf	f2fs: add missing dput() when printing the donation list We missed to call dput() on the grabbed dentry. Fixes: f1a49c1b112b ("f2fs: show the list of donation files") Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-10-03 03:16:10 +00:00
Linus Torvalds	56e7b31071	vfs-6.18-rc1.inode -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQQgAKCRCRxhvAZXjc oud9AQD5IG4sNnzCjsvcTDpQkbX5eZW+LFIiAiiN+nztZ+OcRQEAvC2N7YovfqM3 TWpVoNDKvEPdtDc9ttFMUKqBZYvxvgE= =sEaL -----END PGP SIGNATURE----- Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function	2025-09-29 09:42:30 -07:00
Chao Yu	edf7e9040f	f2fs: fix UAF issue in f2fs_merge_page_bio() As JY reported in bugzilla [1], Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : [0xffffffe51d249484] f2fs_is_cp_guaranteed+0x70/0x98 lr : [0xffffffe51d24adbc] f2fs_merge_page_bio+0x520/0x6d4 CPU: 3 UID: 0 PID: 6790 Comm: kworker/u16:3 Tainted: P B W OE 6.12.30-android16-5-maybe-dirty-4k #1 5f7701c9cbf727d1eebe77c89bbbeb3371e895e5 Tainted: [P]=PROPRIETARY_MODULE, [B]=BAD_PAGE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Workqueue: writeback wb_workfn (flush-254:49) Call trace: f2fs_is_cp_guaranteed+0x70/0x98 f2fs_inplace_write_data+0x174/0x2f4 f2fs_do_write_data_page+0x214/0x81c f2fs_write_single_data_page+0x28c/0x764 f2fs_write_data_pages+0x78c/0xce4 do_writepages+0xe8/0x2fc __writeback_single_inode+0x4c/0x4b4 writeback_sb_inodes+0x314/0x540 __writeback_inodes_wb+0xa4/0xf4 wb_writeback+0x160/0x448 wb_workfn+0x2f0/0x5dc process_scheduled_works+0x1c8/0x458 worker_thread+0x334/0x3f0 kthread+0x118/0x1ac ret_from_fork+0x10/0x20 [1] https://bugzilla.kernel.org/show_bug.cgi?id=220575 The panic was caused by UAF issue w/ below race condition: kworker - writepages - f2fs_write_cache_pages - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_inplace_write_data - f2fs_merge_page_bio - add_inu_page : cache page #1 into bio & cache bio in io->bio_list - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_inplace_write_data - f2fs_merge_page_bio - add_inu_page : cache page #2 into bio which is linked in io->bio_list write - f2fs_write_begin : write page #1 - f2fs_folio_wait_writeback - f2fs_submit_merged_ipu_write - f2fs_submit_write_bio : submit bio which inclues page #1 and #2 software IRQ - f2fs_write_end_io - fscrypt_free_bounce_page : freed bounced page which belongs to page #2 - inc_page_count( , WB_DATA_TYPE(data_folio), false) : data_folio points to fio->encrypted_page the bounced page can be freed before accessing it in f2fs_is_cp_guarantee() It can reproduce w/ below testcase: Run below script in shell #1: for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \ -c "pwrite 0 32k" -c "fdatasync" Run below script in shell #2: for ((i=1;i>0;i++)) do xfs_io -f /mnt/f2fs/enc/file \ -c "pwrite 0 32k" -c "fdatasync" So, in f2fs_merge_page_bio(), let's avoid using fio->encrypted_page after commit page into internal ipu cache. Fixes: 0b20fcec8651 ("f2fs: cache global IPU bio") Reported-by: JY <JY.Ho@mediatek.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-28 20:07:53 +00:00
Yunji Kang	72bdca6231	f2fs: readahead node blocks in F2FS_GET_BLOCK_PRECACHE mode In f2fs_precache_extents(), For large files, It requires reading many node blocks. Instead of reading each node block with synchronous I/O, this patch applies readahead so that node blocks can be fetched in advance. It reduces the overhead of repeated sync reads and improves efficiency when precaching extents of large files. I created a file with the same largest extent and executed the test. For this experiment, I set the file's largest extent with an offset of 0 and a size of 1GB. I configured the remaining area with 100MB extents. 5GB test file: dd if=/dev/urandom of=test1 bs=1m count=5120 cp test1 test2 fsync test1 dd if=test1 of=test2 bs=1m skip=1024 seek=1024 count=100 conv=notrunc dd if=test1 of=test2 bs=1m skip=1224 seek=1224 count=100 conv=notrunc ... dd if=test1 of=test2 bs=1m skip=5024 seek=5024 count=100 conv=notrunc reboot I also created 10GB and 20GB files with large extents using the same method. ioctl(F2FS_IOC_PRECACHE_EXTENTS) test results are as follows: +-----------+---------+---------+-----------+ \| File size \| Before \| After \| Reduction \| +-----------+---------+---------+-----------+ \| 5GB \| 101.8ms \| 37.0ms \| 72.1% \| \| 10GB \| 222.9ms \| 56.0ms \| 74.9% \| \| 20GB \| 446.2ms \| 116.4ms \| 73.9% \| +-----------+---------+---------+-----------+ Tested on a 256GB mobile device with an SM8750 chipset. Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yunji Kang <yunji0.kang@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-28 20:04:41 +00:00
Chao Yu	45b70947a4	f2fs: add sanity check on ei.len in __update_extent_tree_range() Add a sanity check in __update_extent_tree_range() to detect any zero-sized extent update. Signed-off-by: wangzijie <wangzijie1@honor.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-28 20:04:29 +00:00
wangzijie	23361bd549	f2fs: fix infinite loop in __insert_extent_tree() When we get wrong extent info data, and look up extent_node in rb tree, it will cause infinite loop (CONFIG_F2FS_CHECK_FS=n). Avoiding this by return NULL and print some kernel messages in that case. Signed-off-by: wangzijie <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-17 16:00:18 +00:00
wangzijie	8175c86439	f2fs: fix zero-sized extent for precache extents Script to reproduce: f2fs_io write 1 0 1881 rand dsync testfile f2fs_io fallocate 0 7708672 4096 testfile f2fs_io write 1 1881 1 rand buffered testfile fsync testfile umount mount f2fs_io precache_extents testfile When the data layout is something like this: dnode1: dnode2: [0] A [0] NEW_ADDR [1] A+1 [1] 0x0 ... [1016] A+1016 [1017] B (B!=A+1017) [1017] 0x0 During precache_extents, we map the last block(valid blkaddr) in dnode1: map->m_flags \|= F2FS_MAP_MAPPED; map->m_pblk = blkaddr(valid blkaddr); map->m_len = 1; then we goto next_dnode, meet the first block in dnode2(hole), goto sync_out: map->m_flags & F2FS_MAP_MAPPED == true, and we make zero-sized extent: map->m_len = 1 ofs = start_pgofs - map->m_lblk = 1882 - 1881 = 1 ei.fofs = start_pgofs = 1882 ei.len = map->m_len - ofs = 1 - 1 = 0 Rebased on patch[1], this patch can cover these cases to avoid zero-sized extent: A,B,C is valid blkaddr case1: dnode1: dnode2: [0] A [0] NEW_ADDR [1] A+1 [1] 0x0 ... .... [1016] A+1016 [1017] B (B!=A+1017) [1017] 0x0 case2: dnode1: dnode2: [0] A [0] C (C!=B+1) [1] A+1 [1] C+1 ... .... [1016] A+1016 [1017] B (B!=A+1017) [1017] 0x0 case3: dnode1: dnode2: [0] A [0] C (C!=B+2) [1] A+1 [1] C+1 ... .... [1015] A+1015 [1016] B (B!=A+1016) [1017] B+1 [1017] 0x0 [1] https://lore.kernel.org/linux-f2fs-devel/20250912081250.44383-1-chao@kernel.org/ Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS") Signed-off-by: wangzijie <wangzijie1@honor.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-17 16:00:10 +00:00
Chao Yu	c2f7c32b25	f2fs: fix to mitigate overhead of f2fs_zero_post_eof_page() f2fs_zero_post_eof_page() may cuase more overhead due to invalidate_lock and page lookup, change as below to mitigate its overhead: - check new_size before grabbing invalidate_lock - lookup and invalidate pages only in range of [old_size, new_size] Fixes: ba8dac350faf ("f2fs: fix to zero post-eof page") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-16 04:23:43 +00:00
Chao Yu	d625a2b08c	f2fs: fix to avoid migrating empty section It reports a bug from device w/ zufs: F2FS-fs (dm-64): Inconsistent segment (173822) type [1, 0] in SSA and SIT F2FS-fs (dm-64): Stopped filesystem due to reason: 4 Thread A Thread B - f2fs_expand_inode_data - f2fs_allocate_pinning_section - f2fs_gc_range - do_garbage_collect w/ segno #x - writepage - f2fs_allocate_data_block - new_curseg - allocate segno #x The root cause is: fallocate on pinning file may race w/ block allocation as above, result in do_garbage_collect() from fallocate() may migrate segment which is just allocated by a log, the log will update segment type in its in-memory structure, however GC will get segment type from on-disk SSA block, once segment type changes by log, we can detect such inconsistency, then shutdown filesystem. In this case, on-disk SSA shows type of segno #173822 is 1 (SUM_TYPE_NODE), however segno #173822 was just allocated as data type segment, so in-memory SIT shows type of segno #173822 is 0 (SUM_TYPE_DATA). Change as below to fix this issue: - check whether current section is empty before gc - add sanity checks on do_garbage_collect() to avoid any race case, result in migrating segment used by log. - btw, it fixes misc issue in printed logs: "SSA and SIT" -> "SIT and SSA". Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-16 02:12:11 +00:00
Chao Yu	9251a9e6e8	f2fs: fix to truncate first page in error path of f2fs_truncate() syzbot reports a bug as below: loop0: detected capacity change from 0 to 40427 F2FS-fs (loop0): Wrong SSA boundary, start(3584) end(4096) blocks(3072) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): f2fs_convert_inline_folio: corrupted inline inode ino=3, i_addr[0]:0x1601, run fsck to fix. ------------[ cut here ]------------ kernel BUG at fs/inode.c:753! RIP: 0010:clear_inode+0x169/0x190 fs/inode.c:753 Call Trace: <TASK> evict+0x504/0x9c0 fs/inode.c:810 f2fs_fill_super+0x5612/0x6fa0 fs/f2fs/super.c:5047 get_tree_bdev_flags+0x40e/0x4d0 fs/super.c:1692 vfs_get_tree+0x8f/0x2b0 fs/super.c:1815 do_new_mount+0x2a2/0x9e0 fs/namespace.c:3808 do_mount fs/namespace.c:4136 [inline] __do_sys_mount fs/namespace.c:4347 [inline] __se_sys_mount+0x317/0x410 fs/namespace.c:4324 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f During f2fs_evict_inode(), clear_inode() detects that we missed to truncate all page cache before destorying inode, that is because in below path, we will create page #0 in cache, but missed to drop it in error path, let's fix it. - evict - f2fs_evict_inode - f2fs_truncate - f2fs_convert_inline_inode - f2fs_grab_cache_folio : create page #0 in cache - f2fs_convert_inline_folio : sanity check failed, return -EFSCORRUPTED - clear_inode detects that inode->i_data.nrpages is not zero Fixes: 92dffd01790a ("f2fs: convert inline_data when i_size becomes large") Reported-by: syzbot+90266696fe5daacebd35@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68c09802.050a0220.3c6139.000e.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-16 02:11:25 +00:00
Chao Yu	869833f54e	f2fs: fix to update map->m_next_extent correctly in f2fs_map_blocks() Script to reproduce: mkfs.f2fs -O extra_attr,compression /dev/vdb -f mount /dev/vdb /mnt/f2fs -o mode=lfs,noextent_cache cd /mnt/f2fs f2fs_io write 1 0 1024 rand dsync testfile xfs_io testfile -c "fsync" f2fs_io write 1 0 512 rand dsync testfile xfs_io testfile -c "fsync" cd / umount /mnt/f2fs mount /dev/vdb /mnt/f2fs f2fs_io precache_extents /mnt/f2fs/testfile umount /mnt/f2fs Tracepoint output: f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 0, len = 512, blkaddr = 1055744, c_len = 0 f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 513, len = 351, blkaddr = 17921, c_len = 0 f2fs_update_read_extent_tree_range: dev = (253,16), ino = 4, pgofs = 864, len = 160, blkaddr = 18272, c_len = 0 During precache_extents, there is off-by-one issue, we should update map->m_next_extent to pgofs rather than pgofs + 1, if last blkaddr is valid and not contiguous to previous extent. Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-16 02:10:54 +00:00
Mateusz Guzik	f99b391778	fs: rename generic_delete_inode() and generic_drop_inode() generic_delete_inode() is rather misleading for what the routine is doing. inode_just_drop() should be much clearer. The new naming is inconsistent with generic_drop_inode(), so rename that one as well with inode_ as the suffix. No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-15 16:09:42 +02:00
Jaegeuk Kim	a33be64b98	f2fs: fix wrong layout information on 16KB page This patch fixes to support different block size. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 19:33:43 +00:00
Chao Yu	d0236266cb	f2fs: clean up error handing of f2fs_submit_page_read() Below two functions should never fail, clean up error handling in their callers: 1) f2fs_grab_read_bio() in f2fs_submit_page_read() 2) bio_add_folio() in f2fs_submit_page_read() Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 03:26:41 +00:00
Chao Yu	2f84e99d61	f2fs: avoid unnecessary folio_clear_uptodate() for cleanup In error path of __get_node_folio(), if the folio is not uptodate, let's avoid unnecessary folio_clear_uptodate() for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 03:26:41 +00:00
Jaegeuk Kim	44749759d5	f2fs: merge FUA command with the existing writes FUA writes can be merged to the existing write IOs. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-09 03:26:36 +00:00
Jaegeuk Kim	c872b6279c	f2fs: allocate HOT_DATA for IPU writes Let's split IPU writes in hot data area to improve the GC efficiency. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-09-02 20:16:04 +00:00
Liao Yuanhong	b639c20e74	f2fs: Use allocate_section_policy to control write priority in multi-devices setups Introduces two new sys nodes: allocate_section_hint and allocate_section_policy. The allocate_section_hint identifies the boundary between devices, measured in sections; it defaults to the end of the device for single storage setups, and the end of the first device for multiple storage setups. The allocate_section_policy determines the write strategy, with a default value of 0 for normal sequential write strategy. A value of 1 prioritizes writes before the allocate_section_hint, while a value of 2 prioritizes writes after it. This strategy addresses the issue where, despite F2FS supporting multiple devices, SOC vendors lack multi-devices support (currently only supporting zoned devices). As a workaround, multiple storage devices are mapped to a single dm device. Both this workaround and the F2FS multi-devices solution may require prioritizing writing to certain devices, such as a device with better performance or when switching is needed due to performance degradation near a device's end. For scenarios with more than two devices, sort them at mount time to utilize this feature. When using this feature with a single storage device, it has almost no impact. However, for configurations where multiple storage devices are mapped to the same dm device using F2FS, utilizing this feature can provide some optimization benefits. Therefore, I believe it should not be limited to just multi-devices usage. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-29 20:48:47 +00:00
Chao Yu	c18ecd99e0	f2fs: fix to do sanity check on node footer for non inode dnode As syzbot reported below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/file.c:1243! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5354 Comm: syz.0.0 Not tainted 6.17.0-rc1-syzkaller-00211-g90d970cade8e #0 PREEMPT(full) RIP: 0010:f2fs_truncate_hole+0x69e/0x6c0 fs/f2fs/file.c:1243 Call Trace: <TASK> f2fs_punch_hole+0x2db/0x330 fs/f2fs/file.c:1306 f2fs_fallocate+0x546/0x990 fs/f2fs/file.c:2018 vfs_fallocate+0x666/0x7e0 fs/open.c:342 ksys_fallocate fs/open.c:366 [inline] __do_sys_fallocate fs/open.c:371 [inline] __se_sys_fallocate fs/open.c:369 [inline] __x64_sys_fallocate+0xc0/0x110 fs/open.c:369 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1e65f8ebe9 w/ a fuzzed image, f2fs may encounter panic due to it detects inconsistent truncation range in direct node in f2fs_truncate_hole(). The root cause is: a non-inode dnode may has the same footer.ino and footer.nid, so the dnode will be parsed as an inode, then ADDRS_PER_PAGE() may return wrong blkaddr count which may be 923 typically, by chance, dn.ofs_in_node is equal to 923, then count can be calculated to 0 in below statement, later it will trigger panic w/ f2fs_bug_on(, count == 0 \|\| ...). count = min(end_offset - dn.ofs_in_node, pg_end - pg_start); This patch introduces a new node_type NODE_TYPE_NON_INODE, then allowing passing the new_type to sanity_check_node_footer in f2fs_get_node_folio() to detect corruption that a non-inode dnode has the same footer.ino and footer.nid. Scripts to reproduce: mkfs.f2fs -f /dev/vdb mount /dev/vdb /mnt/f2fs touch /mnt/f2fs/foo touch /mnt/f2fs/bar dd if=/dev/zero of=/mnt/f2fs/foo bs=1M count=8 umount /mnt/f2fs inject.f2fs --node --mb i_nid --nid 4 --idx 0 --val 5 /dev/vdb mount /dev/vdb /mnt/f2fs xfs_io /mnt/f2fs/foo -c "fpunch 6984k 4k" Cc: stable@kernel.org Reported-by: syzbot+b9c7ffd609c3f09416ab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68a68e27.050a0220.1a3988.0002.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-28 00:09:01 +00:00
Josef Bacik	bc986b1d75	fs: stop accessing ->i_count directly in f2fs and gfs2 Instead of accessing ->i_count directly in these file systems, use the appropriate __iget and iput helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/b8e6eb8a3e690ce082828d3580415bf70dfa93aa.1755806649.git.josef@toxicpanda.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-27 13:12:48 +02:00
Eric Biggers	1f66cef4a9	f2fs: move verity info pointer to fs-specific part of inode Move the fsverity_info pointer into the filesystem-specific part of the inode by adding the field f2fs_inode_info::i_verity_info and configuring fsverity_operations::inode_info_offs accordingly. This is a prerequisite for a later commit that removes inode::i_verity_info, saving memory and improving cache efficiency on filesystems that don't support fsverity. Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/20250810075706.172910-11-ebiggers@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-21 13:58:08 +02:00
Eric Biggers	7afb71ee92	f2fs: move crypt info pointer to fs-specific part of inode Move the fscrypt_inode_info pointer into the filesystem-specific part of the inode by adding the field f2fs_inode_info::i_crypt_info and configuring fscrypt_operations::inode_info_offs accordingly. This is a prerequisite for a later commit that removes inode::i_crypt_info, saving memory and improving cache efficiency with filesystems that don't support fscrypt. Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/20250810075706.172910-5-ebiggers@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-08-21 13:58:07 +02:00
Jaegeuk Kim	f1a49c1b11	f2fs: show the list of donation files This patch introduces a proc entry to show the currently enrolled donation files. - "File path" indicates a file. - "Status" a. "Donated" means the file is registed in the donation list by fadvise(offset, length, POSIX_FADV_NOREUSE) b. "Evicted" means the donated pages were reclaimed. - "Offset (kb)" and "Length (kb) show the registered donation range. - "Cached pages (kb)" shows the amount of cached pages in the inode page cache. For example, # of files : 2 File path Status Donation offset (kb) Donation size (kb) File cached size (kb) --- /local/test2 Donated 0 1048576 2097152 /local/test Evicted 0 1048576 1048576 Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:45:00 +00:00
Chao Yu	ff11d8701b	f2fs: fix to allow removing qf_name The mount behavior changed after commit d18535132523 ("f2fs: separate the options parsing and options checking"), let's fix it. [Scripts] mkfs.f2fs -f /dev/vdb mount -t f2fs -o usrquota /dev/vdb /mnt/f2fs quotacheck -uc /mnt/f2fs umount /mnt/f2fs mount -t f2fs -o usrjquota=aquota.user,jqfmt=vfsold /dev/vdb /mnt/f2fs mount\|grep f2fs mount -t f2fs -o remount,usrjquota=,jqfmt=vfsold /dev/vdb /mnt/f2fs mount\|grep f2fs dmesg [Before commit] mount#1: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... mount#2: ...,quota,jqfmt=vfsold,... kmsg: no output [After commit] mount#1: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... mount#2: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... kmsg: "user quota file already specified" [After patch] mount#1: ...,quota,jqfmt=vfsold,usrjquota=aquota.user,... mount#2: ...,quota,jqfmt=vfsold,... kmsg: "remove qf_name aquota.user" Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:45:00 +00:00
Chao Yu	930a9a6ee8	f2fs: fix to avoid NULL pointer dereference in f2fs_check_quota_consistency() syzbot reported a f2fs bug as below: Oops: gen[ 107.736417][ T5848] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 5848 Comm: syz-executor263 Tainted: G W 6.17.0-rc1-syzkaller-00014-g0e39a731820a #0 PREEMPT_{RT,(full)} RIP: 0010:strcmp+0x3c/0xc0 lib/string.c:284 Call Trace: <TASK> f2fs_check_quota_consistency fs/f2fs/super.c:1188 [inline] f2fs_check_opt_consistency+0x1378/0x2c10 fs/f2fs/super.c:1436 __f2fs_remount fs/f2fs/super.c:2653 [inline] f2fs_reconfigure+0x482/0x1770 fs/f2fs/super.c:5297 reconfigure_super+0x224/0x890 fs/super.c:1077 do_remount fs/namespace.c:3314 [inline] path_mount+0xd18/0xfe0 fs/namespace.c:4112 do_mount fs/namespace.c:4133 [inline] __do_sys_mount fs/namespace.c:4344 [inline] __se_sys_mount+0x317/0x410 fs/namespace.c:4321 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The direct reason is f2fs_check_quota_consistency() may suffer null-ptr-deref issue in strcmp(). The bug can be reproduced w/ below scripts: mkfs.f2fs -f /dev/vdb mount -t f2fs -o usrquota /dev/vdb /mnt/f2fs quotacheck -uc /mnt/f2fs/ umount /mnt/f2fs mount -t f2fs -o usrjquota=aquota.user,jqfmt=vfsold /dev/vdb /mnt/f2fs mount -t f2fs -o remount,usrjquota=,jqfmt=vfsold /dev/vdb /mnt/f2fs umount /mnt/f2fs So, before old_qname and new_qname comparison, we need to check whether they are all valid pointers, fix it. Reported-by: syzbot+d371efea57d5aeab877b@syzkaller.appspotmail.com Fixes: d18535132523 ("f2fs: separate the options parsing and options checking") Closes: https://lore.kernel.org/linux-f2fs-devel/689ff889.050a0220.e29e5.0037.GAE@google.com Cc: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:45:00 +00:00
Chao Yu	4978f0a5ee	f2fs: clean up w/ get_left_section_blocks() Introduce get_left_section_blocks() for cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:10 +00:00
Chunhai Guo	2141879369	f2fs: add reserved nodes for privileged users This patch allows privileged users to reserve nodes via the 'reserve_node' mount option, which is similar to the existing 'reserve_root' option. "-o reserve_node=<N>" means <N> nodes are reserved for privileged users only. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:10 +00:00
Liao Yuanhong	00798cd24f	f2fs: Add bggc_io_aware to adjust the priority of BG_GC when issuing IO Currently, we have encountered some issues while testing ZUFS. In situations near the storage limit (e.g., 50GB remaining), and after simulating fragmentation by repeatedly writing and deleting data, we found that application installation and startup tests conducted after idling for a few minutes take significantly longer several times that of traditional UFS. Tracing the operations revealed that the majority of I/Os were issued by background GC, which blocks normal I/O operations. Under normal circumstances, ZUFS indeed requires more background GC and employs a more aggressive GC strategy. However, I aim to find a way to minimize the impact on regular I/O operations under these near-limit conditions. To address this, I have introduced a bggc_io_aware feature, which controls the prioritization of background GC in the presence of I/Os. This switch can be adjusted at the framework level to implement different strategies. If set to AWARE_ALL_IO, all background GC operations will be skipped during active I/O issuance. The default option remains consistent with the current strategy, ensuring no change in behavior. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:10 +00:00
Chao Yu	80b6d1d253	f2fs: dump more information for f2fs_{enable,disable}_checkpoint() Changes as below: - print more logs for f2fs_{enable,disable}_checkpoint() - account and dump time stats for f2fs_enable_checkpoint() Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:10 +00:00
Chao Yu	4bc3477796	f2fs: add timeout in f2fs_enable_checkpoint() During f2fs_enable_checkpoint() in remount(), if we flush a large amount of dirty pages into slow device, it may take long time which will block write IO, let's add a timeout machanism during dirty pages flush to avoid long time block in f2fs_enable_checkpoint(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:09 +00:00
Chao Yu	8fc6056dcf	f2fs: fix to detect potential corrupted nid in free_nid_list As reported, on-disk footer.ino and footer.nid is the same and out-of-range, let's add sanity check on f2fs_alloc_nid() to detect any potential corruption in free_nid_list. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:09 +00:00
Chao Yu	2e8f4c2b2b	f2fs: fix to clear unusable_cap for checkpoint=enable mount -t f2fs -o checkpoint=disable:10% /dev/vdb /mnt/f2fs/ mount -t f2fs -o remount,checkpoint=enable /dev/vdb /mnt/f2fs/ kernel log: F2FS-fs (vdb): Adjust unusable cap for checkpoint=disable = 204440 / 10% If we has assigned checkpoint=enable mount option, unusable_cap{,_perc} parameters of checkpoint=disable should be reset, then calculation and log print could be avoid in adjust_unusable_cap_perc(). Fixes: 1ae18f71cb52 ("f2fs: fix checkpoint=disable:%u%%") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-20 17:44:09 +00:00
Chao Yu	cbba5038ee	f2fs: clean up f2fs_truncate_partial_cluster() Clean up codes as below: - avoid unnecessary "err > 0" check condition - use "1 << log_cluster_size" instead of F2FS_I(inode)->i_cluster_size No logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:55 +00:00
Chao Yu	0b2cd50921	f2fs: fix to zero data after EOF for compressed file correctly generic/091 may fail, then it bisects to the bad commit ba8dac350faf ("f2fs: fix to zero post-eof page"). What will cause generic/091 to fail is something like below Testcase #1: 1. write 16k as compressed blocks 2. truncate to 12k 3. truncate to 20k 4. verify data in range of [12k, 16k], however data is not zero as expected Script of Testcase #1 mkfs.f2fs -f -O extra_attr,compression /dev/vdb mount -t f2fs -o compress_extension=* /dev/vdb /mnt/f2fs dd if=/dev/zero of=/mnt/f2fs/file bs=12k count=1 dd if=/dev/random of=/mnt/f2fs/file bs=4k count=1 seek=3 conv=notrunc sync truncate -s $((121024)) /mnt/f2fs/file truncate -s $((201024)) /mnt/f2fs/file dd if=/mnt/f2fs/file of=/mnt/f2fs/data bs=4k count=1 skip=3 od /mnt/f2fs/data umount /mnt/f2fs Analisys: in step 2), we will redirty all data pages from #0 to #3 in compressed cluster, and zero page #3, in step 3), f2fs_setattr() will call f2fs_zero_post_eof_page() to drop all page cache post eof, includeing dirtied page #3, in step 4) when we read data from page #3, it will decompressed cluster and extra random data to page #3, finally, we hit the non-zeroed data post eof. However, the commit ba8dac350faf ("f2fs: fix to zero post-eof page") just let the issue be reproduced easily, w/o the commit, it can reproduce this bug w/ below Testcase #2: 1. write 16k as compressed blocks 2. truncate to 8k 3. truncate to 12k 4. truncate to 20k 5. verify data in range of [12k, 16k], however data is not zero as expected Script of Testcase #2 mkfs.f2fs -f -O extra_attr,compression /dev/vdb mount -t f2fs -o compress_extension=* /dev/vdb /mnt/f2fs dd if=/dev/zero of=/mnt/f2fs/file bs=12k count=1 dd if=/dev/random of=/mnt/f2fs/file bs=4k count=1 seek=3 conv=notrunc sync truncate -s $((81024)) /mnt/f2fs/file truncate -s $((121024)) /mnt/f2fs/file truncate -s $((20*1024)) /mnt/f2fs/file echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/f2fs/file of=/mnt/f2fs/data bs=4k count=1 skip=3 od /mnt/f2fs/data umount /mnt/f2fs Anlysis: in step 2), we will redirty all data pages from #0 to #3 in compressed cluster, and zero page #2 and #3, in step 3), we will truncate page #3 in page cache, in step 4), expand file size, in step 5), hit random data post eof w/ the same reason in Testcase #1. Root Cause: In f2fs_truncate_partial_cluster(), after we truncate partial data block on compressed cluster, all pages in cluster including the one post eof will be dirtied, after another tuncation, dirty page post eof will be dropped, however on-disk compressed cluster is still valid, it may include non-zero data post eof, result in exposing previous non-zero data post eof while reading. Fix: In f2fs_truncate_partial_cluster(), let change as below to fix: - call filemap_write_and_wait_range() to flush dirty page - call truncate_pagecache() to drop pages or zero partial page post eof - call f2fs_do_truncate_blocks() to truncate non-compress cluster to last valid block Fixes: 3265d3db1f16 ("f2fs: support partial truncation on compressed inode") Reported-by: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:55 +00:00
Chao Yu	0fe1c6bec5	f2fs: fix to avoid overflow while left shift operation Should cast type of folio->index from pgoff_t to loff_t to avoid overflow while left shift operation. Fixes: 3265d3db1f16 ("f2fs: support partial truncation on compressed inode") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:54 +00:00
Daniel Lee	1bd119da0b	f2fs: add sysfs entry for effective lookup mode This commit introduces a new read-only sysfs entry at /sys/fs/f2fs/<device>/effective_lookup_mode. This entry displays the actual directory lookup mode F2FS is currently using. This is needed for debugging and verification, as the behavior is determined by both on-disk flags and mount options. Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:53 +00:00
Daniel Lee	632f0b6c3e	f2fs: add lookup_mode mount option For casefolded directories, f2fs may fall back to a linear search if a hash-based lookup fails. This can cause severe performance regressions. While this behavior can be controlled by userspace tools (e.g. mkfs, fsck) by setting an on-disk flag, a kernel-level solution is needed to guarantee the lookup behavior regardless of the on-disk state. This commit introduces the 'lookup_mode' mount option to provide this kernel-side control. The option accepts three values: - perf: (Default) Enforces a hash-only lookup. The linear fallback is always disabled. - compat: Enables the linear search fallback for compatibility with directory entries from older kernels. - auto: Determines the mode based on the on-disk flag, preserving the userspace-based behavior. Signed-off-by: Daniel Lee <chullee@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:27 +00:00
mason.zhang	76bb6a72bc	f2fs: add error checking in do_write_page() Otherwise, the filesystem may unaware of potential file corruption. Signed-off-by: mason.zhang <masonzhang.linuxer@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:27 +00:00
Chao Yu	e75ce11790	f2fs: fix condition in __allow_reserved_blocks() If reserve_root mount option is not assigned, __allow_reserved_blocks() will return false, it's not correct, fix it. Fixes: 7e65be49ed94 ("f2fs: add reserved blocks for root user") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:27 +00:00
Chao Yu	57e74035ad	f2fs: add time stats of checkpoint for debug checkpoint was blocked for 18643 ms Step 0: 0 ms Step 1: 38 ms Step 2: 63 ms Step 3: 4 ms Step 4: 0 ms Step 5: 0 ms Step 6: 9 ms Step 7: 0 ms Step 8: 18277 ms Step 9: 249 ms Cc: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:27 +00:00
Chao Yu	3fcf228b64	f2fs: dump more information when checkpoint was blocked for long time generic/299 w/ mode=lfs will cause long time latency of checkpoint, let's dump more information once we hit case. CP merge: - Queued : 0 - Issued : 1 - Total : 1 - Cur time : 9765(ms) - Peak time : 9765(ms) F2FS-fs (vdc): blocked on checkpoint for 9765 ms CPU: 11 UID: 0 PID: 237 Comm: kworker/u128:29 Tainted: G O 6.16.0-rc3+ #409 PREEMPT(voluntary) Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: writeback wb_workfn (flush-253:32) Call Trace: <TASK> dump_stack_lvl+0x6e/0xa0 f2fs_issue_checkpoint+0x268/0x280 f2fs_write_node_pages+0x6a/0x2c0 do_writepages+0xd0/0x170 __writeback_single_inode+0x56/0x4c0 writeback_sb_inodes+0x22a/0x550 __writeback_inodes_wb+0x4c/0xf0 wb_writeback+0x300/0x400 wb_workfn+0x3de/0x500 process_one_work+0x230/0x5c0 worker_thread+0x1da/0x3d0 kthread+0x10d/0x250 ret_from_fork+0x164/0x190 ret_from_fork_asm+0x1a/0x30 Cc: Jan Prusakowski <jprusakowski@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-08-11 17:03:27 +00:00
Linus Torvalds	0974f486f3	f2fs-for-6.17-rc1 In this round, we've mainly updated three parts: 1) folio conversion by Matthew, 2) switch to a new mount API by Hongbo and Eric, and 3) several sysfs entries to tune GCs for ZUFS with finer granularity by Daeho. There are also patches to address bugs and issues in the existing features such as GCs, file pinning, write-while-dio-read, contingous block allocation, and memory access violations. Enhancement: - switch to new mount API and folio conversion - add sysfs nodes to controle F2FS GCs for ZUFS - improve performance on the nat entry cache - drop inode from the donation list when the last file is closed - avoid splitting bio when reading multiple pages Bug fix: - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode - make sure zoned device GC to use FG_GC in shortage of free section - fix to calculate dirty data during has_not_enough_free_secs() - fix to update upper_p in __get_secs_required() correctly - wait for inflight dio completion, excluding pinned files read using dio - don't break allocation when crossing contiguous sections - vm_unmap_ram() may be called from an invalid context - fix to avoid out-of-boundary access in dnode page - fix to avoid panic in f2fs_evict_inode - fix to avoid UAF in f2fs_sync_inode_meta() - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page() - fix UAF of f2fs_inode_info in f2fs_free_dic - fix to avoid invalid wait context issue - fix bio memleak when committing super block - handle nat.blkaddr corruption in f2fs_get_node_info() In addition, there are also clean-ups and minor bug fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmiRJ+oACgkQQBSofoJI UNInMA//ekJJCf/0UyMYiPA9ag4KBb/VA0VaVJbw6BA/DoT5ZII6+lCIfllyELbk 78+ZppTrKq5OyImwiajcNijEwyDbh/asfUu+uNVsC85fjoboiBgDGVHbUEtSQ20Q 5JVXIL5PhDDVGdVNPh57ijYK/PxhzBPaFNuaGECYrqnWhkQEb//HmN20KRfzcOjZ 19QnOyEh0HED/izMjLhtZaCBQP53kfB7VjhTxMdY86l6IZ22gJHPRrnqBQHRTfyb iHcMJj4WRd7SpvbD/6bSdnUfpxOYPIm3GwQHdG46cHBEH1scnyQxx2OULlSLUbz6 yeiG36jcuQQWOev8ikBjNzfAozD0VvUAulPpfIbAoHc5jBYkA1sP3N7JOiao1H4Z FnPgw/FyIQE+d9NkbyeVW+6f9WfmKlJlIJ4zKoURbZvARYCZKmiPiI9vPWWe18qV nchWniQMJ45TYsABUGmGJwTEe/SFaOkgLpLjAlzCy7ZY9/6LKVUlnxR0E1ZDcjSp 5/E5fXQhds0Nn7F1jQXV3afxkECW+MNOLS/31ggL+ym6Pce3HPJCxBeRU4XaKrvA O0wP7n3g5jhVVWce0PBghF0mwTVVBwohTaUhL7lIIJMxKGkr4A8kH1j8tLLBdD3b hqcesDCtqqOZhogbwHXEgUDSikak4/1R1gDXnK0KhL1gg0Z6wR4= =XIPU -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "Three main updates: folio conversion by Matthew, switch to a new mount API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS with finer granularity by Daeho. There are also patches to address bugs and issues in the existing features such as GCs, file pinning, write-while-dio-read, contingous block allocation, and memory access violations. Enhancements: - switch to new mount API and folio conversion - add sysfs nodes to controle F2FS GCs for ZUFS - improve performance on the nat entry cache - drop inode from the donation list when the last file is closed - avoid splitting bio when reading multiple pages Bug fixes: - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode - make sure zoned device GC to use FG_GC in shortage of free section - fix to calculate dirty data during has_not_enough_free_secs() - fix to update upper_p in __get_secs_required() correctly - wait for inflight dio completion, excluding pinned files read using dio - don't break allocation when crossing contiguous sections - vm_unmap_ram() may be called from an invalid context - fix to avoid out-of-boundary access in dnode page - fix to avoid panic in f2fs_evict_inode - fix to avoid UAF in f2fs_sync_inode_meta() - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page() - fix UAF of f2fs_inode_info in f2fs_free_dic - fix to avoid invalid wait context issue - fix bio memleak when committing super block - handle nat.blkaddr corruption in f2fs_get_node_info() In addition, there are also clean-ups and minor bug fixes" * tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits) f2fs: drop inode from the donation list when the last file is closed f2fs: add gc_boost_gc_greedy sysfs node f2fs: add gc_boost_gc_multiple sysfs node f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode f2fs: fix to calculate dirty data during has_not_enough_free_secs() f2fs: fix to update upper_p in __get_secs_required() correctly f2fs: directly add newly allocated pre-dirty nat entry to dirty set list f2fs: avoid redundant clean nat entry move in lru list f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio f2fs: ignore valid ratio when free section count is low f2fs: don't break allocation when crossing contiguous sections f2fs: remove unnecessary tracepoint enabled check f2fs: merge the two conditions to avoid code duplication f2fs: vm_unmap_ram() may be called from an invalid context f2fs: fix to avoid out-of-boundary access in dnode page f2fs: switch to the new mount api f2fs: introduce fs_context_operation structure f2fs: separate the options parsing and options checking f2fs: Add f2fs_fs_context to record the mount options f2fs: Allow sbi to be NULL in f2fs_printk ...	2025-08-04 16:27:21 -07:00
Jaegeuk Kim	078cad8212	f2fs: drop inode from the donation list when the last file is closed Let's drop the inode from the donation list when there is no other open file. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-07-30 17:13:12 +00:00
Daeho Jeong	c8705cefce	f2fs: add gc_boost_gc_greedy sysfs node Add this to control GC algorithm for boost GC. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-07-29 15:02:36 +00:00
Daeho Jeong	1d4c5dbba1	f2fs: add gc_boost_gc_multiple sysfs node Add a sysfs knob to set a multiplier for the background GC migration window when F2FS Garbage Collection is boosted. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-07-29 15:02:33 +00:00
Linus Torvalds	57fcb7d930	vfs-6.17-rc1.fileattr -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4= =geAl -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get\|set] return -EOPNOTSUPP selinux: implement inode_file_[g\|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file	2025-07-28 15:24:14 -07:00
Linus Torvalds	7031769e10	vfs-6.17-rc1.mmap_prepare -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO pcYE40CacaekD8rFWwYUyszpgmv6ewc= =wCwp -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_mmap() users to .mmap_prepare() fs: convert simple use of generic_file__mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare	2025-07-28 13:43:25 -07:00

1 2 3 4 5 ...

4691 Commits