Skip to content

Commit

Permalink
f2fs: Squash commits from f2fs-stable
Browse files Browse the repository at this point in the history
f2fs: report error of f2fs_create_root_stats

f2fs_create_root_stats can fail due to no memory, report it to user.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: commit atomic written page in LFS mode

We should always commit atomic written pages in LFS mode, otherwise data
will become corrupted if we encounter suddent power cut after partial
pages committed in IPU mode.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: support file defragment

This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file
defragment in a specified range of regular file.

This ioctl can be used in very limited workload: if user expects high
sequential read performance in randomly written file, this interface
can be used for defragmentation, after that file can be written as
continuous as possible in the device.

Meanwhile, it has side-effect, it will make holes in segments where
blocks located originally, so it's better to trigger GC to eliminate
fragment in segments.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix memory leak of kobject in error path of fill_super

f2fs_sb_info::s_kobj should be released in error path of fill_super,
otherwise it will lead to memory leak.

This bug was found by kmemleak:

dmesg:
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800838dc358 (size 8):
  comm "mount", pid 4154, jiffies 4297482839 (age 1911.412s)
  hex dump (first 8 bytes):
    7a 72 61 6d 31 00 ff ff                          zram1...
  backtrace:
    [<ffffffff817a3668>] kmemleak_alloc+0x28/0x50
    [<ffffffff811dc99f>] __kmalloc_track_caller+0xef/0x1c0
    [<ffffffff8119d1c5>] kstrdup+0x45/0x80
    [<ffffffff8119d228>] kstrdup_const+0x28/0x30
    [<ffffffff813d2333>] kvasprintf_const+0x63/0xa0
    [<ffffffff813c59fc>] kobject_set_name_vargs+0x3c/0xa0
    [<ffffffff813c5a85>] kobject_add_varg+0x25/0x60
    [<ffffffff813c5b13>] kobject_init_and_add+0x53/0x70
    [<ffffffffa07ced19>] f2fs_fill_super+0x9d9/0xc40 [f2fs]
    [<ffffffff811ff0b2>] mount_bdev+0x192/0x1d0
    [<ffffffffa07cd3e5>] f2fs_mount+0x15/0x20 [f2fs]
    [<ffffffff811fec93>] mount_fs+0x43/0x170
    [<ffffffff81220256>] vfs_kern_mount+0x76/0x160
    [<ffffffff812211c8>] do_mount+0x258/0xdc0
    [<ffffffff81221dab>] SyS_mount+0x7b/0xc0
    [<ffffffff817aecd7>] entry_SYSCALL_64_fastpath+0x12/0x6f
...

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to enable missing ioctl interfaces in ->compat_ioctl

In 64-bit kernel f2fs can supports 32-bit ioctl system call by identifying
encoded code which is converted from 32-bit one to 64-bit one in
->compat_ioctl.

When we introduced new interfaces in ->ioctl, we forgot to enable them in
->compat_ioctl, so enable them for fixing.

Signed-off-by: Chao Yu <[email protected]>
[Jaegeuk Kim: fix wrongly added spaces together]
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to remove directory inode from dirty list

If last dirty dentry page was writebacked in reclaim path, we should
remove its directory inode from global dirty list to avoid unnecessary
flush for this inode when doing checkpoint.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: optimize __find_rev_next_bit

1. Skip __reverse_ulong if the bitmap is empty.
2. Reduce branches and codes.
According to my test, the performance of this new version is 5% higher on
an empty bitmap of 64bytes, and remains about the same in the worst scenario.

Signed-off-by: Fan li <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: clear page uptodate when dropping cache for atomic write

We should clear uptodate flag for all pages atomic written when we drop
them, otherwise before these cached pages were reclaimed or invalidated
eventually, we will see invalid data when hitting them again.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to report error in f2fs_readdir

get_lock_data_page in f2fs_readdir can fail due to a lot of reasons (i.e.
no memory or IO error...), it's better to report this kind of error to
user rather than ignoring it.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: avoid deadlock in f2fs_shrink_extent_tree

While handling extent trees, we can enter into a reclaiming path anytime.
If it tries to release some extent nodes in the same extent tree,
write_lock(&et->lock) would be hanged.
In order to avoid the deadlock, we can just skip it.

Note that, if it is an unreferenced tree, we should get write_lock(&et->lock)
successfully and release all of therein nodes.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: do not recover from previous remained wrong dnodes

If device does not support discard, some obsolete dnodes can be recovered
by roll-forward. This patch enhances the recovery flow.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: clean up error path in f2fs_readdir

No logic changes, just clean up the error path.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/dir.c

f2fs: clean up code with __has_cursum_space

Clean up codes in lookup_journal_in_cursum() with __has_cursum_space().

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: clean up argument of recover_data

In recover_data, value of argument 'type' will be CURSEG_WARM_NODE all
the time, remove it for cleanup.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: kill f2fs_drop_largest_extent

For direct IO, f2fs only allocate new address for the block which is not
exist in the disk before, its mapping info should not exist in extent
cache previously, so here we do not need to call f2fs_drop_largest_extent
to drop related cache.

Due to no more callers for f2fs_drop_largest_extent now, kill it.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: use sbi->blocks_per_seg to avoid unnecessary calculation

Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using
1 << sbi->log_blocks_per_seg.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to convert inline inode in ->setattr

In commit 3c4541452748 ("f2fs: do not trim preallocated blocks when
truncating after i_size"), in order to follow the regulation: "truncate(x)
where x > i_size will not trim all blocks past i_size." like other file
systems, in ->setattr we invoked truncate_setsize instead of f2fs_truncate
to avoid unneeded block trimming in such case, but forgot to call
f2fs_convert_inline_inode keep consistency of inline data conversion rule.

This patch fixes to convert inline data if necessary.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: enhance the bit operation for SSR

This patch enhances the existing bit operation when f2fs allocates SSR
blocks.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: refactor f2fs_commit_super

Previously, f2fs_commit_super hacks the bh->blocknr to write the broken
alternate superblock.
Instead of it, we should use the correct logic to retrieve its buffer head
with locking it appropriately.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: use lock_buffer when changing superblock

When modifying sb contents, we need to use lock its buffer.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: clean up node page updating flow

If read_node_page return LOCKED_PAGE, in its caller it's better a) skip
unneeded 'Update' flag and mapping info verfication; b) check nid value
stored in footer structure of node page.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/node.c

f2fs: write only the pages in range during defragment

@LenD of filemap_write_and_wait_range is supposed to be a "offset
in bytes where the range ends (inclusive)". Subtract 1 to avoid
writing an extra page.

Signed-off-by: Fan li <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to update variable correctly when skip a unmapped block

map.m_len should be reduced after skip a block

Signed-off-by: Fan li <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: do more integrity verification for superblock

Do more sanity check for superblock during ->mount.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: add symbol to avoid any confusion with tools

This patch adds MAX_VOLUME_NAME to sync with f2fs-tools.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry

remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.

Here rename ino management related functions for readability, also in
order to avoid name conflict.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce dirty list node in inode info

Add a new dirt list node member in inode info for linking the inode to
global dirty list in superblock, instead of old implementation which
allocate slab cache memory as an entry to inode.

It avoids memory pressure due to slab cache allocation, and also makes
codes more clean.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce __remove_dirty_inode

Introduce __remove_dirty_inode to clean up codes in remove_dirty_dir_inode.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to reset variable correctlly

f2fs_map_blocks will set m_flags and m_len to 0, so we don't need to
reset m_flags ourselves, but have to reset m_len to correct value
before use it again.

Signed-off-by: Fan li <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: backup raw_super in sbi

f2fs use fields of f2fs_super_block struct directly in a grabbed buffer.

Once the buffer happen to be destroyed (e.g. through dd), it may bring
in unpredictable effect on f2fs.

This patch fixes to allocate additional buffer to store datas of super
block rather than using grabbed block buffer directly.

Signed-off-by: Yunlei He <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: don't grab super block buffer header all the time

We have already got one copy of valid super block in memory, do not grab
buffer header of super block all the time.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: relocate tracepoint of write_checkpoint

It needs to relocate its location to see exact trace logs.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce __f2fs_commit_super

Introduce __f2fs_commit_super to include duplicated codes in
f2fs_commit_super for cleanup.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: record dirty status of regular/symlink inode

Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce new option for controlling data flush

Add a new option 'data_flush' to enable data flush functionality.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	Documentation/filesystems/f2fs.txt

f2fs: stat dirty regular/symlink inodes

Add to stat dirty regular and symlink inode for showing in debugfs.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: support data flush in background

Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.

But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.

In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.

When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: optimize the flow of f2fs_map_blocks

check map->m_len right after it changes to avoid excess call
to update dnode_of_data.

Signed-off-by: Fan li <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: add a tracepoint for sync_dirty_inodes

This patch adds a tracepoint for sync_dirty_inodes.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: use atomic variable for total_extent_tree

It would be better to use atomic variable for total_extent_tree.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: speed up shrinking extent tree entries

If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.

Reviewed-by: Chao Yu <[email protected]>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: check inline_data flag at converting time

We can check inode's inline_data flag  when calling to convert it.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: avoid unnecessary f2fs_gc for dir operations

The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/namei.c

f2fs: record node block allocation in dnode_of_data

This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks

Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: call f2fs_balance_fs only when node was changed

If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/file.c

f2fs: report error of do_checkpoint

do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: don't convert inline inode when inline_data option is disable

If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce prepare_write_begin to clean up

This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: return early when trying to read null nid

If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: avoid f2fs_lock_op in f2fs_write_begin

If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: declare static function

The __f2fs_commit_super is static.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: add missing f2fs_balance_fs in __recover_dot_dentries

__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: let user being aware of IO error

Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/data.c

f2fs: fix bugs and simplify codes of f2fs_fiemap

fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
   fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
   that ends at isize, it will also return the extent beyond isize,
   which is outside the range.

Signed-off-by: Fan li <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: add a max block check for get_data_block_bmap

This patch adds a max block check for get_data_block_bmap.

Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.

Signed-off-by: Yunlei He <[email protected]>
Signed-off-by: Xue Liu <[email protected]>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: clean up f2fs_ioc_write_checkpoint

Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.

Signed-off-by: Chao Yu <[email protected]>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: early check broken symlink length in the encrypted case

If link is broken, its len is zero, and we don't need to move forward.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: use i_size_read to get i_size

We need to use i_size_read() to get inode->i_size.

Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/data.c

f2fs: load largest extent all the time

Otherwise, we can get mismatched largest extent information.

One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix to skip recovering dot dentries in a readonly fs

If filesystem is readonly, leave user message info instead of recovering
inline dot inode.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix f2fs_ioc_abort_volatile_write

There are two rules to handle aborting volatile or atomic writes.

1. drop atomic writes
 - we don't need to keep any stale db data.

2. write journal data
 - we should keep the journal data with fsync for db recovery.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: remove f2fs_bug_on in terms of max_depth

There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: write pending bios when cp_error is set

When testing ioc_shutdown, put_super is able to be hanged by waiting for
writebacking pages as follows.

INFO: task umount:2723 blocked for more than 120 seconds.
      Tainted: G           O    4.4.0-rc3+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount          D ffff88000859f9d8     0  2723   2110 0x00000000
 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
 ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
Call Trace:
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8182310c>] schedule+0x3c/0x90
 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
 [<ffffffff8111617c>] ? ktime_get+0xac/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
 [<ffffffff81823a25>] bit_wait_io+0x35/0x50
 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
 [<ffffffff812639bc>] evict+0xbc/0x190
 [<ffffffff81263d19>] iput+0x229/0x2c0
 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
 [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
 [<ffffffff812478f7>] kill_block_super+0x27/0x70
 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
 [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60
 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
 [<ffffffff810ac463>] task_work_run+0x73/0xa0
 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: use IPU for fdatasync

This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: monitor zombie_tree count

This patch adds an entry to show the number of zombie extent_tree.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce zombie list for fast shrinking extent trees

This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink

Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
to avoid unneeded registry of ->{get,set,remove}xattr.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce max_file_blocks in sbi

Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.

Signed-off-by: Chao Yu <[email protected]>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: cover more area with nat_tree_lock

There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.

Signed-off-by: Jaegeuk Kim <[email protected]>

Revert "f2fs: check the node block address of newly allocated nid"

Original issue is fixed by:

  f2fs: cover more area with nat_tree_lock

This reverts commit 24928634f81b1592e83b37dcd89ed45c28f12feb.

f2fs: read isize while holding i_mutex in fiemap

make sure the isize we read doesn't change during the process.

Signed-off-by: Fan li <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: check node id earily when readaheading node page

Add node id check in ra_node_page and get_node_page_ra like get_node_page.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce __get_node_page to reuse common code

There are duplicated code in between get_node_page and get_node_page_ra,
introduce __get_node_page to includes common parts of these two, and
export get_node_page and get_node_page_ra by reusing __get_node_page.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/node.c

f2fs: check the page status filled from disk

After reading a page, we need to check whether there is any error.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: avoid unnecessary f2fs_balance_fs calls

Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: remove redundant calls

This patch removes redundant calls.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: clean up f2fs_balance_fs

This patch adds one parameter to clean up all the callers of f2fs_balance_fs.

Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/namei.c

f2fs: recognize encrypted data in f2fs_fiemap

This patch fixes to teach f2fs_fiemap to recognize encrypted data.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: use atomic type for node count in extent tree

1. rename field in struct extent_tree from count to node_cnt for
   readability.
2. alter to use atomic type for node_cnt.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: skip releasing nodes in chindless extent tree

If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce time and interval facility

This patch adds time and interval arrays to store some timing variables.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: detect idle time depending on user behavior

This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.

Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	Documentation/ABI/testing/sysfs-fs-f2fs

Change-Id: I978b33092a8554f5eb3dd1044a44ca6fd476331e

f2fs: monitor the number of background checkpoint

This patch adds to show the number of background checkpoint.

Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix wrong memory condition check

This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.

Signed-off-by: Jaegeuk Kim <[email protected]>

 Conflicts:
	fs/f2fs/node.c

f2fs: should unset atomic flag after successful commit

If there is an error during commit, we should keep the flag in order to
abort it.

Signed-off-by: Jaegeuk Kim <[email protected]>
  • Loading branch information
chaseyu authored and kerneltoast committed Mar 22, 2016
1 parent 7f81b45 commit 2f30023
Show file tree
Hide file tree
Showing 22 changed files with 1,253 additions and 723 deletions.
4 changes: 3 additions & 1 deletion Documentation/filesystems/f2fs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,12 @@ extent_cache Enable an extent cache based on rb-tree, it can cache
as many as extent which map between contiguous logical
address and physical address per inode, resulting in
increasing the cache hit ratio. Set by default.
noextent_cache Diable an extent cache based on rb-tree explicitly, see
noextent_cache Disable an extent cache based on rb-tree explicitly, see
the above extent_cache mount option.
noinline_data Disable the inline data feature, inline data feature is
enabled by default.
data_flush Enable data flushing before checkpoint in order to
persist data of regular and symlink.

================================================================================
DEBUGFS ENTRIES
Expand Down
177 changes: 87 additions & 90 deletions fs/f2fs/checkpoint.c
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ static int f2fs_write_meta_page(struct page *page,
dec_page_count(sbi, F2FS_DIRTY_META);
unlock_page(page);

if (wbc->for_reclaim)
if (wbc->for_reclaim || unlikely(f2fs_cp_error(sbi)))
f2fs_submit_merged_bio(sbi, META, WRITE);
return 0;

Expand Down Expand Up @@ -411,13 +411,13 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
spin_unlock(&im->ino_lock);
}

void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
{
/* add new dirty ino entry into list */
__add_ino_entry(sbi, ino, type);
}

void remove_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
{
/* remove dirty ino entry from list */
__remove_ino_entry(sbi, ino, type);
Expand All @@ -435,7 +435,7 @@ bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode)
return e ? true : false;
}

void release_dirty_inode(struct f2fs_sb_info *sbi)
void release_ino_entry(struct f2fs_sb_info *sbi)
{
struct ino_entry *e, *tmp;
int i;
Expand Down Expand Up @@ -723,118 +723,109 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi)
return -EINVAL;
}

static int __add_dirty_inode(struct inode *inode, struct inode_entry *new)
static void __add_dirty_inode(struct inode *inode, enum inode_type type)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
int flag = (type == DIR_INODE) ? FI_DIRTY_DIR : FI_DIRTY_FILE;

if (is_inode_flag_set(F2FS_I(inode), FI_DIRTY_DIR))
return -EEXIST;
if (is_inode_flag_set(fi, flag))
return;

set_inode_flag(F2FS_I(inode), FI_DIRTY_DIR);
F2FS_I(inode)->dirty_dir = new;
list_add_tail(&new->list, &sbi->dir_inode_list);
stat_inc_dirty_dir(sbi);
return 0;
set_inode_flag(fi, flag);
list_add_tail(&fi->dirty_list, &sbi->inode_list[type]);
stat_inc_dirty_inode(sbi, type);
}

static void __remove_dirty_inode(struct inode *inode, enum inode_type type)
{
struct f2fs_inode_info *fi = F2FS_I(inode);
int flag = (type == DIR_INODE) ? FI_DIRTY_DIR : FI_DIRTY_FILE;

if (get_dirty_pages(inode) ||
!is_inode_flag_set(F2FS_I(inode), flag))
return;

list_del_init(&fi->dirty_list);
clear_inode_flag(fi, flag);
stat_dec_dirty_inode(F2FS_I_SB(inode), type);
}

void update_dirty_page(struct inode *inode, struct page *page)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct inode_entry *new;
int ret = 0;
enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE;

if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode) &&
!S_ISLNK(inode->i_mode))
return;

if (!S_ISDIR(inode->i_mode)) {
inode_inc_dirty_pages(inode);
goto out;
}

new = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
new->inode = inode;
INIT_LIST_HEAD(&new->list);

spin_lock(&sbi->dir_inode_lock);
ret = __add_dirty_inode(inode, new);
spin_lock(&sbi->inode_lock[type]);
__add_dirty_inode(inode, type);
inode_inc_dirty_pages(inode);
spin_unlock(&sbi->dir_inode_lock);
spin_unlock(&sbi->inode_lock[type]);

if (ret)
kmem_cache_free(inode_entry_slab, new);
out:
SetPagePrivate(page);
f2fs_trace_pid(page);
}

void add_dirty_dir_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct inode_entry *new =
f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
int ret = 0;

new->inode = inode;
INIT_LIST_HEAD(&new->list);

spin_lock(&sbi->dir_inode_lock);
ret = __add_dirty_inode(inode, new);
spin_unlock(&sbi->dir_inode_lock);

if (ret)
kmem_cache_free(inode_entry_slab, new);
spin_lock(&sbi->inode_lock[DIR_INODE]);
__add_dirty_inode(inode, DIR_INODE);
spin_unlock(&sbi->inode_lock[DIR_INODE]);
}

void remove_dirty_dir_inode(struct inode *inode)
void remove_dirty_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct inode_entry *entry;

if (!S_ISDIR(inode->i_mode))
return;
struct f2fs_inode_info *fi = F2FS_I(inode);
enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE;

spin_lock(&sbi->dir_inode_lock);
if (get_dirty_pages(inode) ||
!is_inode_flag_set(F2FS_I(inode), FI_DIRTY_DIR)) {
spin_unlock(&sbi->dir_inode_lock);
if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode) &&
!S_ISLNK(inode->i_mode))
return;
}

entry = F2FS_I(inode)->dirty_dir;
list_del(&entry->list);
F2FS_I(inode)->dirty_dir = NULL;
clear_inode_flag(F2FS_I(inode), FI_DIRTY_DIR);
stat_dec_dirty_dir(sbi);
spin_unlock(&sbi->dir_inode_lock);
kmem_cache_free(inode_entry_slab, entry);
spin_lock(&sbi->inode_lock[type]);
__remove_dirty_inode(inode, type);
spin_unlock(&sbi->inode_lock[type]);

/* Only from the recovery routine */
if (is_inode_flag_set(F2FS_I(inode), FI_DELAY_IPUT)) {
clear_inode_flag(F2FS_I(inode), FI_DELAY_IPUT);
if (is_inode_flag_set(fi, FI_DELAY_IPUT)) {
clear_inode_flag(fi, FI_DELAY_IPUT);
iput(inode);
}
}

void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type)
{
struct list_head *head;
struct inode_entry *entry;
struct inode *inode;
struct f2fs_inode_info *fi;
bool is_dir = (type == DIR_INODE);

trace_f2fs_sync_dirty_inodes_enter(sbi->sb, is_dir,
get_pages(sbi, is_dir ?
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA));
retry:
if (unlikely(f2fs_cp_error(sbi)))
return;
return -EIO;

spin_lock(&sbi->dir_inode_lock);
spin_lock(&sbi->inode_lock[type]);

head = &sbi->dir_inode_list;
head = &sbi->inode_list[type];
if (list_empty(head)) {
spin_unlock(&sbi->dir_inode_lock);
return;
spin_unlock(&sbi->inode_lock[type]);
trace_f2fs_sync_dirty_inodes_exit(sbi->sb, is_dir,
get_pages(sbi, is_dir ?
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA));
return 0;
}
entry = list_entry(head->next, struct inode_entry, list);
inode = igrab(entry->inode);
spin_unlock(&sbi->dir_inode_lock);
fi = list_entry(head->next, struct f2fs_inode_info, dirty_list);
inode = igrab(&fi->vfs_inode);
spin_unlock(&sbi->inode_lock[type]);
if (inode) {
filemap_fdatawrite(inode->i_mapping);
iput(inode);
Expand Down Expand Up @@ -869,11 +860,9 @@ static int block_operations(struct f2fs_sb_info *sbi)
/* write all the dirty dentry pages */
if (get_pages(sbi, F2FS_DIRTY_DENTS)) {
f2fs_unlock_all(sbi);
sync_dirty_dir_inodes(sbi);
if (unlikely(f2fs_cp_error(sbi))) {
err = -EIO;
err = sync_dirty_inodes(sbi, DIR_INODE);
if (err)
goto out;
}
goto retry_flush_dents;
}

Expand All @@ -886,10 +875,9 @@ static int block_operations(struct f2fs_sb_info *sbi)

if (get_pages(sbi, F2FS_DIRTY_NODES)) {
up_write(&sbi->node_write);
sync_node_pages(sbi, 0, &wbc);
if (unlikely(f2fs_cp_error(sbi))) {
err = sync_node_pages(sbi, 0, &wbc);
if (err) {
f2fs_unlock_all(sbi);
err = -EIO;
goto out;
}
goto retry_flush_nodes;
Expand Down Expand Up @@ -920,7 +908,7 @@ static void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi)
finish_wait(&sbi->cp_wait, &wait);
}

static void do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
Expand All @@ -946,7 +934,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
while (get_pages(sbi, F2FS_DIRTY_META)) {
sync_meta_pages(sbi, META, LONG_MAX);
if (unlikely(f2fs_cp_error(sbi)))
return;
return -EIO;
}

next_free_nid(sbi, &last_nid);
Expand Down Expand Up @@ -1031,7 +1019,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
/* need to wait for end_io results */
wait_on_all_pages_writeback(sbi);
if (unlikely(f2fs_cp_error(sbi)))
return;
return -EIO;

/* write out checkpoint buffer at block 0 */
update_meta_page(sbi, ckpt, start_blk++);
Expand Down Expand Up @@ -1059,7 +1047,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
wait_on_all_pages_writeback(sbi);

if (unlikely(f2fs_cp_error(sbi)))
return;
return -EIO;

filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LONG_MAX);
filemap_fdatawait_range(META_MAPPING(sbi), 0, LONG_MAX);
Expand All @@ -1082,37 +1070,45 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
invalidate_mapping_pages(META_MAPPING(sbi), discard_blk,
discard_blk);

release_dirty_inode(sbi);
release_ino_entry(sbi);

if (unlikely(f2fs_cp_error(sbi)))
return;
return -EIO;

clear_prefree_segments(sbi, cpc);
clear_sbi_flag(sbi, SBI_IS_DIRTY);

return 0;
}

/*
* We guarantee that this checkpoint procedure will not fail.
*/
void write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
unsigned long long ckpt_ver;
int err = 0;

mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
(cpc->reason == CP_FASTBOOT || cpc->reason == CP_SYNC ||
(cpc->reason == CP_DISCARD && !sbi->discard_blks)))
goto out;
if (unlikely(f2fs_cp_error(sbi)))
if (unlikely(f2fs_cp_error(sbi))) {
err = -EIO;
goto out;
if (f2fs_readonly(sbi->sb))
}
if (f2fs_readonly(sbi->sb)) {
err = -EROFS;
goto out;
}

trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "start block_ops");

if (block_operations(sbi))
err = block_operations(sbi);
if (err)
goto out;

trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish block_ops");
Expand All @@ -1134,7 +1130,7 @@ void write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
flush_sit_entries(sbi, cpc);

/* unlock all the fs_lock[] in do_checkpoint() */
do_checkpoint(sbi, cpc);
err = do_checkpoint(sbi, cpc);

unblock_operations(sbi);
stat_inc_cp_count(sbi->stat_info);
Expand All @@ -1144,10 +1140,11 @@ void write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
"checkpoint: version = %llx", ckpt_ver);

/* do checkpoint periodically */
sbi->cp_expires = round_jiffies_up(jiffies + HZ * sbi->cp_interval);
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
mutex_unlock(&sbi->cp_mutex);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
return err;
}

void init_ino_entry_info(struct f2fs_sb_info *sbi)
Expand Down
Loading

0 comments on commit 2f30023

Please sign in to comment.