Skip to content

Conversation

@xjia1
Copy link
Member

@xjia1 xjia1 commented Jan 29, 2016

Signed-off-by: Xiao Jia [email protected]

Review on Reviewable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to do like same as other hijacked calls to handle host fds (below is the example of setsockopt():

    CHECK_HOST_CALL(getsockopt);
    if (!is_lklfd(fd))
        return host_setsockopt(fd, level, optname, optval, optlen);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of CHECK_HOST_CALL? Isn't the host_xxx already initialized in init2_host_##name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to make a PR for this, but I think that either the __attribute__((constructor))s or the CHECK_HOST_CALLs should be removed --- preferably the former because afaik constructor functions are run once per thread rather than once per process (so spawning a new worker thread in hijacked code will trigger a wave of dlsyms).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about concurrent executions of CHECK_HOST_CALLs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stfairy That should be safe, right? Worst comes to worst, two threads try to dlsym it at once and it gets set twice, but the setting is atomic so it doesn't break anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dlsym may not be atomic. It is MT safe according to POSIX, but that doesn't provide much guarantee: http://man7.org/linux/man-pages/man7/attributes.7.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw setting a non-atomic variable is also not guaranteed to be atomic on some architectures.

From my POV the CHECK_HOST_CALL should be changed to something like:

if (!acquire_read(host_calls...)) {
lock; local_var = dlsym...; unlock;
release_write(host_calls..., local_var);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So MT safe != reentrant? I figured dlsym was thread safe and then setting a pointer to the result of dlsym would be one movl (i.e. a word-sized write), I'm surprised that there are architectures that don't guarantee that to be atomic.

Depending on what we believe to be atomic I think we can get away with less locking, anyway, something like:

if(!host_foo) {
   acquire(host_foo_lock);
       if(!host_foo) {
         local_foo = dlsym('foo');
         host_foo = local_foo;
      }
  release(host_foo_lock);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but regardless --- eventfd will never need to touch the host, since it creates a new fd; there's no way for the user to ask it to hit the host (except perhaps by adding our own flag). This is a useful discussion but it doesn't seem to apply here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pscollins you're right. eventfd will never touch the host.

epoll_create also has the same issue: no way to distinguish the destination without additional information. I currently have no idea how to deal with it.

This is a useful discussion but it doesn't seem to apply here.

right . my bad.
my comment only applies to eventfd_read and wirte.

@thehajime
Copy link
Member

@stfairy @pscollins __attribute__((constructor)) is somehow useless.

we need to take care for applications who have own constructor calls hijacked funcs of lkl.
see related discussion here.

4c57f95#diff-6754865b3e4a1cc9d24ba88bd5fddfb1R465

@ghost
Copy link

ghost commented Feb 4, 2016

@stfairy can you please add the checks @thehajime mentioned, so I can merge this?

We can do the constructor cleanup in a separate patch.

@xjia1
Copy link
Member Author

xjia1 commented Feb 4, 2016

Pushed a new version. Please take a look.

ghost pushed a commit that referenced this pull request Feb 4, 2016
lkl: turn on CONFIG_EVENTFD
@ghost ghost merged commit 8dd38af into lkl:master Feb 4, 2016
octaviansoldea pushed a commit to octaviansoldea/lkl-linux that referenced this pull request Nov 10, 2017
The below commit added a call to ->destroy() on init failure, but multiq
still frees ->queues on error in init, but ->queues is also freed by
->destroy() thus we get double free and corrupted memory.

Very easy to reproduce (eth0 not multiqueue):
$ tc qdisc add dev eth0 root multiq
RTNETLINK answers: Operation not supported
$ ip l add dumdum type dummy
(crash)

Trace log:
[ 3929.467747] general protection fault: 0000 [libos-nuse#1] SMP
[ 3929.468083] Modules linked in:
[ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ lkl#56
[ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000
[ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be
[ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246
[ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df
[ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020
[ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000
[ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564
[ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00
[ 3929.471869] FS:  00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 3929.472286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0
[ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3929.474873] Call Trace:
[ 3929.475337]  ? kstrdup_const+0x23/0x25
[ 3929.475863]  kstrdup+0x2e/0x4b
[ 3929.476338]  kstrdup_const+0x23/0x25
[ 3929.478084]  __kernfs_new_node+0x28/0xbc
[ 3929.478478]  kernfs_new_node+0x35/0x55
[ 3929.478929]  kernfs_create_link+0x23/0x76
[ 3929.479478]  sysfs_do_create_link_sd.isra.2+0x85/0xd7
[ 3929.480096]  sysfs_create_link+0x33/0x35
[ 3929.480649]  device_add+0x200/0x589
[ 3929.481184]  netdev_register_kobject+0x7c/0x12f
[ 3929.481711]  register_netdevice+0x373/0x471
[ 3929.482174]  rtnl_newlink+0x614/0x729
[ 3929.482610]  ? rtnl_newlink+0x17f/0x729
[ 3929.483080]  rtnetlink_rcv_msg+0x188/0x197
[ 3929.483533]  ? rcu_read_unlock+0x3e/0x5f
[ 3929.483984]  ? rtnl_newlink+0x729/0x729
[ 3929.484420]  netlink_rcv_skb+0x6c/0xce
[ 3929.484858]  rtnetlink_rcv+0x23/0x2a
[ 3929.485291]  netlink_unicast+0x103/0x181
[ 3929.485735]  netlink_sendmsg+0x326/0x337
[ 3929.486181]  sock_sendmsg_nosec+0x14/0x3f
[ 3929.486614]  sock_sendmsg+0x29/0x2e
[ 3929.486973]  ___sys_sendmsg+0x209/0x28b
[ 3929.487340]  ? do_raw_spin_unlock+0xcd/0xf8
[ 3929.487719]  ? _raw_spin_unlock+0x27/0x31
[ 3929.488092]  ? __handle_mm_fault+0x651/0xdb1
[ 3929.488471]  ? check_chain_key+0xb0/0xfd
[ 3929.488847]  __sys_sendmsg+0x45/0x63
[ 3929.489206]  ? __sys_sendmsg+0x45/0x63
[ 3929.489576]  SyS_sendmsg+0x19/0x1b
[ 3929.489901]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 3929.490172] RIP: 0033:0x7f0b6fb93690
[ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690
[ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003
[ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000
[ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002
[ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000
[ 3929.492352]  ? trace_hardirqs_off_caller+0xa7/0xcf
[ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44
89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d
8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01
[ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Fixes: f07d150 ("multiq: Further multiqueue cleanup")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
thehajime referenced this pull request in thehajime/linux Aug 18, 2024
The following warning was triggered by missing srcu locks around
the storage key handling functions.

=============================
WARNING: suspicious RCU usage
4.12.0+ #56 Not tainted
-----------------------------
./include/linux/kvm_host.h:572 suspicious rcu_dereference_check() usage!
rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by live_migration/4936:
  #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
kvm_arch_vm_ioctl+0x6b8/0x22d0

 CPU: 8 PID: 4936 Comm: live_migration Not tainted 4.12.0+ #56
 Hardware name: IBM 2964 NC9 704 (LPAR)
 Call Trace:
 ([<000000000011378a>] show_stack+0xea/0xf0)
  [<000000000055cc4c>] dump_stack+0x94/0xd8
  [<000000000012ee70>] gfn_to_memslot+0x1a0/0x1b8
  [<0000000000130b76>] gfn_to_hva+0x2e/0x48
  [<0000000000141c3c>] kvm_arch_vm_ioctl+0x714/0x22d0
  [<000000000013306c>] kvm_vm_ioctl+0x11c/0x7b8
  [<000000000037e2c0>] do_vfs_ioctl+0xa8/0x6c8
  [<000000000037e984>] SyS_ioctl+0xa4/0xb8
  [<00000000008b20a4>] system_call+0xc4/0x27c
 1 lock held by live_migration/4936:
  #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
kvm_arch_vm_ioctl+0x6b8/0x22d0

Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Pierre Morel<[email protected]>
thehajime referenced this pull request in thehajime/linux Oct 23, 2024
BUG: KASAN: slab-use-after-free in gsm_cleanup_mux+0x77b/0x7b0
drivers/tty/n_gsm.c:3160 [n_gsm]
Read of size 8 at addr ffff88815fe99c00 by task poc/3379
CPU: 0 UID: 0 PID: 3379 Comm: poc Not tainted 6.11.0+ #56
Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
 <TASK>
 gsm_cleanup_mux+0x77b/0x7b0 drivers/tty/n_gsm.c:3160 [n_gsm]
 __pfx_gsm_cleanup_mux+0x10/0x10 drivers/tty/n_gsm.c:3124 [n_gsm]
 __pfx_sched_clock_cpu+0x10/0x10 kernel/sched/clock.c:389
 update_load_avg+0x1c1/0x27b0 kernel/sched/fair.c:4500
 __pfx_min_vruntime_cb_rotate+0x10/0x10 kernel/sched/fair.c:846
 __rb_insert_augmented+0x492/0xbf0 lib/rbtree.c:161
 gsmld_ioctl+0x395/0x1450 drivers/tty/n_gsm.c:3408 [n_gsm]
 _raw_spin_lock_irqsave+0x92/0xf0 arch/x86/include/asm/atomic.h:107
 __pfx_gsmld_ioctl+0x10/0x10 drivers/tty/n_gsm.c:3822 [n_gsm]
 ktime_get+0x5e/0x140 kernel/time/timekeeping.c:195
 ldsem_down_read+0x94/0x4e0 arch/x86/include/asm/atomic64_64.h:79
 __pfx_ldsem_down_read+0x10/0x10 drivers/tty/tty_ldsem.c:338
 __pfx_do_vfs_ioctl+0x10/0x10 fs/ioctl.c:805
 tty_ioctl+0x643/0x1100 drivers/tty/tty_io.c:2818

Allocated by task 65:
 gsm_data_alloc.constprop.0+0x27/0x190 drivers/tty/n_gsm.c:926 [n_gsm]
 gsm_send+0x2c/0x580 drivers/tty/n_gsm.c:819 [n_gsm]
 gsm1_receive+0x547/0xad0 drivers/tty/n_gsm.c:3038 [n_gsm]
 gsmld_receive_buf+0x176/0x280 drivers/tty/n_gsm.c:3609 [n_gsm]
 tty_ldisc_receive_buf+0x101/0x1e0 drivers/tty/tty_buffer.c:391
 tty_port_default_receive_buf+0x61/0xa0 drivers/tty/tty_port.c:39
 flush_to_ldisc+0x1b0/0x750 drivers/tty/tty_buffer.c:445
 process_scheduled_works+0x2b0/0x10d0 kernel/workqueue.c:3229
 worker_thread+0x3dc/0x950 kernel/workqueue.c:3391
 kthread+0x2a3/0x370 kernel/kthread.c:389
 ret_from_fork+0x2d/0x70 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:257

Freed by task 3367:
 kfree+0x126/0x420 mm/slub.c:4580
 gsm_cleanup_mux+0x36c/0x7b0 drivers/tty/n_gsm.c:3160 [n_gsm]
 gsmld_ioctl+0x395/0x1450 drivers/tty/n_gsm.c:3408 [n_gsm]
 tty_ioctl+0x643/0x1100 drivers/tty/tty_io.c:2818

[Analysis]
gsm_msg on the tx_ctrl_list or tx_data_list of gsm_mux
can be freed by multi threads through ioctl,which leads
to the occurrence of uaf. Protect it by gsm tx lock.

Signed-off-by: Longlong Xia <[email protected]>
Cc: stable <[email protected]>
Suggested-by: Jiri Slaby <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
rodionov pushed a commit to rodionov/lkl that referenced this pull request Jan 2, 2025
…ress

If we specify a valid CQ ring address but an invalid SQ ring address,
we'll correctly spot this and free the allocated pages and clear them
to NULL. However, we don't clear the ring page count, and hence will
attempt to free the pages again. We've already cleared the address of
the page array when freeing them, but we don't check for that. This
causes the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Oops [lkl#1]
Modules linked in:
CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty lkl#56
Hardware name: ucbbar,riscvemu-bare (DT)
Workqueue: events_unbound io_ring_exit_work
epc : io_pages_free+0x2a/0x58
 ra : io_rings_free+0x3a/0x50
 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0
 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d
 [<ffffffff808811a2>] io_pages_free+0x2a/0x58
 [<ffffffff80881406>] io_rings_free+0x3a/0x50
 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424
 [<ffffffff80027234>] process_one_work+0x10c/0x1f4
 [<ffffffff8002756e>] worker_thread+0x252/0x31c
 [<ffffffff8002f5e4>] kthread+0xc4/0xe0
 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c

Check for a NULL array in io_pages_free(), but also clear the page counts
when we free them to be on the safer side.

Reported-by: [email protected]
Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes")
Cc: [email protected]
Reviewed-by: Jeff Moyer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants