Skip to content

Conversation

@thehajime
Copy link
Member

@thehajime thehajime commented Feb 9, 2017

This patch changes the API of lkl_start_kernel(). The memsize parameter
is now able to specify by string, instead of integer value, by using
built-in memparse() function.

Additionally hijack lib has new env variable, LKL_HIJACK_MEMSIZE to
change from the default 64MB memory size.

Signed-off-by: Hajime Tazaki [email protected]


This change is Reviewable

lkl_ops = ops;
mem_size = _mem_size;
mem_size = memparse(_mem_size, NULL);
if (mem_size < 16 * 1024 * 1024) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding this checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this can be removed.

in fact less than 5M memsize makes my lkl apps out of memory during an execution so there is a limit around there. but I'm okay to remove this check.

Copy link
Member

@tavip tavip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could remove the parameter altogether and use mem= from command line to parse this?

@thehajime
Copy link
Member Author

I wonder if we could remove the parameter altogether and use mem= from command line to parse this?

could you provide me an example of command line ?

./bin/lkl-hijack.sh mem=100M sleep 1

or like this ?

LKL_HIJACK_PARAMS="mem=100M;loglevel=9;eth0=tap,tap0" ./bin/lkl-hijack.sh sleep 1

@tavip
Copy link
Member

tavip commented Feb 9, 2017

@thehajime Sorry for not being clear. What I meant was that we can remove the mem parameter in lkl_start_kernel, since we can pass it in the cmd_line parameter.

And then in hijack we could use it like:

LKL_HIJACK_PARAMS="mem=100M loglevel=9 "

just like regular Linux command line arguments.

@thehajime
Copy link
Member Author

And then in hijack we could use it like:

LKL_HIJACK_PARAMS="mem=100M loglevel=9 "

just like regular Linux command line arguments.

This was my original idea but I couldn't find built-in mem parameter that I can use for LKL: that's why I introduced a specific env variable for memsize.

do we already have mem command line parameter ? or will we introduce new param handler ?

@tavip
Copy link
Member

tavip commented Feb 9, 2017

This was my original idea but I couldn't find built-in mem parameter that I can use for LKL: that's why I introduced a specific env variable for memsize.

do we already have mem command line parameter ? or will we introduce new param handler ?

We don't have one for LKL but we could introduce it. If we do that, we should also call parse_early_param() in setup_arch().

@thehajime
Copy link
Member Author

We don't have one for LKL but we could introduce it. If we do that, we should also call parse_early_param() in setup_arch().

I see.

I will come up with this only for the memsize parameter. others (LKL_HIJACK_abc) can be applied with this new way but I may skip those for later PRs.

@thehajime thehajime force-pushed the feature-conf-memsize branch from 52fd164 to aa8c8f6 Compare February 10, 2017 04:22
@thehajime thehajime changed the title lkl: change memsize arg of lkl_start_kernel() lkl: add LKL_HIJACK_BOOT_CMDLINE option to configure boot cmdline Feb 10, 2017
@thehajime
Copy link
Member Author

added LKL_HIJACK_BOOT_CMDLINE and some test. two incompatibilities are introduced, described in the commit message.

I hope to have one more review.

@thehajime thehajime force-pushed the feature-conf-memsize branch from aa8c8f6 to 0e00d99 Compare February 10, 2017 06:35
This patch changes the API of lkl_start_kernel().  The memsize parameter
is now able to specify via this env variable with "mem=100M" string.

Also this patch includes 2 API incompatibility.

1. lkl_start_kernel() now doesn't have memsize argument
2. LKL_HIJACK_NET_IP=dhcp is now obsolted - use ip=dhpc in
LKL_HIJACK_BOOT_CMDLINE instead.

Signed-off-by: Hajime Tazaki <[email protected]>
@thehajime thehajime force-pushed the feature-conf-memsize branch from 0e00d99 to bc3fe96 Compare February 10, 2017 08:46
@thehajime thehajime merged commit 4a42464 into lkl:master Feb 10, 2017
@thehajime thehajime deleted the feature-conf-memsize branch February 10, 2017 09:00
octaviansoldea pushed a commit to octaviansoldea/lkl-linux that referenced this pull request Nov 10, 2017
Eric report a oops when booting the system after applying
the commit a99b646 ("PCI: Disable PCIe Relaxed..."):

[    4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[    4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[    4.253011] PGD 0
[    4.253011] P4D 0
[    4.253011]
[    4.258013] Oops: 0000 [libos-nuse#1] SMP DEBUG_PAGEALLOC
[    4.262015] Modules linked in:
[    4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV lkl#316
[    4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[    4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
[    4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[    4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
[    4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
[    4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
[    4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
[    4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
[    4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
[    4.331002] FS:  0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
[    4.339002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
[    4.351002] Call Trace:
[    4.354012]  ? pci_configure_device+0x19f/0x570
[    4.359002]  ? pci_conf1_read+0xb8/0xf0
[    4.363002]  ? raw_pci_read+0x23/0x40
[    4.366011]  ? pci_read+0x2c/0x30
[    4.370014]  ? pci_read_config_word+0x67/0x70
[    4.374012]  pci_device_add+0x28/0x230
[    4.378012]  ? pci_vpd_f0_read+0x50/0x80
[    4.382014]  pci_scan_single_device+0x96/0xc0
[    4.386012]  pci_scan_slot+0x79/0xf0
[    4.389001]  pci_scan_child_bus+0x31/0x180
[    4.394014]  acpi_pci_root_create+0x1c6/0x240
[    4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[    4.402012]  acpi_pci_root_add+0x2e6/0x400
[    4.406012]  ? acpi_evaluate_integer+0x37/0x60
[    4.411002]  acpi_bus_attach+0xdf/0x200
[    4.415002]  acpi_bus_attach+0x6a/0x200
[    4.418014]  acpi_bus_attach+0x6a/0x200
[    4.422013]  acpi_bus_scan+0x38/0x70
[    4.426011]  acpi_scan_init+0x10c/0x271
[    4.429001]  acpi_init+0x2fa/0x348
[    4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[    4.437001]  do_one_initcall+0x43/0x169
[    4.441001]  kernel_init_freeable+0x1d0/0x258
[    4.445003]  ? rest_init+0xe0/0xe0
[    4.449001]  kernel_init+0xe/0x150

====================== cut here =============================

It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.

Fixes: a99b646 ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Fixes: c56d445 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Ding Tianhong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
thehajime pushed a commit to libos-nuse/lkl-linux that referenced this pull request Mar 19, 2018
With latest kernel I get below bug while testing kdump:

  BUG: unable to handle kernel paging request at ffffea00034b1040
  IP: zero_resv_unavail+0xbd/0x126
  PGD 37b98067 P4D 37b98067 PUD 37b97067 PMD 0
  Oops: 0002 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1+ lkl#316
  Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET92WW (2.42 ) 03/03/2017
  task: ffffffff81a0e4c0 task.stack: ffffffff81a00000
  RIP: 0010:zero_resv_unavail+0xbd/0x126
  RSP: 0000:ffffffff81a03d88 EFLAGS: 00010006
  RAX: 0000000000000000 RBX: ffffea00034b1040 RCX: 0000000000000010
  RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffffea00034b1040
  RBP: 00000000000d2c41 R08: 00000000000000c0 R09: 0000000000000a0d
  R10: 0000000000000002 R11: 0000000000007f01 R12: ffffffff81a03d90
  R13: ffffea0000000000 R14: 0000000000000063 R15: 0000000000000062
  FS:  0000000000000000(0000) GS:ffffffff81c73000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffea00034b1040 CR3: 0000000037609000 CR4: 00000000000606b0
  Call Trace:
   ? free_area_init_nodes+0x640/0x664
   ? zone_sizes_init+0x58/0x72
   ? setup_arch+0xb50/0xc6c
   ? start_kernel+0x64/0x43d
   ? secondary_startup_64+0xa5/0xb0
  Code: c1 e8 0c 48 39 d8 76 27 48 89 de 48 c1 e3 06 48 c7 c7 7a 87 79 81 e8 b0 c0 3e ff 4c 01 eb b9 10 00 00 00 31 c0 48 89 df 49 ff c6 <f3> ab eb bc 6a 00 49 c7 c0 f0 93 d1 81 31 d2 83 ce ff 41 54 49
  RIP: zero_resv_unavail+0xbd/0x126 RSP: ffffffff81a03d88
  CR2: ffffea00034b1040
  ---[ end trace f5ba9e8f73c7ee26 ]---

This is introduced by commit a4a3ede ("mm: zero reserved and
unavailable struct pages").

The reason is some efi reserved boot ranges is not reported in E820 ram.
In my case it is a bgrt buffer:

  efi: mem00: [Boot Data          |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x00000000d2c41000-0x00000000d2c85fff] (0MB)

Use "add_efi_memmap" can workaround the problem with another fix:

  http://lkml.kernel.org/r/[email protected]

In zero_resv_unavail it would be better to check pfn_valid first before
zero the page struct.  This fixes the problem and potential other
similar problems.  Also as Pavel Tatashin suggested checks pfn_valid at
the beginning of the section.

The range is backed by real memory.  The memory range is efi "Boot
Service Data", that means after ExitBootServices() these ranges can be
used as system ram.  But some of them need to be reserved, for example
the bgrt image address in an acpi table, if the image memory is freed
then kexec reboot will fail because kexec inherit same acpi table to
initialize the driver.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: a4a3ede ("mm: zero reserved and unavailable struct pages")
Signed-off-by: Dave Young <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Myriad-Dreamin pushed a commit to Myriad-Dreamin/lkl-linux that referenced this pull request Aug 9, 2022
While I was doing zram testing, I found sometimes decompression failed
since the compression buffer was corrupted.  With investigation, I found
below commit calls cond_resched unconditionally so it could make a
problem in atomic context if the task is reschedule.

  BUG: sleeping function called from invalid context at mm/vmalloc.c:108
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
  3 locks held by memhog/946:
   #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
   lkl#1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
   lkl#2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
  CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty lkl#316
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
  Call Trace:
    unmap_kernel_range_noflush+0x2eb/0x350
    unmap_kernel_range+0x14/0x30
    zs_unmap_object+0xd5/0xe0
    zram_bvec_rw.isra.0+0x38c/0x8e0
    zram_rw_page+0x90/0x101
    bdev_write_page+0x92/0xe0
    __swap_writepage+0x94/0x4a0
    pageout+0xe3/0x3a0
    shrink_page_list+0xb94/0xd60
    shrink_inactive_list+0x158/0x460

We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
contains the offending calling code) from zsmalloc.

Even though this option showed some amount improvement(e.g., 30%) in
some arm32 platforms, it has been headache to maintain since it have
abused APIs[1](e.g., unmap_kernel_range in atomic context).

Since we are approaching to deprecate 32bit machines and already made
the config option available for only builtin build since v5.8, lastly it
has been not default option in zsmalloc, it's time to drop the option
for better maintenance.

[1] http://lore.kernel.org/linux-mm/[email protected]

Fixes: e47110e ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
Signed-off-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Harish Sriram <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants