Sync fork #1

Barracuda09 · 2014-07-17T09:24:47Z

No description provided.

commit d319fe2 upstream. We don't expect to get errors from the hypervisor when reading the rng, but if we do we should pass the error up to the hwrng driver. Otherwise the hwrng driver will continue calling us forever. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 9180ac5 upstream. The LTR is the handshake between the device and the root complex about the latency allowed when the bus exits power save. This configuration was missing and this led to high latency in the link power up. The end user could experience high latency in the network because of this. Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit c406515 upstream. zram could kunmap_atomic() a NULL pointer in a rare situation: a zram page becomes a full-zeroed page after a partial write io. The current code doesn't handle this case and performs kunmap_atomic() on a NULL pointer, which panics the kernel. This patch fixes this issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 307fd54 upstream. Replace equivalent (and partially incorrect) scatter-gather functions with ones from crypto-API. The replacement is motivated by page-faults in sg_copy_part triggered by successive calls to crypto_hash_update. The following fault appears after calling crypto_ahash_update twice, first with 13 and then with 285 bytes: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xf9bf9a8c Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: tcrypt(+) caamhash caam_jr caam tls CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted 3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75 task: e9308530 ti: e700e000 task.ti: e700e000 NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0 REGS: e700fb80 TRAP: 0300 Not tainted (3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2) MSR: 00029002 <CE,EE,ME> CR: 44f92024 XER: 20000000 DEAR: 00000008, ESR: 00000000 GPR00: f9bfcf28 e700fc30 e9308530 e70b1e55 00000000 ffffffdd e70b1e54 0bebf888 GPR08: 902c7ef5 c0e771e2 00000002 00000888 c0019ea0 00000000 00000000 c07a4154 GPR16: c08d0000 e91a8f9c 00000001 e98fb400 00000100 e9c83028 e70b1e08 e70b1d48 GPR24: e992ce10 e70b1dc8 f9bfe4f4 e70b1e55 ffffffdd e70b1ce0 00000000 00000000 NIP [f9bf9a8c] sg_copy+0x1c/0x100 [caamhash] LR [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash] Call Trace: [e700fc30] [f9bf9c50] sg_copy_part+0xe0/0x160 [caamhash] (unreliable) [e700fc50] [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash] [e700fcb0] [f954e19c] crypto_tls_genicv+0x13c/0x300 [tls] [e700fd10] [f954e65c] crypto_tls_encrypt+0x5c/0x260 [tls] [e700fd40] [c02250ec] __test_aead.constprop.9+0x2bc/0xb70 [e700fe40] [c02259f0] alg_test_aead+0x50/0xc0 [e700fe60] [c02241e4] alg_test+0x114/0x2e0 [e700fee0] [c022276c] cryptomgr_test+0x4c/0x60 [e700fef0] [c004f658] kthread+0x98/0xa0 [e700ff40] [c000fd04] ret_from_kernel_thread+0x5c/0x64 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Cristian Stoica <cristian.stoica@freescale.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 24c65bc upstream. The add_early_randomness() function in drivers/char/hw_random/core.c passes a 16-byte buffer to pseries_rng_data_read(). Unfortunately, plpar_hcall() returns four 64-bit values and trashes 16 bytes on the stack. This bug has been lying around for a long time. It got unveiled by: commit d3cc799 Author: Amit Shah <amit.shah@redhat.com> Date: Thu Jul 10 15:42:34 2014 +0530 hwrng: fetch randomness only after device init It may trig a oops while loading or unloading the pseries-rng module for both PowerVM and PowerKVM guests. This patch does two things: - pass an intermediate well sized buffer to plpar_hcall(). This is acceptalbe since we're not on a hot path. - move to the new read API so that we know the return buffer size for sure. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit a8f9bfd upstream. When VLAN acceleration is in use on the xmit path, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN acceleration is detected. Fixes: 6680ec6 ("tuntap: hardware vlan tx support") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 81f49a8 upstream. is_compat_task() is the wrong check for audit arch; the check should be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not AUDIT_ARCH_I386. CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has no visible effect. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/a0138ed8c709882aec06e4acc30bfa9b623b8717.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 799b601 upstream. Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: 90b1e7a ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 690000b upstream. This patch adds the AHCI-mode SATA Device IDs for the Intel Sunrise Point PCH. Signed-off-by: James Ralston <james.d.ralston@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 66a7cbc upstream. Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so 67809f8 ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks") disabled NCQ on them. It turns out that NCQ is fine as long as MSI is not used, so let's turn off MSI and leave NCQ on. Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731 Tested-by: <dorin@i51.org> Tested-by: Imre Kaloz <kaloz@openwrt.org> Fixes: 67809f8 ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks") Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 1a29058 upstream. M-audio FastTrack Ultra quirk doesn't release the kzalloc'ed memory. This patch adds the private_free callback to release it properly. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 2651cc6 upstream. Userspace actually passes single parameter (path name) to the umount syscall, so new umount just fails. Fix it by requesting old umount syscall implementation and re-wiring umount to it. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit aaef317 upstream. Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth tickets will have their buffers vmalloc'ed, which leads to the following crash in crypto: [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0 [ 28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.686032] PGD 0 [ 28.688088] Oops: 0000 [#1] PREEMPT SMP [ 28.688088] Modules linked in: [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305 [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.688088] Workqueue: ceph-msgr con_work [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000 [ 28.688088] RIP: 0010:[<ffffffff81392b42>] [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286 [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0 [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750 [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880 [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010 [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000 [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0 [ 28.688088] Stack: [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32 [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020 [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010 [ 28.688088] Call Trace: [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220 [ 28.688088] [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180 [ 28.688088] [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30 [ 28.688088] [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0 [ 28.688088] [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0 [ 28.688088] [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60 [ 28.688088] [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0 [ 28.688088] [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360 [ 28.688088] [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0 [ 28.688088] [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80 [ 28.688088] [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0 [ 28.688088] [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80 [ 28.688088] [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0 [ 28.688088] [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0 [ 28.688088] [<ffffffff81559289>] try_read+0x1e59/0x1f10 This is because we set up crypto scatterlists as if all buffers were kmalloc'ed. Fix it. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 3ce9b20 upstream. When VLAN is in use in macvtap_put_user, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN tags are detected. Fixes: f09e224 ("macvtap: restore vlan header on user read") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>

commit 805dbe1 upstream. The driver is not released when ieee80211_register_hw fails in mac80211_hwsim_create_radio, leading to the access to the unregistered (and possibly freed) device in platform_driver_unregister: [ 0.447547] mac80211_hwsim: ieee80211_register_hw failed (-2) [ 0.448292] ------------[ cut here ]------------ [ 0.448854] WARNING: CPU: 0 PID: 1 at ../include/linux/kref.h:47 kobject_get+0x33/0x50() [ 0.449839] CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-00001-gdd46990-dirty #2 [ 0.450813] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.451512] 00000000 00000000 78025e38 7967c6c6 78025e68 7905e09b 7988b480 00000000 [ 0.452579] 00000001 79887d62 0000002f 79170bb3 79170bb3 78397008 79ac9d74 00000001 [ 0.453614] 78025e78 7905e15d 00000009 00000000 78025e84 79170bb3 78397000 78025e8c [ 0.454632] Call Trace: [ 0.454921] [<7967c6c6>] dump_stack+0x16/0x18 [ 0.455453] [<7905e09b>] warn_slowpath_common+0x6b/0x90 [ 0.456067] [<79170bb3>] ? kobject_get+0x33/0x50 [ 0.456612] [<79170bb3>] ? kobject_get+0x33/0x50 [ 0.457155] [<7905e15d>] warn_slowpath_null+0x1d/0x20 [ 0.457748] [<79170bb3>] kobject_get+0x33/0x50 [ 0.458274] [<7925824f>] get_device+0xf/0x20 [ 0.458779] [<7925b5cd>] driver_detach+0x3d/0xa0 [ 0.459331] [<7925a3ff>] bus_remove_driver+0x8f/0xb0 [ 0.459927] [<7925bf80>] ? class_unregister+0x40/0x80 [ 0.460660] [<7925bad7>] driver_unregister+0x47/0x50 [ 0.461248] [<7925c033>] ? class_destroy+0x13/0x20 [ 0.461824] [<7925d07b>] platform_driver_unregister+0xb/0x10 [ 0.462507] [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9 [ 0.463161] [<79b30c58>] do_one_initcall+0x106/0x1a9 [ 0.463758] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.464393] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.465001] [<79071935>] ? parse_args+0x2f5/0x480 [ 0.465569] [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 0.466345] [<79b30dd9>] kernel_init_freeable+0xde/0x17d [ 0.466972] [<79b304d6>] ? do_early_param+0x7a/0x7a [ 0.467546] [<79677b1b>] kernel_init+0xb/0xe0 [ 0.468072] [<79075f42>] ? schedule_tail+0x12/0x40 [ 0.468658] [<79686580>] ret_from_kernel_thread+0x20/0x30 [ 0.469303] [<79677b10>] ? rest_init+0xc0/0xc0 [ 0.469829] ---[ end trace ad8ac403ff8aef5c ]--- [ 0.470509] ------------[ cut here ]------------ [ 0.471047] WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3161 __lock_acquire.isra.22+0x7aa/0xb00() [ 0.472163] DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS) [ 0.472774] CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.17.0-00001-gdd46990-dirty #2 [ 0.473815] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.474492] 78025de0 78025de0 78025da0 7967c6c6 78025dd0 7905e09b 79888931 78025dfc [ 0.475515] 00000001 79888a93 00000c59 7907f33a 7907f33a 78028000 fffe9d09 00000000 [ 0.476519] 78025de8 7905e10e 00000009 78025de0 79888931 78025dfc 78025e24 7907f33a [ 0.477523] Call Trace: [ 0.477821] [<7967c6c6>] dump_stack+0x16/0x18 [ 0.478352] [<7905e09b>] warn_slowpath_common+0x6b/0x90 [ 0.478976] [<7907f33a>] ? __lock_acquire.isra.22+0x7aa/0xb00 [ 0.479658] [<7907f33a>] ? __lock_acquire.isra.22+0x7aa/0xb00 [ 0.480417] [<7905e10e>] warn_slowpath_fmt+0x2e/0x30 [ 0.480479] [<7907f33a>] __lock_acquire.isra.22+0x7aa/0xb00 [ 0.480479] [<79078aa5>] ? sched_clock_cpu+0xb5/0xf0 [ 0.480479] [<7907fd06>] lock_acquire+0x56/0x70 [ 0.480479] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.480479] [<79682d11>] mutex_lock_nested+0x61/0x2a0 [ 0.480479] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.480479] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.480479] [<7925b5e8>] driver_detach+0x58/0xa0 [ 0.480479] [<7925a3ff>] bus_remove_driver+0x8f/0xb0 [ 0.480479] [<7925bf80>] ? class_unregister+0x40/0x80 [ 0.480479] [<7925bad7>] driver_unregister+0x47/0x50 [ 0.480479] [<7925c033>] ? class_destroy+0x13/0x20 [ 0.480479] [<7925d07b>] platform_driver_unregister+0xb/0x10 [ 0.480479] [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9 [ 0.480479] [<79b30c58>] do_one_initcall+0x106/0x1a9 [ 0.480479] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.480479] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.480479] [<79071935>] ? parse_args+0x2f5/0x480 [ 0.480479] [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 0.480479] [<79b30dd9>] kernel_init_freeable+0xde/0x17d [ 0.480479] [<79b304d6>] ? do_early_param+0x7a/0x7a [ 0.480479] [<79677b1b>] kernel_init+0xb/0xe0 [ 0.480479] [<79075f42>] ? schedule_tail+0x12/0x40 [ 0.480479] [<79686580>] ret_from_kernel_thread+0x20/0x30 [ 0.480479] [<79677b10>] ? rest_init+0xc0/0xc0 [ 0.480479] ---[ end trace ad8ac403ff8aef5d ]--- [ 0.495478] BUG: unable to handle kernel paging request at 00200200 [ 0.496257] IP: [<79682de5>] mutex_lock_nested+0x135/0x2a0 [ 0.496923] *pde = 00000000 [ 0.497290] Oops: 0002 [#1] [ 0.497653] CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.17.0-00001-gdd46990-dirty #2 [ 0.498659] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.499321] task: 78028000 ti: 78024000 task.ti: 78024000 [ 0.499955] EIP: 0060:[<79682de5>] EFLAGS: 00010097 CPU: 0 [ 0.500620] EIP is at mutex_lock_nested+0x135/0x2a0 [ 0.501145] EAX: 00200200 EBX: 78397434 ECX: 78397460 EDX: 78025e70 [ 0.501816] ESI: 00000246 EDI: 78028000 EBP: 78025e8c ESP: 78025e54 [ 0.502497] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 0.503076] CR0: 8005003b CR2: 00200200 CR3: 01b9d000 CR4: 00000690 [ 0.503773] Stack: [ 0.503998] 00000000 00000001 00000000 7925b5e8 78397460 7925b5e8 78397474 78397460 [ 0.504944] 00200200 11111111 78025e70 78397000 79ac9d74 00000001 78025ea0 7925b5e8 [ 0.505451] 79ac9d74 fffffffe 00000001 78025ebc 7925a3ff 7a251398 78025ec8 7925bf80 [ 0.505451] Call Trace: [ 0.505451] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.505451] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.505451] [<7925b5e8>] driver_detach+0x58/0xa0 [ 0.505451] [<7925a3ff>] bus_remove_driver+0x8f/0xb0 [ 0.505451] [<7925bf80>] ? class_unregister+0x40/0x80 [ 0.505451] [<7925bad7>] driver_unregister+0x47/0x50 [ 0.505451] [<7925c033>] ? class_destroy+0x13/0x20 [ 0.505451] [<7925d07b>] platform_driver_unregister+0xb/0x10 [ 0.505451] [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9 [ 0.505451] [<79b30c58>] do_one_initcall+0x106/0x1a9 [ 0.505451] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.505451] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.505451] [<79071935>] ? parse_args+0x2f5/0x480 [ 0.505451] [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 0.505451] [<79b30dd9>] kernel_init_freeable+0xde/0x17d [ 0.505451] [<79b304d6>] ? do_early_param+0x7a/0x7a [ 0.505451] [<79677b1b>] kernel_init+0xb/0xe0 [ 0.505451] [<79075f42>] ? schedule_tail+0x12/0x40 [ 0.505451] [<79686580>] ret_from_kernel_thread+0x20/0x30 [ 0.505451] [<79677b10>] ? rest_init+0xc0/0xc0 [ 0.505451] Code: 89 d8 e8 cf 9b 9f ff 8b 4f 04 8d 55 e4 89 d8 e8 72 9d 9f ff 8d 43 2c 89 c1 89 45 d8 8b 43 30 8d 55 e4 89 53 30 89 4d e4 89 45 e8 <89> 10 8b 55 dc 8b 45 e0 89 7d ec e8 db af 9f ff eb 11 90 31 c0 [ 0.505451] EIP: [<79682de5>] mutex_lock_nested+0x135/0x2a0 SS:ESP 0068:78025e54 [ 0.505451] CR2: 0000000000200200 [ 0.505451] ---[ end trace ad8ac403ff8aef5e ]--- [ 0.505451] Kernel panic - not syncing: Fatal exception Fixes: 9ea9277 ("mac80211_hwsim: Register and bind to driver") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Junjie Mao <eternal.n08@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 4623884 upstream. When an interface is deleted, an ongoing hardware scan is canceled and the driver must abort the scan, at the very least reporting completion while the interface is removed. However, if it scheduled the work that might only run after everything is said and done, which leads to cfg80211 warning that the scan isn't reported as finished yet; this is no fault of the driver, it already did, but mac80211 hasn't processed it. To fix this situation, flush the delayed work when the interface being removed is the one that was executing the scan. Reported-by: Sujith Manoharan <sujith@msujith.org> Tested-by: Sujith Manoharan <sujith@msujith.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit ff1e417 upstream. Due to the time it takes to process the beacon that started the CSA process, we may be late for the switch if we try to reach exactly beacon 0. To avoid that, use count - 1 when calculating the switch time. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit b8fff40 upstream. Upon receiving the last fragment, all but the first fragment are freed, but the multicast check for statistics at the end of the function refers to the current skb (the last fragment) causing a use-after-free bug. Since multicast frames cannot be fragmented and we check for this early in the function, just modify that check to also do the accounting to fix the issue. Reported-by: Yosef Khyal <yosefx.khyal@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit dc4edad upstream. CE ram size is 32k/0k/0k for GFX/CS0/CS1 with CIK Ported from amdgpu driver. Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 8efe82c upstream. The power management code calls into the display code for certain things. If certain power management sysfs attributes are called before the driver has finished initializing all of the hardware we can run into problems with uninitialized modesetting state. Add a check to make sure modesetting init has completed to the bandwidth update callbacks to fix this. Can be triggered by the tlp and laptop start up scripts depending on the timing. bugs: https://bugzilla.kernel.org/show_bug.cgi?id=83611 https://bugs.freedesktop.org/show_bug.cgi?id=85771 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit f0d7bfb upstream. Need to unlock the crtc after updating the blanking state. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 08b964f upstream. The kuser helpers page is not set up on non-MMU systems, so it does not make sense to allow CONFIG_KUSER_HELPERS to be enabled when CONFIG_MMU=n. Allowing it to be set on !MMU results in an oops in set_tls (used in execve and the arm_syscall trap handler): Unhandled exception: IPSR = 00000005 LR = fffffff1 CPU: 0 PID: 1 Comm: swapper Not tainted 3.18.0-rc1-00041-ga30465a #216 task: 8b838000 ti: 8b82a000 task.ti: 8b82a000 PC is at flush_thread+0x32/0x40 LR is at flush_thread+0x21/0x40 pc : [<8f00157a>] lr : [<8f001569>] psr: 4100000b sp : 8b82be20 ip : 00000000 fp : 8b83c000 r10: 00000001 r9 : 88018c84 r8 : 8bb85000 r7 : 8b838000 r6 : 00000000 r5 : 8bb77400 r4 : 8b82a000 r3 : ffff0ff0 r2 : 8b82a000 r1 : 00000000 r0 : 88020354 xPSR: 4100000b CPU: 0 PID: 1 Comm: swapper Not tainted 3.18.0-rc1-00041-ga30465a #216 [<8f002bc1>] (unwind_backtrace) from [<8f002033>] (show_stack+0xb/0xc) [<8f002033>] (show_stack) from [<8f00265b>] (__invalid_entry+0x4b/0x4c) As best I can tell this issue existed for the set_tls ARM syscall before commit fbfb872 "ARM: 8148/1: flush TLS and thumbee register state during exec" consolidated the TLS manipulation code into the set_tls helper function, but now that we're using it to flush register state during execve, !MMU users encounter the oops at the first exec. Prevent CONFIG_MMU=n configurations from enabling CONFIG_KUSER_HELPERS. Fixes: fbfb872 (ARM: 8148/1: flush TLS and thumbee register state during exec) Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Reported-by: Stefan Agner <stefan@agner.ch> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 238962a upstream. To speed up decompression, the decompressor sets up a flat, cacheable mapping of memory. However, when there is insufficient space to hold the page tables for this mapping, we don't bother to enable the caches and subsequently skip all the cache maintenance hooks. Skipping the cache maintenance before jumping to the relocated code allows the processor to predict the branch and populate the I-cache with stale data before the relocation loop has completed (since a bootloader may have SCTLR.I set, which permits normal, cacheable instruction fetches regardless of SCTLR.M). This patch moves the cache maintenance check into the maintenance routines themselves, allowing the v6/v7 versions to invalidate the I-cache regardless of the MMU state. Reported-by: Marc Carino <marc.ceeeee@gmail.com> Tested-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit c822ed9 upstream. Avoids normal IO racing with discard. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 97fc154 upstream. ARM64 currently doesn't fix up faults on the single-byte (strb) case of __clear_user... which means that we can cause a nasty kernel panic as an ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero. i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.) This is a pretty obscure bug in the general case since we'll only __do_kernel_fault (since there's no extable entry for pc) if the mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll always fault. if (!down_read_trylock(&mm->mmap_sem)) { if (!user_mode(regs) && !search_exception_tables(regs->pc)) goto no_context; retry: down_read(&mm->mmap_sem); } else { /* * The above down_read_trylock() might have succeeded in * which * case, we'll have missed the might_sleep() from * down_read(). */ might_sleep(); if (!user_mode(regs) && !search_exception_tables(regs->pc)) goto no_context; } Fix that by adding an extable entry for the strb instruction, since it touches user memory, similar to the other stores in __clear_user. Signed-off-by: Kyle McMartin <kyle@redhat.com> Reported-by: Miloš Prchlík <mprchlik@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit eaca2d8 upstream. Found by the UC-KLEE tool: A user could supply less input to firewire-cdev ioctls than write- or write/read-type ioctl handlers expect. The handlers used data from uninitialized kernel stack then. This could partially leak back to the user if the kernel subsequently generated fw_cdev_event_'s (to be read from the firewire-cdev fd) which notably would contain the _u64 closure field which many of the ioctl argument structures contain. The fact that the handlers would act on random garbage input is a lesser issue since all handlers must check their input anyway. The fix simply always null-initializes the entire ioctl argument buffer regardless of the actual length of expected user input. That is, a runtime overhead of memset(..., 40) is added to each firewirew-cdev ioctl() call. [Comment from Clemens Ladisch: This part of the stack is most likely to be already in the cache.] Remarks: - There was never any leak from kernel stack to the ioctl output buffer itself. IOW, it was not possible to read kernel stack by a read-type or write/read-type ioctl alone; the leak could at most happen in combination with read()ing subsequent event data. - The actual expected minimum user input of each ioctl from include/uapi/linux/firewire-cdev.h is, in bytes: [0x00] = 32, [0x05] = 4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16, [0x01] = 36, [0x06] = 20, [0x0b] = 4, [0x10] = 20, [0x15] = 20, [0x02] = 20, [0x07] = 4, [0x0c] = 0, [0x11] = 0, [0x16] = 8, [0x03] = 4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12, [0x04] = 20, [0x09] = 24, [0x0e] = 4, [0x13] = 40, [0x18] = 4. Reported-by: David Ramos <daramos@stanford.edu> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 8c393f9 upstream. For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets. So we need to take care to free them when freeing dreq. Ideally this needs to be done inside layout driver where ds_cinfo.buckets are allocated. But buckets are attached to dreq and reused across LD IO iterations. So I feel it's OK to free them in the generic layer. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 4837927 upstream. Setups that use the blk-mq I/O path can lock up if a host with a single device that has its door locked enters EH. Make sure to only send the command to re-lock the door to devices that actually were reset and thus might have lost their state. Otherwise the EH code might be get blocked on blk_get_request as all requests for non-reset devices might be in use. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Meelis Roos <meelis.roos@ut.ee> Tested-by: Meelis Roos <meelis.roos@ut.ee> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 7ff4d90 upstream. Today there are 3 instances of setgroups and due to an oversight their permission checking has diverged. Add a common function so that they may all share the same permission checking code. This corrects the current oversight in the current permission checks and adds a helper to avoid this in the future. A user namespace security fix will update this new helper, shortly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

…ppings. commit 0542f17 upstream. The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 273d2c6 upstream. setgroups is unique in not needing a valid mapping before it can be called, in the case of setgroups(0, NULL) which drops all supplemental groups. The design of the user namespace assumes that CAP_SETGID can not actually be used until a gid mapping is established. Therefore add a helper function to see if the user namespace gid mapping has been established and call that function in the setgroups permission check. This is part of the fix for CVE-2014-8989, being able to drop groups without privilege using user namespaces. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit be7c6db upstream. As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

…ping commit 80dd00a upstream. setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit f95d791 upstream. If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace makes it easier to verify all credentials obtained with the uid mapping can be obtained without the uid mapping without privilege. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit f0d62ae upstream. Generalize id_map_mutex so it can be used for more state of a user namespace. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 9cc4651 upstream. - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

…sabled commit 66d2f33 upstream. Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit db86da7 upstream. A security fix in caused the way the unprivileged remount tests were using user namespaces to break. Tweak the way user namespaces are being used so the test works again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 041d7b9 upstream. A regression was caused by commit 780a765: audit: Make testing for a valid loginuid explicit. (which in turn attempted to fix a regression caused by e1760bd) When audit_krule_to_data() fills in the rules to get a listing, there was a missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID. This broke userspace by not returning the same information that was sent and expected. The rule: auditctl -a exit,never -F auid=-1 gives: auditctl -l LIST_RULES: exit,never f24=0 syscall=all when it should give: LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all Tag it so that it is reported the same way it was set. Create a new private flags audit_krule field (pflags) to store it that won't interact with the public one from the API. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 7e77bde upstream. If a request is backlogged, it's complete() handler will get called twice: once with -EINPROGRESS, and once with the final error code. af_alg's complete handler, unlike other users, does not handle the -EINPROGRESS but instead always completes the completion that recvmsg() is waiting on. This can lead to a return to user space while the request is still pending in the driver. If userspace closes the sockets before the requests are handled by the driver, this will lead to use-after-frees (and potential crashes) in the kernel due to the tfm having been freed. The crashes can be easily reproduced (for example) by reducing the max queue length in cryptod.c and running the following (from http://www.chronox.de/libkcapi.html) on AES-NI capable hardware: $ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \ -k 00000000000000000000000000000000 \ -p 00000000000000000000000000000000 >/dev/null & done Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit a682e9c upstream. If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error return value is then (in most cases) just overwritten before we return. This can result in reporting success to userspace although error happened. This bug was introduced by commit 2e54eb9 ("BKL: Remove BKL from ncpfs"). Propagate the errors correctly. Coverity id: 1226925. Fixes: 2e54eb9 ("BKL: Remove BKL from ncpfs") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 24c037e upstream. alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit a1d47b2 upstream. UDF specification allows arbitrarily large symlinks. However we support only symlinks at most one block large. Check the length of the symlink so that we don't access memory beyond end of the symlink block. Reported-by: Carl Henrik Lunde <chlunde@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 332b122 upstream. The ecryptfs_encrypted_view mount option greatly changes the functionality of an eCryptfs mount. Instead of encrypting and decrypting lower files, it provides a unified view of the encrypted files in the lower filesystem. The presence of the ecryptfs_encrypted_view mount option is intended to force a read-only mount and modifying files is not supported when the feature is in use. See the following commit for more information: e77a56d [PATCH] eCryptfs: Encrypted passthrough This patch forces the mount to be read-only when the ecryptfs_encrypted_view mount option is specified by setting the MS_RDONLY flag on the superblock. Additionally, this patch removes some broken logic in ecryptfs_open() that attempted to prevent modifications of files when the encrypted view feature was in use. The check in ecryptfs_open() was not sufficient to prevent file modifications using system calls that do not operate on a file descriptor. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Priya Bansal <p.bansal@samsung.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 9420806 upstream. Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the end of the allocated buffer during encrypted filename decoding. This fix corrects the issue by getting rid of the unnecessary 0 write when the current bit offset is 2. Signed-off-by: Michael Halcrow <mhalcrow@google.com> Reported-by: Dmitry Chernenkov <dmitryc@google.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit a280469 upstream. We use the modified list to keep track of which extents have been modified so we know which ones are candidates for logging at fsync() time. Newly modified extents are added to the list at modification time, around the same time the ordered extent is created. We do this so that we don't have to wait for ordered extents to complete before we know what we need to log. The problem is when something like this happens log extent 0-4k on inode 1 copy csum for 0-4k from ordered extent into log sync log commit transaction log some other extent on inode 1 ordered extent for 0-4k completes and adds itself onto modified list again log changed extents see ordered extent for 0-4k has already been logged at this point we assume the csum has been copied sync log crash On replay we will see the extent 0-4k in the log, drop the original 0-4k extent which is the same one that we are replaying which also drops the csum, and then we won't find the csum in the log for that bytenr. This of course causes us to have errors about not having csums for certain ranges of our inode. So remove the modified list manipulation in unpin_extent_cache, any modified extents should have been added well before now, and we don't want them re-logged. This fixes my test that I could reliably reproduce this problem with. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 678886b upstream. When we abort a transaction we iterate over all the ranges marked as dirty in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them from those trees, add them back (unpin) to the free space caches and, if the fs was mounted with "-o discard", perform a discard on those regions. Also, after adding the regions to the free space caches, a fitrim ioctl call can see those ranges in a block group's free space cache and perform a discard on the ranges, so the same issue can happen without "-o discard" as well. This causes corruption, affecting one or multiple btree nodes (in the worst case leaving the fs unmountable) because some of those ranges (the ones in the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are referred by the last committed super block - breaking the rule that anything that was committed by a transaction is untouched until the next transaction commits successfully. I ran into this while running in a loop (for several hours) the fstest that I recently submitted: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim The corruption always happened when a transaction aborted and then fsck complained like this: _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 read block failed check_tree_block Couldn't open file system In this case 94945280 corresponded to the root of a tree. Using frace what I observed was the following sequence of steps happened: 1) transaction N started, fs_info->pinned_extents pointed to fs_info->freed_extents[0]; 2) node/eb 94945280 is created; 3) eb is persisted to disk; 4) transaction N commit starts, fs_info->pinned_extents now points to fs_info->freed_extents[1], and transaction N completes; 5) transaction N + 1 starts; 6) eb is COWed, and btrfs_free_tree_block() called for this eb; 7) eb range (94945280 to 94945280 + 16Kb) is added to fs_info->pinned_extents (fs_info->freed_extents[1]); 8) Something goes wrong in transaction N + 1, like hitting ENOSPC for example, and the transaction is aborted, turning the fs into readonly mode. The stack trace I got for example: [112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66 [112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98 [112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50 [112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs] [112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs] [112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs] [112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs] (...) [112065.264493] ---[ end trace dd7903a975a31a08 ]--- [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left [112065.264997] BTRFS info (device sdc): forced readonly 9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in fs_info->fs_state and calls btrfs_cleanup_transaction(), which in turn calls btrfs_destroy_pinned_extent(); 10) Then btrfs_destroy_pinned_extent() iterates over all the ranges marked as dirty in fs_info->freed_extents[], and for each one it calls discard, if the fs was mounted with "-o discard", and adds the range to the free space cache of the respective block group; 11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path, sees the free space entries and performs a discard; 12) After an umount and mount (or fsck), our eb's location on disk was full of zeroes, and it should have been untouched, because it was marked as dirty in the fs_info->pinned_extents tree, and therefore used by the trees that the last committed superblock points to. Fix this by not performing a discard and not adding the ranges to the free space caches - it's useless from this point since the fs is now in readonly mode and we won't write free space caches to disk anymore (otherwise we would leak space) nor any new superblock. By not adding the ranges to the free space caches, it prevents other code paths from allocating that space and write to it as well, therefore being safer and simpler. This isn't a new problem, as it's been present since 2011 (git commit acce952). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit 871c3cf upstream. The least significat byte of the GPIO value read register on the STMPE24xx series is on addres 0xA4 not 0xA5. Correct against datasheet and tested on the STMPE2401 hardware. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit b668422 upstream. Allow more than one viperboard to be connected by registering with PLATFORM_DEVID_AUTO instead of PLATFORM_DEVID_NONE. The subdevices are currently registered with PLATFORM_DEVID_NONE, which will cause a name collision on the platform bus when a second viperboard is plugged in: viperboard 1-2.4:1.0: version 0.00 found at bus 001 address 004 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 181 at /home/johan/work/omicron/src/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x74/0x84() sysfs: cannot create duplicate filename '/bus/platform/devices/viperboard-gpio' Modules linked in: i2c_viperboard viperboard netconsole [last unloaded: viperboard] CPU: 0 PID: 181 Comm: bash Tainted: G W 3.17.0-rc6 #1 [<c0016bf4>] (unwind_backtrace) from [<c0013860>] (show_stack+0x20/0x24) [<c0013860>] (show_stack) from [<c04305f8>] (dump_stack+0x24/0x28) [<c04305f8>] (dump_stack) from [<c0040fb4>] (warn_slowpath_common+0x80/0x98) [<c0040fb4>] (warn_slowpath_common) from [<c004100c>] (warn_slowpath_fmt+0x40/0x48) [<c004100c>] (warn_slowpath_fmt) from [<c016f1bc>] (sysfs_warn_dup+0x74/0x84) [<c016f1bc>] (sysfs_warn_dup) from [<c016f548>] (sysfs_do_create_link_sd.isra.2+0xcc/0xd0) [<c016f548>] (sysfs_do_create_link_sd.isra.2) from [<c016f588>] (sysfs_create_link+0x3c/0x48) [<c016f588>] (sysfs_create_link) from [<c02867ec>] (bus_add_device+0x12c/0x1e0) [<c02867ec>] (bus_add_device) from [<c0284820>] (device_add+0x410/0x584) [<c0284820>] (device_add) from [<c0289440>] (platform_device_add+0xd8/0x26c) [<c0289440>] (platform_device_add) from [<c02a5ae4>] (mfd_add_device+0x240/0x344) [<c02a5ae4>] (mfd_add_device) from [<c02a5ce0>] (mfd_add_devices+0xb8/0x110) [<c02a5ce0>] (mfd_add_devices) from [<bf00d1c8>] (vprbrd_probe+0x160/0x1b0 [viperboard]) [<bf00d1c8>] (vprbrd_probe [viperboard]) from [<c030c000>] (usb_probe_interface+0x1bc/0x2a8) [<c030c000>] (usb_probe_interface) from [<c028768c>] (driver_probe_device+0x14c/0x3ac) [<c028768c>] (driver_probe_device) from [<c02879e4>] (__driver_attach+0xa4/0xa8) [<c02879e4>] (__driver_attach) from [<c0285698>] (bus_for_each_dev+0x70/0xa4) [<c0285698>] (bus_for_each_dev) from [<c0287030>] (driver_attach+0x2c/0x30) [<c0287030>] (driver_attach) from [<c030a288>] (usb_store_new_id+0x170/0x1ac) [<c030a288>] (usb_store_new_id) from [<c030a2f8>] (new_id_store+0x34/0x3c) [<c030a2f8>] (new_id_store) from [<c02853ec>] (drv_attr_store+0x30/0x3c) [<c02853ec>] (drv_attr_store) from [<c016eaa8>] (sysfs_kf_write+0x5c/0x60) [<c016eaa8>] (sysfs_kf_write) from [<c016dc68>] (kernfs_fop_write+0xd4/0x194) [<c016dc68>] (kernfs_fop_write) from [<c010fe40>] (vfs_write+0xb4/0x1c0) [<c010fe40>] (vfs_write) from [<c01104a8>] (SyS_write+0x4c/0xa0) [<c01104a8>] (SyS_write) from [<c000f900>] (ret_fast_syscall+0x0/0x48) ---[ end trace 98e8603c22d65817 ]--- viperboard 1-2.4:1.0: Failed to add mfd devices to core. viperboard: probe of 1-2.4:1.0 failed with error -17 Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit f72e7dc upstream. Trinity has reported: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G W 3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398 lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) remove_migration_pte (mm/migrate.c:137) rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699) remove_migration_ptes (mm/migrate.c:224) migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126) migrate_misplaced_page (mm/migrate.c:1733) __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925) handle_mm_fault (mm/memory.c:3948) __get_user_pages (mm/memory.c:1851) __mlock_vma_pages_range (mm/mlock.c:255) __mm_populate (mm/mlock.c:711) SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791) I believe this comes about because, whereas collapsing and splitting THP functions take anon_vma lock in write mode (which excludes concurrent rmap walks), faulting THP functions (write protection and misplaced NUMA) do not - and mostly they do not need to. But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for an instant (indeed, for a long instant, given the inter-CPU TLB flush in there), leaves *pmd neither present not trans_huge. Which can confuse a concurrent rmap walk, as when removing migration ptes, seen in the dumped trace. Although that rmap walk has a 4k page to insert, anon_vmas containing THPs are in no way segregated from 4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with that instant when a trans_huge pmd is temporarily absent. I don't think we need strengthen the locking at the THP end: it's easily handled with an ACCESS_ONCE() before testing both conditions. And since mm_find_pmd() had only one caller who wanted a THP rather than a pmd, let's slightly repurpose it to fail when it hits a THP or non-present pmd, and open code split_huge_page_address() again. Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit c9b0888 upstream. The current code simply assumes Intel Arch PerfMon v2+ to have the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead. This was found by KVM which implements v2+ but didn't provide the capabilities MSR. Change the code to DTRT; KVM will also implement the MSR and return 0. Cc: pbonzini@redhat.com Reported-by: "Michael S. Tsirkin" <mst@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

commit b5a8cad upstream. [stable 3.12 note] This commit was supposed to fix a completely other issue. But in 3.12, with commit f72e7dc (mm: let mm_find_pmd fix buggy race with THP fault), we need this commit as well (it fixes the issue as a by-product). Hugh Dickins writes: <== citation starts here> Fine for this to go in, but there is one catch, which I discovered when backporting to v3.11: it needed one more hunk. I haven't checked your base tree, but if this applies then I believe you need it - most of the time no problem, but it can case page migration to fail to find a migration entry it inserted earlier, then BUG_ON(!PageLocked(p)) in migration_entry_to_page() soon after. Here's what I wrote back then: Note on rebase to v3.11: added a hunk to replace the use of mm_find_pmd() in page_check_address_pmd(). This call had been similarly replaced by the time of my v3.16 commit, in Kirill Shutemov's v3.15 b5a8cad ("thp: close race between split and zap huge pages"): which we do not need as such, since it's fixing v3.13 117b079 ("mm, thp: move ptl taking inside page_check_address_pmd()"), from a split page-table-lock series we are not backporting. But without this additional hunk, rmap sometimes broke when the new semantic for mm_find_pmd() was used here. <== end of citation> But instead of appending hunks to commits, I am taking a full, backported version of commit b5a8cad with this note prepended. So the changelog of b5a8cad is left below, but does not apply to 3.12 yet. [=== stable 3.12 note ends here] Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have the same root cause. Let's look to them one by one. The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). From my testing I see that page_mapcount() is higher than mapcount here. I think it happens due to race between zap_huge_pmd() and page_check_address_pmd(). page_check_address_pmd() misses PMD which is under zap: CPU0 CPU1 zap_huge_pmd() pmdp_get_and_clear() __split_huge_page() anon_vma_interval_tree_foreach() __split_huge_page_splitting() page_check_address_pmd() mm_find_pmd() /* * We check if PMD present without taking ptl: no * serialization against zap_huge_pmd(). We miss this PMD, * it's not accounted to 'mapcount' in __split_huge_page(). */ pmd_present(pmd) == 0 BUG_ON(mapcount != page_mapcount(page)) // CRASH!!! page_remove_rmap(page) atomic_add_negative(-1, &page->_mapcount) The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!". It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd(). This happens in similar way: CPU0 CPU1 zap_huge_pmd() pmdp_get_and_clear() page_remove_rmap(page) atomic_add_negative(-1, &page->_mapcount) __split_huge_page() anon_vma_interval_tree_foreach() __split_huge_page_splitting() page_check_address_pmd() mm_find_pmd() pmd_present(pmd) == 0 /* The same comment as above */ /* * No crash this time since we already decremented page->_mapcount in * zap_huge_pmd(). */ BUG_ON(mapcount != page_mapcount(page)) /* * We split the compound page here into small pages without * serialization against zap_huge_pmd() */ __split_huge_page_refcount() VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!! So my understanding the problem is pmd_present() check in mm_find_pmd() without taking page table lock. The bug was introduced by me commit with commit 117b079. Sorry for that. :( Let's open code mm_find_pmd() in page_check_address_pmd() and do the check under page table lock. Note that __page_check_address() does the same for PTE entires if sync != 0. I've stress tested split and zap code paths for 36+ hours by now and don't see crashes with the patch applied. Before it took <20 min to trigger the first bug and few hours for second one (if we ignore first). [1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com> [2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Bob Liu <lliubbo@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michel Lespinasse <walken@google.com> Cc: Dave Jones <davej@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

2 main changes: - check for timeouts in the bcm2708_bsc_setup function as indicated by this comment: /* poll for transfer start bit (should only take 1-20 polls) */ This implies that the setup function can now fail so account for this everywhere it's called - Removed the clk_get_rate call from inside the setup function as it locks a mutex and that's not ok since we call it from under a spin lock. removed dead code and update comment fixed typo in comment

Fix grabbing lock from atomic context in i2c driver

…o rpi-3.12.y

popcornmix force-pushed the rpi-3.12.y branch 2 times, most recently from 32fb6ab to fd7d7a3 Compare October 17, 2014 16:37

mpe and others added 28 commits November 19, 2014 17:02

drm/radeon: add missing crtc unlock when setting up the MC

984e580

commit f0d7bfb upstream. Need to unlock the crtc after updating the blanking state. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

dm thin: grab a virtual cell before looking up the mapping

2c49ae5

commit c822ed9 upstream. Avoids normal IO racing with discard. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

ebiederm and others added 30 commits January 7, 2015 17:55

Linux 3.12.36

f610195

Merge remote-tracking branch 'stable/linux-3.12.y' into rpi-3.12.y

9301f9a

config: Add CONFIG_DMA_CMA to quick config to avoid USB coherency issues

90fa5df

Merge pull request #780 from jeanleflambeur/patch-2

ee9b8c7

Fix grabbing lock from atomic context in i2c driver

Merge branch 'rpi-3.12.y' of https://github.com/raspberrypi/linux int…

102fb4e

…o rpi-3.12.y

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync fork #1

Sync fork #1

Barracuda09 commented Jul 17, 2014

Sync fork #1

Are you sure you want to change the base?

Sync fork #1

Conversation

Barracuda09 commented Jul 17, 2014