android_kernel_xiaomi_sdm845

Author	SHA1	Message	Date
Sage Weil	3d7ded4d81	ceph: release cap on import if we don't have the inode If we get an IMPORT that give us a cap, but we don't have the inode, queue a release (and try to send it immediately) so that the MDS doesn't get stuck waiting for us. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:30:07 -07:00
Sage Weil	9dbd412f56	ceph: fix misleading/incorrect debug message Nothing is released here: the caps message is simply ignored in this case. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:29:59 -07:00
Jeff Mahoney	00d5643e7c	ceph: fix atomic64_t initialization on ia64 bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:29:50 -07:00
Catalin Marinas	1082345290	sata_sil24: Use memory barriers before issuing commands The data in the cmd_block buffers may reach the main memory after the writel() to the device ports. This patch introduces two calls to wmb() to ensure the relative ordering. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Colin Tuckley <colin.tuckley@arm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-06-10 16:06:48 -04:00
Dan Carpenter	14e45c15e1	sata_sil24: memset() overflow cb->atapi.cdb is an array of 16 u8 elements. The call too memset() would set the first part of the sge array to zero as well. It's not a packed struct. This one has been around for five years. I found it with Smatch. I think the reason no one has seen it before is because we normally call sil24_fill_sg() and that overwrites sge with proper information? Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-06-10 16:06:33 -04:00
Linus Torvalds	7908a9e5fc	Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: read apic->irr with ioapic lock held KVM: ia64: Add missing spin_unlock in kvm_arch_hardware_enable() KVM: Fix order passed to iommu_unmap KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page KVM: MMU: invalidate and flush on spte small->large page size change KVM: SVM: Implement workaround for Erratum 383 KVM: SVM: Handle MCEs early in the vmexit process KVM: powerpc: fix init/exit annotation	2010-06-10 10:53:14 -07:00
Marcelo Tosatti	07dc7263b9	KVM: read apic->irr with ioapic lock held Read ioapic->irr inside ioapic->lock protected section. KVM-Stable-Tag Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-06-10 20:29:03 +03:00
Linus Torvalds	8fade6aff7	Merge branch 'for-linus2' of git://git.kernel.dk/linux-2.6-block * 'for-linus2' of git://git.kernel.dk/linux-2.6-block: pipe: fix check in "set size" fcntl pipe: fix pipe buffer resizing block: remove duplicate BUG_ON() in bd_finish_claiming() block: bd_start_claiming cleanup block: bd_start_claiming fix module refcount	2010-06-10 10:26:42 -07:00
Miklos Szeredi	6db40cf047	pipe: fix check in "set size" fcntl As it stands this check compares the number of pages to the page size. This makes no sense and makes the fcntl fail in almost any sane case. Fix it by checking if nr_pages is not zero (it can become zero only if arg is too big and round_pipe_size() overflows). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Miklos Szeredi	1d862f4122	pipe: fix pipe buffer resizing pipe_set_size() needs to copy pipe bufs from the old circular buffer to the new. The current code gets this wrong in multiple ways, resulting in oops. Test program is available here: http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/ Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Jens Axboe	3e6c05052c	block: remove duplicate BUG_ON() in bd_finish_claiming() We do the same BUG_ON() just a line later when calling into __bd_abort_claiming(). Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Nick Piggin	b0018361c3	block: bd_start_claiming cleanup I don't like the subtle multi-context code in bd_claim (ie. detects where it has been called based on bd_claiming). It seems clearer to just require a new function to finish a 2-part claim. Also improve commentary in bd_start_claiming as to how it should be used. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Nick Piggin	cf3425707e	block: bd_start_claiming fix module refcount bd_start_claiming has an unbalanced module_put introduced in `6b4517a79`. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Linus Torvalds	e1f38e2cea	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: sound/spi: patch for the unuseful variable removal ALSA: hda - Add SSID table for iMac7,1. ALSA: hda - Add SSID table for MacBookAir1,1 ALSA: hda - Add SSID table for MacBookAir2,1 ALSA: atmel: set "channel A event" output to debug	2010-06-10 09:34:15 -07:00
Linus Torvalds	85ca7886f5	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix null pointer deref with SEND_SIG_FORCED perf: Fix signed comparison in perf_adjust_period() powerpc/oprofile: fix potential buffer overrun in op_model_cell.c perf symbols: Set the DSO long name when using symbol_conf.vmlinux_name	2010-06-10 09:30:09 -07:00
Linus Torvalds	7c8d20d40f	Merge master.kernel.org:/home/rmk/linux-2.6-arm * master.kernel.org:/home/rmk/linux-2.6-arm: ARM: 6164/1: Add kto and kfrom to input operands list. ARM: 6166/1: Proper prefetch abort handling on pre-ARMv6 ARM: 6165/1: trap overflows on highmem pages from kmap_atomic when debugging ARM: 6152/1: ux500 make it possible to disable localtimers [ARM] pxa/spitz: Correctly register WM8750 [ARM] pxa/palmtc: storage class should be before const qualifier ARM: 6146/1: sa1111: Prevent deadlock in resume path ARM: 6145/1: ux500 MTU clockrate correction ARM: 6144/1: TCM memory bug freeing bug ARM: VFP: Fix vfp_put_double() for d16-d31	2010-06-10 07:35:41 -07:00
Michal Marek	607b30fcf2	kbuild: Create output directory in Makefile.modbuiltin Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michal Marek <mmarek@suse.cz>	2010-06-10 12:23:08 +02:00
Takashi Iwai	2d0a1dbf57	Merge branch 'fix/misc' into for-linus	2010-06-10 11:08:53 +02:00
Eric Dumazet	00d9d6a185	ipv6: fix ICMP6_MIB_OUTERRORS In commit `1f8438a853` (icmp: Account for ICMP out errors), I did a typo on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-09 18:39:27 -07:00
Timo Teräs	81a95f0499	r8169: fix mdio_read and update mdio_write according to hw specs Realtek confirmed that a 20us delay is needed after mdio_read and mdio_write operations. Reduce the delay in mdio_write, and add it to mdio_read too. Also add a comment that the 20us is from hw specs. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-09 17:31:48 -07:00
David S. Miller	ebedb22d2b	Merge branch 'num_rx_queues' of git://kernel.ubuntu.com/rtg/net-2.6	2010-06-09 16:28:25 -07:00
Anton Vorontsov	619baba195	gianfar: Revive the driver for eTSEC devices (disable timestamping) Since commit `cc772ab7cd` ("gianfar: Add hardware RX timestamping support"), the driver no longer works on at least MPC8313ERDB and MPC8568EMDS boards (and possibly much more boards as well). That's how MPC8313 Reference Manual describes RCTRL_TS_ENABLE bit: Timestamp incoming packets as padding bytes. PAL field is set to 8 if the PAL field is programmed to less than 8. Must be set to zero if TMR_CTRL[TE]=0. I see that the commit above sets this bit, but it doesn't handle TMR_CTRL. Manfred probably had this bit set by the firmware for his boards. But obviously this isn't true for all boards in the wild. Also, I recall that Freescale BSPs were explicitly disabling the timestamping because of a performance drop. For now, the best way to deal with this is just disable the timestamping, and later we can discuss proper device tree bindings and implement enabling this feature via some property. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-09 16:27:08 -07:00
Dan Carpenter	aea34e7ae7	caif: fix a couple range checks The extra ! character means that these conditions are always false. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-09 16:18:53 -07:00
Richard Cochran	e13647c158	phylib: Add support for the LXT973 phy. This patch implements a work around for Erratum 5, "3.3 V Fiber Speed Selection." If the hardware wiring does not respect this erratum, then fiber optic mode will not work properly. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-09 16:17:02 -07:00
Michal Marek	2da30e703c	kbuild: Generate modules.builtin in make modules Generating the file in make modules_install was broken as well, because it didn't work in a readonly filesystem and otherwise it generated a root-owned file which is not wanted. Reported-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Michal Marek <mmarek@suse.cz>	2010-06-09 22:40:05 +02:00
Tim Gardner	08c801f8d4	net: Print num_rx_queues imbalance warning only when there are allocated queues BugLink: http://bugs.launchpad.net/bugs/591416 There are a number of network drivers (bridge, bonding, etc) that are not yet receive multi-queue enabled and use alloc_netdev(), so don't print a num_rx_queues imbalance warning in that case. Also, only print the warning once for those drivers that _are_ multi-queue enabled. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2010-06-09 13:46:03 -06:00
Linus Torvalds	63a07cb64c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) mac80211: fix deauth before assoc iwlwifi: add missing rcu_read_lock mac80211: fix function pointer check wireless: remove my name from the maintainer list ath5k: fix NULL pointer in antenna configuration p54usb: Add device ID for Dell WLA3310 USB wl1251: fix a memory leak in probe ipmr: dont corrupt lists 8139too: fix buffer overrun in rtl8139_init_board asix: check packet size against mtu+ETH_HLEN instead of ETH_FRAME_LEN r8169: fix random mdio_write failures ip6mr: fix a typo in ip6mr_for_each_table() iwlwifi: move sysfs_create_group to post request firmware iwlwifi: add name to Maintainers list iwl3945: fix internal scan iwl3945: enable stuck queue detection on 3945 ipv6: avoid high order allocations ath5k: retain promiscuous setting ath5k: depend on CONFIG_PM_SLEEP for suspend/resume functions mac80211: process station blockack action frames from work ...	2010-06-09 12:44:19 -07:00
Linus Torvalds	b95a568093	Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux * 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: nfsd4: shut down callback queue outside state lock nfsd: nfsd_setattr needs to call commit_metadata	2010-06-09 12:43:04 -07:00
David Howells	a7f5378e24	FRV: Reinstate null behaviour for the GDB remote protocol 'p' command Reinstate the null behaviour that the in-kernel gdbstub had for the GDB remote protocol 'p' command (retrieve a single register value) prior to commit `7ca8b9c0da` ("frv: extend gdbstub to support more features of gdb"). Before that, the 'p' command just returned an empty reply, which causes gdb to then go and use the 'g' command. However, since that commit, the 'p' command returns an error string, which causes gdb to abort its connection to the target. Not all gdb versions are affected, some use try 'g' first, and if that works, don't bother with 'p', and so don't see the error. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-09 12:42:44 -07:00
David S. Miller	327723edeb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-06-09 11:13:23 -07:00
Linus Torvalds	9aad9c0d93	Merge branch 'msm-urgent' of git://codeaurora.org/quic/kernel/dwalker/linux-msm * 'msm-urgent' of git://codeaurora.org/quic/kernel/dwalker/linux-msm: mmc: msm: fix compile error on MSM7x30 msm: dma: add completion.h header	2010-06-09 09:45:46 -07:00
Linus Torvalds	e411f2dda4	Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze * 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Fix sg_dma_len() regression microblaze: Define ARCH_SLAB_MINALIGN to fix slab crash	2010-06-09 08:52:03 -07:00
Daniel Walker	f3d56144c8	mmc: msm: fix compile error on MSM7x30 MSM7x30 isn't supported in this driver yet. If ones tried to compile it in with MSM7x30 configure you get, linux-2.6/drivers/mmc/host/msm_sdcc.c: In function 'msmsdcc_fifo_addr': linux-2.6/drivers/mmc/host/msm_sdcc.c:165: error: 'MSM_SDC1_PHYS' undeclared (first use in this function) linux-2.6/drivers/mmc/host/msm_sdcc.c:165: error: (Each undeclared identifier is reported only once linux-2.6/drivers/mmc/host/msm_sdcc.c:165: error: for each function it appears in.) linux-2.6/drivers/mmc/host/msm_sdcc.c:167: error: 'MSM_SDC2_PHYS' undeclared (first use in this function) linux-2.6/drivers/mmc/host/msm_sdcc.c:169: error: 'MSM_SDC3_PHYS' undeclared (first use in this function) linux-2.6/drivers/mmc/host/msm_sdcc.c:171: error: 'MSM_SDC4_PHYS' undeclared (first use in this function) So we add a Kconfig check to prevent this. Signed-off-by: Daniel Walker <dwalker@codeaurora.org>	2010-06-09 08:51:31 -07:00
Alan Cox	79907d89c3	misc: Fix allocation 'borrowed' by vhost_net 10, 233 is allocated officially to /dev/kmview which is shipping in Ubuntu and Debian distributions. vhost_net seem to have borrowed it without making a proper request and this causes regressions in the other distributions. vhost_net can use a dynamic minor so use that instead. Also update the file with a comment to try and avoid future misunderstandings. cc: stable@kernel.org Signed-off-by: Alan Cox <device@lanana.org> [ We should have caught this before 2.6.34 got released. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-09 08:50:31 -07:00
Julia Lawall	3499f4d0d1	KVM: ia64: Add missing spin_unlock in kvm_arch_hardware_enable() Add a spin_unlock missing on the error path. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:48:40 +03:00
Jan Kiszka	05b782ab95	KVM: Fix order passed to iommu_unmap This is obviously a left-over from the the old interface taking the size. Apparently a mostly harmless issue with the current iommu_unmap implementation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:48:38 +03:00
Avi Kivity	69325a1225	KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0. We do that by setting spte.w=1, since the host cr0.wp must remain set so the host can write protect pages. Once we allow write access, we must remove user access otherwise we mistakenly allow the user to write the page. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:48:37 +03:00
Marcelo Tosatti	3be2264be3	KVM: MMU: invalidate and flush on spte small->large page size change Always invalidate spte and flush TLBs when changing page size, to make sure different sized translations for the same address are never cached in a CPU's TLB. Currently the only case where this occurs is when a non-leaf spte pointer is overwritten by a leaf, large spte entry. This can happen after dirty logging is disabled on a memslot, for example. Noticed by Andrea. KVM-Stable-Tag Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:48:36 +03:00
Joerg Roedel	67ec660777	KVM: SVM: Implement workaround for Erratum 383 This patch implements a workaround for AMD erratum 383 into KVM. Without this erratum fix it is possible for a guest to kill the host machine. This patch implements the suggested workaround for hypervisors which will be published by the next revision guide update. [jan: fix overflow warning on i386] [xiao: fix unused variable warning] Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:47:20 +03:00
Joerg Roedel	fe5913e4e1	KVM: SVM: Handle MCEs early in the vmexit process This patch moves handling of the MC vmexits to an earlier point in the vmexit. The handle_exit function is too late because the vcpu might alreadry have changed its physical cpu. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:39:10 +03:00
Jean Delvare	a06cdb5676	KVM: powerpc: fix init/exit annotation kvmppc_e500_exit() is a module_exit function, so it should be tagged with __exit, not __init. The incorrect annotation was added by commit `2986b8c72c`. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: stable@kernel.org Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-06-09 18:39:09 +03:00
Peter Zijlstra	89275d59b5	powerpc: Exclude arch_sd_sibiling_asym_packing() on UP Only SMP systems care about load-balance features, plus this saves some .text space on UP and also fixes the build. Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Neuling <mikey@neuling.org> LKML-Reference: <tip-76cbd8a8f8b0dddbff89a6708bd5bd13c0d21a00@git.kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 16:31:39 +02:00
FUJITA Tomonori	fcdcddbcbb	microblaze: Fix sg_dma_len() regression The commit "asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()" `18e98307de` broke microblaze compilation. dma_direct_map_sg() sets sg->dma_length, however microblaze doesn't set NEED_SG_DMA_LENGTH so scatterlist strcutres doesn't include dma_length. sg->dma_length is always equal to sg->length on microblaze. So we don't need to set set dma_length, that is, microblaze can simply use sg->length. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Michal Simek <monstr@monstr.eu>	2010-06-09 16:20:54 +02:00
Michal Simek	ffe57d02b2	microblaze: Define ARCH_SLAB_MINALIGN to fix slab crash The commit "mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slab_def.h>" `1f0ce8b3dd` which moved the ARCH_SLAB_MINALIGN default into the global header broke FLAT for Microblaze. Error message: slab error in verify_redzone_free(): cache `idr_layer_cache': memory outside object was overwritten Signed-off-by: Michal Simek <monstr@monstr.eu>	2010-06-09 16:20:43 +02:00
Michael Neuling	76cbd8a8f8	powerpc: Enable asymmetric SMT scheduling on POWER7 The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 & 1 SMT4 uses threads 0, 1, 2 & 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 & 3 idle and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 11:13:14 +02:00
Michael Neuling	532cb4c401	sched: Add asymmetric group packing option for sibling domain Check to see if the group is packed in a sched doman. This is primarily intended to used at the sibling level. Some cores like POWER7 prefer to use lower numbered SMT threads. In the case of POWER7, it can move to lower SMT modes only when higher threads are idle. When in lower SMT modes, the threads will perform better since they share less core resources. Hence when we have idle threads, we want them to be the higher ones. This adds a hook into f_b_g() called check_asym_packing() to check the packing. This packing function is run on idle threads. It checks to see if the busiest CPU in this domain (core in the P7 case) has a higher CPU number than what where the packing function is being run on. If it is, calculate the imbalance and return the higher busier thread as the busiest group to f_b_g(). Here we are assuming a lower CPU number will be equivalent to a lower SMT thread number. It also creates a new SD_ASYM_PACKING flag to enable this feature at any scheduler domain level. It also creates an arch hook to enable this feature at the sibling level. The default function doesn't enable this feature. Based heavily on patch from Peter Zijlstra. Fixes from Srivatsa Vaddagiri. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20100608045702.2936CCC897@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 10:34:55 +02:00
Srivatsa Vaddagiri	9d5efe05eb	sched: Fix capacity calculations for SMT4 Handle cpu capacity being reported as 0 on cores with more number of hardware threads. For example on a Power7 core with 4 hardware threads, core power is 1177 and thus power of each hardware thread is 1177/4 = 294. This low power can lead to capacity for each hardware thread being calculated as 0, which leads to tasks bouncing within the core madly! Fix this by reporting capacity for hardware threads as 1, provided their power is not scaled down significantly because of frequency scaling or real-time tasks usage of cpu. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <20100608045702.21D03CC895@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 10:34:54 +02:00
Venkatesh Pallipadi	83cd4fe27a	sched: Change nohz idle load balancing logic to push model In the new push model, all idle CPUs indeed go into nohz mode. There is still the concept of idle load balancer (performing the load balancing on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz balancer when any of the nohz CPUs need idle load balancing. The kickee CPU does the idle load balancing on behalf of all idle CPUs instead of the normal idle balance. This addresses the below two problems with the current nohz ilb logic: * the idle load balancer continued to have periodic ticks during idle and wokeup frequently, even though it did not have any rebalancing to do on behalf of any of the idle CPUs. * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic wakeup can result in a periodic additional interrupt on a CPU doing the timer broadcast. Also currently we are migrating the unpinned timers from an idle to the cpu doing idle load balancing (when all the cpus in the system are idle, there is no idle load balancing cpu and timers get added to the same idle cpu where the request was made. So the existing optimization works only on semi idle system). And In semi idle system, we no longer have periodic ticks on the idle load balancer CPU. Using that cpu will add more delays to the timers than intended (as that cpu's timer base may not be uptodate wrt jiffies etc). This was causing mysterious slowdowns during boot etc. For now, in the semi idle case, use the nearest busy cpu for migrating timers from an idle cpu. This is good for power-savings anyway. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1274486981.2840.46.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 10:34:52 +02:00
Venkatesh Pallipadi	fdf3e95d39	sched: Avoid side-effect of tickless idle on update_cpu_load tickless idle has a negative side effect on update_cpu_load(), which in turn can affect load balancing behavior. update_cpu_load() is supposed to be called every tick, to keep track of various load indicies. With tickless idle, there are no scheduler ticks called on the idle CPUs. Idle CPUs may still do load balancing (with idle_load_balance CPU) using the stale cpu_load. It will also cause problems when all CPUs go idle for a while and become active again. In this case loads would not degrade as expected. This is how rq->nr_load_updates change looks like under different conditions: <cpu_num> <nr_load_updates change> All CPUS idle for 10 seconds (HZ=1000) 0 1621 10 496 11 139 12 875 13 1672 14 12 15 21 1 1472 2 2426 3 1161 4 2108 5 1525 6 701 7 249 8 766 9 1967 One CPU busy rest idle for 10 seconds 0 10003 10 601 11 95 12 966 13 1597 14 114 15 98 1 3457 2 93 3 6679 4 1425 5 1479 6 595 7 193 8 633 9 1687 All CPUs busy for 10 seconds 0 10026 10 10026 11 10026 12 10026 13 10025 14 10025 15 10025 1 10026 2 10026 3 10026 4 10026 5 10026 6 10026 7 10026 8 10026 9 10026 That is update_cpu_load works properly only when all CPUs are busy. If all are idle, all the CPUs get way lower updates. And when few CPUs are busy and rest are idle, only busy and ilb CPU does proper updates and rest of the idle CPUs will do lower updates. The patch keeps track of when a last update was done and fixes up the load avg based on current time. On one of my test system SPECjbb with warehouse 1..numcpus, patch improves throughput numbers by ~1% (average of 6 runs). On another test system (with different domain hierarchy) there is no noticable change in perf. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <AANLkTilLtDWQsAUrIxJ6s04WTgmw9GuOODc5AOrYsaR5@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 10:34:51 +02:00
Oleg Nesterov	246d86b518	sched: Simplify the reacquire_kernel_lock() logic - Contrary to what `6d558c3a` says, there is no need to reload prev = rq->curr after the context switch. You always schedule back to where you came from, prev must be equal to current even if cpu/rq was changed. - This also means reacquire_kernel_lock() can use prev instead of current. - No need to reassign switch_count if reacquire_kernel_lock() reports need_resched(), we can just move the initial assignment down, under the "need_resched_nonpreemptible:" label. - Try to update the comment after context_switch(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100519125711.GA30199@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-09 10:34:50 +02:00

1 2 3 4 5 ...

200412 Commits