Commit Graph

8090 Commits

Author SHA1 Message Date
Paul Menage
bbcb81d091 Task Control Groups: add tasks file interface
Add the per-directory "tasks" file for cgroupfs mounts; this allows the
user to determine which tasks are members of a cgroup by reading a
cgroup's "tasks", and to move a task into a cgroup by writing its pid to
its "tasks".

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:36 -07:00
Paul Menage
ddbcc7e8e5 Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------

There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.

This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.

The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:

- the userspace APIs are (somewhat) normalised

- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.

- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel

This patch:

Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:36 -07:00
Randy Dunlap
8f731f7d83 kernel-api docbook: fix content problems
Fix kernel-api docbook contents problems.

docproc: linux-2.6.23-git13/include/asm-x86/unaligned_32.h: No such file or directory
Warning(linux-2.6.23-git13//include/linux/list.h:482): bad line: 			of list entry
Warning(linux-2.6.23-git13//mm/filemap.c:864): No description found for parameter 'ra'
Warning(linux-2.6.23-git13//block/ll_rw_blk.c:3760): No description found for parameter 'req'
Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'private'
Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'cdev'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Jeff Mahoney
cb680c1be6 reiserfs: ignore on disk s_bmap_nr value
Implement support for file systems larger than 8 TiB.

The reiserfs superblock contains a 16 bit value for counting the number of
bitmap blocks.  The rest of the disk format supports file systems up to 2^32
blocks, but the bitmap block limitation artificially limits this to 8 TiB with
a 4KiB block size.

Rather than trust the superblock's 16-bit bitmap block count, we calculate it
dynamically based on the number of blocks in the file system.  When an
incorrect value is observed in the superblock, it is zeroed out, ensuring that
older kernels will not be able to mount the file system.

Userspace support has already been implemented and shipped in reiserfsprogs
3.6.20.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Jeff Mahoney
4d20851d37 reiserfs: remove first_zero_hint
The first_zero_hint metadata caching was never actually used, and it's of
dubious optimization quality.  This patch removes it.

It doesn't actually shrink the size of the reiserfs_bitmap_info struct, since
that doesn't work with block sizes larger than 8K.  There was a big fixme in
there, and with all the work lately in allowing block size > page size, I
might as well kill the fixme as well.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Jeff Mahoney
3ee1667042 reiserfs: fix usage of signed ints for block numbers
Do a quick signedness check for block numbers.  There are a number of places
where signed integers are used for block numbers, which limits the usable file
system size to 8 TiB.  The disk format, excepting a problem which will be
fixed in the following patch, supports file systems up to 16 TiB in size.
This patch cleans up those sites so that we can enable the full usable size.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Jose R. Santos
c2a9159cdd jbd: config_jbd_debug cannot create /proc entry
The jbd-debug file used to be located in /proc/sys/fs/jbd-debug, but
create_proc_entry() does not do lookups on file names that are more that
one directory deep.  This causes the entry creation to fail and hence, no
proc file is created.

Instead of fixing this on procfs might as well move the jbd2-debug file to
debugfs which would be the preferred location for this kind of tunable.
The new location is now /sys/kernel/debug/jbd/jbd-debug.

[akpm@linux-foundation.org: zillions of cleanups]
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:35 -07:00
Chris Snook
1c09924448 jbd: remove printk() from J_ASSERT macros
Remove printk from J_ASSERT to preserve registers during BUG.

Signed-off-by: Chris Snook <csnook@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:34 -07:00
Samuel Thibault
b293d75847 Console events and accessibility
Some external modules like Speakup need to monitor console output.

This adds a VT notifier that such modules can use to get console output events:
allocation, deallocation, writes, other updates (cursor position, switch, etc.)

[akpm@linux-foundation.org: fix headers_check]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:34 -07:00
Alexey Dobriyan
fe9d4f5763 Add kernel/notifier.c
There is separate notifier header, but no separate notifier .c file.

Extract notifier code out of kernel/sys.c which will remain for
misc syscalls I hope. Merge kernel/die_notifier.c into kernel/notifier.c.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:34 -07:00
Samuel Thibault
41ab4396e1 Console keyboard events and accessibility
Some blind people use a kernel engine called Speakup which uses hardware
synthesis to speak what gets displayed on the screen.  They use the
PC keyboard to control this engine (start/stop, accelerate, ...) and
also need to get keyboard feedback (to make sure to know what they are
typing, the caps lock status, etc.)

Up to now, the way it was done was very ugly.  Below is a patch to add a
notifier list for permitting a far better implementation, see ChangeLog
above for details.

You may wonder why this can't be done at the input layer.  The problem
is that what people want to monitor is the console keyboard, i.e. all
input keyboards that got attached to the console, and with the currently
active keymap (i.e. keysyms, not only keycodes).

This adds a keyboard notifier that such modules can use to get the keyboard
events and possibly eat them, at several stages:

- keycodes: even before translation into keysym.
- unbound keycodes: when no keysym is bound.
- unicode: when the keycode would get translated into a unicode character.
- keysym: when the keycode would get translated into a keysym.
- post_keysym: after the keysym got interpreted, so as to see the result
  (caps lock, etc.)

This also provides access to k_handler so as to permit simulation of
keypresses.

[akpm@linux-foundation.org: various fixes]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:33 -07:00
Miklos Szeredi
c18479fe01 put declaration of put_filesystem() in fs.h
Declarations go into headers.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Ram Pai <linuxram@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:33 -07:00
Linus Torvalds
4fa4d23fa2 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
  pcnet32: remove private net_device_stats structure
  vortex_up should initialize "err"
  pcnet32: remove compile warnings in non-napi mode
  pcnet32: fix non-napi packet reception
  fix EMAC driver for proper napi_synchronize API
  sky2: shutdown cleanup
  napi_synchronize: waiting for NAPI
  forcedeth msi bugfix
  gianfar: fix obviously wrong #ifdef CONFIG_GFAR_NAPI placement
  fs_enet: Update for API changes
  gianfar: remove orphan struct.
  forcedeth: fix rx-work condition in nv_rx_process_optimized() too
2007-10-18 19:31:54 -07:00
Linus Torvalds
a9e82d3a02 Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: (37 commits)
  ide: set drive->autotune in ide_pci_setup_ports()
  triflex: always tune PIO
  opti621: always tune PIO
  cy82c693: always tune PIO
  cs5520: always tune PIO
  alim15x3: always tune PIO
  ide: add IDE_HFLAG_LEGACY_IRQS host flag
  ide: add IDE_HFLAG_SERIALIZE host flag
  ide: add IDE_HFLAG_ERROR_STOPS_FIFO host flag
  piix: add DECLARE_ICH_DEV() macro
  pdc202xx_old: add DECLARE_PDC2026X_DEV() macro
  pdc202xx_new: add DECLARE_PDCNEW_DEV() macro
  aec62xx: no need to disable UDMA in ->init_hwif method for ATP850UF
  ide: remove .init_setup from ide_pci_device_t
  serverworks: remove ->init_setup
  scc_pata: remove ->init_setup
  pdc202xx_old: remove ->init_setup
  pdc202xx_new: remove ->init_setup
  hpt366: remove ->init_setup
  cmd64x: remove ->init_setup
  ...
2007-10-18 16:00:02 -07:00
Bartlomiej Zolnierkiewicz
3985ee3b4c ide: add IDE_HFLAG_LEGACY_IRQS host flag
Add IDE_HFLAG_LEGACY_IRQS host flag to tell ide_pci_setup_ports() to set
hwif->irq to legacy IRQ 14/15 (iff hwif->irq is not already set) and convert
atiixp, piix, serverworks, sis5513 and slc90e66 host drivers to use it.

While at it:

* In piix.c add IDE_HFLAGS_PIIX define and don't use ->init_hwif for MPIIX.

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:11 +02:00
Bartlomiej Zolnierkiewicz
1c51361a98 ide: add IDE_HFLAG_SERIALIZE host flag
Add IDE_HFLAG_SERIALIZE host flag to tell ide_pci_setup_ports() to set
hwif/mate->serialized and convert aec62xx, cs5530 and sc1200 host drivers
to use it.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:10 +02:00
Bartlomiej Zolnierkiewicz
ed67b92385 ide: add IDE_HFLAG_ERROR_STOPS_FIFO host flag
Add IDE_HFLAG_ERROR_STOPS_FIFO host flag and use it instead
of hwif->err_stops_fifo.  As a side-effect this change fixes
hwif->err_stops_fifo not being restored by ide_hwif_restore().

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:10 +02:00
Bartlomiej Zolnierkiewicz
942278ef64 ide: remove .init_setup from ide_pci_device_t
Now that all users were fixed we can safely remove it.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:09 +02:00
Bartlomiej Zolnierkiewicz
5f8b6c3485 ide: add ->mwdma_mask and ->swdma_mask to ide_pci_device_t (take 2)
* Add ->mwdma_mask and ->swdma_mask to ide_pci_device_t.

* Set ide_hwif_t DMA masks using DMA masks from ide_pci_device_t in
  setup-pci.c::ide_pci_setup_ports() (iff DMA base is valid and ->init_hwif
  method may still override them).

* Convert IDE PCI host drivers to use ide_pci_device_t DMA masks.

While at it:

* Use ATA_{UDMA,MWDMA,SWDMA}* defines.

* hpt34x.c: add separate ide_pci_device_t instances for HPT343 and HPT345.

* serverworks.c: fix DMA masks being set before checking DMA base.

v2:
* Add missing masks to DECLARE_GENERIC_PCI_DEV() macro.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:07 +02:00
Bartlomiej Zolnierkiewicz
238e4f142c ide: add IDE_HFLAG_NO_LBA48 and IDE_HFLAG_NO_LBA48_DMA host flags
Add IDE_HFLAG_NO_LBA48[_DMA] host flags, use it instead of hwif->no_lba48[_dma]
and then remove no longer needed hwif->no_lba48[_dma].  As a side-effect
this change fixes hwif->no_lba48_dma not being restored by ide_hwif_restore().

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:07 +02:00
Bartlomiej Zolnierkiewicz
9ffcf364f9 ide: remove ->init_setup_dma from ide_pci_device_t (take 2)
* Make ide_pci_device_t.host_flags u32 and add IDE_HFLAG_CS5520 host flag.

* Pass ide_pci_device_t *d to setup-pci.c::ide_get_or_set_dma_base()
  and use d->name instead of hwif->cds->name.

* Set IDE_HFLAG_CS5520 host flag in cs5520 host driver and use it in
  ide_get_or_set_dma_base() to find out which PCI BAR to use, remove no longer
  needed cs5520.c::cs5520_init_setup_dma() and ide_pci_device_t.init_setup_dma.

  This fixes PCI bus-mastering not being checked for CS5510/CS5520 hosts.

v2:
* It is wrong to check simplex bits on CS5510/CS5520 as v1 did.
  (Noticed by Alan).

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:07 +02:00
Bartlomiej Zolnierkiewicz
47b687882c ide: add IDE_HFLAG_NO_{DMA,AUTODMA} host flags
Add IDE_HFLAG_NO_{DMA,AUTODMA} host flags.  Convert all host drivers using
ide_pci_device_t to use these flags instead of d->autodma and then remove no
longer needed d->autodma.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:06 +02:00
Bartlomiej Zolnierkiewicz
7cab14a799 ide: add IDE_HFLAG_BOOTABLE host flag
Add IDE_HFLAG_BOOTABLE host flag and IDE_HFLAG_OFF_BOARD define.  Convert
all host drivers using ide_pci_device_t to use IDE_HFLAG_{BOOTABLE,OFF_BOARD}
instead of d->bootable and then remove no longer needed d->bootable.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:06 +02:00
Bartlomiej Zolnierkiewicz
33c1002ed9 ide: add IDE_HFLAG_NO_ATAPI_DMA host flag
Add IDE_HFLAG_NO_ATAPI_DMA host flag and set it in host drivers which
don't support ATAPI DMA.  Then remove no longer needed hwif->atapi_dma.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:06 +02:00
Benjamin Herrenschmidt
1b67834712 ide: Add ide_get_paired_drive() helper
This adds a helper to get to the "other" drive on a pair connected
to a given hwif.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrew Morton <akpm@osdl.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-10-19 00:30:05 +02:00
Linus Torvalds
58f9b52ee8 Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt
* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
  hrtimer: hook compat_sys_nanosleep up to high res timer code
  hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
2007-10-18 15:12:41 -07:00
Linus Torvalds
2af170dd24 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] kill ata_sg_is_last()
  Update libata driver for bf548 atapi controller against the 2.6.24 tree.
  libata-sff: Correct use of check_status()
  drivers/ata: add support to Freescale 3.0Gbps SATA Controller
  pata_acpi: fix build breakage if !CONFIG_PM
2007-10-18 15:08:35 -07:00
Linus Torvalds
54e840dd50 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  sched: reduce schedstat variable overhead a bit
  sched: add KERN_CONT annotation
  sched: cleanup, make struct rq comments more consistent
  sched: cleanup, fix spacing
  sched: fix return value of wait_for_completion_interruptible()
2007-10-18 14:54:03 -07:00
Linus Torvalds
a57793651f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits)
  [IPV6]: Fix again the fl6_sock_lookup() fixed locking
  [NETFILTER]: nf_conntrack_tcp: fix connection reopening fix
  [IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels
  [IPV6]: Lost locking in fl6_sock_lookup
  [IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list
  [NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required
  [NET]: Fix OOPS due to missing check in dev_parse_header().
  [TCP]: Remove lost_retrans zero seqno special cases
  [NET]: fix carrier-on bug?
  [NET]: Fix uninitialised variable in ip_frag_reasm()
  [IPSEC]: Rename mode to outer_mode and add inner_mode
  [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
  [IPSEC]: Use the top IPv4 route's peer instead of the bottom
  [IPSEC]: Store afinfo pointer in xfrm_mode
  [IPSEC]: Add missing BEET checks
  [IPSEC]: Move type and mode map into xfrm_state.c
  [IPSEC]: Fix length check in xfrm_parse_spi
  [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
  [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
  [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
  ...
2007-10-18 14:40:30 -07:00
Linus Torvalds
9cf52b2921 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC/64]: Consolidate of_register_driver
  [SPARC] Videopix Frame Grabber: Convert device_lock_sem to mutex
  [SPARC]: Support for new termios.
  [SPARC64]: Check of_get_property() return in pci_determine_mem_io_space().
  [SPARC64]: Fix boot failures due to bootmem.
  [SPARC64]: Implement atomic backoff.
2007-10-18 14:39:44 -07:00
Corey Minyard
d8c98618f4 IPMI: add 0.9 support
Add support for IPMI 0.9 systems to the IPMI driver.  Just handle a shorter
get device ID command with less information.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Stian Jordet <liste@jordet.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:32 -07:00
Corey Minyard
fcfa472411 IPMI: add polled interface
Currently the IPMI watchdog timer sets the watchdog timeout on a panic, but it
doesn't actually poll the interface to make sure the message goes out.

Add an interface for polling the IPMI driver, and add code to the IPMI
watchdog timer to poll the interface when the timer is set from a panic.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:32 -07:00
Ralf Baechle
e8c44319c6 Replace __attribute_pure__ with __pure
To be consistent with the use of attributes in the rest of the kernel
replace all use of __attribute_pure__ with __pure and delete the definition
of __attribute_pure__.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:32 -07:00
Miklos Szeredi
0e9663ee45 fuse: add blksize field to fuse_attr
There are cases when the filesystem will be passed the buffer from a single
read or write call, namely:

 1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go
    through the page cache, but go directly to the userspace fs

 2) currently buffered writes are done with single page requests, but
    if Nick's ->perform_write() patch goes it, it will be possible to
    do larger write requests.  But only if the original write() was
    also bigger than a page.

In these cases the filesystem might want to give a hint to the app
about the optimal I/O size.

Allow the userspace filesystem to supply a blksize value to be returned by
stat() and friends.  If the field is zero, it defaults to the old
PAGE_CACHE_SIZE value.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
f33321141b fuse: add support for mandatory locking
For mandatory locking the userspace filesystem needs to know the lock
ownership for read, write and truncate operations.

This patch adds the necessary fields to the protocol.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
b25e82e567 fuse: add helper for asynchronous writes
This patch adds a new helper function fuse_write_fill() which makes it
possible to send WRITE requests asynchronously.

A new flag for WRITE requests is also added which indicates that this a write
from the page cache, and not a "normal" file write.

This patch is in preparation for writable mmap support.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
a9ff4f8705 fuse: support BSD locking semantics
It is trivial to add support for flock(2) semantics to the existing protocol,
by setting the lock owner field to the file pointer, and passing a new
FUSE_LK_FLOCK flag with the locking request.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
6ff958edbf fuse: add atomic open+truncate support
This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single
request, instead of separate truncate and open requests.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
17637cbaba fuse: improve utimes support
Add two new flags for setattr: FATTR_ATIME_NOW and FATTR_MTIME_NOW.  These
mean, that atime or mtime should be changed to the current time.

Also it is now possible to update atime or mtime individually, not just
together.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:30 -07:00
Miklos Szeredi
d139d7ffd0 VFS: allow filesystems to implement atomic open+truncate
Add a new attribute flag ATTR_OPEN, with the meaning: "truncation was
initiated by open() due to the O_TRUNC flag".

This way filesystems wanting to implement truncation within their ->open()
method can ignore such truncate requests.

This is a quick & dirty hack, but it comes for free.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:30 -07:00
Miklos Szeredi
c79e322f63 fuse: add file handle to getattr operation
Add necessary protocol changes for supplying a file handle with the getattr
operation.  Step the API version to 7.9.

This patch doesn't actually supply the file handle, because that needs some
kind of VFS support, which we haven't yet been able to agree upon.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:30 -07:00
Takashi Sato
0f0a89ebe1 ext3: support large blocksize up to PAGESIZE
This patch set supports large block size(>4k, <=64k) in ext3 just enlarging
the block size limit.  But it is NOT possible to have 64kB blocksize on
ext3 without some changes to the directory handling code.  The reason is
that an empty 64kB directory block would have a rec_len == (__u16)2^16 ==
0, and this would cause an error to be hit in the filesystem.  The proposed
solution is treat 64k rec_len with a an impossible value like rec_len =
0xffff to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext3: enlarge blocksize
         - Allow blocksize up to pagesize

  [2/2]  ext3: fix rec_len overflow
         - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext3, and able to handle empty directory block.

Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:29 -07:00
Nick Piggin
b8dc93cbe9 bit_spin_lock: use lock bitops
Convert bit_spin_lock to new locking bitops.  Slub can use the non-atomic
store version to clear (Christoph?)

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:29 -07:00
Satyam Sharma
761bb43190 Redefine {un}register_hotcpu_notifier() !HOTPLUG_CPU stubs
The return of the present "do {} while" based stub definition of
register_hotcpu_notifier() cannot be checked.  This makes the stub
asymmetric w.r.t.  the real HOTPLUG_CPU=y implementation that is
int-returning.  So let us redefine this to be consistent with the full
version.  Also do the same for unregister_hotcpu_notifier().

We cannot define these as static inline functions due to an existing GCC
bug (#33172).  So define as macros that return appropriately instead (int
'0' for the register_hotcpu_notifier case and void for
unregister_hotcpu_notifier).

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:28 -07:00
Michael Neuling
f494f8fcb1 add-scaled-time-to-taskstats-based-process-accounting fix
This moves the new items to the end of the taskstats struct as
requested by Balbir and yourself.

Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:28 -07:00
Michael Neuling
c66f08be7e Add scaled time to taskstats based process accounting
This adds items to the taststats struct to account for user and system
time based on scaling the CPU frequency and instruction issue rates.

Adds account_(user|system)_time_scaled callbacks which architectures
can use to account for time using this mechanism.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:28 -07:00
Jiri Slaby
65f76a82ec Char: cyclades, fix some -W warnings
Most of them are signedness, the rest unused function parameters.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:26 -07:00
Jiri Slaby
ebafeeff0f Char: cyclades, remove bottom half processing
The work done in bottom half doesn't cost much cpu time (e.g.  tty_hangup
itself schedules its own bottom half), it's possible to do the work in isr
directly and save hence some .text.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:26 -07:00
Andrew Morgan
72c2d5823f V3 file capabilities: alter behavior of cap_setpcap
The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1,
can change the capabilities of another process, p2.  This is not the
meaning that was intended for this capability at all, and this
implementation came about purely because, without filesystem capabilities,
there was no way to use capabilities without one process bestowing them on
another.

Since we now have a filesystem support for capabilities we can fix the
implementation of CAP_SETPCAP.

The most significant thing about this change is that, with it in effect, no
process can set the capabilities of another process.

The capabilities of a program are set via the capability convolution
rules:

   pI(post-exec) = pI(pre-exec)
   pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI)
   pE(post-exec) = fE ? pP(post-exec) : 0

at exec() time.  As such, the only influence the pre-exec() program can
have on the post-exec() program's capabilities are through the pI
capability set.

The correct implementation for CAP_SETPCAP (and that enabled by this patch)
is that it can be used to add extra pI capabilities to the current process
- to be picked up by subsequent exec()s when the above convolution rules
are applied.

Here is how it works:

Let's say we have a process, p. It has capability sets, pE, pP and pI.
Generally, p, can change the value of its own pI to pI' where

   (pI' & ~pI) & ~pP = 0.

That is, the only new things in pI' that were not present in pI need to
be present in pP.

The role of CAP_SETPCAP is basically to permit changes to pI beyond
the above:

   if (pE & CAP_SETPCAP) {
      pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0  */
   }

This capability is useful for things like login, which (say, via
pam_cap) might want to raise certain inheritable capabilities for use
by the children of the logged-in user's shell, but those capabilities
are not useful to or needed by the login program itself.

One such use might be to limit who can run ping. You set the
capabilities of the 'ping' program to be "= cap_net_raw+i", and then
only shells that have (pI & CAP_NET_RAW) will be able to run
it. Without CAP_SETPCAP implemented as described above, login(pam_cap)
would have to also have (pP & CAP_NET_RAW) in order to raise this
capability and pass it on through the inheritable set.

Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:24 -07:00
Eric W. Biederman
fc6cd25b73 sysctl: Error on bad sysctl tables
After going through the kernels sysctl tables several times it has become
clear that code review and testing is just not effective in prevent
problematic sysctl tables from being used in the stable kernel.  I certainly
can't seem to fix the problems as fast as they are introduced.

Therefore this patch adds sysctl_check_table which is called when a sysctl
table is registered and checks to see if we have a problematic sysctl table.

The biggest part of the code is the table of valid binary sysctl entries, but
since we have frozen our set of binary sysctls this table should not need to
change, and it makes it much easier to detect when someone unintentionally
adds a new binary sysctl value.

As best as I can determine all of the several hundred errors spewed on boot up
now are legitimate.

[bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:23 -07:00