Commit Graph

2440 Commits

Author SHA1 Message Date
Nick Piggin
617d2214ee [PATCH] mm: optimise page_count
Optimise page_count compound page test and make it consistent with similar
functions.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:02 -08:00
Nick Piggin
7835e98b2e [PATCH] remove set_page_count() outside mm/
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().

This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:02 -08:00
Nick Piggin
84097518d1 [PATCH] mm: nommu use compound pages
Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.

nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: David Howells <dhowells@redhat.com>

(Needs testing, please).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Nick Piggin
0f8053a509 [PATCH] mm: make __put_page internal
Remove __put_page from outside the core mm/.  It is dangerous because it does
not handle compound pages nicely, and misses 1->0 transitions.  If a user
later appears that really needs the extra speed we can reevaluate.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Andrew Morton
69e05944af [PATCH] vmscan: use unsigned longs
Turn basically everything in vmscan.c into `unsigned long'.  This is to avoid
the possibility that some piece of code in there might decide to operate upon
more than 4G (or even 2G) of pages in one hit.

This might be silly, but we'll need it one day.

Cc: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:00 -08:00
Andrew Morton
78eef01b0f [PATCH] on_each_cpu(): disable local interrupts
When on_each_cpu() runs the callback on other CPUs, it runs with local
interrupts disabled.  So we should run the function with local interrupts
disabled on this CPU, too.

And do the same for UP, so the callback is run in the same environment on both
UP and SMP.  (strictly it should do preempt_disable() too, but I think
local_irq_disable is sufficiently equivalent).

Also uninlines on_each_cpu().  softirq.c was the most appropriate file I could
find, but it doesn't seem to justify creating a new file.

Oh, and fix up that comment over (under?) x86's smp_call_function().  It
drives me nuts.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:59 -08:00
Christoph Lameter
ac2b898ca6 [PATCH] slab: Remove SLAB_NO_REAP option
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure.  However, that is not what happens.  The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap().  Cache_reap()
is run every few seconds independently of memory pressure.

Could we remove the whole thing?  Its only used by three slabs anyways and
I cannot find a reason for having this option.

There is an additional problem with SLAB_NO_REAP.  If set then the recovery
of objects from alien caches is switched off.  Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
check for SLAB_NO_REAP)

Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:59 -08:00
Adrian Bunk
b50ec7d807 [PATCH] kcalloc(): INT_MAX -> ULONG_MAX
Since size_t has the same size as a long on all architectures, it's enough
for overflow checks to check against ULONG_MAX.

This change could allow a compiler better optimization (especially in the
n=1 case).

The practical effect seems to be positive, but quite small:

    text           data     bss      dec            hex filename
21762380        5859870 1848928 29471178        1c1b1ca vmlinux-old
21762211        5859870 1848928 29471009        1c1b121 vmlinux-patched

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
9d41415221 [PATCH] mm: page_state comment more
Clarify that preemption needs to be guarded against with the
__xxx_page_state functions.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
8dfcc9ba27 [PATCH] mm: split highorder pages
Have an explicit mm call to split higher order pages into individual pages.
 Should help to avoid bugs and be more explicit about the code's intention.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
8dc04efbfb [PATCH] mm: de-skew page refcounting
atomic_add_unless (atomic_inc_not_zero) no longer requires an offset refcount
to function correctly.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
7c8ee9a863 [PATCH] mm: simplify vmscan vs release refcounting
The VM has an interesting race where a page refcount can drop to zero, but it
is still on the LRU lists for a short time.  This was solved by testing a 0->1
refcount transition when picking up pages from the LRU, and dropping the
refcount in that case.

Instead, use atomic_add_unless to ensure we never pick up a 0 refcount page
from the LRU, thus a 0 refcount page will never have its refcount elevated
until it is allocated again.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
f205b2fe62 [PATCH] mm: slab less atomics
Atomic operation removal from slab

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
5e9dace8d3 [PATCH] mm: page_alloc less atomics
More atomic operation removal from page allocator

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
674539115c [PATCH] mm: less atomic ops
In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0.  So no need for atomic
operations here.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
4c84cacfa4 [PATCH] mm: PageActive no testset
PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
8d438f96d2 [PATCH] mm: PageLRU no testset
PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:56 -08:00
Joe Korty
4024ce5e0f [PATCH] rtc.h broke strace(1) builds
Git patch 52dfa9a64c

	[PATCH] move rtc_interrupt() prototype to rtc.h

broke strace(1) builds.  The below moves the kernel-only additions lower,
under the already provided #ifdef __KERNEL__ statement.

Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:55 -08:00
Linus Torvalds
ec1248e70e Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  [CRYPTO] aes: Fixed array boundary violation
  [CRYPTO] tcrypt: Fix key alignment
  [CRYPTO] all: Add missing cra_alignmask
  [CRYPTO] all: Use kzalloc where possible
  [CRYPTO] api: Align tfm context as wide as possible
  [CRYPTO] twofish: Use rol32/ror32 where appropriate
2006-03-21 09:33:19 -08:00
Linus Torvalds
3d1f337b3e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (235 commits)
  [NETFILTER]: Add H.323 conntrack/NAT helper
  [TG3]: Don't mark tg3_test_registers() as returning const.
  [IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2
  [IPV6]: Nearly complete kzalloc cleanup for net/ipv6
  [IPV6]: Cleanup of net/ipv6/reassambly.c
  [BRIDGE]: Remove duplicate const from is_link_local() argument type.
  [DECNET]: net/decnet/dn_route.c: fix inconsequent NULL checking
  [TG3]: make drivers/net/tg3.c:tg3_request_irq() static
  [BRIDGE]: use LLC to send STP
  [LLC]: llc_mac_hdr_init const arguments
  [BRIDGE]: allow show/store of group multicast address
  [BRIDGE]: use llc for receiving STP packets
  [BRIDGE]: stp timer to jiffies cleanup
  [BRIDGE]: forwarding remove unneeded preempt and bh diasables
  [BRIDGE]: netfilter inline cleanup
  [BRIDGE]: netfilter VLAN macro cleanup
  [BRIDGE]: netfilter dont use __constant_htons
  [BRIDGE]: netfilter whitespace
  [BRIDGE]: optimize frame pass up
  [BRIDGE]: use kzalloc
  ...
2006-03-21 09:31:48 -08:00
Linus Torvalds
2bf2154c6b Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: (81 commits)
  [PATCH] USB: omninet: fix up debugging comments
  [PATCH] USB serial: add navman driver
  [PATCH] USB: Fix irda-usb use after use
  [PATCH] USB: rtl8150 small fix
  [PATCH] USB: ftdi_sio: add Icom ID1 USB product and vendor ids
  [PATCH] USB: cp2101: add new device IDs
  [PATCH] USB: fix check_ctrlrecip to allow control transfers in state ADDRESS
  [PATCH] USB: vicam.c: fix a NULL pointer dereference
  [PATCH] USB: ZC0301 driver bugfix
  [PATCH] USB: add support for Creativelabs Silvercrest USB keyboard
  [PATCH] USB: storage: new unusual_devs.h entry: Mitsumi 7in1 Card Reader
  [PATCH] USB: storage: unusual_devs.h entry 0420:0001
  [PATCH] USB: storage: another unusual_devs.h entry
  [PATCH] USB: storage: sandisk unusual_devices entry
  [PATCH] USB: fix initdata issue in isp116x-hcd
  [PATCH] USB: usbcore: usb_set_configuration oops (NULL ptr dereference)
  [PATCH] USB: usbcore: Don't assume a USB configuration includes any interfaces
  [PATCH] USB: ub 03 drop stall clearing
  [PATCH] USB: ub 02 remove diag
  [PATCH] USB: ub 01 remove first_open
  ...
2006-03-21 09:25:47 -08:00
Linus Torvalds
08a4ecee98 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (23 commits)
  [PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path
  [PATCH] sysfs: don't export dir symbols
  [PATCH] get_cpu_sysdev() signedness fix
  [PATCH] kobject_add_dir
  [PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data
  [PATCH] sysfs: fix problem with duplicate sysfs directories and files
  [PATCH] Kobject: kobject.h: fix a typo
  [PATCH] Kobject: provide better warning messages when people do stupid things
  [PATCH] Driver core: add macros notice(), dev_notice()
  [PATCH] firmware: fix BUG: in fw_realloc_buffer
  [PATCH] sysfs: kzalloc conversion
  [PATCH] fix module sysfs files reference counting
  [PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to USB subsystem
  [PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to RCU subsystem
  [PATCH] add EXPORT_SYMBOL_GPL_FUTURE()
  [PATCH] Clean up module.c symbol searching logic
  [PATCH] kobj_map semaphore to mutex conversion
  [PATCH] kref: avoid an atomic operation in kref_put()
  [PATCH] handle errors returned by platform_get_irq*()
  [PATCH] driver core: platform_get_irq*(): return -ENXIO on error
  ...
2006-03-21 09:25:15 -08:00
Linus Torvalds
28c006c1f0 Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] Fix cosmetic typo in asm/irq.h
  [ARM] 3367/1: CLCD mode no longer supported on the RealView boards
  [ARM] 3366/1: Allow the 16bpp mode configuration in the CLCD control register
2006-03-21 09:20:47 -08:00
Linus Torvalds
cbe037b46f Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (112 commits)
  [libata] sata_mv: fix irq port status usage
  [PATCH] libata: move IDENTIFY info printing from ata_dev_read_id() to ata_dev_configure()
  [PATCH] libata: use local *id instead of dev->id in ata_dev_configure()
  [PATCH] libata: check Word 88 validity in ata_id_xfer_mask()
  [PATCH] libata: fix class handling in ata_bus_probe()
  [PATCH] ahci: enable prefetching for PACKET commands
  libata: turn on ATAPI by default
  [PATCH] sata_sil24: lengthen softreset timeout
  [PATCH] sata_sil24: exit early from softreset if SStatus reports no device
  [PATCH] libata: fix missing classes[] initialization in ata_bus_probe()
  [PATCH] libata: kill unused xfer_mode functions
  [PATCH] libata: reimplement ata_set_mode() using xfer_mask helpers
  [PATCH] libata: use xfer_mask helpers in ata_dev_set_mode()
  [PATCH] libata: use ata_id_xfermask() in ata_dev_configure()
  [PATCH] libata: add xfer_mask handling functions
  [PATCH] libata: improve xfer mask constants and update ata_mode_string()
  [PATCH] libata: rename ATA_FLAG_FLUSH_PIO_TASK to ATA_FLAG_FLUSH_PORT_TASK
  [PATCH] libata: kill unused pio_task and packet_task
  [PATCH] libata: convert pio_task and packet_task to port_task
  [PATCH] libata: implement port_task
  ...
2006-03-21 09:20:12 -08:00
Linus Torvalds
f0481730c8 Merge kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb.git with fixups
This merges the DVB tree, but fixes up the history that had gotten
screwed up by a broken commit.

The history is fixed up by re-doing the commit properly (taking the
resolve from the final result of the original), and then cherry-picking
the commits that followed the broken merge.

* dvb: (190 commits)
  V4L/DVB (3545): Fixed no_overlay option and quirks on saa7134 driver
  V4L/DVB (3543): Fix Makefile to adapt to bt8xx/ conversion
  V4L/DVB (3538): Bt8xx documentation update
  V4L/DVB (3537a): Whitespace cleanup
  V4L/DVB (3533): Add WSS (wide screen signalling) module parameters
  V4L/DVB (3532): Moved duplicated code of ALPS BSRU6 tuner to a standalone file.
  V4L/DVB (3530): Kconfig: remove VIDEO_AUDIO_DECODER
  V4L/DVB (3529): Kconfig: add menu items for cs53l32a and wm8775 A/D converters
  V4L/DVB (3528): Kconfig: fix ATSC frontend menu item names by manufacturer
  V4L/DVB (3527): VIDEO_CPIA2 must depend on USB
  V4L/DVB (3525): Kconfig: remove VIDEO_DECODER
  V4L/DVB (3524): Kconfig: add menu items for saa7115 and saa7127
  V4L/DVB (3494): Kconfig: select VIDEO_MSP3400 to build msp3400.ko
  V4L/DVB (3522): Fixed a trouble with other PAL standards
  V4L/DVB (3521): Avoid warnings at video-buf.c
  V4L/DVB (3514): SAA7113 doesn't have auto std chroma detection mode
  V4L/DVB (3513): Remove saa711x driver
  V4L/DVB (3509): Make a needlessly global function static.
  V4L/DVB (3506): Cinergy T2 dmx cleanup on disconnect
  V4L/DVB (3504): Medion 7134: Autodetect second bridge chip
  ...

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-21 09:01:08 -08:00
Linus Torvalds
b05005772f Merge branch 'origin'
Conflicts:
	Documentation/video4linux/CARDLIST.cx88
	drivers/media/video/cx88/Kconfig
	drivers/media/video/em28xx/em28xx-video.c
	drivers/media/video/saa7134/saa7134-dvb.c

Resolved as in the original merge by Mauro Carvalho Chehab
2006-03-21 08:52:18 -08:00
Herbert Xu
f10b7897ee [CRYPTO] api: Align tfm context as wide as possible
Since tfm contexts can contain arbitrary types we should provide at least
natural alignment (__attribute__ ((__aligned__))) for them.  In particular,
this is needed on the Xscale which is a 32-bit architecture with a u64 type
that requires 64-bit alignment.  This problem was reported by Ronen Shitrit.

The crypto_tfm structure's size was 44 bytes on 32-bit architectures and
80 bytes on 64-bit architectures.  So adding this requirement only means
that we have to add an extra 4 bytes on 32-bit architectures.

On i386 the natural alignment is 16 bytes which also benefits the VIA
Padlock as it no longer has to manually align its context structure to
128 bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-03-21 20:14:08 +11:00
Jing Min Zhao
5e35941d99 [NETFILTER]: Add H.323 conntrack/NAT helper
Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:41:17 -08:00
Stephen Hemminger
fdeabdefb2 [BRIDGE]: netfilter inline cleanup
Move nf_bridge_alloc from header file to the one place it is
used and optimize it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:58:21 -08:00
Arnaldo Carvalho de Melo
a4bf390242 [DCCP] minisock: Rename struct dccp_options to struct dccp_minisock
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.

Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:50:58 -08:00
Dmitry Mishin
3fdadf7d27 [NET]: {get|set}sockopt compatibility layer
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:45:21 -08:00
Herbert Xu
cbb042f9e1 [NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsum
We're now starting to have quite a number of places that do skb_pull
followed immediately by an skb_postpull_rcsum.  We can merge these two
operations into one function with skb_pull_rcsum.  This makes sense
since most pull operations on receive skb's need to update the
checksum.

I've decided to make this out-of-line since it is fairly big and the
fast path where hardware checksums are enabled need to call
csum_partial anyway.

Since this is a brand new function we get to add an extra check on the
len argument.  As it is most callers of skb_pull ignore its return
value which essentially means that there is no check on the len
argument.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:43:56 -08:00
Steven Whitehouse
c4ea94ab37 [DECnet]: Endian annotation and fixes for DECnet.
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.

The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.

Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.

One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.

There are still a few warnings to fix, but this patch deals with the
important cases.

Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:42:39 -08:00
Catherine Zhang
2c7946a7bf [SECURITY]: TCP/UDP getpeersec
This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using.  The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection.  In the case of UDP, the
security context is for each individual packet.  An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.

Patch design approach:

- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association.  The application may retrieve this context using
getsockopt.  When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations.  If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.

- Design for UDP
Unlike TCP, UDP is connectionless.  This requires a somewhat different
API to retrieve the peer security context.  With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down.  With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.

The solution is to build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).

Patch implementation details:

- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag.  As an example (ignoring error
checking):

getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
printf("Socket peer context is: %s\n", optbuf);

The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations.  If
these have security associations with security contexts, the security
context is returned.

getsockopt returns a buffer that contains a security context string or
the buffer is unmodified.

- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt.  Then the application retrieves the security context using
the auxiliary data mechanism.

An example server application for UDP should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_IP &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer.  A new
ancillary message type SCM_SECURITY.

When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space.  An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space.  The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.

Testing:

We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built.  For TCP, we can then
extract the peer security context using getsockopt on either end.  For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:41:23 -08:00
Rick Jones
15d99e02ba [TCP]: sysctl to allow TCP window > 32767 sans wscale
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated.  Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.

Those days are long gone, so we can use the full 16-bits by default
now.

There is a sysctl added so that we can still interact with such old
stacks

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:40:29 -08:00
Neil Horman
abd596a4b6 [IPV4] ARP: Alloc acceptance of unsolicited ARP via netdevice sysctl.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:39:47 -08:00
Ingo Molnar
57b47a53ec [NET]: sem2mutex part 2
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:35:41 -08:00
Stephen Hemminger
1533306186 [NET]: dev_put/dev_hold cleanup
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel.  And turn dev_hold into an inline instead of a macro.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:32:28 -08:00
Michael Chan
d9ab5ad12b [TG3]: Add 5787 and 5754 basic support
Add basic support for 2 new chips 5787 and 5754.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:27:35 -08:00
Alpt
99cae7fca1 [NET] rtnetlink: Add RTPROT entry for Netsukuku.
The Netsukuku daemon is using the same number to mark its routes, you
can see it here:
http://hinezumilabs.org/cgi-bin/viewcvs.cgi/netsukuku/src/krnl_route.h?rev=HEAD&content-type=text/vnd.viewcvs-markup

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:26:17 -08:00
Stephen Hemminger
6756ae4b4e [NET]: Convert RTNL to mutex.
This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and
gets rid of some of the leftover legacy.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:23:58 -08:00
David Basden
0ac81ae34e [IRDA]: TOIM3232 dongle support
Here goes a patch for supporting TOIM3232 based serial IrDA dongles.
The code is based on the tekram dongle code.

It's been tested with a TOIM3232 based IRWave 320S dongle. It may work
for TOIM4232 dongles, although it's not been tested.

Signed-off-by: David Basden <davidb-irda@rcpt.to>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:21:10 -08:00
John Heffner
0e7b13685f [TCP] mtu probing: move tcp-specific data out of inet_connection_sock
This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:32:58 -08:00
Jörn Engel
231d06ae82 [NET]: Uninline kfree_skb and allow NULL argument
o Uninline kfree_skb, which saves some 15k of object code on my notebook.

o Allow kfree_skb to be called with a NULL argument.

  Subsequent patches can remove conditional from drivers and further
  reduce source and object size.

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:28:35 -08:00
Jamal Hadi Salim
4bf07ef3fd [XFRM]: Rearrange struct xfrm_aevent_id for better compatibility.
struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly.

Based upon suggestions from Yoshifuji.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:25:50 -08:00
Arnaldo Carvalho de Melo
e55d912f5b [DCCP] feat: Introduce sysctls for the default features
[root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done
/proc/sys/net/dccp/default/ack_ratio
2
/proc/sys/net/dccp/default/rx_ccid
3
/proc/sys/net/dccp/default/send_ackvec
1
/proc/sys/net/dccp/default/send_ndp
1
/proc/sys/net/dccp/default/seq_window
100
/proc/sys/net/dccp/default/tx_ccid
3
[root@qemu ~]#

So if wanting to test ccid3 as the tx CCID one can just do:

[root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid
[root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid
[root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid
2
3
[root@qemu ~]#

Of course we also need the setsockopt for each app to tell its preferences, but
for testing or defining something other than CCID2 as the default for apps that
don't explicitely set their preference the sysctl interface is handy.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:25:02 -08:00
Arnaldo Carvalho de Melo
93ce20928f [DCCP]: Make CCID2 be the default
As per the draft. This fixes the build when netfilter dccp components
are built and dccp isn't. Thanks to Reuben Farrelly for reporting
this.

The following changesets will introduce /proc/sys/net/dccp/defaults/
to give more flexibility to DCCP developers and testers while apps
doesn't use setsockopt to specify the desired CCID, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:58 -08:00
Andrea Bittau
60fe62e789 [DCCP]: sparse endianness annotations
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:32 -08:00
Patrick McHardy
a193a4abdd [NETFILTER]: Fix skb->nf_bridge lifetime issues
The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips
the real hook by registering with high priority and returning NF_STOP if
skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not
set. The flag is only set during the simulated hook.

Because skb->nf_bridge is only freed when the packet is destroyed, the
packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but
in the case of tunnel devices on top of the bridge also all further ones.
Forwarded packets from a bridge encapsulated by a tunnel device and sent
as locally outgoing packet will also still have the incorrect bridge
information from the input path attached.

We already have nf_reset calls on all RX/TX paths of tunnel devices,
so simply reset the nf_bridge field there too. As an added bonus,
the bridge information for locally delivered packets is now also freed
when the packet is queued to a socket.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:05 -08:00
Arnaldo Carvalho de Melo
91f0ebf7b6 [DCCP] CCID: Improve CCID infrastructure
1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
   does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
   with a pointer to ccid_operations and rigth after it the rx or tx
   private state.

3. Remove the pointer to the state of the half connections from struct
   dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:21:44 -08:00