Commit Graph

2044 Commits

Author SHA1 Message Date
David S. Miller
b0e1e6462d netdev: Move rest of qdisc state into struct netdev_queue
Now qdisc, qdisc_sleeping, and qdisc_list also live there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:42:10 -07:00
David S. Miller
7c3ceb4a40 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl-3945.c
	net/mac80211/mlme.c
2008-07-08 16:30:17 -07:00
Andrey Vagin
b223856640 ipv6: fix race between ipv6_del_addr and DAD timer
Consider the following scenario:

ipv6_del_addr(ifp)
  ipv6_ifa_notify(RTM_DELADDR, ifp)
    ip6_del_rt(ifp->rt)

after returning from the ipv6_ifa_notify and enabling BH-s
back, but *before* calling the addrconf_del_timer the 
ifp->timer fires and:

addrconf_dad_timer(ifp)
  addrconf_dad_completed(ifp)
    ipv6_ifa_notify(RTM_NEWADDR, ifp)
      ip6_ins_rt(ifp->rt)

then return back to the ipv6_del_addr and:

in6_ifa_put(ifp)
  inet6_ifa_finish_destroy(ifp)
    dst_release(&ifp->rt->u.dst)

After this we have an ifp->rt inserted into fib6 lists, but 
queued for gc, which in turn can result in oopses in the
fib6_run_gc. Maybe some other nasty things, but we caught 
only the oops in gc so far.

The solution is to disarm the ifp->timer before flushing the
rt from it.

Signed-off-by: Andrey Vagin <avagin@parallels.com>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 15:13:31 -07:00
Alexey Dobriyan
43de9dfeaa netfilter: ip6table_filter in netns for real
One still needs to remove checks in nf_hook_slow() and nf_sockopt_find()
to test this, though.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:36:18 -07:00
Alexey Dobriyan
d2789312cc netfilter: use correct namespace in ip6table_security
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 02:34:52 -07:00
Pavel Emelyanov
ef28d1a20f MIB: add struct net to UDP6_INC_STATS_BH
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:19:40 -07:00
Pavel Emelyanov
235b9f7ac5 MIB: add struct net to UDP6_INC_STATS_USER
As simple as the patch #1 in this set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 21:19:20 -07:00
YOSHIFUJI Hideaki
e0835f8fa5 ipv4,ipv6 mroute: Add some helper inline functions to remove ugly ifdefs.
ip{,v6}_mroute_{set,get}sockopt() should not matter by optimization but
it would be better not to depend on optimization semantically.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:57 +09:00
Wang Chen
623d1a1af7 ipv6: Do cleanup for ip6_mr_init.
If do not do it, we will get following issues:
1. Leaving junks after inet6_init failing halfway.
2. Leaving proc and notifier junks after ipv6 modules unloading.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki
dd3abc4ef5 ipv6 route: Prefer outgoing interface with source address assigned.
Outgoing interface is selected by the route decision if unspecified.
Let's prefer routes via interface(s) with the address assigned if we
have multiple routes with same cost.
With help from Naohiro Ooiwa <nooiwa@miraclelinux.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki
1b34be74cb ipv6 addrconf: add accept_dad sysctl to control DAD operation.
- If 0, disable DAD.
- If 1, perform DAD (default).
- If >1, perform DAD and disable IPv6 operation if DAD for MAC-based
  link-local address has been failed (RFC4862 5.4.5).

We do not follow RFC4862 by default.  Refer to the netdev thread entitled
"Linux IPv6 DAD not full conform to RFC 4862 ?"
	http://www.spinics.net/lists/netdev/msg52027.html

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:56 +09:00
YOSHIFUJI Hideaki
778d80be52 ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00
YOSHIFUJI Hideaki
5ce83afaac ipv6: Assume the loopback address in link-local scope.
Handle interface property strictly when looking up a route
for the loopback address (RFC4291 2.5.3).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00
YOSHIFUJI Hideaki
f81b2e7d8c ipv6: Do not forward packets with the unspecified source address.
RFC4291 2.5.2.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00
YOSHIFUJI Hideaki
d68b82705a ipv6: Do not assign non-valid address on interface.
Check the type of the address when adding a new one on interface.
- the unspecified address (::) is always disallowed (RFC4291 2.5.2)
- the loopback address is disallowed unless the interface is (one of)
  loopback (RFC4291 2.5.3).
- multicast addresses are disallowed.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-07-03 17:51:55 +09:00
Stephen Hemminger
6dbf4bcac9 icmp: fix units for ratelimit
Convert the sysctl values for icmp ratelimit to use milliseconds instead
of jiffies which is based on kernel configured HZ.
Internal kernel jiffies are not a proper unit for any userspace API.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:29:07 -07:00
David S. Miller
1b63ba8a86 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl4965-base.c
2008-06-28 01:19:40 -07:00
YOSHIFUJI Hideaki
d420895efb ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.
The commit 77d16f450a ("[IPV6] ROUTE:
Unify RT6_F_xxx and RT6_SELECT_F_xxx flags") intended to pass various
routing lookup hints around RT6_LOOKUP_F_xxx flags, but conversion was
missing for rt6_device_match().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 20:14:54 -07:00
Pavel Emelyanov
9a375803fe inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild
The problem is that while we work w/o the inet_frags.lock even
read-locked the secret rebuild timer may occur (on another CPU, since
BHs are still disabled in the inet_frag_find) and change the rnd seed
for ipv4/6 fragments.

It was caused by my patch fd9e63544c
([INET]: Omit double hash calculations in xxx_frag_intern) late 
in the 2.6.24 kernel, so this should probably be queued to -stable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 20:06:08 -07:00
Stephen Hemminger
7be87351a1 tcp: /proc/net/tcp rto,ato values not scaled properly (v2)
I found another case where we are sending information to userspace
in the wrong HZ scale.  This should have been fixed back in 2.5 :-(

This means an ABI change but as it stands there is no way for an application
like ss to get the right value.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 20:00:19 -07:00
Patrick McHardy
88a6f4ad76 netfilter: ip6table_mangle: don't reroute in LOCAL_IN
Rerouting should only happen in LOCAL_OUT, in INPUT its useless
since the packet has already chosen its final destination.

Noticed by Alexey Dobriyan <adobriyan@gmail.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-24 13:30:45 -07:00
YOSHIFUJI Hideaki
f630e43a21 ipv6: Drop packets for loopback address from outside of the box.
[ Based upon original report and patch by Karsten Keil.  Karsten
  has verified that this fixes the TAHI test case "ICMPv6 test
  v6LC.5.1.2 Part F". -DaveM ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:33:57 -07:00
Shan Wei
aea7427f70 ipv6: Remove options header when setsockopt's optlen is 0
Remove the sticky Hop-by-Hop options header by calling setsockopt()
for IPV6_HOPOPTS with a zero option length, per RFC3542.

Routing header and Destination options header does the same as
Hop-by-Hop options header.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:29:39 -07:00
Ben Hutchings
4497b0763c net: Discard and warn about LRO'd skbs received for forwarding
Add skb_warn_if_lro() to test whether an skb was received with LRO and
warn if so.

Change br_forward(), ip_forward() and ip6_forward() to call it) and
discard the skb if it returns true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:22:28 -07:00
Ben Hutchings
0187bdfb05 net: Disable LRO on devices that are forwarding
Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded.  It can also confuse the GSO on output.

Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.

Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:15:47 -07:00
Rami Rosen
dd574dbfcc ipv6: minor cleanup in net/ipv6/tcp_ipv6.c [RESEND ].
In net/ipv6/tcp_ipv6.c:

  - Remove unneeded tcp_v6_send_check() declaration.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 00:51:09 -07:00
Eric Dumazet
cb61cb9b8b udp: sk_drops handling
In commits 33c732c361 ([IPV4]: Add raw
drops counter) and a92aa318b4 ([IPV6]:
Add raw drops counter), Wang Chen added raw drops counter for
/proc/net/raw & /proc/net/raw6

This patch adds this capability to UDP sockets too (/proc/net/udp &
/proc/net/udp6).

This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
be examined for each udp socket.

# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 23971006 75 899420 16390693 146348 0

# cat /proc/net/udp
 sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt  ---
uid  timeout inode ref pointer drops
 75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
  0        0 2358 2 ffff81082a538c80 0
111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000  ---
  0        0 2286 2 ffff81042dd35c80 146348

In this example, only port 111 (0x006F) was flooded by messages that
user program could not read fast enough. 146348 messages were lost.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 21:04:56 -07:00
David S. Miller
caea902f72 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/rt2x00/Kconfig
	drivers/net/wireless/rt2x00/rt2x00usb.c
	net/sctp/protocol.c
2008-06-16 18:25:48 -07:00
Pavel Emelyanov
33de014c63 inet6: add struct net argument to inet6_ehashfn
Same as for inet_hashfn, prepare its ipv6 incarnation.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:13:48 -07:00
Pavel Emelyanov
2086a65078 inet: add struct net argument to inet_lhashfn
Listening-on-one-port sockets in many namespaces produce long 
chains in the listening_hash-es, so prepare the inet_lhashfn to 
take struct net into account.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:13:08 -07:00
Pavel Emelyanov
19c7578fb2 udp: add struct net argument to udp_hashfn
Every caller already has this one. The new argument is currently 
unused, but this will be fixed shortly.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:12:29 -07:00
Pavel Emelyanov
e31634931d udp: provide a struct net pointer for __udp[46]_lib_mcast_deliver
They both calculate the hash chain, but currently do not have
a struct net pointer, so pass one there via additional argument,
all the more so their callers already have such.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:12:11 -07:00
Pavel Emelyanov
d6266281f8 udp: introduce a udp_hashfn function
Currently the chain to store a UDP socket is calculated with
simple (x & (UDP_HTABLE_SIZE - 1)). But taking net into account
would make this calculation a bit more complex, so moving it into
a function would help.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 17:11:50 -07:00
YOSHIFUJI Hideaki
2b4743bd6b ipv6 sit: Avoid extra need for compat layer in PRL management.
We've introduced extra need of compat layer for ip_tunnel_prl{}
for PRL (Potential Router List) management.  Though compat_ioctl
is still missing in ipv4/ipv6, let's make the interface more
straight-forward and eliminate extra need for nasty compat layer
anyway since the interface is new for 2.6.26.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-16 16:48:20 -07:00
Brian Haley
7d06b2e053 net: change proto destroy method to return void
Change struct proto destroy function pointer to return void.  Noticed
by Al Viro.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-14 17:04:49 -07:00
David S. Miller
4ae127d1b6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/smc911x.c
2008-06-13 20:52:39 -07:00
Linus Torvalds
51558576ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  tcp: Revert 'process defer accept as established' changes.
  ipv6: Fix duplicate initialization of rawv6_prot.destroy
  bnx2x: Updating the Maintainer
  net: Eliminate flush_scheduled_work() calls while RTNL is held.
  drivers/net/r6040.c: correct bad use of round_jiffies()
  fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP
  ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list()
  netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info()
  netfilter: Make nflog quiet when no one listen in userspace.
  ipv6: Fail with appropriate error code when setting not-applicable sockopt.
  ipv6: Check IPV6_MULTICAST_LOOP option value.
  ipv6: Check the hop limit setting in ancillary data.
  ipv6 route: Fix route lifetime in netlink message.
  ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
  dccp: Bug in initial acknowledgment number assignment
  dccp ccid-3: X truncated due to type conversion
  dccp ccid-3: TFRC reverse-lookup Bug-Fix
  dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets
  dccp: Fix sparse warnings
  dccp ccid-3: Bug-Fix - Zero RTT is possible
2008-06-13 07:34:47 -07:00
David S. Miller
f23d60de71 ipv6: Fix duplicate initialization of rawv6_prot.destroy
In changeset 22dd485022
("raw: Raw socket leak.") code was added so that we
flush pending frames on raw sockets to avoid leaks.

The ipv4 part was fine, but the ipv6 part was not
done correctly.  Unlike the ipv4 side, the ipv6 code
already has a .destroy method for rawv6_prot.

So now there were two assignments to this member, and
what the compiler does is use the last one, effectively
making the ipv6 parts of that changeset a NOP.

Fix this by removing the:

	.destroy	   = inet6_destroy_sock,

line, and adding an inet6_destroy_sock() call to the
end of raw6_destroy().

Noticed by Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 16:34:34 -07:00
David S. Miller
e6e30add6b Merge branch 'net-next-2.6-misc-20080612a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next 2008-06-11 22:33:59 -07:00
Adrian Bunk
0b04082995 net: remove CVS keywords
This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-11 21:00:38 -07:00
YOSHIFUJI Hideaki
1717699cd5 ipv6: Fail with appropriate error code when setting not-applicable sockopt.
IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
Since they are virtually unavailable for stream sockets,
we should return ENOPROTOOPT instead of EINVAL.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:09 +09:00
YOSHIFUJI Hideaki
28d4488216 ipv6: Check IPV6_MULTICAST_LOOP option value.
Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
and we should return an error of EINVAL otherwise, per RFC3493.

Based on patch from Shan Wei <shanwei@cn.fujitsu.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:09 +09:00
Shan Wei
e8766fc86b ipv6: Check the hop limit setting in ancillary data.
When specifing the outgoing hop limit as ancillary data for sendmsg(),
the kernel doesn't check the integer hop limit value as specified in
[RFC-3542] section 6.3.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:08 +09:00
YOSHIFUJI Hideaki
36e3deae8b ipv6 route: Fix route lifetime in netlink message.
1) We may have route lifetime larger than INT_MAX.
In that case we had wired value in lifetime.
Use INT_MAX if lifetime does not fit in s32.

2) Lifetime is valid iif RTF_EXPIRES is set.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:08 +09:00
YOSHIFUJI Hideaki
20c61fbd8d ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 09:19:08 +09:00
YOSHIFUJI Hideaki
9501f97229 tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().
As we do for other socket/timewait-socket specific parameters,
let the callers pass appropriate arguments to
tcp_v{4,6}_do_calc_md5_hash().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 03:46:30 +09:00
YOSHIFUJI Hideaki
8d26d76dd4 tcp md5sig: Share most of hash calcucaltion bits between IPv4 and IPv6.
We can share most part of the hash calculation code because
the only difference between IPv4 and IPv6 is their pseudo headers.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:20 +09:00
YOSHIFUJI Hideaki
076fb72233 tcp md5sig: Remove redundant protocol argument.
Protocol is always TCP, so remove useless protocol argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:19 +09:00
YOSHIFUJI Hideaki
7d5d5525bd tcp md5sig: Share MD5 Signature option parser between IPv4 and IPv6.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:18 +09:00
Benjamin Thery
3de232554a ipv6 netns: Address labels per namespace
This pacth makes IPv6 address labels per network namespace.
It keeps the global label tables, ip6addrlbl_table, but
adds a 'net' member to each ip6addrlbl_entry.
This new member is taken into account when matching labels.

Changelog
=========
* v1: Initial version
* v2:
  * Minize the penalty when network namespaces are not configured:
      *  the 'net' member is added only if CONFIG_NET_NS is
         defined. This saves space when network namespaces are not
         configured.
      * 'net' value is retrieved with the inlined function
         ip6addrlbl_net() that always return &init_net when
         CONFIG_NET_NS is not defined.
  * 'net' member in ip6addrlbl_entry renamed to the less generic
    'lbl_net' name (helps code search).

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:15 +09:00
YOSHIFUJI Hideaki
2b5ead4644 ipv6 addrconf: Introduce addrconf_is_prefix_route() helper.
This inline function, for readability, returns if the route
is a "prefix" route regardless if it was installed by RA or by
hand.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:14 +09:00
Rami Rosen
7d120c55df ipv6 mroute: Use MRT6_VERSION instead of MRT_VERSION in ip6mr.c.
MRT6_VERSION should be used instead of MRT_VERSION in ip6mr.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:13 +09:00
Rami Rosen
9cba632e24 ipv6 mcast: Remove unused macro (MLDV2_QQIC) from mcast.c.
This patch removes  MLDV2_QQIC macro from mcast.c
as it is unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-12 02:38:12 +09:00
Linus Torvalds
f7f866eed0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  net: Fix routing tables with id > 255 for legacy software
  sky2: Hold RTNL while calling dev_close()
  s2io iomem annotations
  atl1: fix suspend regression
  qeth: start dev queue after tx drop error
  qeth: Prepare-function to call s390dbf was wrong
  qeth: reduce number of kernel messages
  qeth: Use ccw_device_get_id().
  qeth: layer 3 Oops in ip event handler
  virtio: use callback on empty in virtio_net
  virtio: virtio_net free transmit skbs in a timer
  virtio: Fix typo in virtio_net_hdr comments
  virtio_net: Fix skb->csum_start computation
  ehea: set mac address fix
  sfc: Recover from RX queue flush failure
  add missing lance_* exports
  ixgbe: fix typo
  forcedeth: msi interrupts
  ipsec: pfkey should ignore events when no listeners
  pppoe: Unshare skb before anything else
  ...
2008-06-11 08:39:51 -07:00
Arnaldo Carvalho de Melo
ce4a7d0d48 inet{6}_request_sock: Init ->opt and ->pktopts in the constructor
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-10 12:39:35 -07:00
David S. Miller
65b53e4cc9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/tg3.c
	drivers/net/wireless/rt2x00/rt2x00dev.c
	net/mac80211/ieee80211_i.h
2008-06-10 02:22:26 -07:00
Rami Rosen
e64bda89b8 netfilter: {ip,ip6,nfnetlink}_queue: misc cleanups
- No need to perform data_len = 0 in the switch command, since data_len
  is initialized to 0 in the beginning of the ipq_build_packet_message()
  method.

- {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is
  sure to be NULL when getting there; since skb is NULL, there is no need
  to check this fact and call kfree_skb().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 16:00:45 -07:00
Fabian Hugelshofer
718d4ad98e netfilter: nf_conntrack: properly account terminating packets
Currently the last packet of a connection isn't accounted when its causing
abnormal termination.

Introduces nf_ct_kill_acct() which increments the accounting counters on
conntrack kill. The new function was necessary, because there are calls
to nf_ct_kill() which don't need accounting:

nf_conntrack_proto_tcp.c line ~847:
Kills ct and returns NF_REPEAT. We don't want to count twice.

nf_conntrack_proto_tcp.c line ~880:
Kills ct and returns NF_DROP. I think we don't want to count dropped
packets.

nf_conntrack_netlink.c line ~824:
As far as I can see ctnetlink_del_conntrack() is used to destroy a
conntrack on behalf of the user. There is an sk_buff, but I don't think
this is an actual packet. Incrementing counters here is therefore not
desired.

Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:40 -07:00
Patrick McHardy
51091764f2 netfilter: nf_conntrack: add nf_ct_kill()
Encapsulate the common

	if (del_timer(&ct->timeout))
		ct->timeout.function((unsigned long)ct)

sequence in a new function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:59:06 -07:00
James Morris
17e6e59f0a netfilter: ip6_tables: add ip6tables security table
This is a port of the IPv4 security table for IPv6.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:58:05 -07:00
Pavel Emelyanov
2e761e0532 ipv6 netns: init net is used to set bindv6only for new sock
The bindv6only is tuned via sysctl. It is already on a struct net
and per-net sysctls allow for its modification (ipv6_sysctl_net_init).

Despite this the value configured in the init net is used for the
rest of them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-09 15:53:30 -07:00
Linus Torvalds
3e387fcdc4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
  l2tp: Fix possible oops if transmitting or receiving when tunnel goes down
  tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits.
  tcp: Increment OUTRSTS in tcp_send_active_reset()
  raw: Raw socket leak.
  lt2p: Fix possible WARN_ON from socket code when UDP socket is closed
  USB ID for Philips CPWUA054/00 Wireless USB Adapter 11g
  ssb: Fix context assertion in ssb_pcicore_dev_irqvecs_enable
  libertas: fix command size for CMD_802_11_SUBSCRIBE_EVENT
  ipw2200: expire and use oldest BSS on adhoc create
  airo warning fix
  b43legacy: Fix controller restart crash
  sctp: Fix ECN markings for IPv6
  sctp: Flush the queue only once during fast retransmit.
  sctp: Start T3-RTX timer when fast retransmitting lowest TSN
  sctp: Correctly implement Fast Recovery cwnd manipulations.
  sctp: Move sctp_v4_dst_saddr out of loop
  sctp: retran_path update bug fix
  tcp: fix skb vs fack_count out-of-sync condition
  sunhme: Cleanup use of deprecated calls to save_and_cli and restore_flags.
  xfrm: xfrm_algo: correct usage of RIPEMD-160
  ...
2008-06-04 17:39:33 -07:00
Denis V. Lunev
22dd485022 raw: Raw socket leak.
The program below just leaks the raw kernel socket

int main() {
        int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP);
        struct sockaddr_in addr;

        memset(&addr, 0, sizeof(addr));
        inet_aton("127.0.0.1", &addr.sin_addr);
        addr.sin_family = AF_INET;
        addr.sin_port = htons(2048);
        sendto(fd,  "a", 1, MSG_MORE, &addr, sizeof(addr));
        return 0;
}

Corked packet is allocated via sock_wmalloc which holds the owner socket,
so one should uncork it and flush all pending data on close. Do this in the
same way as in UDP.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 15:16:12 -07:00
David S. Miller
aed5a833fb Merge branch 'net-2.6-misc-20080605a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix 2008-06-04 12:10:21 -07:00
Denis V. Lunev
9596cc826e [IPV6]: Do not change protocol for UDPv6 sockets with pending sent data.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:38 +09:00
Denis V. Lunev
36d926b94a [IPV6]: inet_sk(sk)->cork.opt leak
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:38 +09:00
Denis V. Lunev
49d074f400 [IPV6]: Do not change protocol for raw IPv6 sockets.
It is not allowed to change underlying protocol for
   int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:37 +09:00
YOSHIFUJI Hideaki
91e1908f56 [IPV6] NETNS: Handle ancillary data in appropriate namespace.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:36 +09:00
YOSHIFUJI Hideaki
187e38384c [IPV6]: Check outgoing interface even if source address is unspecified.
The outgoing interface index (ipi6_ifindex) in IPV6_PKTINFO
ancillary data, is not checked if the source address (ipi6_addr)
is unspecified.  If the ipi6_ifindex is the not-exist interface,
it should be fail.

Based on patch from Shan Wei <shanwei@cn.fujitsu.com> and
Brian Haley <brian.haley@hp.com>.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:35 +09:00
Yang Hongyang
95b496b666 [IPV6]: Fix the data length of get destination options with short length
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
 This is because ipv6_getsockopt_sticky() returns the real length of
the option.

This patch fix this problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:35 +09:00
Yang Hongyang
05335c2220 [IPV6]: Fix the return value of get destination options with NULL data pointer
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
    getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);

This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.

This patch fix this problem.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:34 +09:00
YOSHIFUJI Hideaki
4bed72e4f5 [IPV6] ADDRCONF: Allow longer lifetime on 64bit archs.
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
  by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
  helper functions: addrconf_timeout_fixup() and
  addrconf_finite_timeout().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:34 +09:00
Colin
8283637231 [IPV6] TUNNEL6: Fix incoming packet length check for inter-protocol tunnel.
I discover a strange behavior in [ipv4 in ipv6] tunnel. When IPv6 tunnel
payload is less than 40(0x28), packet can be sent to network, received in
physical interface, but not seen in IP tunnel interface. No counter increase
in tunnel interface.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:32 +09:00
Thomas Graf
24ef0da7b8 [IPV6] ADDRCONF: Check range of prefix length
As of now, the prefix length is not vaildated when adding or deleting
addresses. The value is passed directly into the inet6_ifaddr structure
and later passed on to memcmp() as length indicator which relies on
the value never to exceed 128 (bits).

Due to the missing check, the currently code allows for any 8 bit
value to be passed on as prefix length while using the netlink
interface, and any 32 bit value while using the ioctl interface.

[Use unsigned int instead to generate better code - yoshfuji]

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-06-05 04:02:31 +09:00
YOSHIFUJI Hideaki
a3c960899e [IPV6] UDP: Possible dst leak in udpv6_sendmsg.
ip6_sk_dst_lookup returns held dst entry. It should be released
on all paths beyond this point. Add missed release when up->pending
is set.

Bug report and initial patch by Denis V. Lunev <den@openvz.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Denis V. Lunev <den@openvz.org>
2008-06-05 04:02:31 +09:00
Jarek Poplawski
b9c6989646 netfilter: nf_conntrack_ipv6: fix inconsistent lock state in nf_ct_frag6_gather()
[   63.531438] =================================
[   63.531520] [ INFO: inconsistent lock state ]
[   63.531520] 2.6.26-rc4 #7
[   63.531520] ---------------------------------
[   63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[   63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   63.531520]  (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[   63.531520] {softirq-on-W} state was registered at:
[   63.531520]   [<c0143bba>] __lock_acquire+0x3aa/0x1080
[   63.531520]   [<c0144906>] lock_acquire+0x76/0xa0
[   63.531520]   [<c07a8f0b>] _spin_lock+0x2b/0x40
[   63.531520]   [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
 ...

According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.

Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-04 09:58:27 -07:00
Al Viro
d430a227d2 bogus format in ip6mr
ptrdiff_t is %t..., not %Z...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-04 08:06:02 -07:00
David S. Miller
43154d08d6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/cpmac.c
	net/mac80211/mlme.c
2008-05-25 23:26:10 -07:00
Rami Rosen
071f92d059 net: The world is not perfect patch.
Unless there will be any objection here, I suggest consider the
following patch which simply removes the code for the
-DI_WISH_WORLD_WERE_PERFECT in the three methods which use it.

The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT
show that this code was not built and not used for really a long time.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 17:47:54 -07:00
Pavel Emelyanov
dc58c78c04 ip6mr: Use on-device stats instead of private ones.
Similar to ipmr.

[ Fix build failures -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 14:43:46 -07:00
Pavel Emelyanov
3dca02af38 ip6tnl: Use on-device stats instead of private ones.
This tunnel uses its own private structure and requires separate
patch to switch from private stats to on-device ones.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 14:17:05 -07:00
Pavel Emelyanov
4eecc107a8 sit: Use on-device stats instead of private ones.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-21 14:15:46 -07:00
Herbert Xu
1ac06e0306 ipsec: Use the correct ip_local_out function
Because the IPsec output function xfrm_output_resume does its
own dst_output call it should always call __ip_local_output
instead of ip_local_output as the latter may invoke dst_output
directly.  Otherwise the return values from nf_hook and dst_output
may clash as they both use the value 1 but for different purposes.

When that clash occurs this can cause a packet to be used after
it has been freed which usually leads to a crash.  Because the
offending value is only returned from dst_output with qdiscs
such as HTB, this bug is normally not visible.

Thanks to Marco Berizzi for his perseverance in tracking this
down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-20 14:32:14 -07:00
YOSHIFUJI Hideaki
6f704992d3 ipv6 addrconf: Allow infinite prefix lifetime.
We need to handle infinite prefix lifetime specially.
With help from original reporter "Bonitch, Joseph"
<Joseph.Bonitch@xerox.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 16:56:11 -07:00
YOSHIFUJI Hideaki
69cdf8f92a ipv6 route: Fix lifetime in netlink.
We could not see appropriate lifetime if the route had been scheduled
to expired at 0 (in jiffies).  We should check rt6i_flags instead of
rt6i_expires to determine whether lifetime is valid or not.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 16:55:13 -07:00
YOSHIFUJI Hideaki
a3264435b4 ipv6 addrconf: Fix route lifetime setting in corner case.
Because of arithmetic overflow avoidance, the actual lifetime setting
(vs the value given by RA) did not increase monotonically around
0x7fffffff/HZ.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 16:54:29 -07:00
YOSHIFUJI Hideaki
0686caa35e ndisc: Add missing strategies for per-device retrans timer/reachable time settings.
Noticed from Al Viro <viro@ftp.linux.org.uk> via David Miller
<davem@davemloft.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 16:25:42 -07:00
Pavel Emelyanov
34ac2573e8 ipv6: Register some net/ipv6/ core sysctls at read-only root.
There are some sysctls left to be switched to read-only,
but they are all in ipv6, so complete with them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 13:53:30 -07:00
Pavel Emelyanov
7d291ebb83 inet: Register fragmentation some ctls at read-only root.
Parts of fragments-related sysctls are read-only, but this is
done by cloning all the tables and dropping write-bits from
mode. Do the same but with read-only root.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 13:53:02 -07:00
Pavel Emelyanov
0002c630c4 ipv6: In fragmentation code, handle error returned from register_pernet_subsys.
The error code is ignored now, but ipv6 is a module and one can
be loaded under memory pressure, so the error may occur (in theory).

Besides, I'm going to handle error returned from registering a
read-only part of the table, so ignoring this one, while handing
the other one would look strange.

(However, this possibility of error is rather small, so I'm not 
 sure whether this is a candidate for current net tree).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 13:52:28 -07:00
Pavel Emelyanov
0a64b4b811 inet: Rename fragmentation sysctl-related functions/variables.
The fragments sysctls also contains some, that are to be
visible, but read-only in net namespaces.

The naming in net/core/sysctl_net_core.c is - tables, that are
to be registered in namespaces have a "ns" word in their names.
So rename ones in ipv4/ip_fragment.c and ipv6/reassembly.c to
fit this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-19 13:51:29 -07:00
Johannes Berg
f5184d267c net: Allow netdevices to specify needed head/tailroom
This patch adds needed_headroom/needed_tailroom members to struct
net_device and updates many places that allocate sbks to use them. Not
all of them can be converted though, and I'm sure I missed some (I
mostly grepped for LL_RESERVED_SPACE)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-12 20:48:31 -07:00
David S. Miller
36ca34cc3b sit: Add missing kfree_skb() on pskb_may_pull() failure.
Noticed by Paul Marks <paul@pmarks.net>.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08 23:40:26 -07:00
Satoru SATOH
5ffc02a158 ip: Use inline function dst_metric() instead of direct access to dst->metric[]
There are functions to refer to the value of dst->metric[THE_METRIC-1]
directly without use of a inline function "dst_metric" defined in
net/dst.h.

The following patch changes them to use the inline function
consistently.

Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-04 22:14:42 -07:00
Daniel Lezcano
4ac2ccd016 netns: Fix reassembly timer to use the right namespace
This trivial fix retrieves the network namespace from frag queue
and use it to get the network device in the right namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 17:02:03 -07:00
Denis V. Lunev
0bb53a66fe ipv6: assign PDE->data before gluing PDE into /proc tree
Simply replace proc_create and further data assigned with proc_create_data.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 02:46:55 -07:00
Linus Torvalds
ccc7518415 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ipv6: Compilation fix for compat MCAST_MSFILTER sockopts.
2008-04-30 20:13:22 -07:00
Pavel Emelyanov
8099179031 ipv6: Compilation fix for compat MCAST_MSFILTER sockopts.
The last hunk from the commit dae50295 (ipv4/ipv6 compat: Fix SSM
applications on 64bit kernels.) escaped from the compat_ipv6_setsockopt
to the ipv6_getsockopt (I guess due to patch smartness wrt searching
for context) thus breaking 32-bit and 64-bit-without-compat compilation.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-30 14:49:54 -07:00
Linus Torvalds
95dfec6ae1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  tcp: Overflow bug in Vegas
  [IPv4] UFO: prevent generation of chained skb destined to UFO device
  iwlwifi: move the selects to the tristate drivers
  ipv4: annotate a few functions __init in ipconfig.c
  atm: ambassador: vcc_sf semaphore to mutex
  MAINTAINERS: The socketcan-core list is subscribers-only.
  netfilter: nf_conntrack: padding breaks conntrack hash on ARM
  ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
  sch_sfq: use del_timer_sync() in sfq_destroy()
  net: Add compat support for getsockopt (MCAST_MSFILTER)
  net: Several cleanups for the setsockopt compat support.
  ipvs: fix oops in backup for fwmark conn templates
  bridge: kernel panic when unloading bridge module
  bridge: fix error handling in br_add_if()
  netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
  netfilter: x_tables: fix net namespace leak when reading /proc/net/xxx_tables_names
  netfilter: xt_TCPOPTSTRIP: signed tcphoff for ipv6_skip_exthdr() retval
  tcp: Limit cwnd growth when deferring for GSO
  tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled
  [netdrvr] gianfar: Determine TBIPA value dynamically
  ...
2008-04-30 08:45:48 -07:00
David L Stevens
42908c69f6 net: Add compat support for getsockopt (MCAST_MSFILTER)
This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:23:22 -07:00
Arnaud Ebalard
9a732ed6d0 netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
While reinjecting *bigger* modified versions of IPv6 packets using
libnetfilter_queue, things work fine on a 2.6.24 kernel (2.6.22 too)
but I get the following on recents kernels (2.6.25, trace below is
against today's net-2.6 git tree):

skb_over_panic: text:c04fddb0 len:696 put:632 head:f7592c00 data:f7592c00 tail:0xf7592eb8 end:0xf7592e80 dev:eth0
------------[ cut here ]------------
invalid opcode: 0000 [#1] PREEMPT 
Process sendd (pid: 3657, ti=f6014000 task=f77c31d0 task.ti=f6014000)
Stack: c071e638 c04fddb0 000002b8 00000278 f7592c00 f7592c00 f7592eb8 f7592e80 
       f763c000 f6bc5200 f7592c40 f6015c34 c04cdbfc f6bc5200 00000278 f6015c60 
       c04fddb0 00000020 f72a10c0 f751b420 00000001 0000000a 000002b8 c065582c 
Call Trace:
 [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
 [<c04cdbfc>] ? skb_put+0x3c/0x40
 [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
 [<c04fd115>] ? nfnetlink_rcv_msg+0xf5/0x160
 [<c04fd03e>] ? nfnetlink_rcv_msg+0x1e/0x160
 [<c04fd020>] ? nfnetlink_rcv_msg+0x0/0x160
 [<c04f8ed7>] ? netlink_rcv_skb+0x77/0xa0
 [<c04fcefc>] ? nfnetlink_rcv+0x1c/0x30
 [<c04f8c73>] ? netlink_unicast+0x243/0x2b0
 [<c04cfaba>] ? memcpy_fromiovec+0x4a/0x70
 [<c04f9406>] ? netlink_sendmsg+0x1c6/0x270
 [<c04c8244>] ? sock_sendmsg+0xc4/0xf0
 [<c011970d>] ? set_next_entity+0x1d/0x50
 [<c0133a80>] ? autoremove_wake_function+0x0/0x40
 [<c0118f9e>] ? __wake_up_common+0x3e/0x70
 [<c0342fbf>] ? n_tty_receive_buf+0x34f/0x1280
 [<c011d308>] ? __wake_up+0x68/0x70
 [<c02cea47>] ? copy_from_user+0x37/0x70
 [<c04cfd7c>] ? verify_iovec+0x2c/0x90
 [<c04c837a>] ? sys_sendmsg+0x10a/0x230
 [<c011967a>] ? __dequeue_entity+0x2a/0xa0
 [<c011970d>] ? set_next_entity+0x1d/0x50
 [<c0345397>] ? pty_write+0x47/0x60
 [<c033d59b>] ? tty_default_put_char+0x1b/0x20
 [<c011d2e9>] ? __wake_up+0x49/0x70
 [<c033df99>] ? tty_ldisc_deref+0x39/0x90
 [<c033ff20>] ? tty_write+0x1a0/0x1b0
 [<c04c93af>] ? sys_socketcall+0x7f/0x260
 [<c0102ff9>] ? sysenter_past_esp+0x6a/0x91
 [<c05f0000>] ? snd_intel8x0m_probe+0x270/0x6e0
 =======================
Code: 00 00 89 5c 24 14 8b 98 9c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 50 89 4c 24 04 c7 04 24 38 e6 71 c0 89 44 24 08 e8 c4 46 c5 ff <0f> 0b eb fe 55 89 e5 56 89 d6 53 89 c3 83 ec 0c 8b 40 50 39 d0 
EIP: [<c04ccdfc>] skb_over_panic+0x5c/0x60 SS:ESP 0068:f6015bf8


Looking at the code, I ended up in nfq_mangle() function (called by
nfqnl_recv_verdict()) which performs a call to skb_copy_expand() due to
the increased size of data passed to the function. AFAICT, it should ask
for 'diff' instead of 'diff - skb_tailroom(e->skb)'. Because the
resulting sk_buff has not enough space to support the skb_put(skb, diff)
call a few lines later, this results in the call to skb_over_panic().

The patch below asks for allocation of a copy with enough space for
mangled packet and the same amount of headroom as old sk_buff. While
looking at how the regression appeared (e2b58a67), I noticed the same
pattern in ipq_mangle_ipv6() and ipq_mangle_ipv4(). The patch corrects
those locations too.

Tested with bigger reinjected IPv6 packets (nfqnl_mangle() path), things
are ok (2.6.25 and today's net-2.6 git tree).

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29 03:16:34 -07:00
Linus Torvalds
77a50df2b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  iwlwifi: Allow building iwl3945 without iwl4965.
  wireless: Fix compile error with wifi & leds
  tcp: Fix slab corruption with ipv6 and tcp6fuzz
  ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
  [IPSEC]: Use digest_null directly for auth
  sunrpc: fix missing kernel-doc
  can: Fix copy_from_user() results interpretation
  Revert "ipv6: Fix typo in net/ipv6/Kconfig"
  tipc: endianness annotations
  ipv6: result of csum_fold() is already 16bit, no need to cast
  [XFRM] AUDIT: Fix flowlabel text format ambibuity.
2008-04-28 09:44:11 -07:00
David L Stevens
dae5029548 ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 14:26:53 -07:00
David S. Miller
5c5d6dabb7 Revert "ipv6: Fix typo in net/ipv6/Kconfig"
This reverts commit 5b3f129c55.

As requested by Maciej W. Rozycki.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 14:26:50 -07:00
Al Viro
ec6b486fa9 ipv6: result of csum_fold() is already 16bit, no need to cast
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 14:26:49 -07:00
Linus Torvalds
2e561c7b7e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits)
  net: Fix wrong interpretation of some copy_to_user() results.
  xfrm: alg_key_len & alg_icv_len should be unsigned
  [netdrvr] tehuti: move ioctl perm check closer to function start
  ipv6: Fix typo in net/ipv6/Kconfig
  via-velocity: fix vlan receipt
  tg3: sparse cleanup
  forcedeth: realtek phy crossover detection
  ibm_newemac: Increase MDIO timeouts
  gianfar: Fix skb allocation strategy
  netxen: reduce stack usage of netxen_nic_flash_print
  smc911x: test after postfix decrement fails in smc911x_{reset,drop_pkt}
  net drivers: fix platform driver hotplug/coldplug
  forcedeth: new backoff implementation
  ehea: make things static
  phylib: Add support for board-level PHY fixups
  [netdrvr] atlx: code movement: move atl1 parameter parsing
  atlx: remove flash vendor parameter
  korina: misc cleanup
  korina: fix misplaced return statement
  WAN: Fix confusing insmod error code for C101 too.
  ...
2008-04-25 12:28:28 -07:00
Michael Beasley
5b3f129c55 ipv6: Fix typo in net/ipv6/Kconfig
Two is used in the wrong context here, as you are connecting to an
IPv6 network over IPv4; not connecting two IPv6 networks to an IPv4
one.

Signed-off-by: Michael Beasley <youvegotmoxie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-24 23:50:30 -07:00
YOSHIFUJI Hideaki
1a98d05f59 ipv6 RAW: Disallow IPPROTO_IPV6-level IPV6_CHECKSUM socket option on ICMPv6 sockets.
RFC3542 tells that IPV6_CHECKSUM socket option in the IPPROTO_IPV6
level is not allowed on ICMPv6 sockets.  IPPROTO_RAW level
IPV6_CHECKSUM socket option (a Linux extension) is still allowed.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-24 21:30:38 -07:00
Linus Torvalds
b0d19a378a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  iwlwifi: Fix built-in compilation of iwlcore
  net: Unexport move_addr_to_{kernel,user}
  rt2x00: Select LEDS_CLASS.
  iwlwifi: Select LEDS_CLASS.
  leds: Do not guard NEW_LEDS with HAS_IOMEM
  [IPSEC]: Fix catch-22 with algorithm IDs above 31
  time: Export set_normalized_timespec.
  tcp: Make use of before macro in tcp_input.c
  hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c
  [NETNS]: Remove empty ->init callback.
  [DCCP]: Convert do_gettimeofday() to getnstimeofday().
  [NETNS]: Don't initialize err variable twice.
  [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop.
  [IPV4]: Convert do_gettimeofday() to getnstimeofday().
  [IPV4]: Make icmp_sk_init() static.
  [IPV6]: Make struct ip6_prohibit_entry_template static.
  tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c
  [NET]: Expose netdevice dev_id through sysfs
  skbuff: fix missing kernel-doc notation
  [ROSE]: Fix soft lockup wrt. rose_node_list_lock
2008-04-23 12:23:45 -07:00
Pavel Emelyanov
92998dd495 [NETNS]: Remove empty ->init callback.
The netns start-stop engine can happily live with any of
init or exit callbacks set to NULL.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-21 14:33:16 -07:00
Pavel Emelyanov
633d424bf3 [NETNS]: Don't initialize err variable twice.
The ip6_route_net_init() performs some unneeded actions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-21 14:25:23 -07:00
Pavel Emelyanov
2aed2827df [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop.
The del_timer() function doesn't guarantee, that the timer callback
is not active by the time it exits.

Thus, the fib6_net_exit() may kfree() all the data, that is required
by the fib6_run_gc(). The race window is tiny, but slab poisoning can
trigger this bug.

Using del_timer_sync() will cure this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-21 14:23:03 -07:00
Adrian Bunk
280a34c87f [IPV6]: Make struct ip6_prohibit_entry_template static.
This patch makes the needlessly global struct
ip6_prohibit_entry_template static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-21 02:29:32 -07:00
Matthew Wilcox
5f090dcb4d net: Remove unnecessary inclusions of asm/semaphore.h
None of these files use any of the functionality promised by
asm/semaphore.h.  It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:15:50 -04:00
David S. Miller
3c051235a7 [IPV6]: Fix dangling references on error in fib6_add().
Fixes bugzilla #8895

If a super-tree leaf has 'rt' assigned to it and we
get an error from fib6_add_rt2node(), we'll leave
a reference to 'rt' in pn->leaf and then do an
unconditional dst_free().

We should prune such references.

Based upon a report by Vincent Perrier.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-18 01:46:19 -07:00
Pavel Emelyanov
e56d8b8a2e [INET]: Drop the inet_inherit_port() call.
As I can see from the code, two places (tcp_v6_syn_recv_sock and
dccp_v6_request_recv_sock) that call this one already run with
BHs disabled, so it's safe to call __inet_inherit_port there.

Besides (in case I missed smth with code review) the calltrace
tcp_v6_syn_recv_sock
 `- tcp_v4_syn_recv_sock
     `- __inet_inherit_port
and the similar for DCCP are valid, but assumes BHs to be disabled.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-17 23:17:34 -07:00
Denis V. Lunev
48115becf6 [NETNS]: Add netns refcnt debug for dst ops.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 02:01:34 -07:00
Pavel Emelyanov
554eb27782 [IP6TUNNEL]: Allow to create IP6 tunnels in net namespaces.
And no need in some IPPROTO_XXX enabling, since ipv6 code
doesn't have any filtering.

So, just set proper net and mark device with NETNS_LOCAL.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:24:13 -07:00
Pavel Emelyanov
2f7f54b725 [IP6TUNNEL]: Use proper net instead of init_net stubs.
All the ip_route_output_key(), dev_get_by_...() and ipv6_chk_addr()
calls are now stubbed with init_net.

Fortunately, all the places already have where to get the proper
net from.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:23:44 -07:00
Pavel Emelyanov
3e6c9fb5f5 [IP6TUNNEL]: Make tunnels hashes per-net.
Move hashes in the struct ip6_tnl_net, replace tnls_xxx[] with 
ip6n->tnlx_xxx[] and handle init and exit appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:23:22 -07:00
Pavel Emelyanov
15820e1290 [IP6TUNNEL]: Make the fallback tunnel device per-net.
All the code, that reference it already has the ip6_tnl_net pointer,
so s/ip6_fb_tnl_dev/ip6n->fb_tnl_dev/ and move creation/releasing
code into net init/exit ops.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:23:02 -07:00
Pavel Emelyanov
8704ca7e91 [IP6TUNNEL]: Use proper net in hash-lookup functions.
Calls to ip6_tnl_lookup were stubbed with init_net - give them
a proper one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:22:43 -07:00
Pavel Emelyanov
2dd02c897d [IP6TUNNEL]: Add (ip6_tnl_)net argument to some calls.
Hashes and fallback device used in them will be per-net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:22:23 -07:00
Pavel Emelyanov
13eeb8e92c [IP6TUNNEL]: Introduce empty ip6_tnl_net structure and net ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:22:02 -07:00
Pavel Emelyanov
7a97146cc6 [SIT]: Allow to create SIT tunnels in net namespaces.
Set proper net and mark a new device as NETNS_LOCAL before registering.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:17:18 -07:00
Pavel Emelyanov
907a08c402 [SIT]: Use proper net in routing calls.
I.e. replace init_net stubs in ip_route_output_key() calls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:16:58 -07:00
Pavel Emelyanov
291821766b [SIT]: Make tunnels hashes per-net.
Just move all the hashes on the sit_net structure and
patch the rest of the code appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:16:38 -07:00
Pavel Emelyanov
cd3dbc194d [SIT]: Make the fallback tunnel device per-net
Allocate and register one in sit_init_net, use sitn->fb_tunnel_dev
over the code and unregister one in sit_exit_net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:16:18 -07:00
Pavel Emelyanov
fcee5ec9fd [SIT]: Use proper net in hash-lookup functions.
Replace introduced in the previous patch init_net stubs 
with the proper net pointer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:15:59 -07:00
Pavel Emelyanov
ca8def1483 [SIT]: Add net/sit_net argument to some functions.
... to make them prepared for future hashes and fallback device
move on the struct sit_net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:15:39 -07:00
Pavel Emelyanov
8190d9009a [SIT]: Introduce empty struct sit_net and init/exit net ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:15:17 -07:00
David S. Miller
334f8b2afd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26 2008-04-14 03:50:43 -07:00
Pavel Emelyanov
7477fd2e6b [SOCK]: Add some notes about per-bind-bucket sock lookup.
I was asked about "why don't we perform a sk_net filtering in
bind_conflict calls, like we do in other sock lookup places"
for a couple of times.

Can we please add a comment about why we do not need one?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 02:42:27 -07:00
David S. Miller
df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Jan Engelhardt
3c9fba656a [NETFILTER]: nf_conntrack: replace NF_CT_DUMP_TUPLE macro indrection by function call
Directly call IPv4 and IPv6 variants where the address family is
easily known.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt
09f263cd39 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l4proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt
8ce8439a31 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l3proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
d63a650736 [NETFILTER]: Add partial checksum validation helper
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy
544473c166 [NETFILTER]: {ip,ip6,arp}_tables: return EAGAIN for invalid SO_GET_ENTRIES size
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.

Return EAGAIN so userspace can retry the operation.

Fixes (with current iptables SVN version) netfilter bugzilla #104.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:45 +02:00
Jan Engelhardt
58c0fb0ddd [NETFILTER]: annotate rest of nf_conntrack_* with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:42 +02:00
Jan Engelhardt
5452e425ad [NETFILTER]: annotate {arp,ip,ip6,x}tables with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:35 +02:00
Jan Engelhardt
3cf93c96af [NETFILTER]: annotate xtables targets with const and remove casts
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:05 +02:00
Robert P. J. Day
fdccecd0cc [NETFILTER]: Use non-deprecated __RW_LOCK_UNLOCKED macro
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:03 +02:00
Patrick McHardy
36e2a1b0f7 [NETFILTER]: {ip,ip6}t_LOG: print MARK value in log output
Dump the mark value in log messages similar to nfnetlink_log. This
is useful for debugging complex setups where marks are used for
routing or traffic classification.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:01 +02:00
Rami Rosen
0912ea38de [IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().
This patches adds a call to increment IPSTATS_MIB_OUTFORWDATAGRAMS
when forwarding the packet in ip6_mr_forward() in the IPv6 multicast
routing module (net/ipv6/ip6mr.c).

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:59:13 -07:00
YOSHIFUJI Hideaki
9625ed72e8 [IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
As far as I can remember, I was going to disable privacy extensions
on all "tunnel" interfaces.  Disable it on ip6-ip6 interface as well.

Also, just remove ifdefs for SIT for simplicity.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:47:11 -07:00
YOSHIFUJI Hideaki
b077d7abab [IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:42:18 -07:00
Jan Engelhardt
0b18542b7f [NET]: Sink IPv6 menuoptions into its own submenu
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:30:47 -07:00
YOSHIFUJI Hideaki
e7712f1a7c [IPV6]: Share common code-paths for sticky socket options.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:21:52 -07:00
YOSHIFUJI Hideaki
cee8947338 [IPV6] MROUTE: Do not call ipv6_find_idev() directly.
Since NETDEV_REGISTER notifier chain is responsible for creating
inet6_dev{}, we do not need to call ipv6_find_idev() directly here.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:21:16 -07:00
David S. Miller
b45e9189c0 [IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
Fixes kernel bugzilla 10437

Based almost entirely upon a patch by Dmitry Butskoy.

When deciding what raw sockets to deliver the ICMPv6
to, we should use the addresses in the ICMPv6 quoted
IPV6 header, not the top-level one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 23:14:15 -07:00
Denis V. Lunev
5f4472c5a6 [TCP]: Remove owner from tcp_seq_afinfo.
Move it to tcp_seq_afinfo->seq_fops as should be.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:53 -07:00
Denis V. Lunev
68fcadd16c [TCP]: Place file operations directly into tcp_seq_afinfo.
No need to have separate never-used variable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:30 -07:00
Denis V. Lunev
9427c4b36b [TCP]: Move seq_ops from tcp_iter_state to tcp_seq_afinfo.
No need to create seq_operations for each instance of 'netstat'.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:12:13 -07:00
YOSHIFUJI Hideaki
05f175cdcf [IPV6]: Fix IPV6_RECVERR for connected raw sockets.
Based on patch from Dmitry Butskoy <buc@odusz.so-cdu.ru>.

Closes: 10437
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:28 +09:00
Brian Haley
876c7f4196 [IPv6]: Change IPv6 unspecified destination address to ::1 for raw and un-connected sockets
This patch fixes a difference between IPv4 and IPv6 when sending packets
to the unspecified address (either 0.0.0.0 or ::) when using raw or
un-connected UDP sockets.  There are two cases where IPv6 either fails
to send anything, or sends with the destination address set to ::.  For
example:

--> ping -c1 0.0.0.0
PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms

--> ping6 -c1 ::
PING ::(::) 56 data bytes
ping: sendmsg: Invalid argument

Doing a sendto("0.0.0.0") reveals:

10:55:01.495090 IP localhost.32780 > localhost.7639: UDP, length 100

Doing a sendto("::") reveals:

10:56:13.262478 IP6 fe80::217:8ff:fe7d:4718.32779 > ::.7639: UDP, length 100

If you issue a connect() first in the UDP case, it will be sent to ::1,
similar to what happens with TCP.

This restores the BSD-ism.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:27 +09:00
Rami Rosen
6ac7eb0868 [IPV6] MROUTE: Adjust IPV6 multicast routing module to use mroute6 header declarations.
- This patch adjusts IPv6 multicast routing module, net/ipv6/ip6mr.c,
to use mroute6 header definitions instead of mroute.
 (MFC6_LINES instead of MFC_LINES, MAXMIFS instead of MAXVIFS, mifi_t
instead of vifi_t.)

- In addition, inclusion of some headers was removed as it is not needed.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:26 +09:00
YOSHIFUJI Hideaki
b2a9d7c2f8 [IPV6]: Check length of int/boolean optval provided by user in setsockopt().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:24 +09:00
Wang Chen
a28398ba61 [IPV6]: Check length of optval provided by user in setsockopt().
Check length of setsockopt's optval, which provided by user, before copy it
from user space.
For POSIX compliant, return -EINVAL for setsockopt of short lengths.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:23 +09:00
YOSHIFUJI Hideaki
7f1eced8b0 [IPV6] MIP6: Use our standard definitions for paddings.
MIP6_OPT_PAD_X are actually for paddings in destination
option header.  Replace them with our standard IPV6_TLV_PADX.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:22 +09:00
YOSHIFUJI Hideaki
d7aabf22ef [IPV6]: Use in6addr_any where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:20 +09:00
YOSHIFUJI Hideaki
f3ee4010e8 [IPV6]: Define constants for link-local multicast addresses.
- Define link-local all-node / all-router multicast addresses.
- Remove ipv6_addr_all_nodes() and ipv6_addr_all_routers().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:19 +09:00
YOSHIFUJI Hideaki
9acd9f3ae9 [IPV6]: Make address arguments const.
- net/ipv6/addrconf.c:
	ipv6_get_ifaddr(), ipv6_dev_get_saddr()
- net/ipv6/mcast.c:
	ipv6_sock_mc_join(), ipv6_sock_mc_drop(),
	inet6_mc_check(),
	ipv6_dev_mc_inc(), __ipv6_dev_mc_dec(), ipv6_dev_mc_dec(),
	ipv6_chk_mcast_addr()
- net/ipv6/route.c:
	rt6_lookup(), icmp6_dst_alloc()
- net/ipv6/ip6_output.c:
	ip6_nd_hdr()
- net/ipv6/ndisc.c:
	ndisc_send_ns(), ndisc_send_rs(), ndisc_send_redirect(),
	ndisc_get_neigh(), __ndisc_send()

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:18 +09:00
YOSHIFUJI Hideaki
dfd982baff [IPV6] ADDRCONF: Uninline ipv6_isatap_eui64().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:17 +09:00
YOSHIFUJI Hideaki
3eb84f4929 [IPV6] ADDRCONF: Uninline ipv6_addr_hash().
The function is only used in net/ipv6/addrconf.c.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-12 13:43:15 +09:00
YOSHIFUJI Hideaki
caad295fed [IPV6]: Use ipv6_addr_equal() instead of !ipv6_addr_cmp().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-11 19:47:55 +09:00
YOSHIFUJI Hideaki
ff4e1fb0be [IPV6] FIB_RULE: Sparse: fib6_rules_cleanup() is of void.
| net/ipv6/fib6_rules.c:319:2: warning: returning void-valued expression

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-11 19:47:53 +09:00
YOSHIFUJI Hideaki
a9f83bf385 [IPV6]: Sparse: Reuse previous delaration where appropriate.
| net/ipv6/ipv6_sockglue.c:162:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ipv6_sockglue.c:175:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ip6mr.c:1241:10: warning: symbol 'ret' shadows an earlier one
| net/ipv6/ip6mr.c:1163:6: originally declared here

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-11 19:47:52 +09:00
YOSHIFUJI Hideaki
02e10b90cd [IPV6] SIT: Sparse: Use NULL pointer instead of 0.
| net/ipv6/sit.c:382:42: warning: Using plain integer as NULL pointer

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-11 19:47:51 +09:00
YOSHIFUJI Hideaki
aba6096b21 [IPV6]: Kill several warnings without CONFIG_IPV6_MROUTE.
Pointed out by Andrew Morton <akpm@linux-foundation.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-11 19:47:49 +09:00
Florian Westphal
4dfc281702 [Syncookies]: Add support for TCP options via timestamps.
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.

Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.

The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.

changes since v1:
 don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
 and have ipv6/syncookies.c use it.
 Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()

Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-10 03:12:40 -07:00
David S. Miller
8eefca4888 Merge branch 'net-2.6.26-isatap-20080403' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev 2008-04-08 02:33:36 -07:00
YOSHIFUJI Hideaki
549e028d01 [IPV6] MROUTE: Use skb_tail_pointer(skb) instead of skb->tail.
This bug resulted in compilation error on 64bit machines.
Pointed out by Rami Rosen <roszenrami@gmail.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-05 22:35:14 +09:00
YOSHIFUJI Hideaki
14fb64e1f4 [IPV6] MROUTE: Support PIM-SM (SSM).
Based on ancient patch by Mickael Hoerdt
<hoerdt@clarinet.u-strasbg.fr>, which is available at
<http://www-r2.u-strasbg.fr/~hoerdt/dev/linux_ipv6_mforwarding/patch-linux-ipv6-mforwarding-0.1a>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-05 22:33:39 +09:00
YOSHIFUJI Hideaki
7bc570c8b4 [IPV6] MROUTE: Support multicast forwarding.
Based on ancient patch by Mickael Hoerdt
<hoerdt@clarinet.u-strasbg.fr>, which is available at
<http://www-r2.u-strasbg.fr/~hoerdt/dev/linux_ipv6_mforwarding/patch-linux-ipv6-mforwarding-0.1a>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-05 22:33:38 +09:00
YOSHIFUJI Hideaki
f6a07b293f [IPV6] ADDRCONF: Fix array size for sysctls.
We have been using __NET_IPV6_MAX for adjusting the size of array
for sysctl table, but it does not work any longer because of the
deprecation of NET_IPV6_xxx constants.  Let's use DEVCONF_MAX
instead.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-04 10:44:41 +09:00
Denis V. Lunev
1ed8516f09 [IPV6]: Simplify IPv6 control sockets creation.
Do this by replacing sock_create_kern with inet_ctl_sock_create.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:31:03 -07:00
Denis V. Lunev
5677242f43 [NETNS]: Inet control socket should not hold a namespace.
This is a generic requirement, so make inet_ctl_sock_create namespace
aware and create a inet_ctl_sock_destroy wrapper around
sk_release_kernel.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:28:30 -07:00
Denis V. Lunev
eee4fe4ded [INET]: Let inet_ctl_sock_create return sock rather than socket.
All upper protocol layers are already use sock internally.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:27:58 -07:00
Denis V. Lunev
3d58b5fa8e [INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create.
This call is nothing common with INET connection sockets code. It
simply creates an unhashes kernel sockets for protocol messages.

Move the new call into af_inet.c after the rename.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 14:22:32 -07:00
Denis V. Lunev
84f59370c5 [IPV6]: Fix refcounting for anycast dst entries.
Anycast DST entries allocated inside ipv6_dev_ac_inc are leaked when
network device is stopped without removing IPv6 addresses from it. The
bug has been observed in the reality on 2.6.18-rhel5 kernel.

In the above case addrconf_ifdown marks all entries as obsolete and
ip6_del_rt called from __ipv6_dev_ac_dec returns ENOENT. The
referrence is not dropped.

The fix is simple. DST entry should not keep referrence when stored in
the FIB6 tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 13:33:00 -07:00
Denis V. Lunev
eb86757931 [IPV6]: inet6_dev on loopback should be kept until namespace stop.
In the other case it will be destroyed when last address will be removed
from lo inside a namespace. This will break IPv6 in several places. The
most obvious one is ip6_dst_ifdown.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 13:31:53 -07:00
Denis V. Lunev
439e23857a [IPV6]: Event type in addrconf_ifdown is mis-used.
addrconf_ifdown is broken in respect to the usage of how
parameter. This function is called with (event != NETDEV_DOWN) and (2)
on the IPv6 stop.  It the latter case inet6_dev from loopback device
should be destroyed.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 13:30:17 -07:00
Herbert Xu
af2681828a [ICMP]: Ensure that ICMP relookup maintains status quo
The ICMP relookup path is only meant to modify behaviour when
appropriate IPsec policies are in place and marked as requiring
relookups.  It is certainly not meant to modify behaviour when
IPsec policies don't exist at all.

However, due to an oversight on the error paths existing behaviour
may in fact change should one of the relookup steps fail.

This patch corrects this by redirecting all errors on relookup
failures to the previous code path.  That is, if the initial
xfrm_lookup let the packet pass, we will stand by that decision
should the relookup fail due to an error.

This should be safe from a security point-of-view because compliant
systems must install a default deny policy so the packet would'nt
have passed in that case.

Many thanks to Julian Anastasov for pointing out this error.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-03 12:52:19 -07:00
David S. Miller
e1ec1b8ccd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/s2io.c
2008-04-02 22:35:23 -07:00
YOSHIFUJI Hideaki
de357cc013 [IPV6] NDISC: Don't rely on node-type hint from L2 unless required.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:06:01 +09:00
YOSHIFUJI Hideaki
52eeeb8481 [IPV6]: Unify ip6_onlink() and ipip6_onlink().
Both are identical, let's create ipv6_chk_prefix() and use it
in both places.
2008-04-03 10:06:00 +09:00
YOSHIFUJI Hideaki
6294e00073 [IPV6] NDISC: Ignore route information with /0 prefix from interior router.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:06:00 +09:00
YOSHIFUJI Hideaki
300aaeeaab [IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:06:00 +09:00
YOSHIFUJI Hideaki
0009ae1f50 [IPV6] SIT: Disallow 0.0.0.0 in PRL and Flush PRL if given for DEL.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:05:59 +09:00
YOSHIFUJI Hideaki
3fcfa12904 [IPV6] SIT: Fix locking issues in PRL management.
To protect PRL list, use ipip6_lock.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:05:59 +09:00
Templin, Fred L
fadf6bf060 [IPV6] SIT: Add PRL management for ISATAP.
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management.

[Add several changes of structure name, constant names etc. - yoshfuji]

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-04-03 10:05:58 +09:00
Herbert Xu
f32c5f2c38 [IPV6]: Fix ICMP relookup error path dst leak
When we encounter an error while looking up the dst the second
time we need to drop the first dst.  This patch is pretty much
the same as the one for IPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-02 00:06:09 -07:00
Benoit Boissinot
eac55bf970 IPv6: do not create temporary adresses with too short preferred lifetime
From RFC341:
A temporary address is created only if this calculated Preferred
Lifetime is greater than REGEN_ADVANCE time units.  In particular, an
implementation must not create a temporary address with a zero
Preferred Lifetime.

Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-02 00:01:35 -07:00
Benoit Boissinot
c6fbfac2e6 IPv6: only update the lifetime of the relevant temporary address
When receiving a prefix information from a routeur, only update the
lifetimes of the temporary address associated with that prefix.

Otherwise if one deprecated prefix is advertized, all your temporary
addresses will become deprecated.

Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-02 00:00:58 -07:00
YOSHIFUJI Hideaki
f0bdb7ba5a [IPV6] RAW: Remove ancient comment.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-01 23:57:36 -07:00
Pavel Emelyanov
dfb12eb70f [IPV6][NETNS]: Display per-net info in sockstat6 file.
Do with the sockstat6 file what we've already done for the sockstat. 
Same good side effect - ipv6 reassembling stats are now shown per-net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:43:43 -07:00
Pavel Emelyanov
d0538ca355 [SOCK][NETNS]: Register sockstat(6) files in each net.
Currently they live in init_net only, but now almost all the info
they can provide is available per-net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:42:37 -07:00
Pavel Emelyanov
c29a0bc4df [SOCK][NETNS]: Add a struct net argument to sock_prot_inuse_add and _get.
This counter is about to become per-proto-and-per-net, so we'll need 
two arguments to determine which cell in this "table" to work with.

All the places, but proc already pass proper net to it - proc will be
tuned a bit later.

Some indentation with spaces in proc files is done to keep the file
coding style consistent.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:41:46 -07:00
YOSHIFUJI Hideaki
4c7966b86b [IPV6] MCAST: Ensure to check multicast listener(s).
In ip6_mc_input(), we need to check whether we have listener(s) for
the packet.

After commit ae7bf20a63, all packets
for multicast destinations are delivered to upper layer if
IFF_PROMISC or IFF_ALLMULTI is set.

In fact, bug was rather ancient; the original (before the commit)
intent of the dev->flags check was to skip the ipv6_chk_mcast_addr()
call, assuming L2 filters packets appropriately, but it was even not
true.

Let's explicitly check our multicast list.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-31 19:30:45 -07:00
Denis V. Lunev
4ad96d39a2 [UDP]: Remove owner from udp_seq_afinfo.
Move it to udp_seq_afinfo->seq_fops as should be.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:25:53 -07:00
Denis V. Lunev
3ba9441bdf [UDP]: Place file operations directly into udp_seq_afinfo.
No need to have separate never-used variable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:25:32 -07:00
Denis V. Lunev
dda61925f8 [UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo.
No need to create seq_operations for each instance of 'netstat'.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:24:26 -07:00
David S. Miller
e8e16b706e [INET]: inet_frag_evictor() must run with BH disabled
Based upon a lockdep trace from Dave Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 17:30:18 -07:00
Pavel Emelyanov
bdcde3d71a [SOCK]: Drop inuse pcounter from struct proto (v2).
An uppercut - do not use the pcounter on struct proto.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:39:33 -07:00
Joe Perches
bc578a54f0 [NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_*
On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.

Done for include/net and net

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 16:35:27 -07:00
YOSHIFUJI Hideaki
0736ffc04e [IPV6] NEIGH: Optimize is_router check.
Our interest is not the whole entry of proxy neighbor but the
NTF_ROUTER flag.  Let's test it explicitly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-28 14:00:06 +09:00
David S. Miller
8e8e43843b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/usb/rndis_host.c
	drivers/net/wireless/b43/dma.c
	net/ipv6/ndisc.c
2008-03-27 18:48:56 -07:00
Denis V. Lunev
8eeee8b152 [NETFILTER]: Replate direct proc_fops assignment with proc_create call.
This elliminates infamous race during module loading when one could lookup
proc entry without proc_fops assigned.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 16:55:53 -07:00
Thomas Graf
920fc941a9 [ESP]: Ensure IV is in linear part of the skb to avoid BUG() due to OOB access
ESP does not account for the IV size when calling pskb_may_pull() to
ensure everything it accesses directly is within the linear part of a
potential fragment. This results in a BUG() being triggered when the
both the IPv4 and IPv6 ESP stack is fed with an skb where the first
fragment ends between the end of the esp header and the end of the IV.

This bug was found by Dirk Nehring <dnehring@gmx.net> .

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 16:08:03 -07:00
Benjamin Thery
5983a3dff0 [NETNS][IPV6] flowlabels - make proc per namespace
Make /proc/net/ip6_flowlabel show only flow labels belonging to the
current network namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:53:30 -07:00
Benjamin Thery
60e8fbc4c5 [NETNS][IPV6] flowlabels - make flowlabels per namespace
This patch introduces a new member, fl_net, in struct ip6_flowlabel.
This allows to create labels with the same value in different namespaces.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:53:08 -07:00
Daniel Lezcano
6ab57e7e7f [NETNS][IPV6] anycast - handle several network namespace
Make use of the network namespace information to have this protocol to
handle several network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:52:32 -07:00
Herbert Xu
732c8bd590 [IPSEC]: Fix BEET output
The IPv6 BEET output function is incorrectly including the inner
header in the payload to be protected.  This causes a crash as
the packet doesn't actually have that many bytes for a second
header.

The IPv4 BEET output on the other hand is broken when it comes
to handling an inner IPv6 header since it always assumes an
inner IPv4 header.

This patch fixes both by making sure that neither BEET output
function touches the inner header at all.  All access is now
done through the protocol-independent cb structure.  Two new
attributes are added to make this work, the IP header length
and the IPv4 option length.  They're filled in by the inner
mode's output function.

Thanks to Joakim Koskela for finding this problem.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:51:09 -07:00
Pavel Emelyanov
a233352506 [IPV6]: Fix potential net leak and oops in ipv6 routing code.
The commits f3db4851 ([NETNS][IPV6] ip6_fib - fib6_clean_all handle several 
network namespaces) and 69ddb805 ([NETNS][IPV6] route6 - Make proc entry 
/proc/net/rt6_stats per namespace) made some proc files per net.

Both of them introduced potential OOPS - get_proc_net can return NULL, but
this check is lost - and a struct net leak - in case single_open() fails the
previously got net is not put.

Kill all these bugs with one patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-26 16:49:40 -07:00
YOSHIFUJI Hideaki
878628fbf2 [NET] NETNS: Omit namespace comparision without CONFIG_NET_NS.
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.

We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:40:00 +09:00
YOSHIFUJI Hideaki
1218854afa [NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS.
Without CONFIG_NET_NS, no namespace other than &init_net exists,
no need to store net in seq_net_private.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:56 +09:00
YOSHIFUJI Hideaki
3b1e0a655f [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki
c346dca108 [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-26 04:39:53 +09:00
YOSHIFUJI Hideaki
7cbca67c07 [IPV6]: Support Source Address Selection API (RFC5014).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:24:01 +09:00
YOSHIFUJI Hideaki
6b75d09081 [IPV6]: Optimize hop-limit determination.
Last part of hop-limit determination is always:
    hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
    if (hoplimit < 0)
        hoplimit = ipv6_get_hoplimit(dst->dev).

Let's consolidate it as ip6_dst_hoplimit(dst).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:24:00 +09:00
YOSHIFUJI Hideaki
c8cdaf998d [IPV4,IPV6]: Share cork.rt between IPv4 and IPv6.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:59 +09:00
YOSHIFUJI Hideaki
a9b05723ff [IPV6] ADDRCONF: Clean-up ipv6_dev_get_saddr().
old:
|    text	   data	    bss	    dec	    hex	filename
|   28599	   1416	     96	  30111	   759f	net/ipv6/addrconf.o

new:
|    text	   data	    bss	    dec	    hex	filename
|   28007	   1416	     96	  29519	   734f	net/ipv6/addrconf.o

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:58 +09:00
YOSHIFUJI Hideaki
df8ea19b5d [XFRM] IPV6: Optimize __xfrm_tunnel_alloc_spi().
| % size old/net/ipv6/xfrm6_tunnel.o new/net/ipv6/xfrm6_tunnel.o
|    text	   data	    bss	    dec	    hex	filename
|    1606	     40	   2080	   3726	    e8e	old/net/ipv6/xfrm6_tunnel.o
|    1574	     40	   2080	   3694	    e6e	new/net/ipv6/xfrm6_tunnel.o

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:57 +09:00
YOSHIFUJI Hideaki
a002c6fd71 [XFRM] IPV6: Optimize xfrm6_input_addr().
| % size old/net/ipv6/xfrm6_input.o new/net/ipv6/xfrm6_input.o
|    text	   data	    bss	    dec	    hex	filename
|    1026	      0	      0	   1026	    402	old/net/ipv6/xfrm6_input.o
|     947	      0	      0	    947	    3b3	new/net/ipv6/xfrm6_input.o

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:56 +09:00
YOSHIFUJI Hideaki
3b6cdf94cd [XFRM] IPV6: Use distribution counting sort for xfrm_state/xfrm_tmpl chain.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-25 10:23:56 +09:00
Pavel Emelyanov
84c375af0f [NETNS][UDP-Lite]: Register /proc/net/udplite(6) in a namespace.
UDP-Lite sockets are displayed in another files, rather than
UDP ones, so make the present in namespaces as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 14:56:57 -07:00
Kazunori MIYAZAWA
df9dcb4588 [IPSEC]: Fix inter address family IPsec tunnel handling.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 14:51:51 -07:00
Pavel Emelyanov
fa86d322d8 [NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3).
Proxy neighbors do not have any reference counting, so any caller
of pneigh_lookup (unless it's a netlink triggered add/del routine)
should _not_ perform any actions on the found proxy entry. 

There's one exception from this rule - the ipv6's ndisc_recv_ns() 
uses found entry to check the flags for NTF_ROUTER.

This creates a race between the ndisc and pneigh_delete - after 
the pneigh is returned to the caller, the nd_tbl.lock is dropped 
and the deleting procedure may proceed.

One of the fixes would be to add a reference counting, but this
problem exists for ndisc only. Besides such a patch would be too 
big for -rc4.

So I propose to introduce a __pneigh_lookup() which is supposed
to be called with the lock held and use it in ndisc code to check
the flags on alive pneigh entry.


Changes from v2:
As David noticed, Exported the __pneigh_lookup() to ipv6 module. 
The checkpatch generates a warning on it, since the EXPORT_SYMBOL 
does not follow the symbol itself, but in this file all the 
exports come at the end, so I decided no to break this harmony.

Changes from v1:
Fixed comments from YOSHIFUJI - indentation of prototype in header
and the pndisc_check_router() name - and a compilation fix, pointed
by Daniel - the is_routed was (falsely) considered as uninitialized
by gcc.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-24 14:48:59 -07:00
Florian Westphal
2051f11fb8 [TCP]: Shrink syncookie_secret by 8 byte.
the first u32 copied from syncookie_secret is overwritten by the
minute-counter four lines below.  After adjusting the destination
address, the size of syncookie_secret can be reduced accordingly.

AFAICS, the only other user of syncookie_secret[] is the ipv6
syncookie support.  Because ipv6 syncookies only grab 44 bytes from
syncookie_secret[], this shouldn't affect them in any way.

With fixes from Glenn Griffin.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23 22:21:28 -07:00
Rami Rosen
061964fb98 [IPV6]: Remove unused code in ndisc_send_redirect().
This patches removes unused code in ndisc_send_redirect() method in
net/ipv6/ndisc.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-23 21:58:44 -07:00
Julia Lawall
421f099bc5 [IPV6] net/ipv6/ndisc.c: remove unused variable
The variable hlen is initialized but never used otherwise.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
type T;
identifier i;
constant C;
@@

(
extern T i;
|
- T i;
  <+... when != i
- i = C;
  ...+>
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 18:04:16 -07:00
Pavel Emelyanov
fc8717baa8 [RAW]: Add raw_hashinfo member on struct proto.
Sorry for the patch sequence confusion :| but I found that the similar
thing can be done for raw sockets easily too late.

Expand the proto.h union with the raw_hashinfo member and use it in
raw_prot and rawv6_prot. This allows to drop the protocol specific
versions of hash and unhash callbacks.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:56:51 -07:00
Pavel Emelyanov
6ba5a3c52d [UDP]: Make full use of proto.h.udp_hash innovation.
After this we have only udp_lib_get_port to get the port and two 
stubs for ipv4 and ipv6. No difference in udp and udplite except
for initialized h.udp_hash member.

I tried to find a graceful way to drop the only difference between
udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison 
routine), but adding one more callback on the struct proto didn't 
appear such :( Maybe later.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:51:21 -07:00
Pavel Emelyanov
39d8cda76c [SOCK]: Add udp_hash member to struct proto.
Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to 
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4 
and -v6 protocols.

The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.

The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization 
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc 
warning about inability to initialize anonymous member this way.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-22 16:50:58 -07:00
Daniel Lezcano
6f8b13bcb3 [NETNS][IPV6] tcp6 - make proc per namespace
Make the proc for tcp6 to be per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:14:45 -07:00
Daniel Lezcano
0c96d8c50b [NETNS][IPV6] udp6 - make proc per namespace
The proc init/exit functions take a new network namespace parameter in
order to register/unregister /proc/net/udp6 for a namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:14:17 -07:00
Daniel Lezcano
ea82edf704 [NETNS][IPV6] mcast - fix compilation warning when procfs is not compiled in
When CONFIG_PROC_FS=no, the out_sock_create label is not used because
the code using it is disabled and that leads to a warning at compile
time.

This patch fix that by making a specific function to initialize proc
for igmp6, and remove the annoying CONFIG_PROC_FS sections in
init/exit function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-21 04:10:53 -07:00
David S. Miller
a25606c845 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-03-21 03:42:24 -07:00
YOSHIFUJI Hideaki
38fe999e22 [IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
Based on notice from "Colin" <colins@sjtu.edu.cn>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-20 16:13:58 -07:00
Robert P. J. Day
938b93adb2 [NET]: Add debugging names to __RW_LOCK_UNLOCKED macros.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-18 00:59:23 -07:00
Daniel Lezcano
b8ad0cbc58 [NETNS][IPV6] mcast - handle several network namespace
This patch make use of the network namespace information at the right
places to handle the multicast for several network namespaces.  It
makes the socket control to be per namespace too.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:16:55 -08:00
Daniel Lezcano
e504799276 [NETNS][IPV6] tcp6 - handle several network namespace
We have the right network namespace at the right place now.
So make use of this information to make tcp6 per network namespace

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:16:26 -08:00
Daniel Lezcano
93ec926b07 [NETNS][IPV6] tcp6 - make socket control per namespace
Instead of having a tcp6_socket global to all the namespace, there is
tcp6 socket control per namespace. That is consistent with which
namespace sent a RST and allows to pass the socket to the underlying
function to retrieve the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:16:02 -08:00
Daniel Lezcano
1762f7e88e [NETNS][IPV6] ndisc - make socket control per namespace
Make ndisc socket control per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:15:34 -08:00
Daniel Lezcano
a18bc6959d [NETNS][IPV6] ndisc - make ndisc handle multiple network namespaces
Make ndisc handle multiple network namespaces: 
Remove references to init_net, add network namespace parameters and add 
pernet_operations for ndisc

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:14:49 -08:00
Daniel Lezcano
8a3edd800d [NETNS][IPV6] fix some missing namespace
This patch adds some missing namespace

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:14:16 -08:00
David S. Miller
db8dac20d5 [UDP]: Revert udplite and code split.
This reverts commit db1ed684f6 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8fa4 ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db27 ("[UDP]: Allow users to
configure UDP-Lite.").

First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.

We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible.  All of that work is less valuable if we're
just going to slap a config option on udplite support.

It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well.  In fact, this is the
second build failure resulting from the udplite change.

Finally, the config options provided was a bool, instead of a modular
option.  Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-06 16:22:02 -08:00
Harvey Harrison
0dc47877a3 net: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 20:47:47 -08:00
Daniel Lezcano
a05c44f6d5 [IPV6]: Remove commented lines.
Remove commented lines from netns patchset.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 12:37:29 -08:00
David S. Miller
255333c1db Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/mac80211/rc80211_pid_algo.c
2008-03-05 12:26:41 -08:00
Benjamin Thery
9a43b709a2 [NETNS][IPV6] icmp6 - make icmpv6_socket per namespace
This patch make the changes necessary to support network namespaces in
ICMPv6.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:49:18 -08:00
Daniel Lezcano
da6bb5c0c5 [NETNS][IPV6] ip6_input - enable ipv6_rcv to handle several network namespace
The different subsystem of ipv6 are ready for namespaces, so let's
activate it for ipv6_rcv.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:48:56 -08:00
Daniel Lezcano
c20121ae87 [NETNS][IPV6] route6 - pass always a valid socket to ip6_dst_lookup
The ip6_dst_lookup receive a socket as parameter. In some part of the code
it is called with a NULL socket parameter. We want to rely on the socket
to retrieve the network namespace, so we always pass a valid socket in all
cases.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:48:35 -08:00
Daniel Lezcano
4591db4f37 [NETNS][IPV6] route6 - add netns parameter to ip6_route_output
Add an netns parameter to ip6_route_output. That will allow to access
to the right routing table for outgoing traffic.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:48:10 -08:00
Benjamin Thery
6fda735005 [NETNS][IPV6] addrconf - make addrconf per namespace
All the infrastructure to propagate the network namespace information
is ready. Make use of it.

There is a special case here between the initial network namespace and
the other namespaces:

* When ipv6 is initialized at boot time (aka in the init_net), it
registers to the notifier callback. So addrconf_notify will be called
as many time as there are network devices setup on the system and the
function will add ipv6 addresses to the network devices. But the first
device which needs to have its ipv6 address setup is the loopback,
unfortunatly this is not the case. So the loopback address is setup
manually in the ipv6 init function.

* With the network namespace, this ordering problem does not appears
because notifier is already setup and active, so as soon as we
register the loopback the ipv6 address is setup and it will be the
first device.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:47:47 -08:00
Daniel Lezcano
af2849377e [NETNS][IPV6] addrconf - Pass the proper network namespace parameters to addrconf
This patch propagates the network namespace pointer to the address
configuration routines which need it, which means adding a new
parameter to these functions, and make them use it instead of using
the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:46:57 -08:00
Daniel Lezcano
300bf591de [NETNS][IPV6] proc - protect snmp6 from non-init_net calls
This patchset avoids creation of the /proc entry for snmp6 when
the call is made from a network namespace different from the init_net.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:46:31 -08:00
Benjamin Thery
075de93957 [NETNS][IPV6] af_inet6 - allow socket creation per namespace
Allow creation of IPv6 raw and datagram sockets in network namespaces
other than init_net.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:45:59 -08:00
Benjamin Thery
94911fe317 [NETNS][IPV6] Move sysctl initialization later on in the IPv6 init sequence
This patch moves initialization of IPv6 sysctl stuff at the end of
IPv6 initialization.

This will be helpful for network namespaces where some sysctl entries
depend on per-namespace variables, that need to be allocated and
initialized before they are referenced by sysctl.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 10:45:36 -08:00
Herbert Xu
ed58dd41f3 [ESP]: Add select on AUTHENC
Now the ESP uses the AEAD interface even for algorithms which are
not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as
otherwise only combined mode algorithms will work.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 14:29:21 -08:00
Daniel Lezcano
7019b78e14 [NETNS][IPV6] route6 - Make ip6_dst_gc simpler
This patches improves the readibility of the ip6_dst_gc() routine.
It simplifies long lines which grow a lot due to the introduction
of network namespaces support.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:50:14 -08:00
Benjamin Thery
6891a346c3 [NETNS][IPV6] route6 - make garbage collection work with multiple network namespaces
This patch makes the necessary changes to make IPv6 dst_entry garbage
collection work with multiple network namespaces.

In ip6_dst_gc(), static local variables are now declared
per-namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:49:47 -08:00
Benjamin Thery
f2fc6a5458 [NETNS][IPV6] route6 - move ip6_dst_ops inside the network namespace
The ip6_dst_ops is moved inside the network namespace structure.  All
references to this structure are now relative to the initial network
namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:49:23 -08:00
Daniel Lezcano
9a7ec3a94d [NETNS][IPV6] route6 - dynamically allocate ip6_dst_ops
ip6_dst_ops is dynamically allocated in init and exit functions.  That
provides the ability to do multiple instanciations of this structure.

This will be needed for network namespaces, indeed dst_ops stores data
that are required to be per namespace: entries and gc_thresh.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:48:53 -08:00
Daniel Lezcano
8ed6778967 [NETNS][IPV6] rt6_info - move rt6_info structure inside the namespace
The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:48:30 -08:00
Daniel Lezcano
bdb3289f73 [NETNS][IPV6] rt6_info - make rt6_info accessed as a pointer
This patch make mindless changes and prepares the code to use dynamic
allocation for rt6_info structure. The code accesses the rt6_info
structure as a pointer instead of a global static variable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:48:10 -08:00
Daniel Lezcano
5578689a4e [NETNS][IPV6] route6 - make route6 per namespace
This patch makes the routing engine use the network namespaces to
access routing informations: Add a network namespace parameter to
ipv6_route_ioctl and propagate the network namespace value to all the
routing code that have not yet been changed.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:47:47 -08:00
Daniel Lezcano
7b4da53229 [NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_purge_dflt_routers
Add a network namespace parameter to rt6_purge_dflt_routers.  This is
needed to call fib6_get_table with the appropriate network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:47:14 -08:00
Daniel Lezcano
efa2cea0d9 [NETNS][IPV6] route6 - Pass network namespace to rt6_add_route_info and rt6_get_route_info
Add a network namespace parameter to rt6_add_route_info() and
rt6_get_route_info to enable them to handle multiple network
namespaces.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:46:48 -08:00
Daniel Lezcano
69ddb80562 [NETNS][IPV6] route6 - Make proc entry /proc/net/rt6_stats per namespace
Make the proc entry /proc/net/rt6_stats work in all network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:46:23 -08:00
Daniel Lezcano
606a2b4862 [NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_lookup
Add a network namespace parameter to rt6_lookup().

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:45:59 -08:00
Daniel Lezcano
cdb1876192 [NETNS][IPV6] route6 - create route6 proc files for the namespace
Make /proc/net/ipv6_route and /proc/net/rt6_stats to be per namespace.
These proc files are now created when the network namespace is
initialized.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-04 13:45:33 -08:00
Benjamin Thery
c572872f89 [NETNS][IPV6] rt6_stats - make the stats per network namespace
The rt6_stats is now per namespace with this patch. It is allocated
when a network namespace is created and freed when the network
namespace exits and references are relative to the network namespace.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:34:17 -08:00
Daniel Lezcano
6cc118bd50 [NETNS][IPV6] rt6_stats - dynamically allocate the routes statistics
This patch allocates the rt6_stats struct dynamically when the fib6 is
initialized. That provides the ability to create several instances of
this structure for the network namespaces.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:33:43 -08:00
Daniel Lezcano
dcabb819a6 [NETNS][IPV6] fib6_rules - handle several network namespaces
The fib6_rules_ops is moved to the network namespace structure.  All
references are changed to have it relatively to it.

Each time a network namespace is created a new fib6_rules_ops is
allocated, initialized and stored into the network namespace
structure.

The common part of the fib rules is namespace aware, so it is quite
easy to retrieve the network namespace from the rules and use it in
the different callbacks.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:33:08 -08:00
Daniel Lezcano
eb5564b853 [NETNS][IPV6] fib6 rule - dynamic allocation of the rules struct ops
The fib6_rules_ops structure is dynamically allocated, so that allows
to make several instances of it per network namespace.

The global static fib6_rules_ops structure is renamed to
fib6_rules_ops_template in order to quickly memcopy it for the
structure initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:32:30 -08:00
Benjamin Thery
ec7d43c291 [NETNS][IPV6] ip6_fib - clean node use namespace
The fib6_clean_node function should have the network namespace it is
working on. The fib6_cleaner_t structure is extended with the network
namespace field to be passed to the fib6_clean_node function.

The different functions calling the fib6_clean_node function are
extended with the netns parameter when needed to propagate the netns
pointer.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:31:57 -08:00
Daniel Lezcano
63152fc0de [NETNS][IPV6] ip6_fib - gc timer per namespace
Move the timer initialization at the network namespace creation and
store the network namespace in the timer argument.

That enables multiple timers (one per network namespace) to do garbage
collecting.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:31:11 -08:00
Daniel Lezcano
450d19f8ab [NETNS][IPV6] ip6_fib - dynamically allocate gc-timer
The ip6_fib_timer gc timer is dynamically allocated and initialized in
the ip6 fib init function. There are no more references to a static
global variable. That will allow to make multiple instance of the
garbage collecting timer and make them per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:29:33 -08:00
Daniel Lezcano
5b7c931dff [NETNS][IPV6] ip6_fib - add net to gc timer parameter
The fib tables are now relative to the network namespace. When the
garbage collector timer expires, we must have a network namespace
parameter in order to retrieve the tables. For now this is the
init_net, but we should be able to have a timer per namespace and use
the timer callback parameter to pass the network namespace from the
expired timer.

The timer callback, fib6_run_gc, is actually used to be called
synchronously by some functions and asynchronously when the timer
expires.

When the timer expires, the delay specified for fib6_run_gc parameter
is always zero. So, I changed fib6_run_gc to not be a timer callback
but a function called by the timer callback and I added a timer
callback where its work is just to retrieve from the data arg of the
timer the network namespace and call fib6_run_gc with zero expiring
time and the network namespace parameters. That makes the code cleaner
for the fib6_run_gc callers.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:28:58 -08:00
Daniel Lezcano
f3db48517f [NETNS][IPV6] ip6_fib - fib6_clean_all handle several network namespaces
The function fib6_clean_all takes the network namespace as
parameter. That allows to flush the routes related to a specific
network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:27:06 -08:00
Daniel Lezcano
58f09b78b7 [NETNS][IPV6] ip6_fib - make it per network namespace
The fib table for ipv6 are moved to the network namespace structure.
All references to them are made relatively to the network namespace.

All external calls to the ip6_fib functions taking the network
namespace parameter are made using the init_net variable, so the
ip6_fib engine is ready for the namespaces but the callers not yet.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:25:27 -08:00
Daniel Lezcano
e0b85590bc [NETNS][IPV6] ip6_fib - dynamically allocate the fib tables
This patch changes the fib6 tables to be dynamically allocated.  That
provides the ability to make several instances of them when a new
network namespace is created.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 23:24:31 -08:00
YOSHIFUJI Hideaki
4192717807 [IPV6] MCAST: Use standard path for sending MLD/MLDv2 messages.
This is changing the paths for sending MLD/MLDv2 messages
from dev_queue_xmit() to standard dst_output().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:24 +09:00
YOSHIFUJI Hideaki
3b00944c5c [IPV6]: Make ndisc_dst_alloc() common for later use.
For later use, this patch is renaming ndisc_dst_alloc()
(and related function/structures) to icmp6_dst_alloc()
(and so on).  This patch also removing unused function-
pointer argument for it.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:24 +09:00
YOSHIFUJI Hideaki
95e41e93e1 [IPV6]: Make ndisc_flow_init() common for later use.
For later use, this patch is renaming ndisc_flow_init() to
icmpv6_flow_init() and putting it in common place.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:24 +09:00
YOSHIFUJI Hideaki
5e5f3f0f80 [IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().
Since most users of ipv6_get_saddr() pass non-NULL as
dst argument, use ipv6_dev_get_saddr() directly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
5ee0910509 [IPV6] SYSCTL: complete initialization for sysctl table in subsystem code.
Move initialization bits for subsystem sysctl tables to
appropriate functions.
 - route
 - icmp

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
662397fd7a [IPV6]: Move packet_type{} related bits to af_inet6.c.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:23 +09:00
YOSHIFUJI Hideaki
db1ed684f6 [IPV6] UDP: Rename IPv6 UDP files.
Rename net/ipv6/udp.c to net/ipv6/udp_ipv6.c
Rename net/ipv6/udplite.c to net/ipv6/udplite_ipv6.c.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:22 +09:00
YOSHIFUJI Hideaki
e898d4db27 [UDP]: Allow users to configure UDP-Lite.
Let's give users an option for disabling UDP-Lite (~4K).

old:
|    text	   data	    bss	    dec	    hex	filename
|  286498	  12432	   6072	 305002	  4a76a	net/ipv4/built-in.o
|  193830	   8192	   3204	 205226	  321aa	net/ipv6/ipv6.o

new (without UDP-Lite):
|    text	   data	    bss	    dec	    hex	filename
|  284086	  12136	   5432	 301654	  49a56	net/ipv4/built-in.o
|  191835	   7832	   3076	 202743	  317f7	net/ipv6/ipv6.o

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:22 +09:00
Glenn Griffin
c6aefafb7e [TCP]: Add IPv6 support to TCP SYN cookies
Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack.  Just a two line change, but will
resend in it's entirety.

Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-03-04 15:18:21 +09:00
Alexey Dobriyan
8ed7edce82 ipv6: fix inet6_init/icmpv6_cleanup sections mismatch
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-03 12:02:54 -08:00
Denis V. Lunev
fd80eb942a [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.
It looks like dst parameter is used in this API due to historical
reasons.  Actually, it is really used in the direct call to
tcp_v4_send_synack only.  So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:43:03 -08:00
Denis V. Lunev
98c6d1b261 [NETNS]: Make icmpv6_sk per namespace.
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmpv6_sk macro with
proper inline call.  Actual namespace the packet belongs too will be
passed later along with the one for the routing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:21:22 -08:00
Denis V. Lunev
5c8cafd65e [NETNS]: icmp(v6)_sk should not pin a namespace.
So, change icmp(v6)_sk creation/disposal to the scheme used in the
netlink for rtnl, i.e. create a socket in the context of the init_net
and assign the namespace without getting a referrence later.

Also use sk_release_kernel instead of sock_release to properly destroy
such sockets.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:19:22 -08:00
Denis V. Lunev
79c9115953 [ICMP]: Allocate data for __icmp(v6)_sk dynamically.
Own __icmp(v6)_sk should be present in each namespace. So, it should be
allocated dynamically. Though, alloc_percpu does not fit the case as it
implies additional dereferrence for no bonus.

Allocate data for pointers just like __percpu_alloc_mask does and place
pointers to struct sock into this array.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:17:11 -08:00
Denis V. Lunev
405666db84 [ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock.
We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket
is get from global variable now. When this code became namespaces, one
should pass a namespace and get socket from it.

Though, above is useless. Socket is available in the caller, just pass
it inside. This saves a bit of code now and saves more later.

add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168)
function                                     old     new   delta
icmp_rcv                                     718     719      +1
icmpv6_rcv                                  2343    2303     -40
icmp_send                                   1566    1518     -48
icmp_reply                                   549     468     -81

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:16:46 -08:00
Denis V. Lunev
b7e729c4b4 [ICMP]: Store sock rather than socket for ICMP flow control.
Basically, there is no difference, what to store: socket or sock. Though,
sock looks better as there will be 1 less dereferrence on the fast path.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:16:08 -08:00
Denis V. Lunev
9b0f976f27 [INET]: Remove struct net_proto_family* from _init calls.
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29 11:13:15 -08:00
Adrian Bunk
1e04d53070 [IPV6]: Unexport ip6_find_1stfragopt
This patch removes the no longer used 
EXPORT_SYMBOL_GPL(ip6_find_1stfragopt).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 21:27:35 -08:00
Juha-Matti Tapio
99cd07a537 [IPV6]: Fix source address selection for ORCHID addresses
Skip the prefix length matching in source address selection for
orchid -> non-orchid addresses.

Overlay Routable Cryptographic Hash IDentifiers (RFC 4843,
2001:10::/28) are currenty not globally reachable. Without this
check a host with an ORCHID address can end up preferring those over
regular addresses when talking to other regular hosts in the 2001::/16
range thus breaking non-orchid connections.

Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:55:46 -08:00
Juha-Matti Tapio
5fe47b8a65 [IPV6]: Add ORCHID prefix to address label table
Add a new label for Overlay Routable Cryptographic Hash Identifiers
(RFC 4843) prefix 2001:10::/28 to help proper source address
selection.

ORCHID addresses are used by for example Host Identity Protocol. They are 
global and routable, but they currently need support from both endpoints 
and therefore mixing regular and ORCHID addresses for source and 
destination is a bad idea in general case.

Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 20:55:02 -08:00
Wang Chen
4436f4cbfa [IPV6]: Use proc_create() to setup ->proc_fops first
Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 14:13:46 -08:00
Herbert Xu
21e43188f2 [IPCOMP]: Disable BH on output when using shared tfm
Because we use shared tfm objects in order to conserve memory,
(each tfm requires 128K of vmalloc memory), BH needs to be turned
off on output as that can occur in process context.

Previously this was done implicitly by the xfrm output code.
That was lost when it became lockless.  So we need to add the
BH disabling to IPComp directly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-28 11:23:17 -08:00
YOSHIFUJI Hideaki
3bdfe7ec08 [IPV6] SYSCTL: Fix possible memory leakage in error path.
In error path, we do need to free memory just allocated.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-27 12:06:38 -08:00
Pavel Emelyanov
b37d428b24 [INET]: Don't create tunnels with '%' in name.
Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
pre-defined name for a device from the userspace.  Since these drivers
call the register_netdevice() (rtnl_lock, is held), which does _not_
generate the device's name, this name may contain a '%' character.

Not sure how bad is this to have a device with a '%' in its name, but
all the other places either use the register_netdev(), which call the
dev_alloc_name(), or explicitly call the dev_alloc_name() before
registering, i.e. do not allow for such names.

This had to be prior to the commit 34cc7b, but I forgot to number the
patches and this one got lost, sorry.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 23:51:04 -08:00
Benjamin Thery
f1243c2db6 [IPV6]: Add missing initializations of the new nl_info.nl_net field
Add some more missing initializations of the new nl_info.nl_net field
in IPv6 stack. This field will be used when network namespaces are
fully supported.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-26 18:42:37 -08:00
Pavel Emelyanov
34cc7ba639 [IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly.
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.

Thanks Patrick for noticing this.

[ The way this works is, when the device is actually registered,
  the generic code noticed the '%' in the name and invokes
  dev_alloc_name() to fully resolve the name.  -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-23 20:19:20 -08:00
Patrick McHardy
e2b58a67b9 [NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling packet data
As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>,
inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue
triggers a SKB_LINEAR_ASSERT in skb_put().

Going back through the git history, it seems this bug is present since
at least 2.6.12-rc2, probably even since the removal of
skb_linearize() for netfilter.

Linearize non-linear skbs through skb_copy_expand() when enlarging
them.  Tested by Thomas, fixes bugzilla #9933.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-19 17:17:52 -08:00
Pavel Emelyanov
2df96af03d [IPV6]: Use BUG_ON instead of if + BUG in fib6_del_route.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:50:42 -08:00
Denis V. Lunev
9937ded8e4 [IPV6]: dst_entry leak in ip4ip6_err. (resend)
The result of the ip_route_output is not assigned to skb. This means that
- it is leaked
- possible OOPS below dereferrencing skb->dst
- no ICMP message for this case

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-18 20:49:36 -08:00
Wang Chen
5ee46e562c [IPV6]: Fix hardcoded removing of old module code
Rusty hardcoded the old module code.
We can remove it now.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-17 22:34:53 -08:00
Herbert Xu
b5c15fc004 [IPV6]: Fix reversed local_df test in ip6_fragment
I managed to reverse the local_df test when forward-porting this
patch so it actually makes things worse by never fragmenting at
all.

Thanks to David Stevens for testing and reporting this bug.

Bill Fink pointed out that the local_df setting is also the wrong
way around.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-14 23:49:37 -08:00
Herbert Xu
b318e0e4ef [IPSEC]: Fix bogus usage of u64 on input sequence number
Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian.  This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:50:35 -08:00
Herbert Xu
28a89453b1 [IPV6]: Fix IPsec datagram fragmentation
This is a long-standing bug in the IPsec IPv6 code that breaks
when we emit a IPsec tunnel-mode datagram packet.  The problem
is that the code the emits the packet assumes the IPv6 stack
will fragment it later, but the IPv6 stack assumes that whoever
is emitting the packet is going to pre-fragment the packet.

In the long term we need to fix both sides, e.g., to get the
datagram code to pre-fragment as well as to get the IPv6 stack
to fragment locally generated tunnel-mode packet.

For now this patch does the second part which should make it
work for the IPsec host case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 18:07:27 -08:00
Rami Rosen
238fc7eac8 [IPV6]: Replace using the magic constant "1024" with IP6_RT_PRIO_USER for fc_metric.
This patch replaces the explicit usage of the magic constant "1024"
with IP6_RT_PRIO_USER in the IPV6 tree.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-09 23:43:11 -08:00
Herbert Xu
8cf229437f [ICMP]: Restore pskb_pull calls in receive function
Somewhere along the development of my ICMP relookup patch the header
length check went AWOL on the non-IPsec path.  This patch restores the
check.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:15:50 -08:00
Pavel Emelyanov
5d8c0aa943 [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations.
The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit

	5ee31fc1ec
	[INET]: Consolidate inet(6)_hash_connect.

Return this logic back, by passing the port offset directly into the
consolidated function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Noticed-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-05 03:14:44 -08:00
Arnaldo Carvalho de Melo
ab1e0a13d7 [SOCK] proto: Add hashinfo member to struct proto
This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-03 04:28:52 -08:00
Jim Paris
23717795be [IPV6]: Update MSS even if MTU is unchanged.
This is needed because in ndisc.c, we have:

  static void ndisc_router_discovery(struct sk_buff *skb)
  {
  // ...
  	if (ndopts.nd_opts_mtu) {
  // ...
  			if (rt)
  				rt->u.dst.metrics[RTAX_MTU-1] = mtu;

  			rt6_mtu_change(skb->dev, mtu);
  // ...
  }

Since the mtu is set directly here, rt6_mtu_change_route thinks that
it is unchanged, and so it fails to update the MSS accordingly.  This
patch lets rt6_mtu_change_route still update MSS if old_mtu == new_mtu.

Signed-off-by: Jim Paris <jim@jtan.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:21 -08:00
Pavel Emelyanov
fa4d3c6210 [NETNS]: Udp sockets per-net lookup.
Add the net parameter to udp_get_port family of calls and
udp_lookup one and use it to filter sockets.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:21 -08:00
Pavel Emelyanov
d86e0dac2c [NETNS]: Tcp-v6 sockets per-net lookup.
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:20 -08:00
Pavel Emelyanov
5ee31fc1ec [INET]: Consolidate inet(6)_hash_connect.
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.

This saves half-a-kilo for ipv4 and ipv6.

 add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
 function                                     old     new   delta
 __inet_hash_connect                            -     577    +577
 arp_ignore                                   108     113      +5
 static.hint                                    8       4      -4
 rt_worker_func                               376     372      -4
 inet6_hash_connect                           584      25    -559
 inet_hash_connect                            586      25    -561

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:17 -08:00
Pavel Emelyanov
535174efbe [IPV6]: Introduce the INET6_TW_MATCH macro.
We have INET_MATCH, INET_TW_MATCH and INET6_MATCH to test sockets and
twbuckets for matching, but ipv6 twbuckets are tested manually.

Here's the INET6_TW_MATCH to help with it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:17 -08:00
Patrick McHardy
c392a74018 [NETFILTER]: {ip,ip6}_queue: fix build error
Reported by Ingo Molnar:

 net/built-in.o: In function `ip_queue_init':
 ip_queue.c:(.init.text+0x322c): undefined reference to `net_ipv4_ctl_path'

Fix the build error and also handle CONFIG_PROC_FS=n properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:14 -08:00
Jan Engelhardt
32948588ac [NETFILTER]: nf_conntrack: annotate l3protos with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:13 -08:00
Jan Engelhardt
7cc3864d39 [NETFILTER]: nf_{conntrack,nat}_icmp: constify and annotate
Constify a few data tables use const qualifiers on variables where
possible in the nf_conntrack_icmp* sources.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:12 -08:00
Alexey Dobriyan
3cb609d57c [NETFILTER]: x_tables: create per-netns /proc/net/*_tables_*
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:28:06 -08:00
Patrick McHardy
c88130bcd5 [NETFILTER]: nf_conntrack: naming unification
Rename all "conntrack" variables to "ct" for more consistency and
avoiding some overly long lines.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:59 -08:00
Patrick McHardy
99fa5f5397 [NETFILTER]: nf_conntrack_ipv6: fix sparse warnings
CHECK   net/ipv6/netfilter/nf_conntrack_reasm.c
  net/ipv6/netfilter/nf_conntrack_reasm.c:77:18: warning: symbol 'nf_ct_ipv6_sysctl_table' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:586:16: warning: symbol 'nf_ct_frag6_gather' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:662:6: warning: symbol 'nf_ct_frag6_output' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:683:5: warning: symbol 'nf_ct_frag6_kfree_frags' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:698:5: warning: symbol 'nf_ct_frag6_init' was not declared. Should it be static?
  net/ipv6/netfilter/nf_conntrack_reasm.c:717:6: warning: symbol 'nf_ct_frag6_cleanup' was not declared. Should it be static?

Based on patch by Stephen Hemminger with suggestions by Yasuyuki KOZAKAI.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:49 -08:00
Patrick McHardy
b0a6363c24 [NETFILTER]: {ip,arp,ip6}_tables: fix sparse warnings in compat code
CHECK   net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8:    expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8:    got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44:    expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44:    got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2:    got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8:    expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8:    got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40:    expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40:    got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8:    got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2:    expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2:    got int *<noident>
  CHECK   net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40:    expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40:    got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44:    expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44:    got unsigned int [usertype] *size
  CHECK   net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8:    got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44:    got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2:    got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8:    got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40:    expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40:    got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8:    got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2:    expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2:    got int *<noident>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:49 -08:00
Alexey Dobriyan
df200969b1 [NETFILTER]: netns: put table module on netns stop
When number of entries exceeds number of initial entries, foo-tables code
will pin table module. But during table unregister on netns stop,
that additional pin was forgotten.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:41 -08:00
Alexey Dobriyan
8280aa6182 [NETFILTER]: ip6_tables: per-netns IPv6 FILTER, MANGLE, RAW
Now it's possible to list and manipulate per-netns ip6tables rules.
Filtering decisions are based on init_net's table so far.

P.S.: remove init_net check in inet6_create() to see the effect

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:39 -08:00
Alexey Dobriyan
336b517fdc [NETFILTER]: ip6_tables: netns preparation
* Propagate netns from userspace down to xt_find_table_lock()
* Register ip6 tables in netns (modules still use init_net)

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:39 -08:00
Alexey Dobriyan
44d34e721e [NETFILTER]: x_tables: return new table from {arp,ip,ip6}t_register_table()
Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.

So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.

Do it at once to not screw bisection.

P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
      full netnsization of table modules is done.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:36 -08:00
Alexey Dobriyan
8d87005207 [NETFILTER]: x_tables: per-netns xt_tables
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.

Every user stubbed with init_net for a while.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Alexey Dobriyan
a98da11d88 [NETFILTER]: x_tables: change xt_table_register() return value convention
Switch from 0/-E to ptr/PTR_ERR convention.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:35 -08:00
Jan Engelhardt
ecb6f85e11 [NETFILTER]: Use const in struct xt_match, xt_target, xt_table
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:28 -08:00
Denis V. Lunev
3046d76746 [RAW]: Wrong content of the /proc/net/raw6.
The address of IPv6 raw sockets was shown in the wrong format, from
IPv4 ones.  The problem has been introduced by the commit
42a73808ed ("[RAW]: Consolidate proc
interface.")

Thanks to Adrian Bunk who originally noticed the problem.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:26 -08:00
Denis V. Lunev
377cf82d66 [RAW]: Family check in the /proc/net/raw[6] is extra.
Different hashtables are used for IPv6 and IPv4 raw sockets, so no
need to check the socket family in the iterator over hashtables. Clean
this out.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:24 -08:00
Eric Dumazet
e242297055 [NET]: should explicitely initialize atomic_t field in struct dst_ops
All but one struct dst_ops static initializations miss explicit
initialization of entries field.

As this field is atomic_t, we should use ATOMIC_INIT(0), and not
rely on atomic_t implementation.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:23 -08:00
Eric Dumazet
533cb5b0a6 [XFRM]: constify 'struct xfrm_type'
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:20 -08:00
Benjamin Thery
2216b48376 [NETNS]: Add missing initialization of nl_info.nl_net in rtm_to_fib6_config()
Add missing initialization of the new nl_info.nl_net field in
rtm_to_fib6_config(). This will be needed the store network namespace
associated to the fib6_config struct.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:20 -08:00
Laszlo Attila Toth
4a19ec5800 [NET]: Introducing socket mark socket option.
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:19 -08:00
Herbert Xu
2614fa59fa [IPCOMP]: Fetch nexthdr before ipch is destroyed
When I moved the nexthdr setting out of IPComp I accidently moved
the reading of ipch->nexthdr after the decompression.  Unfortunately
this means that we'd be reading from a stale ipch pointer which
doesn't work very well.

This patch moves the reading up so that we get the correct nexthdr
value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:11 -08:00
Herbert Xu
29ffe1a5c5 [INET]: Prevent out-of-sync truesize on ip_fragment slow path
When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length.  This violates the constraints on truesize.

This patch postpones the update of skb->truesize to prevent this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:07 -08:00
Herbert Xu
1a6509d991 [IPSEC]: Add support for combined mode algorithms
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.

Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD.  Each algorithms
is identified by its name and the ICV length.

For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms.  This is in line with how they are negotiated using IKE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:03 -08:00
Herbert Xu
38320c70d2 [IPSEC]: Use crypto_aead and authenc in ESP
This patch converts ESP to use the crypto_aead interface and in particular
the authenc algorithm.  This lays the foundations for future support of
combined mode algorithms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31 19:27:02 -08:00
Linus Torvalds
44c45eb911 Make !NETFILTER_ADVANCED enable IP6_NF_MATCH_IPV6HEADER
We want IPV6HEADER matching for the non-advanced default netfilter
configuration, since it's part of the standard netfilter setup of at
least some distributions (eg Fedora).

Otherwise NETFILTER_ADVANCED loses much of its point, since even
non-advanced users would have to enable all the advanced options just to
get a working IPv6 netfilter setup.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-31 00:26:10 +11:00
YOSHIFUJI Hideaki
85040bcb46 [IPV6] ADDRLABEL: Fix double free on label deletion.
If an entry is being deleted because it has only one reference, 
we immediately delete it and blindly register the rcu handler for it,
This results in oops by double freeing that object.

This patch fixes it by consolidating the code paths for the deletion;
let its rcu handler delete the object if it has no more reference.

Bug was found by Mitsuru Chinen <mitch@linux.vnet.ibm.com>

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:46:02 -08:00
Denis V. Lunev
f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Pavel Emelyanov
81566e8322 [NETNS][FRAGS]: Make the pernet subsystem for fragments.
On namespace start we mainly prepare the ctl variables.

When the namespace is stopped we have to kill all the fragments that
point to this namespace.  The inet_frags_exit_net() handles it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:40 -08:00
Pavel Emelyanov
3140c25c82 [NETNS][FRAGS]: Make the LRU list per namespace.
The inet_frags.lru_list is used for evicting only, so we have
to make it per-namespace, to evict only those fragments, who's
namespace exceeded its high threshold, but not the whole hash.
Besides, this helps to avoid long loops  in evictor.

The spinlock is not per-namespace because it protects the
hash table as well, which is global.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:39 -08:00
Pavel Emelyanov
3b4bc4a2bf [NETNS][FRAGS]: Isolate the secret interval from namespaces.
Since we have one hashtable to lookup the fragment, having
different secret_interval-s for hash rebuild doesn't make
sense, so move this one to inet_frags.

The inet_frags_ctl becomes empty after this, so remove it.
The appropriate ctl table is kept read-only in namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:39 -08:00
Pavel Emelyanov
e31e0bdc7e [NETNS][FRAGS]: Make thresholds work in namespaces.
This is the same as with the timeout variable.

Currently, after exceeding the high threshold _all_
the fragments are evicted, but it will be fixed in
later patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:38 -08:00
Pavel Emelyanov
b2fd5321dd [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.
Move it to the netns_frags, adjust the usage and
make the appropriate ctl table writable.

Now fragment, that live in different namespaces can
live for different times.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:37 -08:00
Pavel Emelyanov
e4a2d5c2bc [NETNS][FRAGS]: Duplicate sysctl tables for new namespaces.
Each namespace has to have own tables to tune their
different parameters, so duplicate the tables and
register them.

All the tables in sub-namespaces are temporarily made
read-only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:37 -08:00
Pavel Emelyanov
6ddc082223 [NETNS][FRAGS]: Make the mem counter per-namespace.
This is also simple, but introduces more changes, since
then mem counter is altered in more places.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:36 -08:00
Pavel Emelyanov
e5a2bb842c [NETNS][FRAGS]: Make the nqueues counter per-namespace.
This is simple - just move the variable from struct inet_frags
to struct netns_frags and adjust the usage appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:35 -08:00
Pavel Emelyanov
ac18e7509e [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.
Since fragment management code is consolidated, we cannot have the
pointer from inet_frag_queue to struct net, since we must know what
king of fragment this is.

So, I introduce the netns_frags structure. This one is currently
empty, but will be eventually filled with per-namespace
attributes. Each inet_frag_queue is tagged with this one.

The conntrack_reasm is not "netns-izated", so it has one static
netns_frags instance to keep working in init namespace.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:34 -08:00
Pavel Emelyanov
8d8354d2fb [NETNS][FRAGS]: Move ctl tables around.
This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.

This will simplify the later patches and will keep
similar things closer to each other.

ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:10:34 -08:00
YOSHIFUJI Hideaki
61cf46ad58 [IPV6] NDISC: Sparse: Use different variable name for local use.
Fix the following sparse warnings:
| net/ipv6/ndisc.c:1300:21: warning: symbol 'opt' shadows an earlier one
| net/ipv6/ndisc.c:1078:7: originally declared here

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:28 -08:00
YOSHIFUJI Hideaki
5d5619b40c [IPV6] ADDRCONF: Sparse: Make inet6_dump_addr() code paths more straight-forward.
Fix the following sparse warning:
| net/ipv6/addrconf.c:3384:2: warning: context imbalance in 'inet6_dump_addr' - different lock contexts for basic block

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:28 -08:00
YOSHIFUJI Hideaki
2334ecbdb2 [IPV6]: Sparse: Declare non-static ipv6_{route,icmp,frag}_sysctl_init() in header.
Fix the following sparse warnings:
| net/ipv6/route.c:2491:18: warning: symbol 'ipv6_route_sysctl_init' was not declared. Should it be static?
| net/ipv6/icmp.c:922:18: warning: symbol 'ipv6_icmp_sysctl_init' was not declared. Should it be static?
| net/ipv6/reassembly.c:628:6: warning: symbol 'ipv6_frag_sysctl_init' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:27 -08:00
YOSHIFUJI Hideaki
40fee36e11 [IPV6] ADDRLABEL: Sparse: Make several functions static.
Fix following sparse warnings:
| net/ipv6/addrlabel.c:172:25: warning: symbol 'ip6addrlbl_alloc' was not declared. Should it be static?
| net/ipv6/addrlabel.c:219:5: warning: symbol '__ip6addrlbl_add' was not declared. Should it be static?
| net/ipv6/addrlabel.c:260:5: warning: symbol 'ip6addrlbl_add' was not declared. Should it be static?
| net/ipv6/addrlabel.c:285:5: warning: symbol '__ip6addrlbl_del' was not declared. Should it be static?
| net/ipv6/addrlabel.c:311:5: warning: symbol 'ip6addrlbl_del' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:26 -08:00
YOSHIFUJI Hideaki
5e8b9df6e8 [IPV6] UDPLITE: Sparse: Declare non-static symbols in header.
Fix the following sparse warnings:
| net/ipv6/udplite.c:45:14: warning: symbol 'udplitev6_prot' was not declared. Should it be static?
| net/ipv6/udplite.c:80:12: warning: symbol 'udplitev6_init' was not declared. Should it be static?
| net/ipv6/udplite.c:99:6: warning: symbol 'udplitev6_exit' was not declared. Should it be static?

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:26 -08:00
YOSHIFUJI Hideaki
77d0d350e9 [IPV6] UDP,UDPLITE: Sparse: {__udp6_lib,udp,udplite}_err() are of void.
Fix following sparse warnings:
| net/ipv6/udp.c:262:2: warning: returning void-valued expression
| net/ipv6/udplite.c:29:2: warning: returning void-valued expression

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2008-01-28 15:10:25 -08:00
Stephen Hemminger
d20b3109e9 [IPV6]: addrconf sparse warnings
Get rid of a couple of sparse warnings in IPV6 addrconf code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:37 -08:00
Denis V. Lunev
9e3a548781 [NETNS]: FIB rules API cleanup.
Remove struct net from fib_rules_register(unregister)/notify_change
paths and diet code size a bit.

add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65)
function                                     old     new   delta
notify_rule_change                           273     280      +7
trie_show_stats                              471     475      +4
fn_trie_delete                               473     477      +4
fib_rules_unregister                         144     148      +4
fib4_rule_compare                            119     123      +4
resize                                      2842    2845      +3
fn_trie_select_default                       515     518      +3
inet_sk_rebuild_header                       836     838      +2
fib_trie_seq_show                            764     766      +2
__devinet_sysctl_register                    276     278      +2
fn_trie_lookup                              1124    1123      -1
ip_fib_check_default                         133     131      -2
devinet_conf_sysctl                          223     221      -2
snmp_fold_field                              126     123      -3
fn_trie_insert                              2091    2086      -5
inet_create                                  876     870      -6
fib4_rules_init                              197     191      -6
fib_sync_down                                452     444      -8
inet_gso_send_check                          334     325      -9
fib_create_info                             3003    2991     -12
fib_nl_delrule                               568     553     -15
fib_nl_newrule                               883     852     -31

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:13 -08:00
Denis V. Lunev
0359238333 [FIB]: Add netns to fib_rules_ops.
The backward link from FIB rules operations to the network namespace
will allow to simplify the API a bit.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:13 -08:00
Denis V. Lunev
b7c6ba6eb1 [NETNS]: Consolidate kernel netlink socket destruction.
Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:07 -08:00
Daniel Lezcano
7d460db953 [IPV6]: Fix ip6_frag ctl
Alexey Dobriyan reported an oops when unsharing the network
indefinitely inside a loop. This is because the ip6_frag is not per
namespace while the ctls are.

That happens at the fragment timer expiration:
inet_frag_secret_rebuild function is called and this one restarts the
timer using the value stored inside the sysctl field.

        "mod_timer(&f->secret_timer, now + f->ctl->secret_interval);"

When the network is unshared, ip6_frag.ctl is initialized with the new
sysctl instances, but ip6_frag has only one instance. A race in this
case will appear because f->ctl can be modified during the read access
in the timer callback.

Until the ip6_frag is not per namespace, I discard the assignation to
the ctl field of ip6_frags in ip6_frag_sysctl_init when the network
namespace is not the init net.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:08:04 -08:00
Daniel Lezcano
569d36452e [NETNS][DST] dst: pass the dst_ops as parameter to the gc functions
The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).

The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:46 -08:00
Patrick McHardy
c71e916708 [NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protos
Allows to remove five empty implementations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:42 -08:00
Patrick McHardy
c56cc9c07b [NETFILTER]: nf_conntrack: remove print_conntrack function from l3protos
Its unused and unlikely to ever be used.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:41 -08:00
Denys Vlasenko
022748a935 [NETFILTER]: {ip,ip6}_tables: remove some inlines
This patch removes inlines except those which are used
by packet matching code and thus are performance-critical.

Before:

$ size */*/*/ip*tables*.o
   text    data     bss     dec     hex filename
   6402     500      16    6918    1b06 net/ipv4/netfilter/ip_tables.o
   7130     500      16    7646    1dde net/ipv6/netfilter/ip6_tables.o

After:

$ size */*/*/ip*tables*.o
   text    data     bss     dec     hex filename
   6307     500      16    6823    1aa7 net/ipv4/netfilter/ip_tables.o
   7010     500      16    7526    1d66 net/ipv6/netfilter/ip6_tables.o

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:29 -08:00
Jan Engelhardt
2ae15b64e6 [NETFILTER]: Update modules' descriptions
Updates the MODULE_DESCRIPTION() tags for all Netfilter modules,
actually describing what the module does and not just
"netfilter XYZ target".

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:26 -08:00
Patrick McHardy
8ce22fcab4 [NETFILTER]: Remove some EXPERIMENTAL dependencies
Most of the netfilter modules are not considered experimental anymore,
the only ones I want to keep marked as EXPERIMENTAL are:

- TCPOPTSTRIP target, which is brand new.

- SANE helper, which is quite new.

- CLUSTERIP target, which I believe hasn't had much testing despite
  being in the kernel for quite a long time.

- SCTP match and conntrack protocol, which are a mess and need to
  be reviewed and cleaned up before I would trust them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:16 -08:00
Pavel Emelyanov
a308da1627 [NETNS][RAW]: Create the /proc/net/raw(6) in each namespace.
To do so, just register the proper subsystem and create files in
->init callbacks.

No other special per-namespace handling for raw sockets is required.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:07 -08:00
Pavel Emelyanov
e5ba31f11f [NETNS][RAW]: Eliminate explicit init_net references.
Happily, in all the rest places (->bind callbacks only), that require the
struct net, we have a socket, so get the net from it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:06 -08:00
Pavel Emelyanov
f51d599fbe [NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list.
Pull the struct net pointer up to the showing functions
to filter the sockets depending on their namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:06 -08:00
Pavel Emelyanov
be185884b3 [NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware.
This requires just to pass the appropriate struct net pointer
into __raw_v[46]_lookup and skip sockets that do not belong
to a needed namespace.

The proper net is get from skb->dev in all the cases.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:02:05 -08:00
Ilpo Järvinen
50eb431d6e [IPV6] route: kill some bloat
net/ipv6/route.c:
  ip6_pkt_prohibit_out | -130
  ip6_pkt_discard      | -261
  ip6_pkt_discard_out  | -130
  ip6_pkt_prohibit     | -261
 4 functions changed, 782 bytes removed, diff: -782

net/ipv6/route.c:
  ip6_pkt_drop | +300
 1 function changed, 300 bytes added, diff: +300

net/ipv6/route.o:
 5 functions changed, 300 bytes added, 782 bytes removed, diff: -482

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:49 -08:00
Daniel Lezcano
389f661224 [NETNS][IPV6]: inet6_addr - make ipv6_chk_home_addr namespace aware
Looks if the address is belonging to the network namespace, otherwise
discard the address for the check.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:46 -08:00
Daniel Lezcano
1cab3da6be [NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace aware
The inet6_addr_lst is browsed taking into account the network
namespace specified as parameter. If an address does not belong
to the specified namespace, it is ignored.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano
06bfe655e7 [NETNS][IPV6]: inet6_addr - ipv6_chk_same_addr namespace aware
This patch makes ipv6_chk_same_addr function to be aware of the
network namespace. The addresses not belonging to the network
namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:45 -08:00
Daniel Lezcano
bfeade0870 [NETNS][IPV6]: inet6_addr - check ipv6 address per namespace
When a new address is added, we must check if the new address does not
already exists.  This patch makes this check to be aware of a network
namespace, so the check will look if the address already exists for
the specified network namespace. While the addresses are browsed, the
addresses which do not belong to the namespace are discarded.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:44 -08:00
Daniel Lezcano
3c40090a0f [NETNS][IPV6]: inet6_addr - isolate inet6 addresses from proc file
Make /proc/net/if_inet6 show only inet6 addresses belonging to the
namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:43 -08:00
Pavel Emelyanov
e186932b3d [NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers
Actually the net->ipv6.devconf_all can be used in a few places,
but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently
in the namespace we should use the per-net devconf_all in the
sysctl "forwarding" handler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:41 -08:00
Pavel Emelyanov
441fc2a239 [NETNS]: Use the per-net ipv6_devconf_dflt
All its users are in net/ipv6/addrconf.c's sysctl handlers.
Since they already have the struct net to get from, the
per-net ipv6_devconf_dflt can already be used.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Pavel Emelyanov
e0da5a480c [NETNS]: Create ipv6 devconf-s for namespaces
This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.

The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.

The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net->devconf_dflt pointer to be already set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:40 -08:00
Pavel Emelyanov
bff16c2f99 [NETNS]: Make the ctl-tables per-namespace
This includes passing the net to __addrconf_sysctl_register
and saving this on the ctl_table->extra2 to be used in
handlers (those, needing it).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:39 -08:00
Pavel Emelyanov
9589731220 [NETNS]: Make the __addrconf_sysctl_register return an error
This error code will be needed to abort the namespace
creation if needed.

Probably, this is to be checked when a new device is
created (currently it is ignored).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:38 -08:00
Pavel Emelyanov
408c4768cd [NETNS]: Clean out the ipv6-related sysctls creation/destruction
The addrconf sysctls and neigh sysctls are registered and
unregistered always in pairs, so they can be joined into
one (well, two) functions, that accept the struct inet6_dev
and do all the job.

This also get rids of unneeded ifdefs inside the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:38 -08:00
Denis V. Lunev
4d1169c1e7 [NETNS]: Add netns to nl_info structure.
nl_info is used to track the end-user destination of routing change
notification. This is a natural object to hold a namespace on. Place
it there and utilize the context in the appropriate places.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:29 -08:00
Eric W. Biederman
6b175b26c1 [NETNS]: Add netns parameter to inet_(dev_)add_type.
The patch extends the inet_addr_type and inet_dev_addr_type with the
network namespace pointer. That allows to access the different tables
relatively to the network namespace.

The modification of the signature function is reported in all the
callers of the inet_addr_type using the pointer to the well known
init_net.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:27 -08:00
Denis V. Lunev
868d13ac81 [NETNS]: Pass fib_rules_ops into default_pref method.
fib_rules_ops contains operations and the list of configured rules. ops will
become per/namespace soon, so we need them to be known in the default_pref
callback.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:22 -08:00
Denis V. Lunev
f8c26b8d58 [NETNS]: Add netns parameter to fib_rules_(un)register.
The patch extends the different fib rules API in order to pass the
network namespace pointer. That will allow to access the different
tables from a namespace relative object. As usual, the pointer to the
init_net variable is passed as parameter so we don't break the
network.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:21 -08:00
Daniel Lezcano
41a76906b3 [NETNS][IPV6]: Make icmpv6_time sysctl per namespace.
This patch moves the icmpv6_time sysctl to the network namespace
structure.

Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively to the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano
4990509f19 [NETNS][IPV6]: Make sysctls route per namespace.
All the sysctl concerning the routes are moved to the network
namespace structure. A helper function is called to initialize the
variables.

Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:20 -08:00
Daniel Lezcano
7c76509d0d [NETNS][IPV6]: Make mld_max_msf readonly in other namespaces.
The mld_max_msf protects the system with a maximum allowed multicast
source filters. Making this variable per namespace can be potentially
an problem if someone inside a namespace set it to a big value, that
will impact the whole system including other namespaces.

I don't see any benefits to have it per namespace for now, so in order
to keep a directory entry in a newly created namespace, I make it
read-only when we are not in the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:19 -08:00
Daniel Lezcano
e71e0349eb [NETNS][IPV6]: Make ip6_frags per namespace.
The ip6_frags is moved to the network namespace structure.  Because
there can be multiple instances of the network namespaces, and the
ip6_frags is no longer a global static variable, a helper function has
been added to facilitate the initialization of the variables.

Until the ipv6 protocol is not per namespace, the variables are
accessed relatively from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano
99bc9c4e45 [NETNS][IPV6]: Make bindv6only sysctl per namespace.
This patch moves the bindv6only sysctl to the network namespace
structure. Until the ipv6 protocol is not per namespace, the sysctl
variable is always from the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:18 -08:00
Daniel Lezcano
760f2d0186 [NETNS][IPV6]: Make multiple instance of sysctl tables.
Each network namespace wants its own set of sysctl value, eg. we
should not be able from a namespace to set a sysctl value for another
namespace , especially for the initial network namespace.

This patch duplicates the sysctl table when we register a new network
namespace for ipv6. The duplicated table are postfixed with the
"template" word to notify the developper the table is cloned.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:17 -08:00
Daniel Lezcano
89918fc270 [NETNS][IPV6]: Make the ipv6 sysctl to be a netns subsystem.
The initialization of the sysctl for the ipv6 protocol is changed to a
network namespace subsystem. That means when a new network namespace
is created the initialization function for the sysctl will be called.

That do not change the behavior of the sysctl in case of the kernel
with the network namespace disabled.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:16 -08:00
Daniel Lezcano
81c1c17804 [NETNS][IPV6]: Make a subsystem for af_inet6.
This patch add a network namespace subsystem for the af_inet6 module.
It does nothing right now, but one of its purpose is to receive the
different variables for sysctl in order to initialize them.

When the sysctl variable will be moved to the network namespace
structure, they will be no longer initialized as global static
variables, so we must find a place to initialize them. Because the
sysctl can be disabled, it has no sense to store them in the
sysctl_net_ipv6 file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:15 -08:00
Daniel Lezcano
291480c09a [NETNS][IPV6]: Make ipv6_sysctl_register to return a value.
This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to handle an error and
catch it from the initialization of the sysctl.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:14 -08:00
Pavel Emelyanov
3d7cc2ba62 [NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modules
This includes the most simple cases for netfilter.

The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.

The conntrack module is also patched, but this hunk is
very small and simple.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:01:10 -08:00
Eric Dumazet
65f7651788 [NET]: prot_inuse cleanups and optimizations
1) Cleanups (all functions are prefixed by sock_prot_inuse)

sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse()       -> sock_prot_inuse_get()

New functions :

sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.

2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.

This saves 1372 bytes on i386/SMP and some cpu cycles.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:36 -08:00
Eric Dumazet
9a429c4983 [NET]: Add some acquires/releases sparse annotations.
Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:31 -08:00
Hideo Aoki
95766fff6b [UDP]: Add memory accounting.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:19 -08:00
Gui Jianfeng
a06b494b61 [IPV6]: Remove useless code from fib6_del_route().
There are useless codes in fib6_del_route(). The following patch has
been tested, every thing looks fine, as usual.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:17 -08:00
Herbert Xu
9dd3245a2a [IPSEC]: Move all calls to xfrm_audit_state_icvfail to xfrm_input
Let's nip the code duplication in the bud :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Herbert Xu
0883ae0e55 [IPSEC]: Fix transport-mode async resume on intput without netfilter
When netfilter is off the transport-mode async resumption doesn't work
because we don't push back the IP header.  This patch fixes that by
moving most of the code outside of ifdef NETFILTER since the only part
that's not common is the short-circuit in the protocol handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Eric W. Biederman
426b5303eb [NETNS]: Modify the neighbour table code so it handles multiple network namespaces
I'm actually surprised at how much was involved.  At first glance it
appears that the neighbour table data structures are already split by
network device so all that should be needed is to modify the user
interface commands to filter the set of neighbours by the network
namespace of their devices.

However a couple things turned up while I was reading through the
code.  The proxy neighbour table allows entries with no network
device, and the neighbour parms are per network device (except for the
defaults) so they now need a per network namespace default.

So I updated the two structures (which surprised me) with their very
own network namespace parameter.  Updated the relevant lookup and
destroy routines with a network namespace parameter and modified the
code that interacts with users to filter out neighbour table entries
for devices of other namespaces.

I'm a little concerned that we can modify and display the global table
configuration and from all network namespaces.  But this appears good
enough for now.

I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner.  The
hash table is already dynamically sized so there are it is not a
limiter.  The default parameter would be straight forward to take care
of.  However when I look at the how the network table is built and
used I still find some assumptions that there is only a single
neighbour table for each type of table in the kernel.  The netlink
operations, neigh_seq_start, the non-core network users that call
neigh_lookup.  So while it might be doable it would require more
refactoring than my current approach of just doing a little extra
filtering in the code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:03 -08:00
Paul Moore
afeb14b490 [XFRM]: RFC4303 compliant auditing
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
YOSHIFUJI Hideaki
9cb5734e5b [TCP]: Convert several length variable to unsigned.
Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:56 -08:00
Masahide NAKAMURA
0aa647746e [XFRM]: Support to increment packet dropping statistics.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:39 -08:00
Masahide NAKAMURA
9473e1f631 [XFRM] MIPv6: Fix to input RO state correctly.
Disable spin_lock during xfrm_type.input() function.
Follow design as IPsec inbound does.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:37 -08:00
Masahide NAKAMURA
a1b051405b [XFRM] IPv6: Fix dst/routing check at transformation.
IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
  care about routing changes. It is required by MIPv6 and
  off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
  MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Herbert Xu
195ad6a3ac [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels
It appears that I've managed to create two different functions both
called xfrm6_tunnel_output.  This is because we have the plain tunnel
encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
which lives in the files xfrmX_mode_tunnel.c.

This patch renames functions from the latter to use the xfrmX_mode_tunnel
prefix to avoid name-space conflicts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:18 -08:00
Patrick McHardy
33b8e77605 [NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option
The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter
options when disabled and provides defaults (M) that should allow to
run a distribution firewall without further thinking.

Defaults to 'y' to avoid breaking current configurations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:12 -08:00
Jan Engelhardt
e79ec50b95 [NETFILTER]: Parenthesize macro parameters
Parenthesize macro parameters.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:08 -08:00
Patrick McHardy
1e796fda00 [NETFILTER]: constify nf_afinfo
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:05 -08:00
Patrick McHardy
7b2f9631e7 [NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo arg
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:59 -08:00
Patrick McHardy
f01ffbd6e7 [NETFILTER]: nf_log: move logging stuff to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:58 -08:00
Patrick McHardy
77236b6e33 [NETFILTER]: ctnetlink: use netlink attribute helpers
Use NLA_PUT_BE32, nla_get_be32() etc.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:54 -08:00
Patrick McHardy
da4d0f6b3d [NETFILTER]: ip6_tables: use raw_smp_processor_id() in do_add_counters()
Use raw_smp_processor_id() in do_add_counters() as in ip_tables.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:41 -08:00
Patrick McHardy
b5dd674b2a [NETFILTER]: ip6_tables: fix stack leagage
Fix leakage of local variable on stack. This already got fixed in
ip_tables silently by the compat patches.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:40 -08:00
Patrick McHardy
c9d8fe1317 [NETFILTER]: {ip,ip6}_tables: fix format strings
Use %zu for sizeof() and remove casts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:39 -08:00
Patrick McHardy
9c54795950 [NETFILTER]: {ip,ip6}_tables: reformat to eliminate differences
Reformat ip_tables.c and ip6_tables.c in order to eliminate non-functional
differences and minimize diff output.

This allows to get a view of the real differences using:

sed -e 's/IP6T/IPT/g' \
    -e 's/IP6/IP/g' \
    -e 's/INET6/INET/g' \
    -e 's/ip6t/ipt/g' \
    -e 's/ip6/ip/g' \
    -e 's/ipv6/ip/g' \
    -e 's/icmp6/icmp/g' \
    net/ipv6/netfilter/ip6_tables.c | \
    diff -wup /dev/stdin net/ipv4/netfilter/ip_tables.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:39 -08:00
Patrick McHardy
3bc3fe5eed [NETFILTER]: ip6_tables: add compat support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:36 -08:00
Patrick McHardy
d924357c50 [NETFILTER]: ip6_tables: resync get_entries() with ip_tables
Resync get_entries() with ip_tables.c by moving the checks from the
setsockopt handler to the function itself.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:35 -08:00
Patrick McHardy
433665c9d1 [NETFILTER]: ip6_tables: move IP6T_SO_GET_INFO handling to seperate function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:35 -08:00
Patrick McHardy
ed1a6f5e77 [NETFILTER]: ip6_tables: move counter allocation to seperate function
More resyncing with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:34 -08:00
Patrick McHardy
3b84e92b0d [NETFILTER]: ip6_tables: use vmalloc_node()
Consistently use vmalloc_node for all counter allocations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:33 -08:00
Patrick McHardy
f173c8a1f2 [NETFILTER]: ip6_tables: move entry, match and target checks to seperate functions
Resync with ip_tables.c as preparation for compat support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:32 -08:00
Patrick McHardy
72f36ec14f [NETFILTER]: ip6_tables: kill a few useless defines/forward declarations
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:32 -08:00
Herbert Xu
9055e051b8 [UDP]: Move udp_stats_in6 into net/ipv4/udp.c
Now that external users may increment the counters directly, we need
to ensure that udp_stats_in6 is always available.  Otherwise we'd
either have to requrie the external users to be built as modules or
ipv6 to be built-in.

This isn't too bad because udp_stats_in6 is just a pair of pointers
plus an EXPORT, e.g., just 40 (16 + 24) bytes on x86-64.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:06 -08:00
Michal Schmidt
8a4a50f98b [IPV6] sit: Rebinding of SIT tunnels to other interfaces
This is similar to the change already done for IPIP tunnels.

Once created, a SIT tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode sit remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ipv6/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:56 -08:00
Denis V. Lunev
528c4ceb42 [IPV6]: Always pass a valid nl_info to inet6_rt_notify.
This makes the code in the inet6_rt_notify more straightforward and provides
groud for namespace passing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:55 -08:00
Daniel Lezcano
09f7709f49 [IPV6]: fix section mismatch warnings
Removed useless and buggy __exit section in the different
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:46 -08:00
Herbert Xu
aebcf82c1f [IPSEC]: Do not let packets pass when ICMP flag is off
This fixes a logical error in ICMP policy checks which lets
packets through if the state ICMP flag is off.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:43 -08:00
Herbert Xu
bb72845e69 [IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT
This patch converts all callers of xfrm_lookup that used an
explicit value of 1 to indiciate blocking to use the new flag
XFRM_LOOKUP_WAIT.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:42 -08:00
Herbert Xu
7233b9f33e [IPSEC]: Fix reversed ICMP6 policy check
The policy check I added for ICMP on IPv6 is reversed.  This
patch fixes that.

It also adds an skb->sp check so that unprotected packets that
fail the policy check do not crash the machine.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:41 -08:00
Herbert Xu
8b7817f3a9 [IPSEC]: Add ICMP host relookup support
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:23 -08:00
Herbert Xu
d5422efe68 [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:22 -08:00
Daniel Lezcano
7f4e4868f3 [IPV6]: make the protocol initialization to return an error code
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.

The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:13 -08:00
Daniel Lezcano
87c3efbfdd [IPV6]: make inet6_register_protosw to return an error code
This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:12 -08:00
Daniel Lezcano
853cbbaaa4 [IPV6]: make frag to return an error at initialization
This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:11 -08:00
Daniel Lezcano
248b238dc9 [IPV6]: make extended headers to return an error at initialization
This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Daniel Lezcano
0a3e78ac2c [IPV6]: make flowlabel to return an error
This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
YOSHIFUJI Hideaki
c69bce20dd [NET]: Remove unused "mibalign" argument for snmp_mib_init().
With fixes from Arnaldo Carvalho de Melo.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:02 -08:00
Daniel Lezcano
7e5449c215 [IPV6]: route6 remove ifdef for fib_rules
The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
c35b7e72cd [IPV6]: remove ifdef in route6 for xfrm6
The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
75314fb383 [IPV6]: create route6 proc init-fini functions
Make the proc creation/destruction to be a separate function. That
allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function
and make them more readable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:58 -08:00
Daniel Lezcano
f845ab6b7d [IPV6] route6/fib6: Don't panic a kmem_cache_create.
If the kmem_cache_creation fails, the kernel will panic. It is
acceptable if the system is booting, but if the ipv6 protocol is
compiled as a module and it is loaded after the system has booted, do
we want to panic instead of just failing to initialize the protocol ?

The init function is now returning an error and this one is checked
for protocol initialization. So the ipv6 protocol will safely fails.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:48 -08:00
Daniel Lezcano
e2fddf5e96 [IPV6]: Make af_inet6 to check ip6_route_init return value.
The af_inet6 initialization function does not check the return code of
the route initilization, so if something goes wrong, the protocol
initialization will continue anyway.  This patch takes into account
the modification made in the different route's initialization
subroutines to check the return value and to make the protocol
initialization to fail.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
433d49c3bb [IPV6]: Make ip6_route_init to return an error code.
The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.

Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
9eb87f3f7e [IPV6]: Make fib6_rules_init to return an error code.
When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:46 -08:00
Daniel Lezcano
0013cabab3 [IPV6]: Make xfrm6_init to return an error code.
The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that.  This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Daniel Lezcano
d63bddbe90 [IPV6]: Make fib6_init to return an error code.
If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Denis V. Lunev
5a3e55d68e [NET]: Multiple namespaces in the all dst_ifdown routines.
Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:44 -08:00
Herbert Xu
a59322be07 [UDP]: Only increment counter on first peek/recv
The previous move of the the UDP inDatagrams counter caused each
peek of the same packet to be counted separately.  This may be
undesirable.

This patch fixes this by adding a bit to sk_buff to record whether
this packet has already been seen through skb_recv_datagram.  We
then only increment the counter when the packet is seen for the
first time.

The only dodgy part is the fact that skb_recv_datagram doesn't have
a good way of returning this new bit of information.  So I've added
a new function __skb_recv_datagram that does return this and made
skb_recv_datagram a wrapper around it.

The plan is to eventually replace all uses of skb_recv_datagram with
this new function at which time it can be renamed its proper name.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:34 -08:00
Herbert Xu
1781f7f580 [UDP]: Restore missing inDatagrams increments
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:33 -08:00
Herbert Xu
27ab256864 [UDP]: Avoid repeated counting of checksum errors due to peeking
Currently it is possible for two processes to peek on the same socket
and end up incrementing the error counter twice for the same packet.

This patch fixes it by making skb_kill_datagram return whether it
succeeded in unlinking the packet and only incrementing the counter
if it did.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
c8fecf2242 [IPV6]: Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
The only difference in this case is that updating all.forwarding
causes the update in default.forwarding when done via proc, but
not via the system call.

Besides, this consolidates a good portion of code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
4d43b78ac2 [IPV6]: Use sysctl paths to register ipv6 sysctl tables
I have already done this for core, ipv4 and tr tables, so repeat this
for the ipv6 ones.

This makes the ipv6.ko smaller and creates the ground needed for net
namespaces support in ipv6.ko ssctls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:30 -08:00
Pavel Emelyanov
4a61b586cd [IPV6]: Make the ipv6/sysctl_net_ipv6.c compilation cleaner
Since this file is entirely enclosed with the
#ifdef CONFIG_SYSCTL/#endif pair, it's OK to move this
CONFIG_ into a Makefile.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Patrick McHardy
4b3d15ef4a [NETFILTER]: {nfnetlink,ip,ip6}_queue: kill issue_verdict
Now that issue_verdict doesn't need to free the queue entries anymore,
all it does is disable local BHs and call nf_reinject. Move the BH
disabling to the okfn invocation in nf_reinject and kill the
issue_verdict functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:15 -08:00
Patrick McHardy
02f014d888 [NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
7a6c6653b3 [NETFILTER]: ip6_queue: resync dev-index based flushing
Resync dev_cmp to take bridge devices into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
171b7fc4fc [NETFILTER]: ip6_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

ipq_find_dequeue_entry -> __ipq_find_dequeue_entry ->
	__ipq_find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
ipq_find_dequeue_entry. Instead add it to ipq_flush (after
similar cleanups) and use ipq_flush for both complete flushes
and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:13 -08:00
Patrick McHardy
0ac41e8146 [NETFILTER]: {nf_netlink,ip,ip6}_queue: use list_for_each_entry
Use list_add_tail/list_for_each_entry instead of list_add and
list_for_each_prev as a preparation for switching to RCU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:11 -08:00
Patrick McHardy
c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
f9d8928f83 [NETFILTER]: nf_queue: remove unused data pointer
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
e3ac529815 [NETFILTER]: nf_queue: make queue_handler const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:09 -08:00
Patrick McHardy
1999414a4e [NETFILTER]: Mark hooks __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:07 -08:00
Patrick McHardy
1841a4c7ae [NETFILTER]: nf_ct_h323: remove ipv6 module dependency
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.

Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Maciej Soltysiak
17dfc93f6d [NETFILTER]: {ip,ip6}t_LOG: log GID
Log GID in addition to UID

Signed-off-by: Maciej Soltysiak <maciej.soltysiak@ae.poznan.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Jan Engelhardt
4c37799ccf [NETFILTER]: Use lowercase names for matches in Kconfig
Unify netfilter match kconfig descriptions

Consistently use lowercase for matches in kconfig one-line
descriptions and name the match module.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:57 -08:00
Jan Engelhardt
0265ab44ba [NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner
xt_owner merges ipt_owner and ip6t_owner, and adds a flag to match
on socket (non-)existence.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:55 -08:00
Patrick McHardy
9e67d5a739 [NETFILTER]: x_tables: remove obsolete overflow check
We're not multiplying the size with the number of CPUs anymore, so the
check is obsolete.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Eric Dumazet
259d4e41f3 [NETFILTER]: x_tables: struct xt_table_info diet
Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids

This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )

In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.

This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:54 -08:00
Jan Engelhardt
d3c5ee6d54 [NETFILTER]: x_tables: consistent and unique symbol names
Give all Netfilter modules consistent and unique symbol names.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:53 -08:00
Herbert Xu
2fcb45b6b8 [IPSEC]: Use the correct family for input state lookup
When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call.  This broke IPv6 obviously.  This
patch fixes by getting the input callers to specify the family through
skb->cb.

Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:49 -08:00
Wang Chen
b2bf1e2659 [UDP]: Clean up for IS_UDPLITE macro
Since we have macro IS_UDPLITE, we can use it.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:48 -08:00
Wang Chen
cb75994ec3 [UDP]: Defer InDataGrams increment until recvmsg() does checksum
Thanks dave, herbert, gerrit, andi and other people for your
discussion about this problem.

UdpInDatagrams can be confusing because it counts packets that
might be dropped later.
Move UdpInDatagrams into recvmsg() as allowed by the RFC.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:47 -08:00
Pavel Emelyanov
1dab62226d [IPV6]: Use ctl paths to register addrconf sysctls
This looks very much like the patch for ipv4's devinet.

This is also intended to help us with the net namespaces
and saves the ipv6.ko size by ~320 bytes.

The difference from the first version is just the patch
offsets, that changed due to changes in the patch #2.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:38 -08:00
Pavel Emelyanov
f52295a9c5 [IPV6]: Unify and cleanup calls to addrconf_sysctl_register
Currently this call is (ab)used similar to devinet one - it
registers sysctls for devices and for the "default" confs, while
the "all" sysctls are registered separately. But unlike its
devinet brother, the passed inet6_device is needed.

The fix is to make a __addrconf_sysctl_register(), which registers
sysctls for all "devices" we need, including "default" and "all" :)

The original addrconf_sysctl_register() calls the introduced
function, passing the inet6_device, device name and ifindex (to
be used as procname and ctl_name) into it.

Thanks to Herbert again for pointing out, that we can shrink the
argument list to 1 :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:38 -08:00
Pavel Emelyanov
f68635e627 [IPV6]: Cleanup the addconf_sysctl_register
This only includes fixing the space-indented lines and
removing one unneeded else after the goto.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:26 -08:00
Fred L. Templin
c7dc89c0ac [IPV6]: Add RFC4214 support
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.

This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.

Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:09 -08:00
Pavel Emelyanov
f126734735 [IPV6]: Correct the comment concerning inetsw6 table
It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c,
but one comment was not fixed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:49 -08:00
Pavel Emelyanov
42a73808ed [RAW]: Consolidate proc interface.
Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.

The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)

Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.

This removes most of the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:32 -08:00
Pavel Emelyanov
ab70768ec7 [RAW]: Consolidate proto->unhash callback
Same as the ->hash one, this is easily consolidated.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov
65b4c50b47 [RAW]: Consolidate proto->hash callback
Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:31 -08:00
Pavel Emelyanov
b673e4dfc8 [RAW]: Introduce raw_hashinfo structure
The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.

Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:30 -08:00
Pavel Emelyanov
69d6da0b0f [IPv6] RAW: Compact the API for the kernel
Same as in the previous patch for ipv4, compact the
API and hide hash table and rwlock inside the raw.c
file.

Plus fix some "bad" places from checkpatch.pl point
of view (assignments inside if()).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:29 -08:00
Denis V. Lunev
97c53cacf0 [NET]: Make rtnetlink infrastructure network namespace aware (v3)
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:25 -08:00
Denis V. Lunev
b854272b3c [NET]: Modify all rtnetlink methods to only work in the initial namespace (v2)
Before I can enable rtnetlink to work in all network namespaces I need
to be certain that something won't break.  So this patch deliberately
disables all of the rtnletlink methods in everything except the
initial network namespace.  After the methods have been audited this
extra check can be disabled.

Changes from v1:
- added IPv6 addrlabel protection

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-01-28 14:54:24 -08:00
YOSHIFUJI Hideaki
2a8cc6c890 [IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.
Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:58 -08:00
YOSHIFUJI Hideaki
303065a854 [IPV6] ADDRCONF: Allow address selection policy with ifindex.
This patch allows ifindex to be a key for address selection policy table.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:57 -08:00
YOSHIFUJI Hideaki
c1ee656ccb [IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label().
This patch renames ipv6_saddr_label() to ipv6_addr_label() because
address label is used for both of source address and destination
address.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:56 -08:00
David S. Miller
294b4baf29 [IPSEC]: Kill afinfo->nf_post_routing
After changeset:

	[NETFILTER]: Introduce NF_INET_ hook values

It always evaluates to NF_INET_POST_ROUTING.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Patrick McHardy
6e23ae2a48 [NETFILTER]: Introduce NF_INET_ hook values
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:55 -08:00
Herbert Xu
1bf06cd2e3 [IPSEC]: Add async resume support on input
This patch adds support for async resumptions on input.  To do so, the
transform would return -EINPROGRESS and subsequently invoke the
function xfrm_input_resume to resume processing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:54 -08:00
Herbert Xu
60d5fcfb19 [IPSEC]: Remove nhoff from xfrm_input
The nhoff field isn't actually necessary in xfrm_input.  For tunnel
mode transforms we now throw away the output IP header so it makes no
sense to fill in the nexthdr field.  For transport mode we can now let
the function transport_finish do the setting and it knows where the
nexthdr field is.

The only other thing that needs the nexthdr field to be set is the
header extraction code.  However, we can simply move the protocol
extraction out of the generic header extraction.

We want to minimise the amount of info we have to carry around between
transforms as this simplifies the resumption process for async crypto.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:53 -08:00
Herbert Xu
d26f398400 [IPSEC]: Make x->lastused an unsigned long
Currently x->lastused is u64 which means that it cannot be
read/written atomically on all architectures.  David Miller observed
that the value stored in it is only an unsigned long which is always
atomic.

So based on his suggestion this patch changes the internal
representation from u64 to unsigned long while the user-interface
still refers to it as u64.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:52 -08:00
Herbert Xu
0ebea8ef35 [IPSEC]: Move state lock into x->type->input
This patch releases the lock on the state before calling
x->type->input.  It also adds the lock to the spots where they're
currently needed.

Most of those places (all except mip6) are expected to disappear with
async crypto.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:52 -08:00
Herbert Xu
668dc8af31 [IPSEC]: Move integrity stat collection into xfrm_input
Similar to the moving out of the replay processing on the output, this
patch moves the integrity stat collectin from x->type->input into
xfrm_input.

This would eventually allow transforms such as AH/ESP to be lockless.

The error value EBADMSG (currently unused in the crypto layer) is used
to indicate a failed integrity check.  In future this error can be
directly returned by the crypto layer once we switch to aead
algorithms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:51 -08:00
Herbert Xu
716062fd4c [IPSEC]: Merge most of the input path
As part of the work on asynchronous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common input code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:50 -08:00
Herbert Xu
862b82c6f9 [IPSEC]: Merge most of the output path
As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:48 -08:00
Herbert Xu
ef76bc23ef [IPV6]: Add ip6_local_out
Most callers of the LOCAL_OUT chain will set the IP packet length
before doing so.  They also share the same output function dst_output.

This patch creates a new function called ip6_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:47 -08:00
Herbert Xu
227620e295 [IPSEC]: Separate inner/outer mode processing on input
With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the input path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the inner mode
input functions to modify the inner IP header.  In this way the input
function no longer has to know about the outer address family.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:46 -08:00
Herbert Xu
36cf9acf93 [IPSEC]: Separate inner/outer mode processing on output
With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the output path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_output/xfrm6_extract_output
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the outer mode
output functions to write the outer IP header.  In this way the output
function no longer has to know about the inner address family.

Since the extract functions are only called by tunnel modes (the only
modes that can support inter-family transforms), I've also moved the
xfrm*_tunnel_check_size calls into them.  This allows the correct ICMP
message to be sent as opposed to now where you might call icmp_send
with an IPv6 packet and vice versa.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Herbert Xu
29bb43b4ec [INET]: Give outer DSCP directly to ip*_copy_dscp
This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:45 -08:00
Herbert Xu
a2deb6d26f [IPSEC]: Move x->outer_mode->output out of locked section
RO mode is the only one that requires a locked output function.  So
it's easier to move the lock into that function rather than requiring
everyone else to run under the lock.

In particular, this allows us to move the size check into the output
function without causing a potential dead-lock should the ICMP error
somehow hit the same SA on transmission.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:44 -08:00
Herbert Xu
e40b328615 [IPSEC]: Forbid BEET + ipcomp for now
While BEET can theoretically work with IPComp the current code can't
do that because it tries to construct a BEET mode tunnel type which
doesn't (and cannot) exist.  In fact as it is it won't even attach a
tunnel object at all for BEET which is bogus.

To support this fully we'd also need to change the policy checks on
input to recognise a plain tunnel as a legal variant of an optional
BEET transform.

This patch simply fails such constructions for now.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:43 -08:00
Herbert Xu
25ee3286dc [IPSEC]: Merge common code into xfrm_bundle_create
Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common.  This patch extracts that logic and puts it into
xfrm_bundle_create.  The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:43 -08:00
Herbert Xu
66cdb3ca27 [IPSEC]: Move flow construction into xfrm_dst_lookup
This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function.  It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:42 -08:00
Herbert Xu
f04e7e8d7f [IPSEC]: Replace x->type->{local,remote}_addr with flags
The functions local_addr and remote_addr are more than what they're
needed for.  The same thing can be done easily with flags on the type
object.  This patch does that and simplifies the wrapper functions in
xfrm6_policy accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:41 -08:00
Herbert Xu
fff6938880 [IPSEC]: Make sure idev is consistent with dev in xfrm_dst
Previously we took the device from the bottom route and idev from the
top route.  This is bad because idev may well point to a different
device.  This patch changes it so that we get the idev from the device
directly.

It also makes it an error if either dev or idev is NULL.  This is
consistent with the rest of the routing code which also treats these
cases as errors.

I've removed the err initialisation in xfrm6_policy.c because it
achieves no purpose and hid a bug when an initial version of this
patch neglected to set err to -ENODEV (fortunately the IPv4 version
warned about it).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:40 -08:00
Herbert Xu
45ff5a3f9a [IPSEC]: Set dst->input to dst_discard
The input function should never be invoked on IPsec dst objects.  This
is because we don't apply IPsec on input until after we've made the
routing decision.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:40 -08:00
Herbert Xu
8ce68ceb55 [IPSEC]: Only set neighbour on top xfrm dst
The neighbour field is only used by dst_confirm which only ever happens on
the top-most xfrm dst.  So it's a waste to duplicate for every other xfrm
dst.  This patch moves its setting out of the loop so that only the top one
gets set.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:39 -08:00
Herbert Xu
352e512c32 [NET]: Eliminate duplicate copies of dst_discard
We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.

This patch deletes all of them except dst_discard and points all the
users to it.

The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors.  So they should
either all return errors or all return zero.  For now I've stuck with
the majority and picked zero as the return value.

It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:37 -08:00
Herbert Xu
b4ce92775c [IPV6]: Move nfheader_len into rt6_info
The dst member nfheader_len is only used by IPv6.  It's also currently
creating a rather ugly alignment hole in struct dst.  Therefore this patch
moves it from there into struct rt6_info.

It also reorders the fields in rt6_info to minimize holes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:37 -08:00
Herbert Xu
0148894223 [IPV6]: Only set nfheader_len for top xfrm dst
We only need to set nfheader_len in the top xfrm dst.  This is because
we only ever read the nfheader_len from the top xfrm dst.

It is also easier to count nfheader_len as part of header_len which
then lets us remove the ugly wrapper functions for incrementing and
decrementing header lengths in xfrm6_policy.c.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:35 -08:00
Pavel Emelyanov
b24b8a247f [NET]: Convert init_timer into setup_timer
Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:35 -08:00
Wang Chen
a92aa318b4 [IPV6]: Add raw6 drops counter.
Add raw drops counter for IPv6 in /proc/net/raw6 .

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:34 -08:00
Jens Axboe
a0974dd3da [TCP] splice: add tcp_splice_read() to IPV6
Thanks to YOSHIFUJI Hideaki for the hint!

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:32 -08:00
Rolf Manderscheid
a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Herbert Xu
f945fa7ad9 [INET]: Fix truesize setting in ip_append_data
As it is ip_append_data only counts page fragments to the skb that
allocated it.  As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.

This bug was exposed by the UDP accounting patch.

[ The wmem_alloc bumping needs to be moved with the truesize,
  noticed by Takahiro Yasui.  -DaveM ]

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-23 03:11:43 -08:00
Wang Chen
fa95c28322 [IPV6]: RFC 2011 compatibility broken
The snmp6 entry name was changed, and it broke compatibility
to RFC 2011.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-21 03:05:43 -08:00
Wang Chen
c964ff4ffb [IPV6]: ICMP6_MIB_OUTMSGS increment duplicated
icmpv6_send() calls ip6_push_pending_frames() indirectly.
Both ip6_push_pending_frames() and icmpv6_send() increment
counter ICMP6_MIB_OUTMSGS.

This patch remove the increment from icmpv6_send.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-21 03:05:20 -08:00
YOSHIFUJI Hideaki
398bcbebb6 [IPV6] ROUTE: Make sending algorithm more friendly with RFC 4861.
We omit (or delay) sending NSes for known-to-unreachable routers (in
NUD_FAILED state) according to RFC 4191 (Default Router Preferences
and More-Specific Routes).  But this is not fully compatible with RFC
4861 (Neighbor Discovery Protocol for IPv6), which does not remember
unreachability of neighbors.

So, let's avoid mixing sending algorithm of RFC 4191 and that of RFC
4861, and make the algorithm more friendly with RFC 4861 if RFC 4191
is disabled.

Issue was found by IPv6 Ready Logo Core Self_Test 1.5.0b2 (by TAHI
Project), and has been tracked down by Mitsuru Chinen
<mitch@linux.vnet.ibm.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:40 -08:00
Pavel Emelyanov
b3652b2dc5 [IPV6]: Mischecked tw match in __inet6_check_established.
When looking for a conflicting connection the !sk->sk_bound_dev_if
check is performed only for live sockets, but not for timewait-ed.

This is not the case for ipv4, for __inet6_lookup_established in
both ipv4 and ipv6 and for other places that check for tw-s.

Was this missed accidentally? If so, then this patch fixes it and
besides makes use if the dif variable declared in the function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-20 20:31:36 -08:00
Yasuyuki Kozakai
8f41f01786 [NETFILTER]: ip6t_eui64: Fixes calculation of Universal/Local bit
RFC2464 says that the next to lowerst order bit of the first octet
of the Interface Identifier is formed by complementing
the Universal/Local bit of the EUI-64. But ip6t_eui64 uses OR not XOR.

Thanks Peter Ivancik for reporing this bug and posting a patch
for it.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-10 22:40:39 -08:00
Brian Haley
1ac4f00885 [IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:52:21 -08:00
Joe Perches
bea8519547 [IPV6]: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-20 14:01:35 -08:00
Wei Yongjun
cf6fc4a924 [IPV6]: Fix the return value of ipv6_getsockopt
If CONFIG_NETFILTER if not selected when compile the kernel source code, 
ipv6_getsockopt will returen an EINVAL error if optname is not supported by
the kernel. But if CONFIG_NETFILTER is selected, ENOPROTOOPT error will 
be return.

This patch fix to always return ENOPROTOOPT error if optname argument of 
ipv6_getsockopt is not supported by the kernel.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-16 13:39:57 -08:00
Thomas Graf
95a02cfd4d [IPv6] ESP: Discard dummy packets introduced in rfc4303
RFC4303 introduces dummy packets with a nexthdr value of 59
to implement traffic confidentiality. Such packets need to
be dropped silently and the payload may not be attempted to
be parsed as it consists of random chunk.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-11 02:45:27 -08:00
YOSHIFUJI Hideaki
1df2e44560 [IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx.
RTCF_xxx flags, defined in include/linux/in_route.h) are available for
IPv4 route (rtable) entries only.  Use RTF_xxx flags instead, defined
in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-11 02:45:24 -08:00
Mitsuru Chinen
ca46f9c834 [IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable network
IPv6 stack doesn't increment OutNoRoutes counter when IP datagrams
is being discarded because no route could be found to transmit them
to their destination. IPv6 stack should increment the counter.
Incidentally, IPv4 stack increments that counter in such situation.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-07 01:06:30 -08:00
Evgeniy Polyakov
d31c7b8fa3 [IPV6]: Restore IPv6 when MTU is big enough
Avaid provided test application, so bug got fixed.

IPv6 addrconf removes ipv6 inner device from netdev each time cmu
changes and new value is less than IPV6_MIN_MTU (1280 bytes).
When mtu is changed and new value is greater than IPV6_MIN_MTU,
it does not add ipv6 addresses and inner device bac.

This patch fixes that.

Tested with Avaid's application, which works ok now.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-30 23:36:08 +11:00
YOSHIFUJI Hideaki
77adefdc98 [IPV6] TCPMD5: Fix deleting key operation.
Due to the bug, refcnt for md5sig pool was leaked when
an user try to delete a key if we have more than one key.
In addition to the leakage, we returned incorrect return
result value for userspace.

This fix should close Bug #9418, reported by <ming-baini@163.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-20 17:31:23 -08:00
YOSHIFUJI Hideaki
aacbe8c880 [IPV6] TCPMD5: Check return value of tcp_alloc_md5sig_pool().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-20 17:30:56 -08:00
Joe Perches
3b6d821c4f [IPV6]: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 23:47:25 -08:00
Pierre Ynard
dbb2ed2485 [IPV6]: Add ifindex field to ND user option messages.
Userland neighbor discovery options are typically heavily involved with
the interface on which thay are received: add a missing ifindex field to
the original struct. Thanks to Rémi Denis-Courmont.

Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-12 17:58:35 -08:00