Commit Graph

344 Commits

Author SHA1 Message Date
Roland Dreier
400ddc11eb IB/mthca: Factor out setting WQE remote address and atomic segment entries
Factor code to set remote address and atomic segment entries out of the
work request posting functions into inline functions set_raddr_seg()
and set_atomic_seg().  This doesn't change the generated code in any
significant way, and makes the source easier on the eyes.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 12:55:42 -07:00
Roland Dreier
80885456e8 IB/mthca: Factor out setting WQE data segment entries
Factor code to set data segment entries out of the work request
posting functions into inline functions mthca_set_data_seg() and
mthca_set_data_seg_inval().  This makes the code more readable and
also allows the compiler to do a better job -- on x86_64:

add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69)
function                                     old     new   delta
mthca_arbel_post_srq_recv                    373     369      -4
mthca_arbel_post_receive                     570     562      -8
mthca_tavor_post_srq_recv                    520     508     -12
mthca_tavor_post_send                       1344    1330     -14
mthca_arbel_post_send                       1481    1467     -14
mthca_tavor_post_receive                     792     775     -17

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18 11:30:34 -07:00
Roland Dreier
6d7d080e9f IB/mthca: Use uninitialized_var() for f0
Commit 9db48926 ("drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd
var warning") added "= 0" to the declarations of f0 to shut up gcc
warnings.  However, there's no point in making the code bigger by
initializing f0 to a random value just to get rid of a warning;
setting f0 to 0 is no safer than just using uninitialized_var(), which
documents the situation better and gives smaller code too.  For example, 
on x86_64:

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-16 (-16)
function                                     old     new   delta
mthca_tavor_post_send                       1352    1344      -8
mthca_arbel_post_send                       1489    1481      -8

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 19:30:51 -07:00
Roland Dreier
e4daf73868 IB/mthca: Fix printk format used for firmware version in warning
When warning about out-of-date firmware, current mthca code messes up
the formatting of the version if the subminor doesn't have three
digits.  It doesn't fill the field with 0s so we end up with:

    ib_mthca 0000:0b:00.0: HCA FW version 1.1.  0 is old (1.2.  0 is current).

Change the format from "%3d" to "%03d" to get the right thing printed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:42 -07:00
Roland Dreier
f6be6fbe26 IB/mthca: Schedule MSI support for removal
The mthca driver supports both MSI and MSI-X.  However, MSI-X works with
all hardware that the driver handles, and provides a superset of what
MSI does, so there's no point in having code for both.  Schedule MSI
support for removal in 2008 to give anyone who actually needs MSI and
who can't use MSI time to speak up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:41 -07:00
Jeff Garzik
9db4892620 drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warning
drivers/infiniband/hw/mthca/mthca_qp.c: In function
  ‘mthca_tavor_post_send’:
drivers/infiniband/hw/mthca/mthca_qp.c:1594: warning: ‘f0’ may be used
  uninitialized in this function
drivers/infiniband/hw/mthca/mthca_qp.c: In function
  ‘mthca_arbel_post_send’:
drivers/infiniband/hw/mthca/mthca_qp.c:1949: warning: ‘f0’ may be used
  uninitialized in this function

Initializing 'f0' is not strictly necessary in either case, AFAICS.

I was considering use of uninitialized_var(), but looking at the
complex flow of control in each function, I feel it is wiser and
safer to simply zero the var and be certain of ourselves.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-17 16:18:00 -04:00
Shani Moideen
8909c571fa IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
----
2007-07-10 12:28:05 -07:00
Jan Engelhardt
06cc85086e IB: Use menuconfig for InfiniBand menu
Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Roland Dreier
3e1db334dc IB/mthca, mlx4_core: Fix typo in comment
s/signifant/significant/

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 11:51:59 -07:00
Michael S. Tsirkin
8b7e15772a IB/mthca: Fix handling of send CQE with error for QPs connected to SRQ
mthca_free_err_wqe() currently treats both send and receive CQEs
identically if a QP is using an SRQ.  But for Tavor hardware, send
CQEs with error can be chained together even if the RQ is part of SRQ,
so we may miss some CQEs.

Fix by following the WQE chain for all send CQEs even for non-SRQ QPs.

This fixes crashes in IPoIB CM:
<https://bugs.openfabrics.org//show_bug.cgi?id=604>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Linus Torvalds
8aee74c8ee Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
2007-05-21 16:19:32 -07:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Michael S. Tsirkin
b18aad7150 IB/mthca: Fix RESET to ERROR transition
According to the IB spec, a QP can be moved from RESET to the ERROR 
state, but mthca firmware does not support this and returns an error if 
we try.  Work around this FW limitation by moving the QP from RESET to
INIT with dummy parameters and then transitioning from INIT to ERROR.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:57 -07:00
Rolf Manderscheid
3f37cae694 IB/mthca: Set GRH:HopLimit when building MLX headers
Global CM packets used by rmda_cm were being sent with a GRH:hopLimit
of zero, causing them to be dropped by the router.  The problem is a
missing initialization of the hop_limit field in mthca_read_ah(),
which was called by build_mlx_header() when sending a MAD on QP1.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Ali Ayoub
de57c9f102 IB/mthca: Fix use-after-free on device restart
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:56 -07:00
Michael S. Tsirkin
bd18c11277 IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ
mthca_cq_clean() updates the CQ consumer index without moving CQEs
back to HW ownership.  As a result, the same WRID might get reported
twice, resulting in a use-after-free.  This was observed in IPoIB CM.
Fix by moving all freed CQEs to HW ownership.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:34 -07:00
Michael S. Tsirkin
3e28c56b9b IB/mthca: Fix posting >255 recv WRs for Tavor
Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must
be updated each doorbell, otherwise the next doorbell will use an
incorrect index.

Found by Ronni Zimmermann at Mellanox.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:34 -07:00
Roland Dreier
f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Linus Torvalds
972d45fb43 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Convert to NAPI
  IB: Return "maybe missed event" hint from ib_req_notify_cq()
  IB: Add CQ comp_vector support
  IB/ipath: Fix a race condition when generating ACKs
  IB/ipath: Fix two more spin lock problems
  IB/fmr_pool: Add prefix to all printks
  IB/srp: Set proc_name
  IB/srp: Add orig_dgid sysfs attribute to scsi_host
  IPoIB/cm: Don't crash if remote side uses one QP for both directions
  RDMA/cxgb3: Support for new abort logic
  RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
  RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
  RDMA/cxgb3: Fix TERM codes
  IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
  IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
  IB/mthca: Work around kernel QP starvation
  IB/ipath: Don't put QP in timeout queue if waiting to send
  IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-07 12:18:21 -07:00
Roland Dreier
ed23a72778 IB: Return "maybe missed event" hint from ib_req_notify_cq()
The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested.  In
other words, this means that the following race is possible:

	while (CQ is not empty)
		ib_poll_cq(CQ);
	// new completion is added after while loop is exited
	ib_req_notify_cq(CQ);
	// no event is generated for the existing completion

To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.

However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification).  Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.

Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:

	 < 0	means an error occurred while requesting notification
	== 0	means notification was requested successfully, and if
		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
		events were missed and it is safe to wait for another
		event.
	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
		passed in.  It means that the consumer must poll the
		CQ again to make sure it is empty to avoid the race
		described above.

We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin
f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Jean Delvare
6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Michael S. Tsirkin
9ba6d5529d IB/mthca: Work around kernel QP starvation
With mthca, RC QPs can starve each other and even UD QPs on the same
hardware schedule queue.  As a result, userspace MPI can starve
e.g. IPoIB traffic, with netdev watchdog warnings getting printed out,
and TCP connections getting stuck or failing.

Reduce the chance of this happening by using three separate hardware
schedule queues: one for userspace RC QPs, one for kernel RC QPs, and
one for all other QPs.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Joachim Fenkes
1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Roland Dreier
30c00986f3 IB/mthca: Simplify CQ cleaning in mthca_free_qp()
mthca_free_qp() already has local variables to hold the QP's send_cq
and recv_cq, so we can slightly clean up the calls to mthca_cq_clean()
by using those local variables instead of expressions like
to_mcq(qp->ibqp.send_cq).

Also, by cleaning the recv_cq first, we can avoid worrying about
whether the QP is attached to an SRQ for the second call, because we
would only clean send_cq if send_cq is not equal to recv_cq, and that
means send_cq cannot have any receive completions from the QP being
destroyed.

All this work even improves the generated code a bit:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function                                     old     new   delta
mthca_free_qp                                510     505      -5

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:11 -07:00
Roland Dreier
532c3b5817 IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash
in mthca_write_mtt() with non-memfree HCAs that have their memory
hidden (that is, have only two PCI BARs instead of having a third BAR
that allows access to the RAM attached to the HCA) on 64-bit
architectures.  This is because the commit just before, c20e20ab
("IB/mthca: Merge MR and FMR space on 64-bit systems") makes
dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and
hence mthca_write_mtt() tries to write directly into the HCA's MTT
table.  However, since that table is in the HCA's memory, this is
impossible without the PCI BAR that gives access to that memory.

This causes a crash because mthca_tavor_write_mtt_seg() basically
tries to dereference some offset of a NULL pointer.  Fix this by
adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always
use the WRITE_MTT firmware command rather than writing directly if
FMRs are not enabled.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:04 -07:00
Roland Dreier
3f114853d4 IB/mthca: Update HCA firmware revisions
Update the driver's list of current firmware versions with Mellanox's
latest releases.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:02 -07:00
Michael S. Tsirkin
608d8268be IB/mthca: Fix data corruption after FMR unmap on Sinai
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different.  This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.

Fix by re-applying adjust_key() after masking the key.

Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-16 14:10:55 -07:00
Michael S. Tsirkin
0264d88531 IB/mthca: Fix thinko in init_mr_table()
Commit c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems")
swapped the number of MTTs and MPTs when initializing the MR table. As
a result, we get a kernel oops when the number of MTT segments
allocated exceeds 0x20000.

Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by
Dotan Barak <dotanb@mellanox.co.il>.  This fixes
https://bugs.openfabrics.org/show_bug.cgi?id=490

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 15:59:32 -07:00
Roland Dreier
88171cfed5 IB/mthca: Fix error path in mthca_alloc_memfree()
The garbled logic in mthca_alloc_memfree() causes it to return 0, even
if it fails to allocate all doorbell records.  Fix it to return -ENOMEM
when it fails.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-01 13:17:14 -08:00
Adrian Bunk
c9add6ec56 IB/mthca: Make 2 functions static
This patch makes the needlessly global functions mthca_tavor_write_mtt_seg()
and mthca_arbel_write_mtt_seg() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20 20:12:02 -08:00
Roland Dreier
11282b32a4 IB/mthca: Fix allocation of ICM chunks in coherent memory
The change to allow allocating ICM chunks from coherent memory did not
increment the count of sg entries properly, so a chunk that required
more than allocation would not be mapped properly by the HCA.

Fix this by adding the missing increment of chunk->nsg.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:33 -08:00
Dotan Barak
fc89afce34 IB/mthca: Allow the QP state transition RESET->RESET
RESET->RESET is an allowed QP state transition, so mthca should handle
it correctly, by just returning success without involving the firmware.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:32 -08:00
Michael S. Tsirkin
b2875d4c39 IB/mthca: Always fill MTTs from CPU
Speed up memory registration by filling in MTTs directly when the CPU
can write directly to the whole table (all mem-free cards, and to
Tavor mode on 64-bit systems with the patch I posted earlier).  This
reduces the number of FW commands needed to register an MR by at least
a factor of 2 and speeds up memory registration significantly.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
c20e20ab0f IB/mthca: Merge MR and FMR space on 64-bit systems
For Tavor, we currently reserve separate MPT and MTT space for FMRs to
avoid abusing the vmalloc space on 32 bit kernels. No such problem
exists on 64 bit kernels so let's not do it there.

This way we have a shared pool for MR and FMR resources, used on
demand.  This will also make it possible to write MTTs for regular
regions directly from driver.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
391e4dea71 IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
We allocate the MTT table with alloc_pages() and then do pci_map_sg(),
so we must call pci_dma_sync_sg() after the CPU writes to the MTT
table.  This works since the device will never write MTTs on mem-free
HCAs, once we get rid of the use of the WRITE_MTT firmware command.
This change is needed to make that work, and is an improvement for
now, since it gives FMRs a chance at working.

For MPTs, both the device and CPU might write there, so we must
allocate DMA coherent memory for these.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
1d1f19cfce IB/mthca: Give reserved MTTs a separate cache line
MTTs are allocated in non-cache-coherent memory, so we must give
reserved MTTs their own cache line, to prevent both device and
CPU from writing into the same cache line at the same time.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
Michael S. Tsirkin
c7d204e8fd IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
The reserved_mtts field has different meaning in Tavor and Arbel, so
we are wasting mtt entries on memfree. Fix the Arbel case to match
Tavor semantics.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-12 16:16:29 -08:00
David Howells
6bdd61d876 IB/mthca: Work around gcc bug on sparc64
For some reason gcc-3.4.5 on sparc64 does:

 WARNING: "____ilog2_NaN" [drivers/infiniband/hw/mthca/ib_mthca.ko] undefined!

Points to note:

 (1) The asm volatile flush/flushw are just markers for viewing what comes out
     in the assembly; removing them has no effect on the result.

 (2) Changing almost anything else in dwh__mthca_arbel_init_srq_context() or
     dwh__mthca_alloc_srq() causes the problem to go away.

The compiler command line issued by the kernel build is:

/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -fno-strict-aliasing -fno-common -Os -m64 -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wa,--undeclared-regs -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -fasynchronous-unwind-tables -g  -c -o drivers/infiniband/hw/mthca/.tmp_mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c

This can be reduced to this whilst still retaining the problem:

/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -m64 -c -o drivers/infiniband/hw/mthca/mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c -Os

Removing -Os or changing it to -O or -O0 thru -O6 gets rid of the problem.

This patch to the kernel code fixes the problem:

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:49 -08:00
Roland Dreier
99d4f22e91 IB/mthca: Use correct structure size in call to memset()
When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof
*ib_ah_attr instead of sizeof *path.

Pointed out by Jack Morgenstein <jackm@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Michael S. Tsirkin
062dbb69f3 IB: Return qp pointer as part of ib_wc
struct ib_wc currently only includes the local QP number: this matches
the IB spec, but seems mostly useless. The following patch replaces
this with the pointer to qp itself, and updates all low level drivers
and all users.

This has the following advantages:
- Ability to get a per-qp context through wc->qp->qp_context
- Existing drivers already have the qp pointer ready in poll cq, so
  this change actually saves a tiny bit (extra memory read) on data path
  (for ehca it would actually be expensive to find the QP pointer when
  polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
  NULL for ehca)
- Users that need the QP number can still get it through wc->qp->qp_num

Use case:

In IPoIB connected mode code, I have a common CQ shared by multiple
QPs.  To track connection usage, I need a way to get at some per-QP
context upon the completion, and I would like to avoid allocating
context object per work request just to stick a QP pointer into it.
With this code, I can just use wc->qp->qp_context.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:55 -08:00
Dotan Barak
f5e10529a9 IB/mthca: Don't execute QUERY_QP firmware command for QP in RESET state
If a QP being queried is in the RESET state, don't execute the
QUERY_QP firmware command (because it will fail).

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-09 14:14:28 -08:00
Jack Morgenstein
98714cb161 IB/mthca: Fix PRM compliance problem in atomic-send completions
According to the Tavor and Arbel programmer's reference manuals, the
number of bytes transferred is not provided in the byte_cnt field of
the CQ entry for atomic operation completions.  For atomic operations,
the number of bytes transferred is always 8 (when the status is
"success"), and this constant value should always be used by the
driver in the ib_wc entry returned, rather than using the CQE.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:25:24 -08:00
Michael S. Tsirkin
46707e96b7 IB/mthca: Fix off-by-one in FMR handling on memfree
mthca_table_find() will return the wrong address when the table entry
being searched for is exactly at the beginning of a sglist entry
(other than the first), because it uses >= when it should use >.

Example: assume we have 2 entries in scatterlist, 4K each, offset is
4K.  The current code will return first entry + 4K when we really want
the second entry.

In particular this means mapping an FMR on a memfree HCA may end up
writing the page table into the wrong place, leading to memory
corruption and also causing the HCA to use an incorrect address
translation table.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-04 19:46:32 -08:00
Michael S. Tsirkin
9d79f1b467 [PATCH] IB/mthca: Fix FMR breakage caused by kmemdup() conversion
Commit bed8bdfd ("IB: kmemdup() cleanup") introduced one bad conversion to
kmemdup() in mthca_alloc_fmr(), where the structure allocated and the
structure copied are not the same size.  Revert this back to the original
kmalloc()/memcpy() code.

Reported-by: Dotan Barak <dotanb@mellanox.co.il>.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:55:55 -08:00
Roland Dreier
0b0df6f207 IB/mthca: Use DEFINE_MUTEX() instead of mutex_init()
mthca_device_mutex() can be initialized automatically with
DEFINE_MUTEX() rather than explicitly calling mutex_init().  This
saves a bit of text and shrinks the source by a line, so we may as
well do it....

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15 20:55:28 -08:00
Leonid Arsh
82da703ee6 IB/mthca: Add HCA profile module parameters
Add module parameters that enable settting some of the HCA
profile values, such as the number of QPs, CQs, etc.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15 20:46:31 -08:00
David Howells
f0d1b0b30d [PATCH] LOG2: Implement a general integer log2 facility in the kernel
This facility provides three entry points:

	ilog2()		Log base 2 of unsigned long
	ilog2_u32()	Log base 2 of u32
	ilog2_u64()	Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

	int do_something(long q)
	{
		...;
		y = ilog2(x)
		...;
	}

Or can be used to statically initialise global variables with constant values:

	unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant.  They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
Christoph Lameter
54e6ecb239 [PATCH] slab: remove SLAB_ATOMIC
SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Jack Morgenstein
7013696a5f IB/mthca: Fix initial SRQ logsize for mem-free HCAs
When initializing an mthca SRQ, the log_srq_size field should be the
log of the number of SRQ WQEs, not the log of the number of bytes in
the SRQ.

This affects only mthca drivers for memfree HCAs which set the initial
srq wqe counter (in the SW2HW transition) to a non-zero value.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Roland Dreier
f4f3d0f0ec IB/mthca: Fix section mismatches
Commit b3b30f5e ("IB/mthca: Recover from catastrophic errors")
introduced some section mismatch breakage, because the error recovery
code tears down and reinitializes the device, which calls into lots of
code originally marked __devinit and __devexit from regular .text.

Fix this by getting rid of these now-incorrect section markers.

Reported by Randy Dunlap <randy.dunlap@oracle.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Eric Sesterhenn
bed8bdfddd IB: kmemdup() cleanup
Replace open coded kmemdup() to save some screen space, and allow
inlining/not inlining to be triggered by gcc.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Michael S. Tsirkin
68586b67ab IB/mthca: Fix MAD extended header format for MAD_IFC firmware command
Several fields in an incoming MAD extended info header were passed
into the MAD_IFC firmware command at incorrect offsets (mostly off by
4 bytes).  As the result, the HCA will fail to generate traps in which
this info is needed (e.g. traps which include the GRH of the incoming
packet), in violation of the IB spec.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-31 10:51:26 -08:00
Arthur Kepner
1f5c23e2c1 IB/mthca: Use mmiowb after doorbell ring
We discovered a problem when running IPoIB applications on multiple
CPUs on an Altix system. Many messages such as:

ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq)

appear in syslog, and the driver wedges up.

Apparently this is because writes to the doorbells from different CPUs
reach the device out of order. The following patch adds mmiowb() calls
after doorbell rings to ensure the doorbell writes are ordered.

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-16 20:22:35 -07:00
Michael S. Tsirkin
2cbe19d48a IB/mthca: Fix off-by-one in mthca SRQ creation
All HCAs (not just mem-free) need a spare SRQ entry, so bump srq->max
by 1 in all cases.

Noted by Jack Morgenstein <jackm@mellanox.co.il>

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Jack Morgenstein
a8bf4e7717 IB/mthca: Query port fix
Fill in "max_vl_num" (encoded according to VLCap field in the PortInfo MAD)
and "init_type_reply" values in the ib_query_port() verb.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:36 -07:00
David Howells
7d12e780e0 IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.

The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around.  On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).

Where appropriate, an arch may override the generic storage facility and do
something different with the variable.  On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.

Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions.  Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller.  A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.

I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.

This will affect all archs.  Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:

	struct pt_regs *old_regs = set_irq_regs(regs);

And put the old one back at the end:

	set_irq_regs(old_regs);

Don't pass regs through to generic_handle_irq() or __do_IRQ().

In timer_interrupt(), this sort of change will be necessary:

	-	update_process_times(user_mode(regs));
	-	profile_tick(CPU_PROFILING, regs);
	+	update_process_times(user_mode(get_irq_regs()));
	+	profile_tick(CPU_PROFILING);

I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().

Some notes on the interrupt handling in the drivers:

 (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
     the input_dev struct.

 (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
     something different depending on whether it's been supplied with a regs
     pointer or not.

 (*) Various IRQ handler function pointers have been moved to type
     irq_handler_t.

Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
2006-10-05 15:10:12 +01:00
Roland Dreier
d35cc330a2 IB/mthca: Simplify calls to mthca_cq_clean()
If a QP has separate send and receive CQs, then the send CQ will never
have receive completions from that QP in it.  So when cleaning the
send CQ, there's no need to pass in an SRQ pointer, even if the QP is
attached to an SRQ.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:55 -07:00
Jack Morgenstein
b3b30f5e8a IB/mthca: Recover from catastrophic errors
Trigger device remove and then add when a catastrophic error is
detected in hardware.  This, in turn, will cause a device reset, which
we hope will recover from the catastrophic condition.

Since this might interefere with debugging the root cause, add a
module option to suppress this behaviour.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:54 -07:00
Tom Tucker
07ebafbaaa RDMA: iWARP Core Changes.
Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:47 -07:00
Roland Dreier
3cd965646b IB: Whitespace fixes
Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all.  Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:46 -07:00
Jack Morgenstein
9e583b85c2 IB/mthca: Return correct number of bits for static rate in query_qp
Incorrect number of bits was taken for static_rate field.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:42 -07:00
Jack Morgenstein
f6f76725b5 IB/mthca: Return port number for unconnected QPs in query_qp
port_num was not being returned for unconnected QPs.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:41 -07:00
Jack Morgenstein
b046a04e16 IB/mthca: Fix default static rate returned for Tavor in AV
When default static rate is returned for Tavor, need to translate it
to an ib rate value.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:40 -07:00
Ralph Campbell
9bc57e2d19 IB/uverbs: Pass userspace data to modify_srq and modify_qp methods
Pass a struct ib_udata to the low-level driver's ->modify_srq() and
->modify_qp() methods, so that it can get to the device-specific data
passed in by the userspace driver.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:25 -07:00
James Lentini
2a214182d2 IB/mthca: Include the header we really want
Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:20 -07:00
Michael S. Tsirkin
9fd558f454 IB/mthca: Don't use privileged UAR for kernel access
Make kernel use UAR2 instead of UAR1 for hardware access: this adds
sanity checking from the hardware side, without any performance cost.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:18 -07:00
Jack Morgenstein
b27075735e IB/mthca: Fix lid used for sending traps
The SM LID used to send traps to is incorrectly set to port LID.  This
is a regression from 2.6.17 -- after a PortInfo MAD is received, no
traps are sent to the SM LID.  The traps go to the loopback interface
instead, and are dropped there.  The SM LID should be taken from the
sm_lid of the PortInfo response.

The bug was introduced by commit 12bbb2b7be:
	IB/mthca: Add client reregister event generation

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:17 -07:00
Roland Dreier
5a4e6dccbc IB/mthca: Use IRQ safe locks to protect allocation bitmaps
It is supposed to be OK to call mthca_create_ah() and mthca_destroy_ah()
from any context.  However, for mem-full HCAs, these functions use the
mthca_alloc() and mthca_free() bitmap helpers, and those helpers use
non-IRQ-safe spin_lock() internally.  Lockdep correctly warns that
this could lead to a deadlock.  Fix this by changing mthca_alloc() and
mthca_free() to use spin_lock_irqsave().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-31 17:25:56 -07:00
Michael S. Tsirkin
834ac73d4b IB/mthca: Update HCA firmware revisions
Update the driver's list of HCA firmware revisions to make sure people
running Sinai firmware older than 1.1.0 get a message suggesting a
firmware upgrade.  Update the Arbel versions as well while we are at it.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-23 13:29:08 -07:00
Roland Dreier
5beba53230 IB/mthca: No userspace SRQs if HCA doesn't have SRQ support
Leave all SRQ methods out of the device's uverbs_cmd_mask if the
device doesn't have SRQ support (because of ancient firmware) so that
we don't allow userspace to call the driver's create_srq method.  This
fixes a userspace-triggerable oops caused by ib_uverbs_create_srq()
following the device's ->create_srq function pointer, which will be
NULL if the device doesn't support SRQs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-18 10:41:46 -07:00
Roland Dreier
a19aa5c5fd IB/mthca: Fix potential AB-BA deadlock with CQ locks
When destroying a QP, mthca locks both the QP's send CQ and receive
CQ.  However, the following scenario is perfectly valid:

    QP_a: send_cq == CQ_x, recv_cq == CQ_y
    QP_b: send_cq == CQ_y, recv_cq == CQ_x

The old mthca code simply locked send_cq and then recv_cq, which in
this case could lead to an AB-BA deadlock if QP_a and QP_b were
destroyed simultaneously.

We can fix this by changing the locking code to lock the CQ with the
lower CQ number first, which will create a consistent lock ordering.
Also, the second CQ is locked with spin_lock_nested() to tell lockdep
that we know what we're doing with the lock nesting.

This bug was found by lockdep.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-11 08:56:57 -07:00
Michael S. Tsirkin
e54b82d739 IB/mthca: Make fence flag work for send work requests
The fence bit needs to be set in the doorbell too, not just the WQE.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-10 10:50:52 -07:00
Roland Dreier
69e9fbb460 IB/mthca: Clean up mthca array index mask
Define a constant MTHCA_ARRAY_MASK to replace repeated uses of
(PAGE_SIZE / sizeof (void *) - 1) in mthca array code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 09:44:22 -07:00
Michael S. Tsirkin
bf74c7479e IB/mthca: Fix mthca_array_clear() thinko
mthca_array_clear() does not clear the slot if the used count is
positive. This leads to crashes in mthca_qp_event() since that uses
mthca_array_get() to check that the qp is valid.

Discovered by Ali Ayoub.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 09:44:21 -07:00
Roland Dreier
8fdf679fdb IB/mthca: Initialize max_cmds before debug code prints it
Read the max_cmds value from the response to the QUERY_FW command
before printing out the value, so that the real value goes into the
debug output.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:36:50 -07:00
Dotan Barak
1252c517cf IB/mthca: Fix SRQ limit event range check
Mem-free HCAs always keep one spare SRQ WQE, so the SRQ limit cannot
be set beyond srq->max - 1.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 07:20:32 -07:00
Michael S. Tsirkin
0964d91618 [PATCH] IB/mthca: comment fix
After recent changes, mthca_wq_init does not actually initialize the WQ as it
used to - it simply resets all index fields to their initial values.  So,
let's rename it to mthca_wq_reset.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Jack Morgenstein
2290d2c9f5 [PATCH] IB/mthca: fix static rate returned by mthca_ah_query
mthca_ah_query returs the static rate of the address handle in internal mthc
format.  fix it to use rate encoding from enum ib_rate, which is what users
expect.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Zach Brown
a46f9484f8 [PATCH] mthca: initialize send and receive queue locks separately
mthca: initialize send and receive queue locks separately

lockdep identifies a lock by the call site of its initialization.  By
initializing the send and receive queue locks in mthca_wq_init() we confuse
lockdep.  It warns that that the ordered acquiry of both locks in
mthca_modify_qp() is recursive acquiry of one lock:

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  modprobe/1192 is trying to acquire lock:
   (&wq->lock){....}, at: [<f892b4db>] mthca_modify_qp+0x60/0xa7b [ib_mthca]
  but task is already holding lock:
   (&wq->lock){....}, at: [<f892b4ce>] mthca_modify_qp+0x53/0xa7b [ib_mthca]

Initializing the locks separately in mthca_alloc_qp_common() stops the
warning and will let lockdep enforce proper ordering on paths that acquire
both locks.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-04 10:24:57 -07:00
Thomas Gleixner
dace145374 [PATCH] irq-flags: misc drivers: Use the new IRQF_ constants
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 13:58:50 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Greg Kroah-Hartman
e29419fffc [PATCH] 64bit resource: fix up printks for resources in misc drivers
This is needed if we wish to change the size of the resource structures.

Based on an original patch from Vivek Goyal <vgoyal@in.ibm.com>

Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-27 09:23:59 -07:00
Roland Dreier
c93b6fbaa9 IB/mthca: Make all device methods truly reentrant
Documentation/infiniband/core_locking.txt says:

  All of the methods in struct ib_device exported by a low-level
  driver must be fully reentrant.  The low-level driver is required to
  perform all synchronization necessary to maintain consistency, even
  if multiple function calls using the same object are run
  simultaneously.

However, mthca's modify_qp, modify_srq and resize_cq methods are
currently not reentrant.  Add a mutex to the QP, SRQ and CQ structures
so that these calls can be properly serialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:41 -07:00
Roland Dreier
c9c5d9feef IB/mthca: Fix memory leak on modify_qp error paths
Some error paths after the mthca_alloc_mailbox() call in mthca_modify_qp()
just do a "return -EINVAL" without freeing the mailbox.  Convert these
returns to "goto out" to avoid leaking the mailbox storage.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:41 -07:00
Or Gerlitz
d4cb0784fd IB/mthca: Fill in max_map_per_fmr device attribute
Report the true max_map_per_fmr value from mthca_query_device(),
taking into account the change in FMR remapping introduced by the
Sinai performance optimization.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Leonid Arsh
12bbb2b7be IB/mthca: Add client reregister event generation
Change the mthca snoop of MADs that set PortInfo to check if the SM
has set the client reregister bit, and if it has, generate a client
reregister event.  If the bit is not set, just generate a LID change
event as usual.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Roland Dreier
e9cd59418f IB/mthca: Convert FW commands to use wait_for_completion_timeout()
The kernel has had wait_for_completion_timeout() for a long time now.
mthca should use it to handle FW commands timing out, instead of
implementing the same thing in a much more complicated way by using
wait_for_completion() along with a timer that does complete().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Michael S. Tsirkin
a26026c122 IB/mthca: Remove dead code
Kill some dead code in mthca_eq.c

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Michael S. Tsirkin
4e56ea794e IB/mthca: memfree completion with error FW bug workaround
Memfree firmware is in rare cases reporting WQE index == base - 1 in
receive completion with error, instead of (rq size - 1); base is 0 in
mthca.  Here is a patch to avoid kernel crash and report a correct WR
id in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:20 -07:00
Michael S. Tsirkin
13aa6ecb47 IB/mthca: restore missing PCI registers after reset
mthca does not restore the following PCI-X/PCI Express registers after reset:
  PCI-X device: PCI-X command register
  PCI-X bridge: upstream and downstream split transaction registers
  PCI Express : PCI Express device control and link control registers

This causes instability and/or bad performance on systems where one of
these registers is set to a non-default value by BIOS.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:00 -07:00
Michael S. Tsirkin
ab28b171ea IB/mthca: Fix posting lists of 256 receive requests to SRQ for Tavor
If we post a list of length exactly a multiple of 256, nreq in
doorbell gets set to 256 which is wrong: it should be encoded by 0.
This is because we only zero it out on the next WR, which may not be
there.  The solution is to ring the doorbell after posting a WQE, not
before posting the next one.

This is the same bug that we just fixed for QPs with non-shared RQ.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-24 13:43:37 -07:00
Michael S. Tsirkin
23f3bc0f2c IB/mthca: Fix posting lists of 256 receive requests for Tavor
If we post a list of length 256 exactly, nreq in doorbell gets set to
256 which is wrong: it should be encoded by 0.  This is because we
only zero it out on the next WR, which may not be there.  The solution
is to ring the doorbell after posting a WQE, not before posting the
next one.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-18 11:37:03 -07:00
Roland Dreier
1db76c14d2 IB/mthca: Make fw_cmd_doorbell default to 0
Setting fw_cmd_doorbell allows FW command to be queued using posted
writes instead of requiring polling on a "go" bit, so it should be a
performance boost.  However, the option causes problems with at least
some device/firmware combinations, so set the default to 0 until we
understand what's going on better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 07:48:07 -07:00
Michael S. Tsirkin
ce477ae4f8 IB/mthca: FMR ioremap fix
Addresses for ioremap must be calculated off of pci_resource_start;
we can't directly use the bus address as seen by the HCA.  Fix the
code that remaps device memory for FMR access.

Based on patch by Klaus Smolin.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-10 15:16:57 -07:00
Roland Dreier
a3285aa4ee IB/mthca: Fix race in reference counting
Fix races in in destroying various objects.  If a destroy routine
waits for an object to become free by doing

	wait_event(&obj->wait, !atomic_read(&obj->refcount));
	/* now clean up and destroy the object */

and another place drops a reference to the object by doing

	if (atomic_dec_and_test(&obj->refcount))
		wake_up(&obj->wait);

then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up().  And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.

Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock.  This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.

While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it.  So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:29 -07:00
Roland Dreier
254abfd33a IB/mthca: Fix offset in query_gid method
GuidInfo records have 8 byte GUIDs in them, so an index should be
multiplied by 8 to get an offset.  mthca_query_gid() was incorrectly
multiplying by 16.

Noticed by Leonid Keller <leonid@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 10:40:23 -07:00
Adrian Bunk
415dcd95b2 IB/mthca: make a function static
This patch makes the needlessly global mthca_update_rate() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00