views:

162

answers:

6

Disclaimer: I am (mostly) hardware ignorant. This is probably my problem. However I find it hard to accept that it is not possible to debug hardware so therefore I just wanted to get some second opinions.

We have an issue. Where certain actions (swapping Usb devices in and out at run-time) can blow either the Usb hub or chip on our Usb board (it's custom hardware). It's a fuzzy problem (it appears that the degree of "blownness" can vary a bit) and the problem manifests itself in intermittent fashions with various symptoms that are very difficult to reliably reproduce (typically random corruption of packets).

This results in difficulty in ascertaining if a newly reported problem is due to this hardware fault or is actually a bug in the software. We have since implemented protection on these devices but if an unprotected device is used with a protected device it has a possibility of then tainting the (now protected) device. One of the ports is also not protected meaning that someone could still "kill off" a unit that should be safe by accidentally using the wrong port.

The upshot of this is that it is impossible to tell which of our devices suffer this issue without completely replacing ALL the hardware (we've bitten the bullet for most of our production hardware but there is still a lot of dev and QA hardware out there with this issue).

I would imagine that it could be possible, given a piece of hardware that one could use some kind of hardware diagnostics tools to determine whether the kit is faulty or not. Am I living in a dream world? My hardware department tell me that the only tests that can prove the fault would be software tests... but as I have stated the symptoms are very difficult to reproduce. As I'm not that experienced with hardware I don't know if this is the only answer or not. I therefore ask the world.

+3  A: 

No it's not.

A lot of hardware manufacturers begin with hardware testing. Inputs and outputs (IO) is just a matter of evaluating where circuit flow is going. Consider the abstraction that both software and hardware deal in boolean operations.

Hardware is just a little less human readable!

Generic Prodigy
+1  A: 

When it comes down to it, hardware's line of communication is (at its most basic) HIGH and LOW through various pins.

I have a brother (in the automobile tech industry) who has used and electrometer to measure voltage on pins to isolate where the problem is (I'm not really smart enough in that field to go into more detail on how he does it).

Matt
+1  A: 

Your problem is that the only known symptom is so hard to detect (packet corruption in USB stream), that you're going to need software (at some level) to detect it.

If you can work out why packets are getting corrupted (bad voltages?) then maybe you could detect that with hardware?

Otherwise you need some kind of robust testing kit, and software to send/receive lots of packets to look for corruption?

Douglas Leeder
Yeah this is currently my only option. Take units, soak em for about 10 days on a test script and see if they fail. It is quite a consuming process due to stretched resources (as the test rooms are rarely free). I could automate it but that would take further time. This is basically plan B and i'm hoping to avoid it.
Quibblesome
+1  A: 

No. That's what oscilloscopes and logic analyzers are for. Also there is more specialized equipment such as USB testers.

Brian Carlton
+6  A: 

Built In Test Equipment is used for performing a Built In Test

BITE for BIT

(No bytes involved.)

It is completely, utterly normal for military/aerospace equipment to have extra hardware to test itself with.

The original IBM PC hard a surprising quantity of test hardware built in.

In the case of your equipment, a test device and some statistical analysis would do the trick. This could be done in hardware in a dongle, but frankly would be easier to with some software. Use two back-to-back USB to RS232 serial converters to make a USB loopback device. Send lots of data , checksum packets and measure error rates.

I'm assuming your errors occur on the in->out as well as the out-<in side.

Really, your hardware guys need to look at some application notes; USB IS hotplug-safe IF done according to the book. There is a cool example out on the net of opto-coupling a USB chip's connection to the board it's onto prevent this sort of thing. The USB chip is connected to the host, powered from the host, and the interface to the USB chip is SPI, which is opto-coupled back to the rest of the board.

As for you, the chips are failing partially. Injured devices may work fine for months then die. An electro-static discharge ("a static zap") can do the same thing that you describe. A device can be injured by shocks too small for you to feel.

The wires and features in semiconductors are microscopic, and easily damaged by stray electricity. If the hardware design is mostly right probably the liekly cause of the problems you've been experiancing is ESD when the devices are handled to plug/unplug. Your devie has it's own power supply and it's ground voltage floats relative to the other end of the USB cable, until it is connected.

Hope this helps.

Tim Williscroft
+1  A: 

The simpler the hardware is, and the more access you have to the signals, the more likely you are to be able to diagnose it in a 'purely hardware' kind of way. For example if you had a simple parallel port card plugged into a PCI slot, it would be relatively straightforward to put a bus analyzer on the PCI bus, and the adapter's output, and see if the outputs did the right thing when the card was addressed. But note you'd still need to attempt to access that card from the PCI bus, which would mean either (A) some kind of PCI bus simulation, which would be one heck of a big pile of test hardware, or (B) a cheap off-the-shelf PC with a few lines of test code.

But then at the other end of the spectrum, suppose you're dealing with a large FPGA. You can get one heck of a lot of logic into an FPGA, and you won't necessarily have access to all the test points you'd like. I've personally encountered a bug with a serial port embedded in an FPGA, where a race condition with the shift register preload register would occasionally corrupt a byte. Hypothetically the VHDL could have been reworked to bring out test points, and a pile of scopes and analyzers gathered, but from a management standpoint it was much more cost effective to try to tease the problem out with software. Under normal usage, the bug in question would have turned up once every blue moon. We iterated through speculation about the conditions that would elicit the bug, and refining the test code, until we had test software that could reproduce the bug 2-3 times a minute. At that point we could actually provide clues to the VHDL guys that helped them fix the problem quickly.

Long story short, inside of a week a hardware bug was smoked out via software, whereas starting with the same information and going 'hardware only' would likely have not been any faster, and would have required a lot of expensive test equipment. So, yeah, you probably can do it without software, but as usual it's a trade-off, and you have to find the right balance point between the amount of software vs hardware for the job.

JustJeff