For Maximum System Test Value, Take it to the Limit!

For Maximum System Test Value, Take it to the Limit!

When we go to an automobile race such as the Indianapolis 500, watching those cars circle the track can get fairly boring. What is secretly unspoken is that everyone observing the race is watching for a race car to find and sometimes exceed a limit, finding a sudden discontinuity. The limit could be a part of the engine reaching its fatigue limit or the speed of the car when it enters a curve before the forces acting on the car exceeds the tires coefficient of friction and may spin or slide.

Of course no one wants to see a driver injured or killed at these events, but the race team learns the most about the capability of an engine, or tires, or transmission when there is a failure at the extremes stresses during a competition. 

It is also true that some of the most useful information to develop a reliable electronics and electromechanical products can be found by finding strength limits in Highly Accelerated Life Testing (HALT) of electronics hardware.

HALT can be a relatively boring process until an empirical operational limit, a functional discontinuity, is discovered. Fortunately, test engineers are not injured or killed discovering empirical stress limits by using HALT for electronics systems. In the case of electronics systems, only the samples tested are at risk of operational failure and it is significant enough.

It is the empirical, not theoretical or specified operational limits, where weaknesses that risk reliability are to be found. Using stress limits to precipitate a latent weaknesses to a patent or observable failure mode allows determination of the relevance of the failure mode to field failures.

Additionally, observing wide differences in operational limits between samples of the same product provides evidence of some design weakness or inconsistent manufacturing processes is resulting in a significant lot-to-lot the strength distribution. Without finding limits, you have  no data to compare strength capabilities. 

Discovery of variable empirical limits of multiple samples can be a good discriminator for the quality of component and assembly process consistency and control. Wide deviation of operational limits between identical system samples is a good indicator of uncontrolled, possibly unknown, process variation that if wide enough will lead to failures in the intended use environment. Even if the numbers of units compared is not sufficient to be statistically significant, wide differences in limits are good qualitative indicators for reliability risks.

Stress testing to arbitrary "good enough" stress levels, though it may be well beyond the end use specifications, provides only very limited data on the product’s strength capability. Testing to only those “margins above spec”, if not close to the empirical stress limit, is just like watching the Indianapolis 500  with a 120 mph speed limit.

For sure there is a probability that a race car in this speed limited type of race could have a failure, and some cars could have latent defect failures and “lose” the race. Still failure would likely be rare and most of the vehicles would be tie for the win and there would be little differentiating information would be available for improving handling, durability or reliability over the competing cars.

In the reliability development of a new product we are somewhat like a person in an unfamiliar dark room. We really don’t know how big the room is until we bump into a wall, and actually several walls, to define the available usable space in the room.

In electronics testing, until we find the actual empirical limits of stress, we do not know what the actual “stress” space is that can be used to find marginal functional or material issues. In subsequent reliability testing we can use that known and observed strength to find the one or two weaknesses in the product that can be improved. Knowing the empirical strength allows us to fully capitalize on that strength to find the exceptions, outliers and variations in potential reliability much faster.

Why not find the boundaries with prototypes of an electronics system before market release? Why do so many engineers resist testing electronics systems to empirical stress limits?

Here are just some of the common reasons given not to test to limits:

  1. Product failures above specified component stress specifications are “foolish failures”
  2. Products in the field will never be subjected to those stress level
  3. The product is too expensive to destroy the samples

In response:

  1.  All components have margins above specification and functional margins are very dependent on its application in the design, not individual component specifications. Why assume any failure is foolish before finding it. Not testing to the operational strength of the actual product is leaving what could be valuable data (and ultimately money) on the table.
  2. The product may not see the instantaneous stress levels used in the tests, but the cumulative fatigue damage of lower field stresses have a high probability of failing the same weakness in the design that is found at the system's HALT operation or destruct limits.
  3. The bottom line it is what is the costs and risks of new product failing to the company and its customers?

Depending on the product, finding out in the design weaknesses in a test lab is almost always less costs than the warranty costs and the potential lost market share when many customers have failures early on. The speed of knowledge about a product's reliability has become exponential with today's social media and online reviews.

There is a risk of catastrophic damage when doing HALT and limit tests yet destruction in not always necessary. In digital systems, for instance, it is very difficult to destroy systems below thermal empirical operating limits due to the parametric shifts causing failures in signal integrity. Vibration on the other hand if applied long enough may eventually cause a destruct failure, where the operational limit is also a destruct limit. Many times unit can be repaired and re-used for HALT or additional testing.

Using HASS (Highly Accelerated Stress Screens) to find and remove causes of latent manufacturing defects must be based on product's cost and the costs of failure. As any inspection process in manufacturing, it should not be depended on "scrap the toast after its burned". It should be used to quickly mature the manufacturing processes to eliminate causes of latent defects.

Just like the title of a song by the rock group the Eagles, we should in testing “Take it to the Limit” to fully benefit from each sample of electronics systems we test. You will find it takes fewer units, less time and money to find the few elements in a design that really could impact field reliability.

 

 

Kirk Gray has over twenty-five years experience teaching, developing, writing and directing HALT and other reliability tests for rapid reliability development. If you like this post, you can find more about HALT and HASS empirical limit testing and contact information at Accelerated Reliability Solutions L.L.C.

Oleg Ivanov

Фриланс222 followers

10y

The second problem is how we can accelerate test for many components uniformly? The amount of the accelerating factors is limited and they have a different influence to components. We need to calculate all this.

Like
Reply
Oleg Ivanov

Фриланс222 followers

10y

It is a good method for mechanical products too. There is a risk to wake the failures which cannot occur in the operation. The Car Race remains the same operation. The change of the destruction mechanism will not add a new information, but will extend a development and will increase the product weight and cost. We need to calculate all this.

Like
Reply

To view or add a comment, sign in

More articles by Kirk Gray

Others also viewed

Explore content categories