Sunday 1 February 2015

Enhanced Electronic System Reliability

ABSTRACT

Using telecommunication as an example, it is argued that the electronics industry badly needs a change in attitude towards reliability thinking. The role of thermal design and reliability qualification is discussed in context to current industrial needs for short design cycles and rapid implementation of new technologies. Current and future practices are discussed in the context of newly-emerging reliability standards. Finally, two multi-company projects targeting to improvement of reliability through better temperature-related information are described.

INTRODUCTION

To introduce the reliability requirements we face for the future, we have chosen to focus on telecommunication as an example. Personal telecommunication is becoming increasingly integrated into our daily activities. Broadband mobile networks promise high-speed web access from anywhere in the world. Ubiquitous computing brings us connection to local networks and in the future will connect various sensor systems in our homes.  However, full advantage of such technology can not be realised unless the telecommunication system is as dependable as a car. Just as turning the ignition key should produce the right engine response first time, every time, so should the ‘connect’ button on a mobile phone. If the connection is lost several times a day, the system won’t be fun to use, and users won’t be convinced that financial transactions are being processed securely. In future, Personal Trusted Devices (PTD) combining all the functions of a phone, organiser, secure web browser for shopping and personal finance, electronic cash, credit card, ID card, driver’s license, and keys to car, home, and work place will be technically possible. To gain widespread acceptance, how dependable does such a device and the infrastructure that supports it need to be?




Telecommunication system integrators now have to push the limits of the currently known technologies to create new products. To date the computer industry has been the first sector to utilise new technologies, but increasingly the telecommunication industry is taking the lead despite the distinction becoming blurred. Heavy competition with short design cycles forces the use of technologies before there is adequate experience of their field reliability. Increasing component power consumption and higher data clock frequencies of digital circuits force the design into smaller tolerances, and drive the demand for methods to predict technology and system reliability through simulation, augmented by accelerated laboratory tests. Even in consumer electronics where especially audio and video products have enjoyed relatively large design margins with respect to reliability and performance, products are being designed closer to their limits, forced by device miniaturisation and reduction in system volume. Reliability prediction has to be linked to overall risk management, providing estimates of how big a reliability risk is taken when a new technology, without previous field experience, is used. From this, system integrators can compare the monetary benefits from increased sales and market share against the possible warranty and maintenance costs.

Often the use of new technologies forms a substantial contribution to the overall risk involved in the product creation process. The use of chip scale packages and flip chips in hand held devices such as mobile phones and palm top computers is driven by the demand for miniaturisation. This drives the industry infrastructure changes needed to produce and assemble boards capable of supporting such high density interconnect technologies in high volume and at low cost. Where there is no field experience in the use of a new technology and short design cycles prevent extensive testing, PoF reliability prediction gives estimations of the reliability.

Currently emerging virtual prototyping and qualification tools simulate the effect of mechanical and thermomechanical stresses on reliability. For a subsystem or a single part the reliability in a specified environment can be predicted rather accurately. However, the prediction accuracy decreases when the whole system is considered, or when the geometry, material properties, use profile or operating environment are not properly known. More work needed in simulation tool development, tool integration, model improvements, and in capturing data on the reliability loads the system will encounter [1]. This article discusses the ‘knowledge gap’ between temperature/stress analysis and system lifetime assessment.

The technical aspects are mainly based on work performed by CALCE Electronic Products and Systems Center at the University of Maryland [e.g. 2-4]. This article discusses how these principles can be applied in practice.

1. AIR COOLER IS BETTER?

The effects of temperature on electronic device failure have been mainly obtained through accelerated testing, during which the temperature, and in some cases power, are substantially increased to make the test duration manageable. This data is then correlated with actual field failures. MIL-HDBK-217, “Reliability Prediction of Electronic Equipment”, which contains failure rate models for different electronic components is based on this correlated data. The total reliability calculations are then performed either by ‘parts count’ or ‘part stress’ analysis. It has been updated many times with the last version, MIL-HDBK-217F Notice 2, was published in 1995. Although now defunct, its basic methodology is the foundation for many in-house reliability programs still in use and has been adapted by Bellcore for telecommunications applications.

The basis of the handbook is the assumption that many of the chip level failure mechanisms that occur under accelerated test conditions are diffusion-dominated physical or chemical processes, represented by an Arrhenius-like exponential equation. This relationship is then used to predict failures under operational conditions. Doing so assumes that failure mechanisms active under test conditions are also active during operation, and that the above relationship holds at lower temperatures, giving a direct relationship between steady-state temperature and reliability. This is substantially incorrect, since some failure mechanisms have a temperature threshold below which the mechanism is not active, whilst others are suppressed at elevated temperatures. In the temperature range –55°C to +150°C, most of the reported failure mechanisms are not due to high steady-state temperature. They either depend on temperature gradients, temperature cycle magnitude, or rate of change of temperature [5]. Considerable care is needed to ensure that the test conditions accelerate the principal failure mechanisms expected to be present during use, without suppressing or introducing others to the point where the results of the test are invalid, and to ensure that material property limits are not exceeded.

Straightforward application of the Arrhenius model has led to widespread misconceptions that are then followed blindly. An example is the ‘10°C rule’, being that the life of a component doubles, for every 10°C the steady-state temperature is dropped. Although this holds for some failure mechanisms, in reality the life of the part is more likely to depend on the number of power on/power off cycles it experiences. Even when the exact failure mechanism is known, the use of the Arrhenius model contains uncertainties because it is very sensitive to the value of the activation energy used in the exponential term. The range of activation energy for the same failure mechanism can vary by more than a factor of 2, depending on the part design, materials and fabrication processes. Due to the exponential relationship, the predicted Mean-Time-To-Failure (MTTF) can vary by a factor of 20.
An additional fundamental difficulty in using MIL-HDBK-217 type models for new and emerging technologies and components is the lack of a wide, environmentally relevant database of test data and experience of field failures. Therefore the MTTF calculations would be based on many assumptions, the validity of which is not known.

In 1993 the US Army Material Systems Acquisition Activity and CALCE began working with the IEEE Reliability Society to develop an IEEE Reliability Prediction Standard for commercial and military use. The standard is based on “Physics-of-Failure” (PoF) approaches to reliability and life cycle prediction [2]. Space restrictions prevent a lengthy description, but Figure 1 shows how the environmental stress on a system and its operational performance are combined to provide lifetime information using data acquired from a spectrum of stakeholders.

You can read more about air cooler here.


No comments:

Post a Comment