Battery Reliability and How to Improve it
Bombshells and the Meaning of Life
We have all learned from photographs of laptop fires as well as aircraft fires that Lithium ion energy cells can be potential incendiary devices. They are based on highly reactive chemistry which under certain circumstances could result in thermal runaway and serious consequential physical damage. This can be particularly dangerous with large format traction batteries. Catastrophic failures, which are not caused by abuse, are mercifully extremely rare and tend to be random events such as contamination of the active chemicals. (See Failure Modes below.) They can be characterised by a very low but fairly constant failure rate over the whole of the battery population.
Since they were first introduced, much work has been done to improve the safety of Lithium batteries, both by the adoption of safer cell chemistries and better control of the cell manufacturing process as well as by external cell protection electronics incorporated into the battery packs. But, though the safety of the cells may have been improved and the failure rates reduced, even if they never catch fire, Lithium ion battery packs can still be potential financial time bombs with fuses of indeterminate length. This is because battery failures are also defined in terms of the lifetime of the battery which is notoriously difficult to predict. (See Graph and Definition of Lifetime). The inability of a battery to fully meet its specified performance is considered a failure even if the battery is still functioning albeit with slightly decreased performance. This performance degradation is known as "wearout" and is not due to rare random events, but to the constant, gradual deterioration of the active chemicals to which all batteries are subject. The actual failure rate is not constant but varies with time depending on the nature of the chemicals used in the cells and on the operating conditions experienced by the battery over its lifetime.
Lithium Fire (UPS Flight 1307)
Source US National Transportation Safety Board
Because traction batteries may cost as much as the vehicle in which they are used, customers expect them to last for the lifetime of the vehicle. This is typically eight to ten years and unfortunately it is longer than the time high power Lithium technology itself has been around. To meet customer expectations manufacturers are thus compelled to make predictions about battery performance for periods exceeding their experience. Without concrete and reliable data about how the deterioration of the chemical contents of the cell affect cell performance, predictions about future performance and battery lifetime are likely to be inaccurate and since performance and lifetime are both affected by the operating environment and usage pattern, these inaccuracies will tend to increase the possible margin of error.
Basing performance warranties on inaccurate lifetime assumptions can give rise to ruinous warranty liabilities.
Unexpectedly high product return rates or catastrophic failures during the lifetime of the battery can destroy both the pack maker’s and the customer’s reputation and business.
Manufacturers’ liabilities in many countries are defined by statute but in practice customers demand much better protection than the minimal statutory responsibilities provide and performance guarantees are negotiated between the parties concerned. In the case of batteries, there are normally two steps in the supply chain so that two contracts are involved, one between the cell maker and the pack maker and the other between the pack maker and the end user. Fundamental to both of these contracts is an agreement on the expected life of the product. Despite the importance of this issue, battery lifetimes are not well defined and are thus subject to confusion and uncertainty leading to misinformation, wishful thinking and possibly disaster. The following questions should help to clarify matters.
The Meaning of Life
What is life?
Is it calendar life or cycle life? Or is it the "characteristic life" as defined for Weibull lifetime probability predictions; that is "the time by which 62.3% of the units fail" (see later). Ignoring the consequences of this latter definition could result in huge warranty costs for the supplier.
When was the birthday?
Was it the day of cell formation? Or was it when the battery was first placed into service?
What is death?
Does a fatality mean sudden death in which case the unit ceases to function or does it mean wearout when the performance of functioning units is merely impaired and now outside of agreed tolerance limits?
Death and Disability?
How does the death of associated components such as those in the Battery Management System (BMS) affect the performance of the battery?
Published lifetime specifications normally show performance at nominal temperature and usage rates. What lifetime can be expected over the full range of operating conditions (temperature and rate)? If abuse is excluded, how is it defined?
Does the battery have an easy life (shallow cycles, intermittent use, comfortable conditions) or is it subject to hard labour, (deep cycles, high rates, continuous use, harsh environments)? Are there any usage restrictions?
Whole life or part life?
What is guaranteed, the cells or the battery?, no impairment or no death?
What is regarded as an acceptable death rate before the specified lifetime is reached (1%, 10%)? What is the standard deviation of the times to failure?
How is ageing defined (capacity drop off, impedance growth)? How many samples were used to determine the published ageing characteristic? Are the test samples truly representative of the entire population? How many failures occurred during the cycle life testing? Are ageing curves available for batteries operating at extremes of temperature range?
Hope and experience?
Is life expectancy greater than period for which evidence available? (Were life tests carried out for the full duration of the expected lifetime of the product or are lifetimes based on extrapolating test data collected over a shorter period?)
Is there life after death? Second use? Scrapped? Recycled? New low power applications?
See also Battery Life (and Death)
Problems in Estimating Battery Lifetimes
Determining battery lifetimes is beset with difficulties. Performance data are not generally available and costly to generate since large numbers of batteries must be tested to destruction. Furthermore, the required test period to verify the predictions is often greater than commercial decision lead time. Charge – discharge times for high capacity batteries are very long and using accelerated life testing to determine battery lifetime is most likely to lead to misleading results since battery life depends on temperature, rate and depth of discharge and the test conditions used to accelerate the occurrence of failures are quite likely to introduce new and unrepresentative failure modes.
Warning: While the reliability predictions in the following section are very useful, they are designed for applications involving constant controlled environments, operating conditions and loads. Many batteries however, particularly those used in automotive applications, operate under a wide variety of operating conditions with wide variations in the loads they must supply. Predicting the lifetime of these batteries is much more complicated and is treated in the section Automotive Battery Lifetime Predictions below.
The most commonly quoted measure of cell or battery life is the cycle life. It is usually defined as the number of cycles completed before the current capacity falls to less than 80% of the capacity when new. Alternatively, for high power batteries, the useful cycle life sometimes defined as the number of cycles completed before the internal impedance increases to double the value it was when new. It is important that the test conditions specify the depth of discharge for the test cycles since battery cycle life increases exponentially as the depth of discharge is reduced. See graph of Life vs DOD. Specification sheets typically show a series of cell capacity measurements over time indicating a fairly linear reduction of capacity with age or cycles completed. The lifetime is specified as the time at which the capacity line crosses the 80% mark, (10 years in the example below).
Since cell ageing is reasonably linear during the measurement period, it is tempting to extrapolate long term performance from short term results to reach early conclusions avoiding prolonged or impractical test programmes. The linear performance degradation with age implies that wearout failures are due to a single ageing mechanism and unless the tests are continued till all the test samples have failed, there is no guarantee that a second failure mode will not come into play at a later time, accelerating the ageing rate. Furthermore, what the published graphs of capacity against time generally do not show is the dispersion of the results, neither do they show the dependence on temperature and rate.
The diagram above also shows that basing performance warranties on a single line ageing characteristic could be very dangerous since by the target or quoted lifetime, over half of the cells will have failed.
Similar arguments apply when the specified lifetime is based on the growth of the cell's internal impedance rather than its capacity reduction.
Reasonably accurate failure predictions can be made for light bulbs and capacitors which are subject to a subject to simple, consistent failure modes and for which copious failure data are available. Unfortunately batteries are not simple devices and several failure mechanisms may exist simultaneously.
Fatalities may be due to short circuits resulting from contaminated materials, mechanical tolerance problems, burrs, dendrites, and Lithium plating. They may also be due to open circuits caused by broken welds, loose connections, or cracks. External faults such as BMS failures can also cause failures in the cells they are supposed to protect.
Latent defects in the components used in the construction of the battery or manufacturing and workmanship defects may cause early failures or "wear in" faults, more commonly called "infant mortalities". Other latent defects such as the contamination of the active materials may cause a series of random failures which could result in the sudden death of the battery rather than gradual wearout. These failures are more difficult to characterise and tend to be random in nature and thankfully occur with a very low frequency. Of particular concern is the occurrence of internal short circuits which could result in fires.
Wearout failures are due to the gradual deterioration of the cell chemicals which may be caused by the breakdown or loss of active chemicals causing a reduction in cell capacity. These failures may in turn result in a fatal condition, such as a short circuit, or they may simply cause out of tolerance performance of the cell. Wearout failures may be initiated or accelerated by the usage pattern to which the battery is subject.
The diagram below shows cell failure distributions due to a variety of failure modes. More details on failure modes can be found in the sections on Why Batteries Fail and Lithium Battery Failures.
The curves represent histograms showing the number of units failing at different lifetimes.
Wearout failures may occur over a short period or be spread over longer time or they may come into play after different periods as indicated by the three wearout distributions shown in blue in the above diagram.
Examples of wearout failures are dendrite growth, Lithium plating of the anode, loss of electrolyte due to chemical breakdown or leaks, electrolyte dry-out, dissolving of cathode material, moisture ingress due to vent failure or case seal failure, or cracks in the active materials or the cell case. Each of these wearout failures has its own characteristic distribution.
The failure distribution for the cell is the sum of the distributions of all of the contributing factors as shown in the diagram below. The top curve shows the variation of the instantaneous failure rate over time due to the combination of all the active failure mechanisms. The result is the characteristic “bathtub curve” which is typical for electronic components. The lower curve is the cumulative failure distribution corresponding to the instantaneous failure rates. The diagram also indicates possible lifetime specification which the manufacturer may choose to apply.
Components designated for high reliability applications are often subject to “burn in” to weed out the infant mortalities. In cell manufacturing, all cells must go through one or more charge–discharge cycles as part of the formation process and this can serve the dual purpose of identifying early failures.
See more on Ageing in the section on Battery Life
To predict the cumulative number of components surviving or failing within a large population based on the failure rate of a representative sample it is useful to have a mathematical expression to calculate the probability of failures at any given time.
The following graphs show typical wearout and random reliability behaviours represented by time varying functions F(t), R(t), f(t) and h(t).
- Example 1
- An 80 cell battery built from cells with an 8 year specified life
- Failure of 1 cell will cause the battery to fail
- Typical usage 300 cycles per year with 8 hours per cycle = 2,400 hours per year
Thus the expected cell life time is approximately 20,000 hours and the failure rate per cell will be 1 in 20,000 hours
BUT - A typical 80 cell battery will NOT have 80 failures spread evenly over 20,000 hours, equivalent to 1 failure every 240 hours (or 1 per month)
This would only occur if the failure rates were constant (typically random failures)
The 80 failures will only start to occur somewhere towards the end of the 20,000 hours when the onset of wearout failures begins to take effect and the battery will be relatively failure free until then.
The battery lifetime also depends on the definition of a failure of the battery and on its cell configuration (Series or parallel cells)
- A single cell with less than 80% capacity (failed) in a series chain will not necessarily cause the capacity of the chain to be less than 80%.
- A battery with parallel chains could keep functioning at a lower capacity if a cell in one of the parallel chains failed.
Weibull Life Distribution Model
One problem with predicting battery lifetimes is that several failure modes exist, each with its own characteristic shape and lifetime, and this requires a different expression for each failure type. Here Dr. Waloddi Weibull the Danish born engineer came to the rescue. In 1939 he suggested a simple mathematical distribution, now named after him, which could represent a wide range of failure characteristics simply by changing two parameters or constants which are fairly easy to determine.
The Weibull failure distribution does not apply to every failure mechanism but it is useful tool for analysing many of the most common reliability problems.
Cumulative Failure Distribution
For statistically independent failures of a given type the Weibull distribution is given by
F(t) = 1 – exp[-(t/α)β]
Where F(t) is the cumulative percent failing after time t
α is the Characteristic life of the components (Also known as the Scale factor)
β is the Shape parameter describing the failure distribution curve
The parameters α and β are determined graphically from measured data gathered from life tests on a relatively small number of samples (see below). The expression is simply a mathematical model representing the shape of the distribution and does not imply any cause and effect.
The characteristic life is defined as the time when the cumulative failure percent of the population reaches 63.2%. It is given by making t = α in the above equation. Thus when t = α the cumulative failure percent is given by
F(t) = 1 - e -1 = 63.2% regardless of the value of β
If more than one failure mechanism exists within the population, each with different characteristics, the appropriate α and β corresponding to each failure mode must be applied separately to obtain the total failure percent. Using the shape and scale parameters developed for similar products is not justified and is likely to lead to erroneous results.
More complex Weibull distributions have been developed with more variables to allow for other factors such as γ (gamma), known as the Location factor, which represents the time delay before the effect of the failures becomes manifest.
More generally the Weibull distribution is given by:
F(t) = 1 – exp[-((t-γ)/α)β]
Replacing the time duration t, in the two variable Weibull equations with (t-γ) effectively moves the Weibull lifetime distribution curve to a new location γ periods the right. Unfortunately, unlike the practical determination of α and β, the determination of a reliable value for γ is much more difficult. Fortunately the location factor γ is zero for most cases since failures may start occurring immediately at time zero so that the two variable distribution is usually sufficient for analysing most common problems.
Probability Density Function of Component Lifetimes
The distribution of component lifetimes within the total population is given by the time derivative of the cumulative failure distribution. Thus
f(t) = dF(t)/dt = (β/α β) t β-1 exp[-(t/α) β]
Examples of distributions with different shape and scale factors are given below. The probability density curves are histograms which represent the distribution of lifetimes of the components within the total population.
Weibull Probability Density Function f(t) Descriptors
In batteries which are constructed from a series string of “N” components from the same distribution with independent failures, where the failure of one component causes the failure of the string, the shape and scale factors for the string are given by
αN=α1 / (N)1 / β
For random failures (β = 1), the characteristic life of a battery with a string of N cells is 1/N times the characteristic life of the cells, or conversely, the failure rate of the battery is N times the failure rate of the individual cells.
The hazard rate h(t), also called the failure rate, is given by
h(t) = f(t)/R(t) = (β/α β) t β-1
For a constant failure rate, β = 1, the mean time between failures (MTBF) is equivalent to the characteristic life and can be deduced from the above equation.
Β=1 and α=MTBF and MTBF=1 / h
Thus the MTBF is the reciprocal of the failure rate.
Note: The concept of MTBF only applies to constant failure rates. This does not apply to wearout failures as found in batteries (and many other things)
Weibull Probability Plot
To determine the α and β parameters for independent failures of a given type within a given population, it is necessary to conduct a life test on a small representative sample of units. The cumulative percent of the sample failing is then plotted against the time of failure, or number of cycles completed, on Weibull probability paper. (See the example opposite)
The characteristic life α of the population is defined as the time when 63.2% of the sample or population has failed and this is obtained directly from the graph.
The slope β of the graph is given by drawing a parallel line on the β scale outlined on the graph and corresponds to the shape factor of the distribution.
If the results of the sample tests do not show a distinct trend line, but instead are scattered across the chart, then the Weibull distribution is unsuitable for modeling the failure characteristics of the products concerned.
Cell and Battery Failures
The reliability of a battery pack can never be as high as the quoted reliability of a single cell. The more components (cells) used in a battery, the less its reliability since the incorporation of more components creates more opportunities to fail.
Additionally, interactions between cells can also cause small production variations between the cells to be magnified resulting in over stress and an increase in failure rates resulting in premature failures. Cell balancing can reduce but not eliminate this. These failures are not independent and are not considered here.
Wearout failures are generally due to the gradual deterioration of, or reduction in, the active chemicals resulting in reduced cell capacity. Cell lifetime is defined as the age when the capacity reduction, or the increase in internal impedance, reach predetermined, unacceptable levels.
- Example 2 - Wearout Failures (Time Varying Failure Rates)
The graphs below illustrate Weibull failure predictions for three different cell types with different characteristic lives α and shape parameters β relating to wearout of three batches of 80,000 identical cells with each batch used to construct 1000 batteries each containing 80 cells.
The parameters α and β must be determined experimentally.
The curves opposite represent the cumulative percentage of failed cells for the three types of cells with different failure characteristics. The cumulative percentage of failed cells in time t is given by
F(t) = 1 – exp[-(t/α)β]
(If α and β are known, the curves can be drawn using the Weibull function provided in the Excel spreadsheet)
Note that for each cell type, the cumulative percent which have failed when the elapsed time corresponds to the characteristic life of the cell is 63.2%.
The curves could also represent 3 (or more) separate, simultaneous failure mechanisms active in the same battery. For example, one for failures due to dendrite growth, another for electrode plating and another for electrolyte breakdown. In such cases the cumulative cell failures in any year would be the sum of the failures due to each factor.
Differentiating the above function with respect to time gives the corresponding distribution of cell lifetimes shown in the curves below.
The curves opposite show the distribution of lifetimes for the above three cell types, that is, the percentage of cells failing in each period.
The blue and red curves show the distribution of lifetimes for cells with an eight year characteristic cell life (α=8).
- The blue curve indicates components with a wide spread of production tolerances (β=4) which cause a high number of early failures.
- The red curve shows that the initial failure rate can be reduced by using tighter production control to produce components with a narrower tolerance spread (β=8) but in this case the failure rate increases later and all the cells fail relatively quickly.
Note that in all cases, more than half of the cells (63%) fail before the characteristic life.
The wearout failure of a single cell will not itself result in a battery failure since all the cells in the battery, including the "failed" cell, will continue to function and the gradual effect of reduced capacity, or increased impedance, of a single cell on the battery will not show until many of the cells are below the rated capacity or above the impedance tolerance. This does not apply to fatal cell faults which will always cause a battery failure. The example below shows two scenarios of the effect of cell failures on the corresponding lifetime of batteries built from 80 cells, each with a 12 year characteristic life as in the green example above.
- The green curve shows the cell lifetime distribution.
- The grey curve shows the battery lifetime distribution resulting from the failure of the cells. It assumes that the presence of a single failed cell in a battery chain is sufficient to cause the battery to fail. The cell failures occur randomly throughout the batteries so that some batteries will have more than one failed cell (but can only fail once) while other batteries have no failed cells.
- The purple curve shows the more likely case when early cell failures due to ageing will have minimal effect on the overall battery performance. In this case there are no battery failures during the initial periods. The Weibull location factor (γ = 2) represents the delayed onset of the failures and essentially moves the lifetime distribution to the right by 2 periods.
- Resist the temptation to manipulate the assumptions, particularly the location factor, to get a desirable outcome. Use only measured, real data.
- Remember that the Weibull curves are only mathematical expressions designed to approximate the shapes of reliability distributions and do not imply "cause and effect" just as actuarial lifetime tables only tell you what the population lifetime distributions are and not how or why the specific numbers are what they are. The curves merely represent or characterise what has happened or what has been achieved. Their value is that they can be used to extrapolate the occurrence of similar events in the future, but only so long as the environmental and operating conditions do not change.
- Manufacturers generally aim for the highest reliability and the longest life for their products. It is unlikely that they would design products to meet a specific set of Weibull reliability curves, however the curves can be used a measure of their performance.
- Accelerated lifetime testing, particularly for batteries, does not necessarily give reliable results since the usual methods of stressing the products by increasing the operating temperature or voltage can introduce new and unrepresentative ageing and failure mechanisms. Accurate reliability data can therefore only be generated by observing the cells under normal operating conditions over a prolonged period, longer than the characteristic life of the cells.
Accelerated lifetime testing, also including pressure, vibration and shock tests, is however still useful for identifying possible design weaknesses but not necessarily for estimating the time at which any failures may occur.
- Don't expect the majority of cells to survive past the characteristic life.
- In order to achieve an acceptably low number of cell failures within a specific period, the characteristic life of the cells must be much longer than the desired lifetime.
- The characteristic lifetime of high voltage batteries which of necessity use many cells is dramatically reduced below the characteristic lifetime of the constituent cells because the large number of failures associated with the larger cell population is spread over a much smaller number of batteries.
- Although every cell wearout failure does not necessarily cause a battery failure, battery lifetimes are still many times worse than lifetimes of the cells from which they are constructed.
- Extreme care must be exercised when deriving acceptable battery warranty limits based on the characteristic lifetimes quoted by cell manufacturers.
- Because of the difficulty and cost of generating basic reliability data, the best source of such information is the cell manufacturers themselves. If they make claims about their product lifetimes, they must have corroborating evidence to prove their claims. Unfortunately, unless you have a special relationship with the manufacturer, it is highly unlikely that you will have access to this information.
Automotive Battery Lifetime Predictions
As noted above, Weibull reliability predictions can be very useful but they are not easily adapted for use in automotive applications which operate under a wide variety of changing operating conditions with wide variations in the loads they must supply. For such applications, a performance model must be developed to represent real-world operating conditions. In practice this is a three stage process. First the stress factors which affect battery lifetime must be identified. Then a series of experiments must be set up to measure the deterioration in battery performance caused by each of these stresses over time. Finally, once these basic relationships have been established they can be combined into a composite model representing the battery performance when subject to multiple stresses. These steps are described in more depth as follows.
Stresses Affecting Battery Performance
Batteries may be subject to a wide range of stresses, from the minor to the serious. Some of these are outlined below. Many of them may be the result of overlapping causes. For instance overcharging, aggressive driving and high ambient temperatures all affect the battery temperature which in turn affects the battery life. It may not be necessary to quantify every identified stress since the essential battery performance in the desired application could possibly be characterised by fewer than ten key parameters.
Battery Electrochemistry Characteristics and Limitations
- Calendar Life
Batteries deteriorate whether they are used or not. Time "t"and temperature "T" related irreversible chemical and physical changes to the active chemicals can cause the internal impedance to increase and the energy storage capacity to fall. These relationships are relatively easy to model mathematically since empirical evidence suggests that the rate of increase in internal impedance follows a t1/2 relationship and that the rate of deterioration doubles with every 10°C increase in temperature T (Arrhenius law). See more details about Calendar Life
- Cycle Life
Usage related performance degradations are related to the number "N" of cycles completed and also to the time "t" but most of the relationships are more complex and non linear. The only practical way of characterising the battery performance is by experiment. The following are some of the factors affecting the cycle life.
- Operating the battery at very high or very low temperatures can cause irreversible chemical changes which reduce the battery life. See Lithium Batteries Cycle Life
- Subjecting the battery to very high charge and discharge rates has a similar effect. See also Charging Times and Lithium Plating
- Charging to high voltages also affects battery lifetime. See Charging Level
- Similarly keeping the battery at a very high state of charge (SOC) also has a deleterious effect on life.
- Battery lifetime also depends on the Depth of Discharge (DOD) to which it is subject and the SOC swing at various SOC levels. See more details about Depth of Discharge
- Lifetime can also be curtailed by loss of electrolyte or water ingress and subsequent reaction with the electrolyte due to leakage or the failure of seals.
See more details about Cycle Life and Lithium Battery Failures
The previous paragraph indicated some of the basic ageing factors inherent in the battery chemistry. From this we can see that certain external environmental and usage factors, such as those following, can be considered as ageing accelerators.
- High and very low temperatures
- High energy throughput rate (Charge and discharge rates)
- Mechanical stress or vibration which can give rise to open or short circuits or seal failures.
In addition to the ageing accelerators noted above there are some less obvious environmental factors which can influence battery life
- A temperature gradient across the battery can increase the rate of battery ageing. From Arrhenius we know that, with a 10°C difference in temperature across the battery, some cells will age at twice the rate of others giving rise to unbalanced stresses on the cells resulting in premature failure. See also Interactions Between Cells.
- High pressure or cyclic pressure changes can cause mechanical failures of the cells.
- High humidity can give rise to corrosion causing increased contact resistance at the battery terminals.
Battery ageing also depends not only on how the battery is treated by the user but also where the user lives and works.
- A battery in a typical family car may be used for less than 2 hours per day, whereas a public transport bus may be in use 18 to 20 hours per day.
- Driving styles also affect ageing. The energy throughput rate of the battery depends on the user's driving profile which may range from mild to aggressive.
- Similarly the typical journey routes taken by the user, whether predominantly rural, urban or highway, will affect the energy throughput rate and hence the wearout of the battery.
- Location is also a key factor influencing battery life. The prevailing climatic conditions where the user lives could range from a hot, arid desert to the Arctic.
Reliability Testing and Data Collection
Once the stress factors which could affect battery life have been identified, it is necessary to carry out a series of experiments to quantify their effect. This section deals with the issues involved.
Reference Point - Total Energy Throughput
Using a fixed time, or a fixed number of cycles as a reference for comparing performance does not necessarily give comparable results since both calendar life and cycle life are affected by many, if not all, of the stress factors we are trying to quantify. This could result in some batteries being under used during some tests while being over used in others. One constant which should apply to all of the cycle tests is the total energy throughput of the battery during its lifetime. Defined as the Nameplate Cycle Life X WattHour Capacity of the battery it provides the reference standard for testing batteries of a particular type under different operating conditions.
Lifetime Test Duration
The time required to verify the lifetime of a battery can be problematical. Ideally the duration of the tests should be the same as the specified lifetime of the battery, but with expected lifetimes of eight to ten years, this would impose an intolerable delay in the auto manufacturer's product design cycle. Testing for shorter periods involves making predictions and taking some risk. Fortunately there are ways of mitigating this risk.
As noted above, calendar life can be represented by two well established mathematical relationships whose constants can be determined by relatively short test periods.
The test duration required for cycle life testing is just the time needed to complete the total energy throughput and these tests can usually be completed well within the specified calendar life. This is because, in most applications, the batteries are only being used for part of the day so that it may be possible to run several consecutive test cycles per day by eliminating the idle time between the cycles in the typical usage pattern. It is however advisable to maintain a short rest period between cycles to allow the chemical transformations in the battery to stabilise between cycles.
Further reductions in test times result from the nature of tests themselves since most stress testing involves speeding up the energy throughput rate by increasing the charge and discharge rates or increasing the chemical reaction rate by increasing the operating temperature.
Overall, it may be possible to predict a ten year lifetime with just a year of testing.
Stress Factors - What to Test
For completeness, separate tests should be run for each stressful condition, such as those noted above, however it should be possible to obtain a reasonable idea of the battery lifetime from monitoring the effect of just five, but preferably more, of the key stress contributors. The stress conditions to be measured could also include more than one stress factor if the two stresses regularly occurred simultaneously, so reducing the number of tests and simplifying the subsequent construction of a representative performance model.
Tests involve cycling the battery through its charge - discharge cycle while subjecting the battery to one or more of the identified stress conditions such as :
- Deterioration during storage
- Working at high and low temperatures
- Aggressive driving cycles
- High average charge and discharge currents
- Deep depth of discharge
The battery capacity and internal impedance should be recorded after each test cycle so that a graphs showing the change of capacity fade or impedance growth with the number of completed cycles can be plotted.
Performing reliability tests is extremely costly and time consuming. The batteries are all very expensive and must be tested to destruction. It is tempting to save money by running the tests on the individual cells alone and extrapolating battery performance from the cell performance, however this will give erroneous results. To get a true representation of battery ageing or wearout, the tests must be run with complete batteries, including the associated thermal management and BMS just as they would be in the planned application.
Similarly, extrapolating the performance of the whole population of batteries from a sample of one can also give unreliable results and erroneous conclusions . For reliable predictions, testing for each stress factor should be performed on several samples.
In addition to the cost of the batteries, the cost of the comprehensive test facilities used to test them must be taken into account. This includes large environmental chambers, high capacity programmable power supplies and the energy to run them, simulated loads, data logging and wide range of test instrumentation as well as the staff to run the facilities 24 hours per day for the duration of the tests which could be up to a year.
Safety testing adds considerably to this.
Thus the cost of testing and approving a battery for use in a passenger vehicle could run into several million dollars.
Battery Lifetime Model
Test results and Lifetime estimation
From the test results a composite ageing characteristic or lifetime model for the battery can be developed by combining all the lifetime graphs of capacity fade or internal impedance rise for both calendar and cycle life, due to each of the identified stress factors, into a single curve. The example below shows how the capacity fade results due to two stress factors, one mild and one severe, can be combined into a single graph by applying each stress factor for the estimated percentage of the battery lifetime for which it applies. The process can be repeated until all the test results have been incorporated.
During the extended period necessary for testing the batteries, it would be expected that the vehicles in which the batteries will be used will also undergo performance testing in a parallel series of tests. By monitoring battery performance of these field test vehicles in practical daily use and comparing the results with the model's predictions, the validity of the model can be verified or refined if necessary.
Random faults are amenable to simpler analysis. They tend to demonstrate a fairly constant failure rate, not time varying as in wearout faults.
- Example 3 - Random Failures (Constant Failure Rate)
Consider a multi-component system such as a series chain consisting of n components in which the failure of any component n results in the failure of the system:
If the probability of survival of each component after time t is R(t)n and the failure rate per hour of each component is λn, the system reliability R(t)system is given by:
R(t)system = R(t)1 x R(t)2 x R(t)3 x ….. R(t)n
For components with constant failure rates in which λn= the failure rate per hour of component n, then R(t)n = e-λnt and the system reliability is given by
R(t)system = e-λ1t x e-λ2t x e-λ3t x .….. e-λnt = e-(λ1 + λ2 + λ3 + . . . . λn)t
Thus the system failure rate λsystem is given by:
λsystem = λ1 + λ2 + λ3 + ..... λn
In a battery the components (cells) making up the series chain are identical so that:
λ1= λ2 = λ3 = ..... λn
and the system failure rate is given by:
λsystem = n x λ
The system reliability is then given by:
But beware - These last conclusions apply to components with constant failure rates and they ignore the affect of time varying wearout failures.
Oft quoted industry references are a prime examples of wishful thinking. It is claimed that the rate of “incidents” with consumer cells (18650s), which are used in their millions, is 1 in 5 million, equivalent to a failure rate of 0.00002%. Assuming this to be true, the same failure rate is then applied to large format cells used in EVs and HEVs. With an average of 80 cells per vehicle, this translates to 1 incident or fire in 60,000 cars or 16 fires per year for 1 million cars on the road. This is of the same order of magnitude as engine compartment fires due to electrical or other faults in conventional cars and while not strictly acceptable it is considered to be tolerable for the time being.
Engine Compartment Fire
This completely ignores the fundamental differences between consumer cells and large format cells, and the environment in which they operate, which make such comparisons invalid. The probability of the occurrence of a short circuit due to contamination between the electrodes is quite possibly proportional to the area of the electrodes. The area of the electrodes used in 200 Ah cells commonly used in EVs will be about 100 times the area of the electrodes used in 2 Ah 18650 cells used in laptop computers. This means we could expect 1 incident in 600 cars unless much stricter process controls were implemented by the cell manufacturers. Besides this, there are most likely to be differences in the active chemical mixes used in the low and high power cells. In addition, EV batteries carry much higher currents and as we know, the Joule heating effect is proportional to the square of the current, creating increased thermal stress within the cells and, following Arrhenius Law, increasing the rate at which both the wanted and unwanted chemical reactions occur in the cells. Furthermore, on a daily basis, automotive batteries are subject to much greater temperature extremes and mechanical stresses such as vibration and shock. As already noted, using the shape and scale parameters developed for one product on other, similar products is not justified.
The very low failure rates due to these random defects means that the corresponding characteristic life will be over 1000 years making Weibull predictions impractical for such low fault levels.
Battery Safety is considered in more depth in a separate section.
Like quality, reliability has to be built in to the product as part of the design process. Beware of depending on published lifetime or MTBFs. These data may have been determined under precisely controlled conditions which most likely will not apply directly to the application being considered. The actual thermal, mechanical, electrical and other stresses on the components used in the application could be significantly different from the operating conditions which apply to those in the published data, rendering the use of these data questionable. The object of reliability calculations should be to highlight the potential for failure and to identify failure prevention actions.
Failure Modes and Effects Analysis (FMEA)
An important tool for failure prevention is Failure Modes and Effects Analysis. This is a formalised design review process which takes place as part of the product design and qualification process and involves using multi-disciplined teams to indentify possible failures in the product, and to classify the probability of the occurrence of the failure and the severity of its consequences followed by an action plan to design out potential failure modes. Carrying out a Failure Modes and Effects Analysis (FMEA) during the development process is mandatory for components such as batteries and electronic circuits used in the automotive and aerospace industries and prudent in most others.
System Reliability Improvements
The overall system reliability can be improved by adopting design and operating principles to minimise the stress on the battery.
- The obvious policy is to use the most reliable cells available.
- Carry out thorough cell qualification which is representative of the expected battery operating environment. The tests should include mechanical stress (vibration and shock) and abuse as well as abuse, temperature and electrical stress. The BMS may also be susceptible to conditions of high humidity.
- Burn in can improve cell reliability by ensuring that the infant mortalities occur in the cell or pack maker's plant and not in the customer's battery.
- In general, lower voltage designs will be more reliable than high voltage designs. This applies at the cell level and the system level.
- At the cell level, operating cells slightly below their maximum specified level reduces the stress on the cell and can significantly increase the cell life time. See the effect of reducing the cell charging cut off level.
- At the system level, reliability can be increased by reducing the system voltage but maintaining the system power by increasing the corresponding current. This allows fewer cells in the series chain but it needs cells with higher current carrying capacity or more parallel cells. The system reliability is inversely proportional to the number of cells in the series chain.
- Another way of increasing cycle life by reducing the stress on the cells is by specifying cells with a slightly higher capacity than absolutely necessary. This small capacity reserve reduces the effective maximum operating DOD. The graph of DOD vs Cycle life shows the potential for improvement.
- Instead of large cells, use parallel strings of smaller cells in . This has the following benefits.
- Smaller cells tend to be less stressed and consequently have a lower failure rate.
- Because less energy is stored in smaller cells, the energy released in case of the catastrophic failure of an individual cell will be less. Any failure will thus be easier to contain and less likely to cause fault propagation throughout the battery.
- The failure of an individual cell in a parallel configuration will not cause the failure of the whole battery which could possibly continue functioning at lower power.
- In multi-cell batteries, manufacturing tolerance spreads of cells tend to increase as the cells age causing the weaker cells to fail. Sorting the cells to be used in each battery into narrower tolerance bands before assembly can help to minimise these premature failures. See also cell balancing.
- Control the operating environment. Both high and low temperatures are cell killers. The system should incorporate thermal management with heating and cooling circuits, where necessary, to keep the cells operating within their temperature sweet spot.
- HEV batteries suffer the harshest environmental conditions but at least for temperature control there are more options since the thermal management can be combined with the vehicle's conventional engine cooling and passenger climate control systems.
- The main stress on EV batteries comes from the requirement to operate with deep discharge levels.
- Carry out regular maintenance and use the battery management system (BMS) to monitor the state of health (SOH) of each of the cells to identify any weaklings for replacement.
- Provide redundancy so that the failure of a single cell does not incapacitate the battery allowing it to continue working in emergency situations. (See - Reliability Improvement through Redundancy - next item below)
- Use parallel cell strings
- Provide standby or cycling redundancy
- Divide the battery into two or more sections, each with bypass paths which can be switched in to enable a section with a failed cell to be circumvented allowing the battery to continue to function at low power.
- See also Battery Safety
Reliability Improvement Through Redundancy
In any system constructed from independent and identical components, each with the same reliability rate of R in a given period , if the failure of any one of the components causes the failure of the system, then (as in example 3 above) the system reliability Rsystem is given by:
Rsystem = Rn
where n is the number of components in the system.
If however the system is designed with more components than strictly necessary to ensure normal functionality, the life of the system can be prolonged by arranging that any of the surplus components can take the place of failed components. The principle of incorporating one or more extra "back-up" components which only come into play when another component has failed is called redundancy and the extra components are called redundant components. The greater the number of redundant components, the greater the reliability improvement, enabling a dramatic improvements in system reliability to be achieved. The trade-off is that the system will be more bulky, more complex and more costly.
Adding redundancy to the system needs an appropriate method of substituting redundant components in place of failed components when required. The extra redundant components needed increase the system component count to n of which any k could keep the system fully operational.
A system with n identical components of which only k are necessary for normal functionality thus has (n - k) redundant components.
The reliability of such a system with redundancy is given by:
- Example 4
Consider a system built from k essential components each with a reliability R of 0.85 per unit time. (Constant failure rate)
In a system built from 4 identical components, all of which are required to deliver full functionality, k=4 and the system reliability Rsystem will be given by:
Rsystem = 0.854 = 0.5220
With 2 redundant components added to improve the reliability, the total component count is increased to n = 6 and the system reliability is given by:
Thus in this case, by adding 2 redundant components, the reliability of the bare system of 4 components without redundancy will be almost doubled.
table opposite presents the data derived from the above example in a slightly different way showing how reliability improves as the level of redundancy increases.
Starting with a total of n=6 available components, each with a reliability of 0.85, it shows the system reliability of 6 different system configurations where k is the minimum number of components necessary to keep the system functioning and the balance of the components are used to provide redundancy.
When all 6 components are required for system functionality, the system has no redundancy and the the reliability is only 0.37715.
At the other extreme, a system consisting of a single component with 5 possible back-up components would have a reliability of 0.999989
Reliability Improvement with Redundancy
- Example 5
Consider a battery system consisting of 8 cells connected in series, each with a probability of surviving for 1000 cycles of 99%.
The probability of the series chain surviving for 1000 cycles will be 0.998 = 0.9227 or 92.3%
By adding a single back-up cell, which can automatically replace, or take over the function of, any failed cell, a system with 1 for 8 redundancy is created.
The probability of this system surviving for 1000 cycles is given by:
This works out to be 0.9966 or 99.7%. Thus by adding a single extra cell, the system reliability is improved from 92.3% to 99.7% which is better than the reliability of the individual cells (99.0%) making up the system.
- Example 6
More generally, in a system having 1 extra redundant component to protect n system components each with reliability R, from the above, the system reliability is given by:
Rsystem = (n+1) x Rn x (1-R) + R(n+1)
Because of the possibility of uneven ageing between the active cells in the battery and the redundant cells, steps must be taken to keep the redundant cells at the same State of Health (SOH) as the active cells so that, if a redundant cell is called into play, it will not unbalance the battery. (See also Cell Balancing). In principle this is simply accomplished by exchanging a different cell from the series chain with the previously redundant cell, every charge - discharge cycle, but it requires some complex electronics to implement. There is however a compensating upside to this action. The number of cycles completed by each cell will be reduced in proportion to the ratio of active cells to the total cells. As well as improving reliability, cyclic redundancy effectively increases the battery cycle life by utilising the excess capacity normally associated with the idle redundant cells to provide extra load cycles spread over the lifetime of the battery, thus sharing the load between all of the cells on a temporal basis.
In the 1 for 8 redundancy example above, cyclic redundancy would increase the battery cycle life from 1000 cycles to 1125 cycles, while none of the 9 cells in the battery would exceed its specified 1000 cycle charge - discharge limits.
See also Cycle life and depth of discharge (DOD).
Beware: System reliability in redundant systems can be completely compromised if the equipment or switch used to disconnect the failed component and to replace it with a back-up component is itself unreliable.
- Personal anecdote
I once had to investigate the failure of a major international communications link. Each remote repeater station was powered by a massive diesel engine charging a huge battery bank, with an identical system on standby in case of emergencies. I discovered that when a fault occurred in one of the power systems, a tiny defective relay in the battery management system failed to initiate the switchover to the expensive back-up system rendering it useless.
Emergency Battery Power Through Segregation - A Compromise Solution
For electric vehicles, a Limp Home mode, which provides emergency power in case of a single cell failure, can be implemented by dividing the battery into two sections. This allows the failed cell to be isolated, preventing it from disabling the entire battery.
A switch associated with each section enables the failed section to be bypassed allowing the battery to continue to function, but only at half power. Complete system failure only occurs if both sections fail. This solution needs two expensive heavy duty circuit breakers which are capable of switching the full battery current.
- Example 7
To demonstrate the scale of the reliability improvement using the above scheme, for simplicity we can assume a constant failure rate. (The reality will however be a time varying wearout rate)
Consider an 80 cell battery divided into two 40 cell sections A and B, each cell having an MTBF of 10,000 hours, equivalent to a failure rate of 10-4 failures/hour.
The MTBF of the 80 cell battery (A plus B) will be 10,000/80 = 125 hours and the failure rate will be 80 x 10-4 = 0.008 failures per hour
The MTBF of each 40 cell section (A or B) will be 10,000/40 = 250 hours and the failure rate will be 40 x 10-4 = 0.004 failures per hour
Because the second section effectively provides redundancy, both sections would have to fail for the battery to fail
The probability of both sections A and B failing is given by the product of their failure probabilities (or by the equation given in example 6 above) - that is 0.004 x 0.004 = 0.000016 = 1.6 X 10-5
failures per hour.
Thus the MTBF of a 2 x 40 cell battery with redundancy = 1/ 0.000016 = 62,500 hours which is over 6 times better than the MTBF of the individual cells.
Battery Warranty Policy
To develop a suitable policy we need to know
- The warranty period – This is negotiated with the customer and the expectation may be longer than the current generation of battery technology has existed.
- The probability of failure -The percentage of the population which will fail for reasons other than abuse before the proposed warranty period expires.
- The magnitude of the risk event - The consequences of the failure for any reason
- The immunity from abuse – This is provided by the BMS
- Record of abuse – If it can be shown by data logging in the BMS that the battery has been abused, this can be grounds for voiding the warranty. Abuse is not considered a warranty issue however it could result in catastrophic failures and it’s no comfort to claim “It’s not my fault”. Cells should be designed to fail in a benign way as the result of abuse and battery packs should be designed to prevent abuse.
- External factors - The warranty issue is complicated very much by the possibility of battery failures caused by vehicle system malfunctions. The vehicle's on board diagnostics (OBD) should indicate whether such a failure has occurred.
Once the possible failure causes have been identified and failure rates have been estimated, the cost of honouring the agreed warranty liabilities such as repairs or replacements can be calculated. The cost of consequential damage and the damage to the supplier’s reputation caused by selling unreliable or dangerous products is incalculable.
Who is responsible when something goes wrong? -- The blame game!
Pinning the blame for battery failures on the cell maker could be very difficult. In addition to possible abuse by the end user, the design of the battery pack as well as the design of the product in which the battery is used both have an influence on battery performance and lifetime.
The usage of the battery needs to be closely monitored and component purchasing contracts need to be drawn up very carefully to tie down the responsibilities for failures where they belong.
Battery Warranty Contractual Issues
Warranty conditions must be negotiated between five interested parties each with conflicting objectives.
- Customers require lifetime performance guarantees in order to justify their purchase of an expensive battery. They expect the warranty to cover the life time of the packs, not the cells and they want to use the cells in a relatively uncontrolled environment. (The BMS can provide a degree of control at the expense of performance flexibility). So long as the battery has not been abused the, customer's risk is minimal, usually only inconvenience, since they are usually protected by consumer law which holds the seller responsible. (See Retailers below)
- Cell Manufacturers are only prepared to provide limited performance guarantees for the cells. They set strict limits on the acceptable cell usage and operating environment. The guaranteed cell lifetime performance will most likely be less than the performance they quote in their specification sheets, or only applicable under strictly controlled operating conditions. There is always a risk that the cell supplier will not honour the guarantee.
- Pack Makers are in the middle and subject to claims from the manufacturer of the product in which the cells are used. By putting the cells into a battery pack the responsibility for lifetime performance is unavoidably muddied since the cells' operating conditions are determined by the pack design, possibly voiding the cell maker's guarantees.
Because of the mismatch between what the customer wants and what the cell supplier is prepared to guarantee, the pack maker is often obliged to assume responsibility for the uncovered risk between the two parties.
- OEM Systems Providers such as automotive manufacturers who incorporate the batteries into their designs muddy the water even more. The OEM determines the battery's overall operating environment. Furthermore, as noted above, malfunctions in the vehicle's ancillary systems could cause failure of the battery.
The OEMs' warranty risks are the greatest. They must accept liability for the overall system performance and bear the total warranty costs. (See Retailers following). To protect themselves, they need to identify the root cause of the failure and determine who was responsible for it so that they can attempt to recover their costs from the relevant party. Was it the user, the cell maker, the pack maker or was the failure due malfunctions in the OEM's own systems?
- Retailers usually have the first responsibility for a battery which fails within the guarantee period even though they do not add to, or modify, the product in any way. In most countries the legal responsibility for honouring the warranty of a product is the organisation who sold the product to the end user or customer, usually a retailer in a distribution chain, and this responsibility includes all the components within the product. This is because the contractual agreement is between the seller and the customer. Customers can not be expected to know who all the components suppliers are, nor can they be expected to pursue the manufacturer of failed components for a replacement or damages since they have no contractual arrangement with the component manufacturers.
In the same way, OEM suppliers will be expected to take responsibility for the products they deliver to the retailers and the retailers distribution contract will reflect this back to back arrangement. The retailer is thus held harmless by the OEM's contractual obligations.
Such is Life