PhilonNet Logo     ANSYS Certified Reseller   LSTC
  CONTACT  
  SITE  
 

 
     \\ home \ products \ modefrontier \ doe \   
           
 
 

 • Home   

 • Products   

 • Training    

 • Consulting 

 • Support    

 • News        

 • Events      

 • Demos      

 • Downloads 

 • Register    

 • About Us   

 • Links        

 • Jobs         



member of technet alliance
member of nafems
EsoCAET
Advantage




   
 
Design of Experiments
Silvia Poles, ES.TEC.O. Research Labs, Padova, Italy

"Design of experiments" was originated around 1920 by Ronald A. Fisher, a British scientist who studied and proposed a more systematic approach in order to maximize the knowledge gained from experimental data. Prior to this, the traditional approach was to test one factor at a time which meant that during the experimental phase, the first factor was moved while the other factors were held constant. Then the next factor was examined, and so on. By using this old technique, many evaluations were usually needed to get sufficient information which turned out to be a time-consuming process. The approach proposed by Ronald Fisher surpassed the traditional approaches as it considered all variables simultaneously, changed more than one variable at a time, thus getting the most relevant information with the minimal effort.

Since then, design of experiments has become an important methodology that maximizes the knowledge gained from experimental data by using a smart positioning of points in the space. This methodology provides a strong tool to design and analyze experiments; it eliminates redundant observations and reduces the time and resources to make experiments.

The intention of this article is to demonstrate particular features of modeFRONTIER that make design of experiments easy to be used and implemented. When using an optimization tool such as modeFRONTIER, there are at least four good reasons to apply "design of experiments" (DOE) from now on:

  1. To get a good statistical understanding of the problem by identifying the sources of variation;
  2. To provide points which can be used to create a meta-model by involving a smart exploration of the design space;
  3. To provide a good starting point for an optimization algorithm, such as Simplex or Genetic Algorithm (GA);
  4. To check for robust solution.

In general, we can say that a good distribution of points achieved through a DOE technique will extract as much information as possible from a system, based on as few data points as possible. Ideally, a set of points made with an appropriate DOE should have a good distribution of input parameter configurations. This equates to having a low correlation between inputs. The later becomes obvious when we look at the left diagram in the figure below where a case with 3 input variables is shown. Clearly, if our combinations of inputs were all in the same part of the design space, the correlation between them would be high. However, we would be learning very little about the other parts of the input parameter space, and hence about those parts of the system.

doe

The right choice of DOE starts from the decision on the number of experiments. For example, when we consider the case of a statistical analysis where the cost of each experiment is 1000 $, we likely have two goals to pursue: To have a good set of samples and a cost-effective campaign of experiments. The problem is similar when the sample is not anymore an experiment to be executed in the laboratory but a numerical analysis instead. If we suppose, for example, that each point corresponds to a numerical analysis that lasts for one day or even more which can be the case when using very heavy numerical tools such as PAMCrash, LS-DYNA or complex CFD analysis with million and millions of cells. Obviously, the calculation time will depend on the hardware available and the number of available licenses of the software used. The objective in this case is to have a good set of calculated designs in order to construct a meta-model to speed up the optimization by using a virtual solver. We need to use the tools available in modeFrontier to check the (non-) linearity of the system. If we think that we have obtained a reliable meta-model, we can use that for "virtual" optimization purposes. In other words, we can use the meta-model to obtain the best result(s). Of course, this always needs to be followed by a careful validation: The combination of inputs which created any virtual optimum should be fed back into the analysis to verify the "real" result. Anyhow, the situation is quite similar with the only difference that in this case the DOE technique reduces the time instead of the cost of the experiments.

This article aims at stressing the fact that, before employing a search strategy, it may be useful or even essential to carry out a preliminary exploration of the design space. This might be in order to provide an initial population of candidate designs or to let the user build some understanding of the behavior of the objectives and constraints, prior to deciding on what further search method to use. For this purpose, a range of reasonable ways exists for positioning a set of N points in the space of designs. modeFRONTIER provides all the tools for measuring the quality of a DOE in terms of statistical reliability. Moreover, modeFRONTIER gives a set of reasonable DOE methods to tackle different kind of problems. These methods include :

  1. User-chosen set: Based on the user's previous experience;
  2. Exploration DOEs, useful for getting information about the problem. These methods eliminate subjective bias and allow good sampling of a configuration space. Exploration DOEs can serve as the starting point for a subsequent optimization process, or as a database for response surface training, or for checking the response sensitivity of a candidate solution. (Random Sequence, Sobol, Latin Hypercube, Monte Carlo, Cross-Validation)
  3. Factorial DOEs, a large family of techniques essential for performing good statistical analyses of the problem, for studying main and higher-order interactions between variables (Full and Reduced Factorial, Cubic Face Centered, Box-Behnken and Latin Square).
  4. Orthogonal DOEs, useful if the purpose is to identify the main effects and all the interactions are negligible or in order to control noise factors (Taguchi Matrix, Plackett Burman).
  5. Special Purposes DOEs, suitable for particular tasks to be achieved in design planning and whenever other methods do not apply. (D-Optimal, Constraint Satisfaction Problem)

In the figure below a sub-set of 3-dimensional DOEs is shown. It is clear that some are very structured, such as the Full Factorial or Latin Square, while others show more a cloud of points, such as Sobol and Random Sequence. The choice of technique depends very much on the number of calculations which can be performed and on the kind of investigation that should be done.

doe

DOE for statistical analysis

The original use of DOE planned by Fisher refers to methods used to obtain the most relevant qualitative information from a database of experiments by making the smallest possible number of experiments. Fisher proposed a new method for conducting experiments, eliminating redundant observations, reducing the number of tests in order to provide information on the major interactions between the variables.

The DOE approach is important to determine the behavior of the objective function we are examining because it is able to identify which factors are more important. The choice of DOE depends mainly on the type of objectives and on the number of variables involved. Usually, only linear or quadratic relations are detected. However and fortunately, higher-order interactions are rarely important and for most purposes it is only necessary to evaluate the main effects of each variable. This can be done with just a fraction of the runs, using only a "high" and "low" setting for each factor and some center points when necessary.

For example, reduced (or fractional) factorial attempts to provide a reasonable coverage of the experimental space while requiring significantly fewer experiments. In using reduced factorial designs, we do not create designs at all possible level combinations, only at a fraction of them. The figure below represents an example of a reduced factorial DOE on a 3-dimensional space. The set of black points represents a reduced factorial and is composed by exactly half the number of the total points. Anyhow, using this DOE, the information related to the binary interaction between variables is kept and main effects are still visible. Hence, it is possible to use a fractional factorial plan to understand the most important characteristics of the problem at hand faster

doe Figure 1: 2-levels full factorial (black and white points) compared with a reduced factorial (black points) on a 3-dimensional space [left]. Cubic face centered on a 3-dimensional space, this method is equivalent to a full factorial with two levels plus the mid-points of the design space hypercube [right].

There are several different design of experiment techniques available: Reduced factorial, Plackett-Burman, Box-Behnken, cubic face centered, just to mention a few. These kind of DOEs are useful to determine the most important design variables, and to research the most favorable region for the objective functions. Hence whenever DOE is used during the optimization process, it should always be applied before the actual optimization phase as it can be useful to identify the main effects, to reduce the number of variables and/or to shrink the range of variations.

Several statistical tools make modeFRONTIER a powerful instrument to analyze experimental data: Main effects plots, correlation matrix, student charts and many other. All these tools guide the user to the real essence of the problem at hand.

DOE for robustness

The presence of uncertainty makes the traditional approaches for design optimization especially insufficient. Robust optimization and the related field of optimization under uncertainty are well known in economics. The importance of controlling variability as opposed to just optimizing the expected value is well recognized in portfolio management. In fact, when constructing a balanced bond portfolio one must deal with uncertainty in the future price of bonds. Robust optimization has recently started to gain attention within the engineering and scientific communities as many real world optimization problems in numerous disciplines and application areas, contain uncertainty. This uncertainty is due to errors in measuring, or difficulties in sampling, or moreover may depend on future events and effects that cannot be known with certainty (e.g. uncontrollable disturbances and forecasting errors).

In many engineering design problems, the design parameters may only be known to some tolerance or in some cases, they may be described by a probability distribution. Moreover, designing a product for a specific environmental scenario does not guarantee good performance for other environments: There is a risk associated with the chosen design, another design may have a lower risk.
Deterministic approaches to optimization do not consider the impact of such variations, and as a result, a design solution may be very sensitive to these variations. These uncertainties should be included in the optimization procedures, so that prescribed robustness can be achieved in the optimal designs.
To select the best parameter set, we need an algorithm which explores the parameter space in an efficient way and chooses solutions that correspond not only to the best average but are also located on broad peaks or high flat regions of the function. Such requirements will ensure that small variations in the model parameters keep the system in the high performance region.
As the input variables are uncertain, these uncertainties can be approximated in terms of a probability density function. Probability density functions can be used to generate a series of sample points, which are used to evaluate the model and create a corresponding output probability density function. To specify the diverse nature of uncertainty, different distributions can be used. For example, a normal (or Gaussian) distribution reflects a symmetric probability of the parameter above and below the mean value. On the other hand, a uniform distribution represents an equal likelihood in the range of inputs. Other distributions, like the lognormal and the triangular, are skewed such that there is a higher probability of values on one side of the median than on the other.

One of the most used techniques for sampling from a probability distribution is the Monte Carlo sampling which is based on a pseudo-random generator used to approximate a uniform distribution with a specific mean and variance. The advantage of this method lies in the fact that the results can be treated by using classical statistical methods (statistical estimation and inference). Unfortunately, this widely used method does not consider constraints on the input parameters and can result in large error bounds and variance. Other sampling methods have been studied in order to improve the computation efficiency of the sampling. An efficient sampling methodology is latin hypercube. This method uses an optimal design scheme for placing the m points on the n-dimensional hypercube. Using this method, the sample set is more representative of the population even for small sample size. Considering that the number of samples plays an important role in robust design optimization, it has been computed that this method can speed up design under uncertainty and is at least 3 to 100 times faster than Monte Carlo techniques.

doe Figure 2: This contour plot on the left shows a multivariate statistical distribution of two variables, x and y. Warmer colors indicate higher values for probability distribution: i.e. the region with a greater probability to generate points is the red peak. This probability distribution function of more than one variable is called joint probability function. The two charts on the right show the two marginal probabilities: In this particular example the probability function along x is a Normal distribution centered in 0 and with standard deviation equal to 1; the marginal probability function along the y is a Cauchy distribution centered in -1 and with scale factor equal to 1.

Both Monte Carlo and latin hypercube sampling are available in modeFRONTIER in order to enhance the research of robust solutions.

Conclusions

The websites, www.esteco.com as well as www.network.modefrontier.eu, the portal of the European modeFRONTIER Network, provide several examples of how to use DOE techniques.

For any questions on this article or to request further examples or information, please email the author or info@philonnet.gr

Silvia Poles
ES.TEC.O. - Research Labs
scientific@esteco.com



  BACK TO TOP  
 
     \\ home \ products \ modefrontier \ doe \   

 
© PhilonNet All Rights Reserved
  DISCLAIMER  
  SITE  
  CONTACT  
 

Valid HTML 4.01 Transitional