Why use a Design-of-Experiment (DOE) Matrix

The word Design-of-Experiment, or DOE, has become very popular among engineers. However, often times this word is misused and misunderstood. The principles of DOE are based on the capabilities and limitations of the analysis tools that will be used to process the data and determine cause and effect. Sometimes proper DOE matrix are followed but adequate analysis is not performed. Other times, a proper DOE matrix is not followed but yet we try to use analysis methods that would have required a proper DOE matrix. Even more confusing is when we use the word DOE when neither a DOE matrix or analysis method was followed. 

First, it is important to acknowledge that DOE results are intended to be analyzed using regression analysis and something called "response surface method" or RSM. These methods rely on using a proper DOE matrix to help with variable independence or orthogonality and statistical significance. 

Let's take a look at the following example. Say you have 4 variables in a process (A, B, C and D) and you want to understand their impact in your process outcome. You conduct the following study where -1, 0, and 1 represent each variable min, middle, and max value:


You may be tempted to say that you have conducted a DOE. However, the table above has not followed any proper DOE matrix design. Even with the table above, it will be difficult to understand the effect of each variable (main effects) since some variables are strongly correlated. For example, let's take a look at the relationship between Parameter A and Parameter B:


Note from the graph above that the coefficient of determination, R^2, is quite high, indicating that Parameter A and Parameter B are confounded. When parameter A goes up, Parameter B also often goes up. It will be difficult for regression analysis to decouple the effect of parameter A vs. Parameter B. If you created a table of R^2 for all the variables, it will look like this (correlation table):




In the table above, you expect to have "1" along the diagonal, but you want to minimize the non-diagonal terms (ideally they would be zero). If you are a MATLAB user, you can use the function "corrplot(X)" to graphically obtain the table above or "corrcoef(X)" to get the table in a variable. Please note that MATLAB will plot R and not R^2. You can also use the DOE Diagnostic, Evaluate design option in JMP to do the same. Other programs like R and Python have similar functionality. Some programs plot R instead of R^2 but the essence of the message is the same.

The study matrix above also present troubles if there are significant variable interactions. For example, let's take a look at the correlation between Parameter A*C Interaction vs. and Parameter A*D Interaction:




Note that the interaction between Parameter A*C and Parameter A*D are also strongly correlated. It will be difficult for the statistical tools to tell the difference between these two interaction effects. The complete correlation table (these are R^2 Values) including 2-level interactions and square terms is shown below. By exploring the table below you will see that there is a lot of variable confounding in this study.




Ideally, if you have 4 parameters at 3 Levels, you would want to do a Full Factorial (FF) DOE which will result in 3^4 = 81 runs (#Levels^#Factors = # runs). If you did this, you would have perfect decoupling and the DOE would be perfectly orthogonal.  The correlation matrix (now in absolute value of R) would look something like this with 1 along the diagonal and zero everywhere else. This design is ideal to find all the main effects and interactions as well as squared terms.




However, 81 runs is a lot of runs. You may perhaps be limited to a much smaller number of runs. Here is where you can choose some other DOE matrix designs such as a central composite design (CCD). A CCD design will look like this having a total of 26 runs.


The corresponding correlation plot will look like this. Note that only the square terms (X1^2, X2^2, X3^2, and X4^2) are confounded but not the first level interactions.



You may still be limited to a much smaller number of runs. After all, the original study only had 6 runs. Here is where one must understand the trade-offs and limitations of what you can do with such a small number of runs and large number of factors and levels. You can use something called D-optimal design to find the an optimal design with a constrain in the number of runs given that perhaps you are most interested in the main effects. For example, the table below is from a D-optimal design focusing around the main effects:







In the correlation matrix above one can see that the main effects have been decoupled as much as possible but some interactions are stilled coupled to main effects. This matrix was created to decouple the variables as much as possible given only 6 runs. 



In summary:

1. Don't use the word DOE if you have not followed proper DOE matrix designs and/or used adequate regression modeling to evaluate your results for significant cause and effect

2. Do spend some time comparing DOE matrix design tables to understand the trade-off with number of runs and variable confounding and thus analysis capabilities

3. Be aware of the modeling limitations you may have (linear, second order, main effects, interactions, main effect second order, etc.)

4. Always use your engineering judgement when performing DOE studies. Follow statistical recommendations as long as they make engineering sense for your application

No comments:

Post a Comment