Why use a Design-of-Experiment (DOE) Matrix

The word Design-of-Experiment, or DOE, has become very popular among engineers. However, often times this word is misused and misunderstood. The principles of DOE are based on the capabilities and limitations of the analysis tools that will be used to process the data and determine cause and effect. Sometimes proper DOE matrix are followed but adequate analysis is not performed. Other times, a proper DOE matrix is not followed but yet we try to use analysis methods that would have required a proper DOE matrix. Even more confusing is when we use the word DOE when neither a DOE matrix or analysis method was followed. 

First, it is important to acknowledge that DOE results are intended to be analyzed using regression analysis and something called "response surface method" or RSM. These methods rely on using a proper DOE matrix to help with variable independence or orthogonality and statistical significance. 

Let's take a look at the following example. Say you have 4 variables in a process (A, B, C and D) and you want to understand their impact in your process outcome. You conduct the following study where -1, 0, and 1 represent each variable min, middle, and max value:


You may be tempted to say that you have conducted a DOE. However, the table above has not followed any proper DOE matrix design. Even with the table above, it will be difficult to understand the effect of each variable (main effects) since some variables are strongly correlated. For example, let's take a look at the relationship between Parameter A and Parameter B:


Note from the graph above that the coefficient of determination, R^2, is quite high, indicating that Parameter A and Parameter B are confounded. When parameter A goes up, Parameter B also often goes up. It will be difficult for regression analysis to decouple the effect of parameter A vs. Parameter B. If you created a table of R^2 for all the variables, it will look like this (correlation table):




In the table above, you expect to have "1" along the diagonal, but you want to minimize the non-diagonal terms (ideally they would be zero). If you are a MATLAB user, you can use the function "corrplot(X)" to graphically obtain the table above or "corrcoef(X)" to get the table in a variable. Please note that MATLAB will plot R and not R^2. You can also use the DOE Diagnostic, Evaluate design option in JMP to do the same. Other programs like R and Python have similar functionality. Some programs plot R instead of R^2 but the essence of the message is the same.

The study matrix above also present troubles if there are significant variable interactions. For example, let's take a look at the correlation between Parameter A*C Interaction vs. and Parameter A*D Interaction:




Note that the interaction between Parameter A*C and Parameter A*D are also strongly correlated. It will be difficult for the statistical tools to tell the difference between these two interaction effects. The complete correlation table (these are R^2 Values) including 2-level interactions and square terms is shown below. By exploring the table below you will see that there is a lot of variable confounding in this study.




Ideally, if you have 4 parameters at 3 Levels, you would want to do a Full Factorial (FF) DOE which will result in 3^4 = 81 runs (#Levels^#Factors = # runs). If you did this, you would have perfect decoupling and the DOE would be perfectly orthogonal.  The correlation matrix (now in absolute value of R) would look something like this with 1 along the diagonal and zero everywhere else. This design is ideal to find all the main effects and interactions as well as squared terms.




However, 81 runs is a lot of runs. You may perhaps be limited to a much smaller number of runs. Here is where you can choose some other DOE matrix designs such as a central composite design (CCD). A CCD design will look like this having a total of 26 runs.


The corresponding correlation plot will look like this. Note that only the square terms (X1^2, X2^2, X3^2, and X4^2) are confounded but not the first level interactions.



You may still be limited to a much smaller number of runs. After all, the original study only had 6 runs. Here is where one must understand the trade-offs and limitations of what you can do with such a small number of runs and large number of factors and levels. You can use something called D-optimal design to find the an optimal design with a constrain in the number of runs given that perhaps you are most interested in the main effects. For example, the table below is from a D-optimal design focusing around the main effects:







In the correlation matrix above one can see that the main effects have been decoupled as much as possible but some interactions are stilled coupled to main effects. This matrix was created to decouple the variables as much as possible given only 6 runs. 



In summary:

1. Don't use the word DOE if you have not followed proper DOE matrix designs and/or used adequate regression modeling to evaluate your results for significant cause and effect

2. Do spend some time comparing DOE matrix design tables to understand the trade-off with number of runs and variable confounding and thus analysis capabilities

3. Be aware of the modeling limitations you may have (linear, second order, main effects, interactions, main effect second order, etc.)

4. Always use your engineering judgement when performing DOE studies. Follow statistical recommendations as long as they make engineering sense for your application

Monte Carlo Analysis: Right Triangle

Here is a very simple problem. Assume you want to build a right triangle of known base (10) and height (3) dimension. However, the sigma tolerance of these dimensions are 1% of its value. Your goal is to find the variability that you will obtain for the dimension of the hypotenuse and the largest of the two complementary angles. It is straight forward to do some simple math to figure out the nominal value for the two variables of interest. This is shown in the figure below:


Ruander Right Triangle Monte Carlo




Note that the variables of interest are not linear with respect to the known variables. Therefore, it is not possible to simply add the variances of the two known dimensions. It is possible to use the principles of propagation of uncertainty to propagate the known variation through the non-linear equations. This exercise will be left to the reader. Here, I will show how you can quickly use my Excel Monte Carlo Workbook to solve this problem. First, download the free version of this workbook or if you wish, you can also purchase the non free version here.  Set up the problem with 2 input variables of normal distribution with the given mean and sigma values. The inputs will look like this: 

Monte Carlo Excel



After setting up the problem, simply  run a large enough number of simulations by entering the number of simulations in cell F6 and then click the "2. Simulate" bottom. I recommend running at least 500K simulations. For this example, I will run 1M simulations. After the simulation is completed, just type in the formula to compute the hypotenuse and the angle as shown in the first figure above. Drag this equation down for all the simulations performed. 



Finally, simply compute the average and standard deviation for the variables of interest. The results are the following:




As mentioned earlier, if you follow the principles of uncertainty propagation using partial derivatives, you will arrive at this same result (Rule No. 5 here). This is shown in the table above under the Analytical column. Using the Monte Carlo Excel you can also visualize the distribution of these two variables.



I hope this example is useful to help identify other relevant applications where a Monte Carlo Analysis can help you understand variation of non-linear problems.

Monte Carlo: Roll 2 Dice

Sometime ago I posted about the Sum of Two Dice. In that post, I explained the math behind it and I also performed a Monte Carlo Simulation in R. I would like to go back to this problem to show how my free Monte Carlo Simulation Excel file can be used in this situation.

Assuming a fair die, each die has six different possible output values (1, 2, 3, 4, 5, and 6) and the probability of any of these values should be equal (1/6). The probability function that describes this behavior is the uniform discrete probability function with a lower bound of 1 and an upper bound of 6. The reason that this is a discrete probability is that in between values that are not integers (such as 1.5) are not an option.

Using my Monte Carlo Simulation Excel file to simulate the sum of two dice is very straight forward. First, click "0. Reset"  in the Simulator sheet to make sure you are starting with a fresh and clean sheet. Indicate that the number of inputs is 2 by typing this value in cell F5. Remember, each input will represent one die. Now, click on "1. Define Inputs" and select the Uniform_Discrete probability function for both inputs with 1 as the lower bound (parameter 1) and 6 as the upper bound (parameter 2). Leave the other parameters empty. Your Simulator sheet should look like this:
Now indicate the number of simulations to perform by typing the number of simulations in cell F6. For this example, I will perform 1 million simulations knowing that this large number of simulations will likely take some time to complete so I will have to be patient. It took ~30 mins in a very slow computer on battery. Note that if I would have done 100K simulations it would have only taken about 3 minutes (time is approximately linear with number of simulations). Faster computers will finish faster.

Once the simulation is completed, there are a couple of things to do. First, in the "Y=" column, you will want to type the equation which is basically the sum of die 1 and die 2. This should look like this:


First, let's explore the output of each die. My Monte Carlo Simulation Excel file automatically outputs a histogram of your X1 variable if you have excel 2016 or higher. However, for discrete outcomes, I like making my own using the Excel Function COUNTIF. Doing this I obtained very similar outputs for both X1 and X2. Below I show the output for X1. Note that the number of occurrences for each number is about 1/6 of the total as expected.


In the same way, we can plot the histogram of our Y= X1 + X2. Note that in accordance with my previous post, the number 7 occurred approximately 16.6667% of the time. It is pretty interesting how the sum of two variables that have a uniform discrete probability function results in an output variable with a discrete triangular distribution function.



Now that we have looked into the sum of 2 dice, how about the sum of 3 dice, or 7 dice? Can you use my Monte Carlo Simulation Excel file to do this with more than 2 dice?

Monte Carlo Simulation Excel


If you are looking for a FREE excel calculator to perform Monte Carlo Simulations you have arrived at the right place. Just download my FREE Excel Monte Carlo Simulator and start doing Monte Carlo simulations right in your already familiar Excel Environment.  I will provide a short intro below, but feel free to download my Excel Monte Carlo Simulator to get started right away.

ruander.com


If you are searching the Word Wide Web for “Monte Carlo Excel” you probably already know or have a general idea of how Monte Carlo Simulations work. The user typically has one or more variables with known probability distributions as well as a transfer function (basically an equation) to translate these input variables into the desired output variable. The goal in a Monte Carlo Simulation is to randomly retrieve a value for each of the input variables according to their probability distribution. Do this over and over multiple times (1k, 100k, 500k, 1 million times, and so on) and you can now understand what your output distribution (and statistics) will look like in the limit as the number of iterations go to infinity.

Monte Carlo Simulations can be performed in a number of different software tools including Python, R, MATLAB, JMP, and many more. However, from my engineering experience, there is a broad audience of Excel users who would like to do this right within the familiar excel environment and without the need to code.  It is for this reason that I created my Free Excel Monte Carlo Simulator. This is a simple to use Excel Workbook where the user can perform Monte Carlo Simulations and use the results in any way they wish within Excel.

Disclaimer:

Every time a user runs a simulation using my Free Excel Monte Carlo Simulator, this website will open in their browser. I have done this to help support the development of tools just like this. However, if you would prefer for this not to occur, I have also included a for-purchase version here that will not open this website upon performing a simulation.

For Purchase:

Free:





Excel Gage R&R Macro

A Gage R&R is a measurement system capability study whose purpose is to assess how much variation is associated with a measurement system and to compare it to the total process variation or tolerance.  The gage variation is divided into two main groups: Repeatability and Reproducibility, thus the name Gage R&R.  Repeatability is the variation obtained by the same person using the same instrument on the same product for repeated measurements (within operator). Reproducibility is the variation obtained due to differences in people who are taking the measurements (between operators). There are multiple software tools that would run a Gage R&R analysis including QuantumXL, Minitab, JMP, MATLAB, R, and many others. However, it has been my experience that often engineers do not feel comfortable with the results due to a lack of clarity on how the variance components were computed. While it is possible to perform a Gage R&R in a tool as simple as Excel using ANOVA, these calculations are numerous and tedious. It is for this reason that I created an Excel Gage R&R Macro, which does not only compute the variance components but also shows the math used to compute them within the excel file. These calculations can be used by engineers to get more familiar with the variance component calculations.



This link will take you to my personal one drive where you can download the file "Anova_GageR&R_Rev0"