Matrix calculations in Excel
Contents
IntroductionThis tutorial is devoted to Excel which is a popular environment for calculations, analysis, and graphic representation of data. Our goal is to present main Excel features used for data processing. The main attention is concentrated on operation with multivariate data and matrix calculations. A specials section is devoted to the Addins programs which extend Excel possibilities. This text is not a comprehensive Excel manual but only a short introduction. The detailed Excel features are explained elsewhere in numerous books and articles. Information presented in the tutorial refers to the basic properties of Excel 2003 and also Excel 2007. This tutorial is written mainly for Excel 2003 users. Features essential for Excel 2007 are specially underlined in the text. Text is accompanied by the examples presented in workbook Excel.xls. Important information on Excel.xls application Links to the examples are embedded as Excel objects in the tutorial. 1. Basic information1.1. Regional and Language settingsExcel software may be used in different countries and languages. Regional and Language settings change the software appearance, names and syntax of standard worksheet functions. For example, recently, in Russian version, the comma (,) was used as a delimiter for the decimal part of a number, therefore the semicolon (;) played the role of list separator (particularly for the function arguments). Luckily, this is becoming the thing of the past and in scientific calculations the point (.) is widely used as the fraction separator.
Fig. 1 Customizing regional options One can use Fig. 1) for changing and button for changing data appearance (see.Names of standard Excel functions depend on the Language settings chosen for the computer. For example, in English version the summation function is =SUM(A1:A9). The same function in the Russian version appears as follows =СУММ(A1:A9). The list of the English and local function names is presented in file FUNCS.XLS that is ordinary located in folder here. File Excel_Functions.xls contains the names of all Excel function in 16 languages. . is the name of the Excel version, e.g. , and is the number of the local version, e.g. for Russian, and for English. Details may be readIn this tutorial we use the English version of Excel 2003 with the point as the fraction separator and the comma for the list delimiter. 1.2. Workbook, worksheet, and cellAn Excel file with extension XLS (XLSX in Excel 2007) is called a workbook. When starting an Excel program, .e.g. by clicking an icon , on a desktop, a new workbook will be opened.
Fig. 2 New Excel workbook If a workbook exists, you can open it using by clicking on the file icon.
Fig. 3 Opening an Excel workbook with the help of Explorer A workbook consists of several worksheets. Worksheet names are shown in the bottom of the open window. One can delete, add, and rename worksheets. For this purpose rightclick on the Sheet name and select an operation in the popup menu.
Fig. 4 Manipulations with worksheets A standard worksheet name is Sheet1, but you can give it any name you like, e.g. Data. Various objects can be inserted in the sheet, e.g. charts, pictures, etc Each worksheet consists of cells. These cells form a table with 256 columns and 65536 rows (In Excel 2007 there are 16384 columns and 1048576 rows). Rows are labeled numerically as 1, 2, 3…, and columns are labeled alphabetically as A, B, …,Z, AA, AB .., etc till column IV (In Excel 2007 till XFD). This reference style is called . Another reference style is called , in which the columns are also labeled numerically. Latter style is used rarely and we will not employ it below. Details can be read here. Columns and rows can be deleted, added, and hided. It is also possible to change columns and rows height and width. A cell can contain different content such as number, text, or formula. For visualization purposes a cell can be formatted by changing font, color, border, and etc. All Excel operations are executed with the help of the Menu items located at the top of the window. (Fig. 2). Excel 2007 Menu essentially differs from previous versions. A special ribbon substitute habitual icons. We will not explore these differences in details and redirect an interested reader to these instructions. 1.3. AddressingIn Excel, each cell has it own address comprised of the corresponding row and columns headings. For example, the address of the first cell in a worksheet is A1. The cell located at the intersection of the third column and fifth row has address C5. Active cell is indicated by the bold frame and its address is displayed in the window (see Fig. 2). To copy the content a cell (e.g. cell A1) into another cell (e.g. cell F1), the formula = address should be employed in the destination cell (e.g. = A1). Addressing (referencing) can be absolute, relative, or mixed. For example, the first cell has absolute address $A$1, relative address A1, and two mixed addresses $A1 and A$1. Differences in the addressing types are manifested when the cell formula is copied or moved into another cell. The following example illustrates this issue.
Fig. 5 Absolute and relative addressing Fragment of the worksheet with data highlighted in yellow is presented in the top panel of Fig. 5. Different types of references to cell A1 (orange cell) are presented in the green regions (column F and row 6). Reference type is indicated near each cell. Let's copy (one by one) the cells from the green region and past them in columns G and H and rows 7 and 8 (Fig. 5, middle panel). One can see that the result depends on the reference method. The reference to cell A1 is not changed for the absolute reference. For relative addressing, the reference shifts right and down keeping the relative distance between the cells. For mixed addressing, the results depend on the location of the cell with reference and on the fixed (invariant) part of the reference indicated by symbol $. Bottom panel shows the results after formula replications. For addressing a cell in another worksheet of the same workbook, the reference should include the worksheet name, e.g. Data!B2. Exclamation point (!) separates a worksheet name from a cell address. If a worksheet name includes blank, the name should be enclosed in the single quotes, e.g. 'Raw Spectra'!C6. For referencing the cell in another workbook, the workbook name is enclosed in the brackets, e.g. [Other.xls]Results!P24. More about the methods of addressing is written here. 1.4. RangeA matrix occupies a range (i.e. a set of cells) in a worksheet. Fig. 6 illustrates the matrix with 9 rows (from row 2 till row 10) and 3 columns (from B till D). Fig. 6 A range on a worksheet The addresses of the upper left and lower right cells jointed by the semicolon are used for a matrix referencing. For example B2:D10 or $B$2:$D$10. Often it is useful to name a matrix range while operating on the matrices. There are two ways to do this. The most straightforward way is to select the area in a worksheet, then click into the Fig. 2), delete the current address and input a name, e.g. Data (Fig. 6). The other way is to use the menu. window (seeA range name may be global, i.e. accessible from any worksheet in a book, or local. A local name is accessible only for the given worksheet. In the latter case the name should be defined in the following form SheetName!RangeName. See details here 1.5. Simple calculationsFor performing calculations Excel uses different formulas. A formula is started with the equal sign (=) and may include references, operators, constants, and functions. Operators provided the simple arithmetic calculations are presented in Fig. 7
h
Fig. 7 Simple arithmetic operations 1.6. FunctionsA function is a standard formula, which operates using given values, called arguments .Examples of some useful functions are presented in Fig. 8.
Fig. 8 Simple functions A function consists of a name and a list of arguments embraced in parentheses and separated by commas (or by another list separator). For example, function in Fig. 9 calculates the value of cumulative (cumulative=TRUE), standard (mean=0, standard_dev =1) normal distribution for the argument value given in cell A1. Fig. 9 Entering the function by means of There are different ways for entering a formula. The most straightforward way is to type a formula in the Fig. 2). Beforehand it is necessary to make the formula window visible with the help of the menu item. This method is suitable in case you remember the function syntax. A formula may be entered very quickly as arguments are inputted by clicking the cells with the corresponding values. window (see.The second method helps when we do not remember exactly a function name and/or the list of arguments. In this case one can use button Fig. 2). This button opens dialog window (Fig. 10) with the list of accessible functions. (
Fig. 10 Entering function by means of Insert Function, first step As soon as a specific function is selected, the second dialog box appears. This box displays the list of the function arguments.
Fig. 11 Entering specific function by means of Insert Function, second step Details may be studied here. 1.7. Some important functionsExcel includes numerous standard worksheet functions, which cannot be outlined here. Only functions that are repeatedly used in chemometric applications will be considered. SUM Summarizes all values in the argument list, or in the range, and returns the sum. Syntax: SUM(number1 [,number2] [,...]) Fig. 12 Function SUM Returns the sum of the squares of the arguments, or the cells in the range. Syntax: SUMSQ(number1 [,number2] [,...]) Fig. 13 Function SUMSQ SUMPRODUCT Pairwise multiplies the corresponding components in the given ranges, and returns the sum of those products. Syntax: SUMPRODUCT (array1, array2, ...) Fig. 14 Function SUMPRODUCT Returns the arithmetic mean of the arguments, or the cell values in the range. Syntax: AVERAGE(number1 [,number2] [,...]) Fig. 15 Function AVERAGE VAR Estimates variance based on a sample . Syntax: VAR(number1 ,number2, ...) Fig. 16 Function VAR Estimates standard deviation based on a sample . Syntax: STDEV(number1 ,number2, ...) Fig. 17 Function STDEV CORREL Returns the correlation coefficient between the cell ranges array1 and array2. Syntax: CORREL(array1, array2) Fig. 18 Function CORREL Functions may be combined in one formula, see example in Fig. 19. Fig. 19 Composite formula 1.8. Errors in formulasErrors may occur on typing a formula and further in the course of a worksheet modification. In such a case the cell does not contain an expected result but the special symbols indicating an error of different type. Description of error types is presented in Fig. 20. Fig. 20 Errors in formulas Menu item in the menu helps to reveal the source of an error.Details regarding different errors may be read here. 1.9. Dragging formulasOften, while working with matrices, there is a necessity to input not one but a range of formulae. For example, performing SNV preprocessing of spectral data it is necessary to calculate the mean and standard deviation values for each row. It would be tiresome to input the same formula repeatedly with changes only in one argument even for a rather small example as in Fig. 22. In a real dataset the number of rows may be about several thousands. Fortunately Excel has a special dragging mechanism for such operations. Let us explain the dragging technique with the following example. We start with typing formula in cell J3.
Fig. 21 Fill handle Formulas to the adjacent cells may be entered with the help of the fill handle, a small black point in the bottom righthand corner of the selection. This handle changes to the black cross on cursor hit. Afterwards the active cell may be dragged to the adjustment region. Dragging direction may be vertical (down), as in Fig. 22, and horizontal (right). Fig. 22 Formula dragging A formula may be duplicated in the other way. On the first step we copy the cell with a formula. Then we select a range of the cells where this formula should be entered. Afterwards we use operation with option selected.Fig. 23 Duplicating a set of similar formulas It may be seen that regardless of the duplicate method we yield a proper references in the function arguments. This is due to utilizing the relative references, B3:I3, in the formula arguments. Details may be read here. 1.10. Create a chartThe charts of different types can be created in Excel. Two types are of the most importance for use. They are the scatter and line plots. Fig. 24 presents an example of the scatter. Fig. 24 Scatter chart Such charts are used for presenting of the scores plots, predicted versus measured dependences, and etc. There is an essential difference between the Scatter and Line charts. The Scatter chart has two peer value axes. In the Line chart xaxis is used only as the category axes in which just the order of categories is displayed but not the scaled numeric values. Therefore the line charts are suitable for presenting dependences on the number of principal components, e.g. diagrams of RMSEC and RMSEP versus model complexity. Chart creating is different in versions Excel 2003 and 2007. We will not stop on this issue and leave it for the selfinstruction study. 2. Matrix operations2.1. Array formulasMany matrix operations are performed with the help of special, socalled array functions. The result of calculation of such a function is not a single value (number) but a set of numbers, e.g. array, even it has a single value. Array formulas are created in the same way as any other Excel formula with one difference. To enter an array formula it is necessary to press for its completion.Let us explain the application of array formulas with a simple example. Suppose we need to perform autoscaling (columnwise centering and scaling) for the data in matrix X. For this purpose the mean values m_{j} and the standard deviation values s_{j} should be calculated for each column j of matrix X. Afterwards, we should subtract value m_{j} from each column and divide the result by s_{j}
The autoscaling procedure may be performed by ordinary (not array) Excel functions as it is shown in Fig. 25 . Fig. 25 Ordinary formulas In this case it is important to mind symbol $ before the rows with number 9 (m) and 11 (s), to fix the position of the corresponding values in the formulas, For large X matrix it is convenient to apply the array formula. (Fig. 26). Let us name by X, m and s the corresponding ranges in the worksheet. At first we select the empty rangy N3:R7 that has the size of anticipated result. Then we enter formula =(Xm)/s in the . Finally we terminate operation by pressing . If all these actions were performed properly, we yield { =(Xm)/s} in the . Braces, {}, indicate the array formula. Fig. 26 Array formula 2.2. Create and edit array formulasTo enter an array formula properly it is necessary to select a region, which size corresponds to the size of the expected result. In case the selected area is larger, the redundant cells are filled with error symbol #N/A after calculation. If the selected region is smaller, the part of the result values is missed. After selection of the resulting region, a formula is placed in the and the whole operation is terminated by pressing . Alternatively, you may input a formula in one cell, then select the whole region with the cell containing the formula as the first in the area (right and down), switch to the , and press .For changing an array formula, it is necessary to select the whole range with the formula result and switch to the . Braces that frame the array formula will disappear. After that one can correct the formula and press to confirm the result.For extending the output region of an array formula, it is sufficient to select a new region, then go to the #N/A symbols. First of all it is recommended to select one cell from the resulting region, switch to the , and copy the formula pressing . Then you should clear the old resulting region and select a smaller one. Afterwards you should switch to the , paste the formula pressing , and terminate with . , and press . It is a little bit more complicated to reduce the resulting region, i.e. to delete extra cells that contain theChanges in the cells inside the resulting region are forbidden. On attempting to change the content of such a cell the following warning box is displayed .
Fig. 27 Warning regarding the forbidden operations with array formula 2.3. Simplest matrix operationsIt is easy to sum up matrices or multiply matrix by a number with the help of array functions in Excel. Fig. 28 Matrices addition and multiplication by a number For matrices multiplication array function MMULT is used. 2.4. Access to the part of a matrixTwo standard worksheet functions are useful for access and manipulation with a part of a matrix. Returns a reference to a region that is specified by number of rows and columns from a cell or a range of cells. Syntax OFFSET(reference, rows, cols [, height] [, width])
Remarks
Fig. 29 Function OFFSET OFFSET is an array function and must be terminated by . Returns the values in an array selected by indices of rows and columns. Syntax INDEX (reference [, row_num] [, column_num])
Remarks
Fig. 30 Function INDEX INDEX is an array function which must be terminated by . 2.5. Unary operationsThe following unary operations can be performed on matrices. Returns the inverse matrix. Syntax MINVERSE (array)
Remarks
Fig. 31 Function MINVERSE MINVERSE is an array function and must be terminated by . Returns the transposed matrix. Syntax TRANSPOSE (array)
Fig. 32 Function TRANSPOSE TRANSPOSE is an array function and must be terminated by . Returns the matrix determinant. Syntax MDETERM (array)
Remarks
MDETERM is not an array function and must be entered by pressing 2.6. Binary operationsThe following binary operations can be performed on matrices. Return the results of multiplication of two matrices. Syntax MMULT (array1, array2)
Remarks
Fig. 33 Function MMULT MMULT is an array function and must be terminated by . 2.7. RegressionSeveral standard worksheet functions involve linear regression . Builds the linear regression y=b+m_{1} x_{1}+…+m_{J} x_{J}+e Approximates known response values of vector known_y's for given predictor matrix known_x's and returns response values for given matrix new_x's. Syntax TREND(known_y's [,known_x's] [,new_x's] [,const])
Remarks
Fig. 34 Function TREND TREND is an array function and must be terminated by . LINEST Calculates statistics for the linear regression y=b+m_{1} x_{1}+…+m_{J} x_{J}+e Syntax LINEST(known_y's [,known_x's] [,new_x's] [,const] [,stats])
Fig. 35 Table provided by LINEST function m_{J}, …, m_{2}, m_{1} and b are the estimates of the regression parameters; s_{J}, …, s_{2}, s_{1} and s_{b} are the standard errors for the regression parameters estimates; R^{2} is the coefficient of determination; s_{y} is the standard error for the y estimate; F is the Fstatistics; DoF is the number of the degrees of freedom; SS_{reg} is the regression sum of squares; SS_{res} is the residual sum of squares. Remarks
Fig. 36 Function LINEST LINEST is an array function and must be terminated by . 2.8. Critical bug in Excel 2003Functions TERND and LINEST in Excel 2003 provide wrong results in special circumstances. Wrong results are yielded when simultaneously
Fig. 37 demonstrates such a case. Mean values for all columns of X_{c} equal zero but mean value for vector Y_{c} does not equal zero. Fig. 37 Bug in regression functions in Excel 2003 The situation may be corrected by a special trick. You can apply TREND function to the centered response values and afterwards correct the result. For this purpose the following formula =TREND(Ycym, Xc)+ym, is used (see the same figure). Strange that this bug was not mentioned by users; however in version Excel 2007 this error is corrected. 2.9. Virtual arrayOften in the course of data processing there is a problem of storing the intermediate computations, which are not important by themselves, but have to be calculated for yielding the final result. For example, residuals in the PCA decomposition are rarely analyzed per se, but used for calculation of the explained variance, orthogonal distances, and etc. At the same time, such intermediate arrays may be very large and must be calculated for the various numbers of principal components. They cause flooding the worksheets by the unnecessary, intermediate information. One can avoid this situation applying virtual arrays. Let us explain this with a simple example. Fig. 38 Example of application of a virtual array Suppose that we have matrix A and we want to calculate determinate of matrix A^{t}A. Fig. 38 demonstrated two methods of calculations. The first method uses a sequence of intermediate calculations indicated by the red arrows. The second method uses only one formula indicated by green arrow. Both methods provide the same final result, but the first one occupies a lot of space in the worksheet, and the latter one uses several virtual intermediate arrays. All these virtual arrays are the same which are calculated by the first method, but they are not output explicitly. The first virtual array is the transposed matrix A^{t} calculated by function TRANSPOSE(A). The second virtual array is the result of multiplication of the first virtual array and matrix A using function MMULT(TRANSPOSE(A), A). Function MDETERM applied to the second virtual array gives the final result. Virtual arrays are very useful for calculation of various auxiliary characteristics in multivariate data analysis, such as residuals, eigenvalues, leverages, etc. This is explained in details in tutorial Extension of Chemometrics AddIn 3. Extension of Excel facilities3.1. VBA programmingSometimes, the Excel facilities are not enough and we have to add some userdefined features to Excel. A specially designed programming tool, Microsoft Visual Basic for Applications (VBA) is used for these purposes. VBA helps to create macros (a special set of instruction for performing a sequence of operation) and userdefined functions (a special set of instructions for performing calculations in a worksheet). Macros are used for automation of standard procedures. Once a macro is created it can be repeatedly used for a routine operation. To run a macro from menu, select the item. Sometimes it is more convenient to assign a macro to a special new button, placed in the item, or in a worksheet.Userdefined functions are run in the same way as the standard Excel worksheet functions via .To make the macros and userdefined functions accessible it is necessary to set up a special level of security via menu item (Excel 2003)
Fig. 39 Selection of security level in Excel 2003 For Excel 2007 the selection of the security level is performed via Fig. 40 Selection of the security level in Excel 2007 On each opening of an Excel file, the system will ask the permission for running macros, if level Fig. 39 and Fig. 40, but mind the reliable antivirus software for testing the alien Excel files. (Excel 2003) or (Excel 2007) has been selected. We recommend selection of a security level as it is shown inThe capabilities of the VBA application are essentially limited on the initial installation of Excel 2007. To restore these properties you should follow the sequence and switch on option .Details can be read here. 3.2. ExampleLet us consider the application of VBA by an example. For modeling the nonisothermic kinetics (DSC, TGA, etc.) it is necessary to calculate the integral exponential function E_{1}(x). By definition,
An infinite series can be used for E_{1}(x) calculation
Each series term can be placed in its own cell in a worksheet. Afterwards one can sum up all these cells. This approach is presented in Fig. 41. Fig. 41 Calculation of the E_{1}(x) function in a worksheet We cannot consider this approach as a wise one. First of all, these calculations occupy rather large area in the worksheet. But the main drawback is that we do not know in advance how many terms should be included in the summation. Sometimes 10 terms are sufficient, but in other cases 50 terms are not enough. 3.3. Macro exampleThe second way of calculation is application of a recurrent formula that links two neighboring terms of the series
For application of this formula it is necessary to manage the recurrence calculations in a worksheet. The example is shown in Fig. 42. Fig. 42 Calculation of the E_{1}(x) function by recurrence method One iteration step transfers the values from range J2:J4 into range L2:L4. To start the next iteration step it is necessary to copy the values from range L2:L4 and paste them into range J2:J4. This copypaste operation should insert only values without formulas, i.e. must be used. Range H2:H4 contains initial values for starting the iteration process. Numerously repeating the operation, one can yield the target value in cell L4, though this is boring. It is better to make a macro for the procedure automation. The simplest way for designing a macro is recording the operations performed in a worksheet. Go to the menu item . In the dialog window (Fig. 43) one can name a macro and point the place where this macro will be stored.
Fig. 43 Dialog window for the macro recording After the Visual Basic editor. button is pressed, all operations performed in a worksheet are recorded in the macro procedure. When recording is over, it must be terminated by the command. The result may be seen in the
Fig. 44 Visual Basic editor Fig. 44 illustrates the macro after the minor correction. We added a cycle for repeating the nIter times. Value nIter is determined in cell J6 in the worksheet. Cell J6 has a local name n. For finalizing of the automation, we use button with macro Iteration assigned to this button. operation forMore about VBA macros can be read here. 3.4. Userdefined function exampleAt last, the wisest decision for calculation of the integral exponential function is the designing of the userdefined function with the help of VBA
Fig. 45 IntExp function Fig. 45 illustrates the VBA code and the function reference for this example. We will not concern the VBA programming as this is a very large and complex issue. For selfstudy one can read a numerous literature elsewhere in the internet, for example here. VBA macros are rather slow and they badly fit for the large calculations. For example, we do not recommend programming the PCA decomposition procedure in VBA. For the large arrays such a procedure will work very slowly. Better to consider Excel and VBA as a front end tool for input and output of data, which are further passed to the dynamic link library (DLL), coded in a fast programming language, such as C++ (back end). This specific approach was used in Fitter and Chemometrics AddIn. More about the userdefined functions can be read here. 3.5. AddinThe VBA programs created by a user are stored in the same Excel workbook in which they were created. To make the macros available for other workbooks one can either copy the macros into these workbooks, or create an AddIn application. AddIn is a special Excel file containing several VBA modules linked to the required DLL libraries. AddIn may be added to Excel to extend its capabilities. Standard Excel configuration includes several AddIns. The most interesting among them are: Solver and Analysis Toolpak AddIns. Solver is designed to optimize a value in the objective cell. The result is calculated with respect to other cells functionally linked to the objective one. Analysis Toolpak contains a set of statistical functions for data analysis There are various Excel AddIns in the Internet. Some of them may be downloaded for free and others are commercial. Bellow are some of these programs. The JWalk Chart Tools AddIn is a free utility for the chart management (add the legend, names, etc. to the chart elements); all the features that are absent in the standard Excel package XLStat is a large and expensive package for statistical analysis, including PLS regression The Multivariate Analysis AddIn is a shareware package for the multivariate data analysis from Bristol University Fitter is an AddIn for nonlinear regression analysis Chemometrics is a set of worksheet functions for multivariate data analysis More about the AddIns can be read here. 3.6. Addin installationBefore application of an AddIn program one must perform an installation consisting of two steps. At the first step all files comprised the AddIn package must be placed in the computer. Some AddIns provide a special Setup program for automatic installation. In other cases, the files should be located by a user. We will explain how this should be performed. A package always includes a file with extension XLA and several additional files with extension DLL, HLP, and etc. All these additional files should be placed in folder , or , or .The main XLA file may be located in any folder. But two following folders are preferable. Microsoft recommends that XLA file should be placed in folder update the links continuously. (here is the login name under which the Windows is currently running on). Placing XLA file in this directory provide an easy access to the file at the second step of installation. At the same time when the workbooks are run on several computers under the different names, the links to the XLA files are lost. Therefore, it is necessary toWe recommend to place file Chemometrics.xla in a folder that has the same name at different computers, e.g. Chemometrics AddIn is given here. . Automated installation ofThe second step of installation is performed from an open Excel workbook. For Excel 2003 it is necessary to go to , and in Excel 2007 to . In the open window (see Fig. 46) you must press and find a pertinent XLA file.
Fig. 46 Addin installation After an AddIn has been installed, it may be loaded and unloaded by selecting the corresponding option in front of the name.To uninstall an AddIn program, it is necessary to delete a check near its in name, close Excel, and delete from the computer all files related to this AddIn. ConclusionWe consider only the main features of application of Excel for matrix manipulations. A lot of details left out of this textbook. Tutorial called Projection methods in Excel may partly fill this gap.
