Matrix calculations in Excel

© 2010 Alexey Pomerantsev, Oxana Rodionova

Contents

Introduction
1. Basic information
1.1. Regional and Language settings
1.2. Workbook, worksheet, cell
1.3. Addressing
1.4. Range
1.5. Simple calculations
1.6. Functions
1.7. Some important functions
1.8. Errors in formulas
1.9. Dragging formulas
1.10. Create a chart
2. Matrix operations
2.1. Array formulas
2.2. Create and edit array formulas
2.3. Simplest matrix operations
2.4. Access to the part of a matrix
2.5. Unary operations
2.6. Binary operations
2.7. Regression
2.8. Critical bug in Excel 2003
2.9. Virtual array
3. Extension of Excel facilities
3.1. VBA programming
3.2. Example
3.3. Macro example
3.4. User-defined function example
3.5. Add-Ins
3.6. Add-Ins installation
Conclusion

Introduction

This tutorial is devoted to Excel which is a popular environment for calculations, analysis, and graphic representation of data. Our goal is to present main Excel features used for data processing. The main attention is concentrated on operation with multivariate data and matrix calculations. A specials section is devoted to the Add-ins programs which extend Excel possibilities. This text is not a comprehensive Excel manual but only a short introduction. The detailed Excel features are explained elsewhere in numerous books and articles.

Information presented in the tutorial refers to the basic properties of Excel 2003 and also Excel 2007. This tutorial is written mainly for Excel 2003 users. Features essential for Excel 2007 are specially underlined in the text.

Text is accompanied by the examples presented in workbook Excel.xls.

Important information on Excel.xls application

Links to the examples are embedded as Excel objects in the tutorial.

Chemometrics tutorials

Contents

1. Basic information 

1.1. Regional and Language settings

Excel software may be used in different countries and languages. Regional and Language settings change the software appearance, names and syntax of standard worksheet functions. For example, recently, in Russian version, the comma (,) was used as a delimiter for the decimal part of a number, therefore the semicolon (;) played the role of list separator (particularly for the function arguments). Luckily, this is becoming the thing of the past and in scientific calculations the point (.) is widely used as the fraction separator.

Fig. 1 Customizing regional options

One can use Control Panel for changing Regional and Language Options and Customize button for changing data appearance (see. Fig. 1)

Names of standard Excel functions depend on the Language settings chosen for the computer. For example, in English version the summation function is

 =SUM(A1:A9).

The same function in the Russian version appears as follows

=СУММ(A1:A9).

The list of the English and local function names is presented in file FUNCS.XLS that is ordinary located in folder C:\Program Files\Microsoft Office\OfficeVer\Local. OfficeVer is the name of the Excel version, e.g. Office 11, and Local is the number of the local version, e.g. 1049 for Russian, and 1033 for English. Details may be read here. File Excel_Functions.xls contains the names of all Excel function in 16 languages.

In this tutorial we use the English version of Excel 2003 with the point as the fraction separator and the comma for the list delimiter.

Contents

1.2. Workbook, worksheet, and cell

An Excel file with extension XLS (XLSX in Excel 2007) is called a workbook. When starting an Excel program, .e.g. by clicking an icon , on a desktop, a new workbook will be opened. 

Fig. 2 New Excel workbook

If a workbook exists, you can open it using Explorer by clicking on the file icon.  

Fig. 3 Opening an Excel workbook with the help of Explorer

A workbook consists of several worksheets. Worksheet names are shown in the bottom of the open window. One can delete, add, and rename worksheets. For this purpose right-click on the Sheet name and select an operation in the pop-up menu.

Fig. 4 Manipulations with worksheets

A standard worksheet name is Sheet1, but you can give it any name you like, e.g. Data. Various objects can be inserted in the sheet, e.g. charts, pictures, etc

Each worksheet consists of cells. These cells form a table with 256 columns and 65536 rows (In Excel 2007 there are 16384 columns and 1048576 rows). Rows are labeled numerically as 1, 2, 3…, and columns are labeled alphabetically as A, B, …,Z, AA, AB .., etc till column  IV (In Excel 2007 till XFD). This reference style is called A1. Another reference style is called R1C1, in which the columns are also labeled numerically. Latter style is used rarely and we will not employ it below. Details can be read here.

Columns and rows can be deleted, added, and hided. It is also possible to change columns and rows height and width.

A cell can contain different content such as number, text, or formula. For visualization purposes a cell can be formatted by changing font, color, border, and etc.

All Excel operations are executed with the help of the Menu items located at the top of the window. (Fig. 2). Excel 2007 Menu essentially differs from previous versions. A special ribbon substitute habitual icons. We will not explore these differences in details and redirect an interested reader to these instructions.

Contents

1.3. Addressing

In Excel, each cell has it own address comprised of the corresponding row and columns headings. For example, the address of the first cell in a worksheet is A1. The cell located at the intersection of the third column and fifth row has address C5. Active cell is indicated by the bold frame and its address is displayed in the Name Box window (see Fig. 2). To copy the content a cell (e.g. cell A1) into another cell (e.g. cell F1),  the formula = address should be employed in the destination cell (e.g. = A1). Addressing (referencing) can be absolute, relative, or mixed.

For example, the first cell has absolute address $A$1, relative address A1, and two mixed addresses $A1 and A$1. Differences in the addressing types are manifested when the cell formula is copied or moved into another cell. The following example illustrates this issue.

Fig. 5 Absolute and relative addressing

Fragment of the worksheet with data highlighted in yellow is presented in the top panel of Fig. 5. Different types of references to cell A1 (orange cell) are presented in the green regions (column F and row 6). Reference type is indicated near each cell. Let's copy (one by one) the cells from the green region and past them in columns G and H and rows 7 and 8 (Fig. 5, middle panel). One can see that the result depends on the reference method. The reference to cell A1 is not changed for the absolute reference. For relative addressing, the reference shifts right and down keeping the relative distance between the cells. For mixed addressing, the results depend on the location of the cell with reference and on the fixed (invariant) part of the reference indicated by symbol $. Bottom panel shows the results after formula replications.

For addressing a cell in another worksheet of the same workbook, the reference should include the worksheet name, e.g. Data!B2. Exclamation point (!) separates a worksheet name from a cell address. If a worksheet name includes blank, the name should be enclosed in the single quotes, e.g. 'Raw Spectra'!C6. For referencing the cell in another workbook, the workbook name is enclosed in the brackets, e.g. [Other.xls]Results!P24.

More about the methods of addressing is written here.

Contents

1.4. Range

A matrix occupies a range (i.e. a set of cells) in a worksheet. Fig. 6 illustrates the matrix with 9 rows (from row 2 till row 10) and 3 columns (from B till D).  

Fig. 6 A range on a worksheet

The addresses of the upper left and lower right cells jointed by the semicolon are used for a matrix referencing. For example B2:D10 or $B$2:$D$10.

Often it is useful to name a matrix range while operating on the matrices. There are two ways to do this. The most straightforward way is to select the area in a worksheet, then click into the Name Box window (see Fig. 2), delete the current address and input a name, e.g. Data (Fig. 6). The other way is to use the Insert-Name-Define menu.

A range name may be global, i.e. accessible from any worksheet in a book, or local. A local name is accessible only for the given worksheet. In the latter case the name should be defined in the following form SheetName!RangeName.

See details here  

Contents

1.5. Simple calculations

For performing calculations Excel uses different formulas. A formula is started with the equal sign (=) and may include references, operators, constants, and functions.

Operators provided the simple arithmetic calculations are presented in Fig. 7    

h

Fig. 7 Simple arithmetic operations

More about this topic. 

Contents

1.6. Functions

A function is a standard formula, which operates using given values, called arguments .Examples of some useful functions are presented in Fig. 8.


Fig. 8 Simple functions

A function consists of a name and a list of arguments embraced in parentheses and separated by commas (or by another list separator).

For example, function in Fig. 9 calculates the value of cumulative (cumulative=TRUE), standard (mean=0, standard_dev =1) normal distribution for the argument value given in cell A1.

Fig. 9 Entering the function by means of Formula Bar

There are different ways for entering a formula. The most straightforward way is to type a formula in the Formula Bar window (see. Fig. 2). Beforehand it is necessary to make the formula window visible with the help of the View-Formula Bar menu item. This method is suitable in case you remember the function syntax. A formula may be entered very quickly as arguments are inputted by clicking the cells with the corresponding values.

The second method helps when we do not remember exactly a function name and/or the list of arguments. In this case one can use button Insert Function  (Fig. 2). This button opens dialog window (Fig. 10) with the list of accessible functions.

 

Fig. 10 Entering function by means of Insert Function, first step

As soon as a specific function is selected, the second dialog box appears. This box displays the list of the function arguments. 

Fig. 11 Entering specific function by means of Insert Function, second step

Details may be studied here

Contents

1.7. Some important functions

Excel includes numerous standard worksheet functions, which cannot be outlined here. Only functions that are repeatedly used in chemometric applications will be considered.

SUM

Summarizes all values in the argument list, or in the range, and returns the sum.

Syntax:

SUM(number1 [,number2] [,...]) 

Example

Fig. 12 Function SUM

Description

SUMSQ

Returns the sum of the squares of the arguments, or the cells in the range.

Syntax:

SUMSQ(number1 [,number2] [,...]) 

Example

Fig. 13 Function SUMSQ

Description

SUMPRODUCT

Pair-wise multiplies the corresponding components in the given ranges, and returns the sum of those products.

Syntax:

SUMPRODUCT (array1, array2, ...)  

Example

 

Fig. 14 Function SUMPRODUCT

Description

AVERAGE

Returns the arithmetic mean of the arguments, or the cell values in the range.

Syntax:

AVERAGE(number1 [,number2] [,...]) 

Example

Fig. 15 Function AVERAGE

Description

VAR

Estimates variance based on a sample .

Syntax:

VAR(number1 ,number2, ...

Example 

Fig. 16 Function VAR

Description

STDEV

Estimates standard deviation based on a sample .

Syntax:

STDEV(number1 ,number2, ...

Example

 

Fig. 17 Function STDEV

Description

CORREL

Returns the correlation coefficient between the cell ranges array1 and array2.

Syntax:

CORREL(array1, array2)

Example

 

Fig. 18 Function CORREL

Description

Functions may be combined in one formula, see example in Fig. 19.

 

Fig. 19 Composite formula

Contents

1.8. Errors in formulas

Errors may occur on typing a formula and further in the course of a worksheet modification. In such a case the cell does not contain an expected result but the special symbols indicating an error of different type. Description of error types is presented in Fig. 20. 

Fig. 20 Errors in formulas

Menu item Formula Auditing in the Tool menu helps to reveal the source of an error.

Details regarding different errors may be read here.

Contents

1.9. Dragging formulas

Often, while working with matrices, there is a necessity to input not one but a range of formulae. For example, performing SNV pre-processing of spectral data it is necessary to calculate the mean and standard deviation values for each row. It would be tiresome to input the same formula repeatedly with changes only in one argument even for a rather small example as in Fig. 22. In a real dataset the number of rows may be about several thousands. Fortunately Excel has a special dragging mechanism for such operations.

Let us explain the dragging technique with the following example. We start with typing formula in cell J3

 

Fig. 21 Fill handle

Formulas to the adjacent cells may be entered with the help of the fill handle, a small black point in the bottom right-hand corner of the selection. This handle changes to the black cross on cursor hit. Afterwards the active cell may be dragged to the adjustment region. Dragging direction may be vertical (down), as in Fig. 22, and horizontal (right).

 

Fig. 22 Formula dragging

A formula may be duplicated in the other way. On the first step we copy the cell with a formula. Then we select a range of the cells where this formula should be entered. Afterwards we use Paste Special operation with Formulas option selected.

 

Fig. 23 Duplicating a set of similar formulas

It may be seen that regardless of the duplicate method we yield a proper references in the function arguments. This is due to utilizing the relative references, B3:I3, in the formula arguments.

Details may be read here.

Contents

1.10. Create a chart

The charts of different types can be created in Excel. Two types are of the most importance for use. They are the scatter and line plots. Fig. 24 presents an example of the scatter.

 

Fig. 24 Scatter chart

Such charts are used for presenting of the scores plots, predicted versus measured dependences, and etc. There is an essential difference between the Scatter and Line charts. The Scatter chart has two peer value axes. In the Line chart x-axis is used only as the category axes in which just the order of categories is displayed but not the scaled numeric values. Therefore the line charts are suitable for presenting dependences on the number of principal components, e.g. diagrams of RMSEC and RMSEP versus model complexity.

Chart creating is different in versions Excel 2003 and 2007. We will not stop on this issue and leave it for the self-instruction study.

Contents

2. Matrix operations

2.1. Array formulas

Many matrix operations are performed with the help of special, so-called array functions. The result of calculation of such a function is not a single value (number) but a set of numbers, e.g. array, even it has a single value. Array formulas are created in the same way as any other Excel formula with one difference. To enter an array formula it is necessary to press CTRL+SHIFT+ENTER for its completion.

Let us explain the application of array formulas with a simple example. Suppose we need to perform autoscaling (column-wise centering and scaling) for the data in matrix X. For this purpose the mean values mj and the standard deviation values sj should be calculated for each column j of matrix X. Afterwards, we should subtract value mj from each column and divide the result by sj   

 

The autoscaling procedure may be performed by ordinary (not array) Excel functions as it is shown in Fig. 25 .

 

Fig. 25 Ordinary formulas

In this case it is important to mind symbol $ before the rows with number 9 (m) and 11 (s), to fix the position of the corresponding values in the formulas,  

For large X matrix it is convenient to apply the array formula. (Fig. 26). Let us name by X, m and s the corresponding ranges in the worksheet. At first we select the empty rangy N3:R7 that has the size of anticipated result. Then we enter formula  =(X-m)/s in the Formula Bar. Finally we terminate operation by pressing CTRL+SHIFT+ENTER. If all these actions were performed properly, we yield { =(X-m)/s} in the Formula Bar. Braces, {}, indicate the array formula.

 

Fig. 26 Array formula

Additional information 

Contents

2.2. Create and edit array formulas

To enter an array formula properly it is necessary to select a region, which size corresponds to the size of the expected result.

In case the selected area is larger, the redundant cells are filled with error symbol #N/A after calculation. If the selected region is smaller, the part of the result values is missed. After selection of the resulting region, a formula is placed in the Formula Bar  and the whole operation is terminated by pressing  CTRL+SHIFT+ENTER.

Alternatively, you may input a formula in one cell, then select the whole region with the cell containing the formula as the first in the area (right and down), switch to the Formula Bar, and press CTRL+SHIFT+ENTER.

For changing an array formula, it is necessary to select the whole range with the formula result and switch to the Formula Bar. Braces that frame the array formula will disappear. After that one can correct the formula and press CTRL+SHIFT+ENTER to confirm the result.

For extending the output region of an array formula, it is sufficient to select a new region, then go to the Formula Bar, and press CTRL+SHIFT+ENTER. It is a little bit more complicated to reduce the resulting region, i.e. to delete extra cells that contain the #N/A symbols. First of all it is recommended to select one cell from the resulting region, switch to the Formula Bar, and copy the formula pressing Ctrl+Ins. Then you should clear the old resulting region and select a smaller one. Afterwards you should switch to the Formula Bar, paste the formula pressing Shift+Ins, and terminate with CTRL+SHIFT+ENTER.

Changes in the cells inside the resulting region are forbidden. On attempting to change the content of such a cell the following warning box is displayed .

 

Fig. 27 Warning regarding the forbidden operations with array formula

Contents

2.3. Simplest matrix operations

It is easy to sum up matrices or multiply matrix by a number with the help of array functions in Excel.

 

Fig. 28 Matrices addition and multiplication by a number

For matrices multiplication array function MMULT is used. 

Contents

2.4. Access to the part of a matrix

Two standard worksheet functions are useful for access and manipulation with a part of a matrix.

OFFSET

Returns a reference to a region that is specified by number of rows and columns from a cell or a range of cells.

Syntax

OFFSET(reference, rows, cols [, height] [, width])

reference is a reference to the first cell in the region from which offset starts;

rows is the number of rows for which the shift should be done (if positive, the shift direction is below the starting reference, if negative, the shift direction is above the starting cell);

cols is the same but for columns  (if positive, the shift is to the right from the starting reference, if negative, the shift is to the left from the starting cell);

height is an optional argument, must be a positive number. This is a number of rows in the returned reference;

width is an optional argument, must be a positive number. This is a number of columns in the returned reference  

Remarks

  • If arguments height and/or width are omitted, it is assumed to be the same height or width as reference;

  • Argument reference is a reference to a real (located in a worksheet) range, but not to a virtual array.  

Example

 

Fig. 29 Function OFFSET

OFFSET is an array function and must be terminated by  CTRL+SHIFT+ENTER.

Description

INDEX

Returns the values in an array selected by indices of rows and columns.

Syntax

INDEX (reference [, row_num] [, column_num])

reference is a numerical array (a matrix);

row_num  is an optional argument, a row number from which values are selected;

col_num is an optional argument, a column number from which values are selected;

Remarks

  • if argument row_num is omitted then the whole column is selected; 

  • if argument col_num is omitted then the whole row is selected;

  • if both arguments are used, the function return a cell located in the intercept of the corresponding row and column;

  • argument reference can be both a reference to a region located in a worksheet and to a virtual array.

Example

Fig. 30 Function INDEX

INDEX is an array function which must be terminated by CTRL+SHIFT+ENTER.

Description

Contents 

2.5. Unary operations

The following unary operations can be performed on matrices.

MINVERSE

Returns the inverse matrix.

Syntax

MINVERSE (array)

array is a numerical array (a matrix).

Remarks

  • Matrix array must be a square one;

  • For a singular matrix symbol #VALUE is returned.

Example  

 

Fig. 31 Function  MINVERSE

MINVERSE is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

TRANSPOSE

Returns the transposed matrix. 

Syntax

TRANSPOSE (array)

array is a numerical array (a matrix).

Example

Fig. 32 Function TRANSPOSE

TRANSPOSE is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

MDETERM

Returns the matrix determinant.

Syntax

MDETERM (array)

array is a numerical array (a matrix).

Remarks

  • Matrix array must be a square one.

MDETERM is not an array function and must be entered by pressing ENTER.

Description

Contents

2.6. Binary operations

The following binary operations can be performed on matrices.

MMULT

Return the results of multiplication of two matrices.

Syntax

MMULT (array1, array2)

array1, array2 are numerical arrays (matrices).

Remarks

  • The number of columns in array1 must be equal to the number of rows in array2, otherwise symbol #VALUE! is returned;

  • The number of elements in the resulting matrix must be less than or equal to 5461 (Excel 2003).

Example

Fig. 33 Function MMULT

MMULT is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

Contents

2.7. Regression

Several standard worksheet functions involve linear regression .

TREND

Builds the linear regression

y=b+m1 x1+…+mJ xJ+e

Approximates known response values of vector known_y's  for given predictor matrix known_x's and returns response values for given matrix new_x's.

Syntax

TREND(known_y's [,known_x's] [,new_x's] [,const])

known_y's is a vector of known response values y (calibration set);

known_x's is an optional argument. Matrix X of known values of predictors (calibration set);

new_x's is an optional argument. Matrix Xnew of new predictor values (test set or new matrix) for which the results are calculated and returned;

const is an optional argument. Boolean value which indicates whether parameter b must be equal to zero or not. If const equals TRUE or omitted then b is calculated in an ordinary way. Otherwise b=0.

Remarks

  • Vector known_y's must be located in a single column. In this case each column of array known_x's is treated as a separate variable;

  • If argument known_x's is omitted, it is supposed to be the vector of numbers {1;2;3;...} having the same size as known_y's;

  • Matrix of new values new_x's must have the same number of columns (variables) as matrix known_x's;

  • If argument new_x's is omitted, it is assumed to be the same as known_x's. The result is a vector with the same number of rows as in array new_x's.

Example  

 

Fig. 34 Function TREND

TREND is an array function and must be terminated by CTRL+SHIFT+ENTER. 

Description

LINEST

Calculates statistics for the linear regression   

y=b+m1 x1+…+mJ xJ+e

Syntax

LINEST(known_y's [,known_x's] [,new_x's] [,const] [,stats])

known_y's is a vector of known response values y (calibration set);

known_x's is an optional argument. Matrix X of known values of predictors (calibration set);

new_x's is an optional argument. Matrix Xnew of new predictor values (test set or new matrix) for which the results are calculated and returned;

const is an optional argument. Boolean value which indicates whether parameter b must be equal to zero or not. If const equals TRUE or omitted then b is calculated in an ordinary way. Otherwise b=0.

stats  is an optional argument. Boolean value which indicates the necessity to output additional statistics information. If stats is equal to FALSE or omitted, the estimates of regression parameters mJ, …, m2m1 and b are returned. Otherwise the following table is output  

 

Fig. 35 Table provided by LINEST function

mJ, …, m2m1 and b are the estimates of the regression parameters;

sJ, …, s2s1 and sb are the standard errors for the regression parameters estimates;

R2  is the coefficient of determination;

sy is the standard error for the y estimate;

F is the F-statistics;

DoF is the number of the degrees of freedom;

SSreg is the regression sum of squares;

SSres is the residual sum of squares.

Remarks

  • LINEST is badly designed function and very inconvenient in application;

  • Remarks included in TREND description are applicable for LINEST function.

Example  

 

Fig. 36 Function LINEST

LINEST is an array function and must be terminated by CTRL+SHIFT+ENTER. 

Description

Contents

2.8. Critical bug in Excel 2003

Functions TERND and LINEST in Excel 2003 provide wrong results in special circumstances.

Wrong results are yielded when simultaneously

  • mean value for each predictor variable in X is equal to zero;

  • mean value of the response vector Y does not equal zero. 

Fig. 37 demonstrates such a case. Mean values for all columns of Xc equal zero but mean value for vector Yc does not equal zero.

Example  

 

Fig. 37 Bug in regression functions in Excel 2003

The situation may be corrected by a special trick. You can apply TREND function to the centered response values and afterwards correct the result. For this purpose the following formula   

=TREND(Yc-ym, Xc)+ym,

is used (see the same figure).

Strange that this bug was not mentioned by users; however in version Excel 2007 this error is corrected.

Contents

2.9. Virtual array

Often in the course of data processing there is a problem of storing the intermediate computations, which are not important by themselves, but have to be calculated for yielding the final result. For example, residuals in the PCA decomposition are rarely analyzed per se, but used for calculation of the explained variance, orthogonal distances, and etc. At the same time, such intermediate arrays may be very large and must be calculated for the various numbers of principal components. They cause flooding the worksheets by the unnecessary, intermediate information. One can avoid this situation applying virtual arrays. Let us explain this with a simple example. 

 

Fig. 38 Example of application of a virtual array

Suppose that we have matrix A and we want to calculate determinate of matrix AtA. Fig. 38 demonstrated two methods of calculations. The first method uses a sequence of intermediate calculations indicated by the red arrows. The second method uses only one formula indicated by green arrow. Both methods provide the same final result, but the first one occupies a lot of space in the worksheet, and the latter one uses several virtual intermediate arrays. All these virtual arrays are the same which are calculated by the first method, but they are not output explicitly.

The first virtual array is the transposed matrix At calculated by function TRANSPOSE(A).

The second virtual array is the result of multiplication of the first virtual array and matrix A using function MMULT(TRANSPOSE(A), A).

Function MDETERM applied to the second virtual array gives the final result.

Virtual arrays are very useful for calculation of various auxiliary characteristics in multivariate data analysis, such as residuals, eigenvalues, leverages, etc. This is explained in details in tutorial Extension of Chemometrics Add-In

Contents

3. Extension of Excel facilities

3.1. VBA programming 

Sometimes, the Excel facilities are not enough and we have to add some user-defined features to Excel. A specially designed programming tool, Microsoft Visual Basic for Applications (VBA) is used for these purposes. VBA helps to create macros (a special set of instruction for performing a sequence of operation) and user-defined functions (a special set of instructions for performing calculations in a worksheet). Macros are used for automation of standard procedures. Once a macro is created it can be repeatedly used for a routine operation. To run a macro from menu, select the Tools-Macro-Macros item. Sometimes it is more convenient to assign a macro to a special new button, placed in the Tool item, or in a worksheet.

User-defined functions are run in the same way as the standard Excel worksheet functions via Formula Bar.

To make the macros and user-defined functions accessible it is necessary to set up a special level of security via menu item Tools-Macro-Security (Excel 2003)  

 

Fig. 39 Selection of security level in Excel 2003

For Excel 2007 the selection of the security level is performed via Office Button-Excel Options-Trust Center.  

Fig. 40 Selection of the security level in Excel 2007

On each opening of an Excel file, the system will ask the permission for running macros, if level Medium (Excel 2003) or Disable all macros with notification (Excel 2007) has been selected. We recommend selection of a security level as it is shown in Fig. 39 and Fig. 40, but mind the reliable antivirus software for testing the alien Excel files.

The capabilities of the VBA application are essentially limited on the initial installation of Excel 2007. To restore these properties you should follow the sequence Office Button– Excel Options–Popular and switch on option Show Developer Tab in the Ribbon.

Details can be read here

Contents

3.2. Example

Let us consider the application of VBA by an example. For modeling the nonisothermic kinetics (DSC, TGA, etc.) it is necessary to calculate the integral exponential function E1(x). By definition,

 

An infinite series can be used for E1(x) calculation

 

Each series term can be placed in its own cell in a worksheet. Afterwards one can sum up all these cells. This approach is presented in Fig. 41. 

 

Fig. 41 Calculation of the E1(x) function in a worksheet

We cannot consider this approach as a wise one. First of all, these calculations occupy rather large area in the worksheet. But the main drawback is that we do not know in advance how many terms should be included in the summation. Sometimes 10 terms are sufficient, but in other cases 50 terms are not enough.

Contents

3.3. Macro example

The second way of calculation is application of a recurrent formula that links two neighboring terms of the series

 

For application of this formula it is necessary to manage the recurrence calculations in a worksheet. The example is shown in Fig. 42.   

 

Fig. 42 Calculation of the E1(x) function by recurrence method

One iteration step transfers the values from range J2:J4 into range L2:L4. To start the next iteration step it is necessary to copy the values from range L2:L4 and paste them into range J2:J4. This copy-paste operation should insert only values without formulas, i.e. Copy-Paste Special must be used. Range H2:H4 contains initial values for starting the iteration process. Numerously repeating the Copy-Paste Special operation, one can yield the target value in cell L4, though this is boring.

It is better to make a macro for the procedure automation. The simplest way for designing a macro is recording the operations performed in a worksheet. Go to the menu item Tools–Macro–Record New Macro. In the dialog window (Fig. 43) one can name a macro and point the place where this macro will be stored.  

 

Fig. 43 Dialog window for the macro recording

After the OK button is pressed, all operations performed in a worksheet are recorded in the macro procedure. When recording is over, it must be terminated by the Tools–Macro–Stop Recording command. The result may be seen in the Visual Basic editor.   

 

Fig. 44 Visual Basic editor

Fig. 44 illustrates the macro after the minor correction. We added a cycle for repeating the Copy-Paste operation for nIter times. Value nIter is determined in cell J6 in the worksheet. Cell J6 has a local name n. For finalizing of the automation, we use button Repeat with macro Iteration assigned to this button.

More about VBA macros can be read here.

Contents

3.4. User-defined function example

At last, the wisest decision for calculation of the integral exponential function is the designing of the user-defined function with the help of VBA   

Function IntExp(dX)
     GammaConst = 0.5772156
     Eps = 0.00000000001
     IntExp = -GammaConst - Log(dX)
     nFact = 1
     For n = 1 To 1000
         nFact = nFact * n
         dR = (-dX) ^ n / n / nFact
         IntExp = IntExp - dR
         If (Abs(dR) < Eps) Then Exit Function
     Next n
End Function  

 

Fig. 45 IntExp function

Fig. 45 illustrates the VBA code and the function reference for this example.

We will not concern the VBA programming as this is a very large and complex issue. For self-study one can read a numerous literature elsewhere in the internet, for example here.

VBA macros are rather slow and they badly fit for the large calculations. For example, we do not recommend programming the PCA decomposition procedure in VBA. For the large arrays such a procedure will work very slowly. Better to consider Excel and VBA as a front end tool for input and output of data, which are further passed to the dynamic link library (DLL), coded in a fast programming language, such as C++ (back end). This specific approach was used in Fitter and Chemometrics Add-In.

More about the user-defined functions can be read here.

Contents

3.5. Add-in

The VBA programs created by a user are stored in the same Excel workbook in which they were created. To make the macros available for other workbooks one can either copy the macros into these workbooks, or create an Add-In application.

Add-In is a special Excel file containing several VBA modules linked to the required DLL libraries. Add-In may be added to Excel to extend its capabilities.

Standard Excel configuration includes several Add-Ins. The most interesting among them are: Solver  and Analysis Toolpak Add-Ins.

Solver is designed to optimize a value in the objective cell. The result is calculated with respect to other cells functionally linked to the objective one.

Analysis Toolpak contains a set of statistical functions for data analysis

There are various Excel Add-Ins in the Internet. Some of them may be downloaded for free and others are commercial. Bellow are some of these programs.

The J-Walk Chart Tools Add-In is a free utility for the chart management (add the legend, names, etc. to the chart elements); all the features that are absent in the standard Excel package

XLStat is a large and expensive package for statistical analysis, including PLS regression

The Multivariate Analysis Add-In is a shareware package for the multivariate data analysis from Bristol University

Fitter is an Add-In for nonlinear regression analysis

Chemometrics is a set of worksheet functions for multivariate data analysis

More about the Add-Ins can be read here.

Contents

3.6. Add-in installation

Before application of an Add-In program one must perform an installation consisting of two steps.

At the first step all files comprised the Add-In package must be placed in the computer. Some Add-Ins provide a special  Setup program for automatic installation. In other cases, the files should be located by a user. We will explain how this should be performed.

A package always includes a file with extension XLA and several additional files with extension DLL, HLP, and etc. All these additional files should be placed in folder C:\Windows, or C:\Windows\System, or C:\Windows\System32.

The main XLA file may be located in any folder. But two following folders are preferable. Microsoft recommends that XLA file should be placed in folder C:\Documents and Settings\User\Application Data\Microsoft\AddIns (here User is the login name under which the Windows is currently running on). Placing XLA file in this directory provide an easy access to the file at the second step of installation. At the same time when the workbooks are run on several computers under the different User names, the links to the XLA files are lost. Therefore, it is necessary to update the links continuously. 

We recommend to place file Chemometrics.xla in a folder that has the same name at different computers, e.g.  C:\Program Files\Chemometrics. Automated installation of Chemometrics Add-In is given here.

The second step of installation is performed from an open Excel workbook. For Excel 2003 it is necessary to go to Tools-Add-Ins, and in Excel 2007 to Office Button– Excel Options– Add-Ins–Go. In the open window (see Fig. 46) you must press Browse and find a pertinent XLA file.  

 

Fig. 46 Add-in installation

After an Add-In has been installed, it may be loaded and unloaded by selecting the corresponding option in front of the Add-In name.

To uninstall an Add-In program, it is necessary to delete a check near its in name, close Excel, and delete from the computer all files related to this Add-In.

Contents

Conclusion

We consider only the main features of application of Excel for matrix manipulations. A lot of details left out of this text-book. Tutorial called Projection methods in Excel may partly fill this gap.

Contents