Part I – Using R in Excel – Descriptive Statistics

Wait 5 sec.

[This article was first published on Adam's Software Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.IntroductionThe purpose of this series of posts is to demonstrate some use-cases for R in Excel using the ExcelRAddIn component (disclaimer: I am the developer of this add-in: ExcelRAddIn). The fundamental rationale for the add-in is that it allows access to the extensive R ecosystem within an Excel worksheet. Excel provides many excellent facilities for data wrangling and analysis. However, for certain types of statistical data analysis, the limitations of the built-in functions even alongside the Analysis ToolPak is not sufficient, and R provides superior facilities (for example, for performing LDA, PCA, forecasting and time series analysis to mention a few).This series of posts demonstrates four main areas where R is useful in Excel: descriptive statistics, linear regression, forecasting, and accessing Python. Along the way, we will see that using R in Excel is no more difficult than writing a formula and calling the ExcelRAddIn to evaluate it. The ‘trick’, if there is one, is unpacking the results into a form that Excel understands and which can be used in a worksheet. We will see several examples of how to do this.Installing and setting up the ExcelRAddIn is described here. Each part of the series is accompanied by an Excel workbook with the R scripts. The workbooks depend on the ExcelRAddIn-AddIn64.dll, so this should be loaded first, and the “R AddIn” menu should appear on the right-hand side of the menu bar.The task pane is empty until the first script is evaluated. This will initialise R using the directories in the “Settings”. The default packages I use are tidyverse, dplyr, forecast, ggplot2, ggthemes, as shown below:The workbook for this part of the series is: Part I – R in Excel – Descriptive Statistics.xlsx. The workbooks all have a similar structure just to keep things organised. The ‘References’ worksheet lists any links to external references. The ‘Libraries’ worksheet loads additional (non-default) packages. The ‘Datasets’ worksheet contains any data referenced in the worksheets.Descriptive statisticsLoading DataThe first step is to load some data. This dataset comes from “Linear Models with R” by Julian Faraway. In this example, I have loaded the data into Excel from a csv file (GalapagosData.csv) using Power Query. The data has been tidied up and I have created a table (tableGalapagos) that can be referenced in the workbook.In the Descriptive Statistics worksheet, we first create a data frame using the CreateDataFrame function.=RScript.CreateDataFrame("galapagos", tableGalapagosData, tableGalapagosData[#Headers])This function is part of the add-in and simplifies the creation of dataframes. There are also functions to create vectors and matrices. We pass in a name (which will appear in the R environment) and the data and headers. The final parameter (‘Type’ => character, complex, integer, logical, numeric) is optional; the RType is now determined from the data if possible. This makes it somewhat easier to create objects to pass to R from Excel.This copies the data into the R environment. There are a number of alternatives to this approach. We could have loaded the csv file directly into R using:galapagos