While the statistics produced by PROC TABULATE can be calculated with other procedures such as PROC MEANS or PROC FREQ, the main advantage of PROC TABULTE is the flexibility available in creating a wide variety of output tables. In this article, we will present a number of examples on how PROC TABULATE can be used to present easy to read one-dimensional, two-dimensional and three-dimensional tables.

Using examples with one, two and three dimensional tables, the following components of PROC TABULATE will be covered:

- Adding multiple statistics to your tables
- Using parentheses to shorten and simplify your code
- Adding classification variables to one or more dimensions
- Nesting classification variables to report on multiple by variables
- Adding subtotals and totals to multi-dimensional tables
- Adding row and column percentages
- Applying formatting, adding or dropping row and column labels

**Data Sets**

**One-Dimensional Tables**

The most basic usage of PROC TABULATE, a one-dimensional table, requires at least two components: (1) a VAR statement to specify an analysis variable and (2) a TABLE statement to specify which variables to include in the output table and where to put them in the table.

For this first set of examples, we are going to compute some statistics on MSRP (Manufacturers Suggested Retail Price) of the vehicles found in the SASHELP.CARS dataset. To use the default settings with PROC TABULATE in this example, we simply need to specify which dataset we are analyzing with the data argument (data=SASHELP.CARS), use the VAR statement to specify MSRP as the analysis variable (VAR MSRP) and the use the TABLE statement to specify that MSRP should be included in the output table (TABLE MSRP). The complete PROC TABULATE syntax is as follows:

var msrp;

table msrp;

run;

There is a wide variety of statistics that can be included with PROC TABULATE, of which a complete list can be found in the SAS documentation. Some of the most commonly used descriptive statistics which will be used in this article include MIN, MAX, MEAN, N and SUM.

So, to determine the max MSRP in the SASHELP.CARS dataset, we can add an asterisks(*) followed by MAX after MSRP in our TABLE statement:

var msrp;

table msrp*max;

run;

var msrp;

table msrp*mean msrp*min msrp*max;

run;

var msrp;

table msrp*(mean min max);

run;

First, a CLASS statement needs to be added to tell PROC TABULATE that you are adding the ORIGIN variable as a classification variable. Next, the TABLE statement needs to be modified to indicate that the mean MSRP values should be grouped by Origin. To add ORIGIN, simply include a second asterisks operator (*) after the chosen statistic (MEAN in this case) as shown below:

class origin;

var msrp;

table msrp*mean*origin;

run;

Now, the output table includes the mean MSRP values by Origin for all vehicles in the SASHELP.CARS dataset:

**Adding Labels and Formatting Output**

As with other SAS procedures, you can add formats and labels to enhance your output. In addition, PROC TABULATE also gives you the ability to remove row or column labels to reduce clutter in your output.

If for example in the table above you did not want to include the mean row label and also wanted to apply a label to the MSRP name, you can do so by adding an equal sign (=) after the variable name in the TABLE statement followed by the desired label in quotations. To remove a row or column label, simply leave a blank between the quotation marks. To format the mean numbers with dollar signs, commas and no decimals places, you could add the dollar8 format to the PROC TABULATE statement:

class origin;

var msrp;

table msrp="Manufacturer's Suggested Retail Price"*mean=' '*origin ;

run;

## Do you have a hard time learning SAS?

Take our Practical SAS Training Course for **Absolute Beginners** and learn how to write your first SAS program!

**Two Dimensional Table**

Let’s start with a simple example to illustrate how to manipulate the rows and columns of the PROC TABULATE output. Note that by default, PROC TABULATE will put the statistics requested into columns.

Recall from our previous example, the following code will calculate the mean MSRP by Origin for all the vehicles in the SASHELP.CARS dataset and place the results into 3 columns.

class origin;

var msrp;

table msrp*mean*origin;

run

class origin;

var msrp;

table origin, msrp*mean;

run;

While it is possible to produce these results as a one-dimensionak table without the comma, the table is not very elegant in appearance. To produce these results as a one-dimensional table, the table statement would start with MSRP (the variable of interest), followed by mean and max (the desired statistics) and finally the classification variable, origin:

class origin;

var msrp;

table msrp*(mean max)*origin;

run;

class origin;

var msrp;

table origin,msrp*(mean max);

run;

## Become a Certified SAS Specialist

Get access to two SAS base certification prep courses and 150+ practice exercises

**Two Classification Variables**

You can continue to build up your two-dimensional tables by adding an additional classification variable. For this example, let’s say you would like know the Mean MSRP for all vehicles in the SASHELP.CARS dataset by Origin as before, but you would also like the Mean MSRP’s broken down by vehicle type.

First, since we are adding a new classification variable, TYPE, we need to add the TYPE variable to the CLASS statement. Next, the table statement includes ORIGIN as the row variable before the comma. After the comma, MSRP, MEAN and TYPE variables are added to indicate that we would like to include them as column values:

class origin type;

var msrp;

table origin,msrp*mean*type;

run;

**Nesting Classification Variables**

For this next example, you would like to have the mean MSRP values for all vehicles types not only broken down by Origin, but also by Drivetrain (DRIVETRAIN). That is, for each origin, you would like to see how the mean MSRPs compare by Drivetrain for all vehicle types.

To generate this new table, there are two modifications that need to made to the SAS syntax from the previous example. First, since we are introducing a third classification variable, DRIVETRAIN, we need to add DRIVETRAIN to the CLASS statement. Next, DRIVETRAIN needs to be added after origin with the asterisks (*) operator to specify that you would like DRIVETRAIN as an additional classification variable for ORIGIN in the rows of the table:

class origin type drivetrain;

var msrp;

table origin*drivetrain,msrp*type*mean;

run;

**Additional Formatting Tips**

As you continue to add layers to the table, they can become quite busy and difficult to read. Earlier we discussed how to add labels to variables by adding an equal sign followed by the label in quotation marks after each variable in the TABLE statement. From our previous example, adding labels can be done as follows:

class origin type drivetrain;

var msrp;

table origin = "Continent of Origin"*drivetrain = "Vehicle DriveTrain",msrp = "Mean MSRP"*type = "Vehicle Type"*mean = " ";

run;

To make the code easier to read, you can add a LABEL statement to label variables and the KEYLABEL statement to label statistics so you don’t need to clutter up the TABLE statement. After the LABEL or KEYLABEL statements, simply specify the variable or statistic, followed by an equal sign and the desired label in quotation mark. Recall that by setting the mean statistic equal to a blank label, we can hide the row heading with “Mean” to tidy up the output table.

We can also add a label to the large empty box on the top left of the table using the BOX option with the TABLE statement, which can be used to give the entire table a title.

Individual statistics can also have formats applied. By adding an F=

*formatname*to the desired statistic in the table statement, you can apply any standard format such as the Dollar8. format to the mean statistic.

These formatting changes are added to the syntax below:

class origin type drivetrain;

var msrp;

table origin*drivetrain,msrp*type*mean*F=Dollar8. / box="Mean MSRP all Vehicle Types by Origin & DriveTrain";

label origin = "Continent of Origin"

drivetrain = "Vehicle DriveTrain"

msrp = "Mean MSRP"

type = "Vehicle Type"

;

keylabel mean = " ";

run;

**Totals, Subtotals and Percentages**

Similar to the cross-tabulation output that you can produce with PROC FREQ, PROC TABULATE also has the flexibility to include totals, subtotals, row and column percentages in your table.

In this next example, we will determine how many vehicles (N) of each drivetrain (DRIVETRAIN) in SASHELP.CARS are from each continent (ORIGIN). In addition, we will also add the total number of cars for each DRIVETRAIN, regardless of origin.

To add these subtotals to your PROC TABULATE output, you simply need to include the ALL keyword in your TABLE statement. The important part is where you place the ALL statement. If you add ALL to the left side of the comma like in the example below, you will get the column subtotals:

class origin drivetrain;

var msrp;

table origin all,msrp*drivetrain*n;

run;

class origin drivetrain;

var msrp;

table origin all,msrp*drivetrain*n all;

run;

To include a percentage in your table, simply add one of the percentage keywords to your TABLE statement in the same way you would add a statistic. For example, to include the row, column, and total percentages in addition to the count of drivetrain to your table, you would use the following syntax:

class origin drivetrain;

var msrp;

table origin all,drivetrain*(n rowpctn colpctn pctn) all;

run

**Three Dimensional Table**

Despite the name, three-dimensional tables created with PROC TABULATE are not truly three dimensional cubes by traditional definition. However, what PROC TABULATE does give you is the ability to easily create multiple two-dimensional tables for different classes or categorical variables without having to run PROC TABULATE multiple times or create multiple input datasets.

First, let’s start with a simple two-dimensional table. For this example, we are interested in the mean MSRP of vehicles by the number of cylinders in the engine (CYLINDERS) and the continent of origin (ORIGIN):

class cylinders origin;

var msrp;

table origin,msrp*cylinders*mean;

run

class cylinders origin drivetrain;

var msrp;

table origin*drivetrain,msrp*cylinders*mean;

run;

The position for third dimension classification variable is also important. To produce a table for each drivetrain that has the origins as rows and the cylinders as columns, the TABLE statement needs to start with drivetrain (our “third dimension”) followed by a comma, then origin and finally a second comma before MSRP, cylinders and mean, as shown below:

class cylinders origin drivetrain;

var msrp;

table drivetrain,origin,msrp*cylinders*mean;

run;

As discussed previously, the output can be further improved by applying some additional formatting. By applying the formatting highlighted in the syntax below, our input can be enhanced by:

- Removing the unnecessary MSRP, Origin and Mean headings
- Applying a more informative label to the Cylinders heading
- Formatting the mean MSRP values as Dollar8.
- Adding a title to the upper left box

class cylinders origin drivetrain;

var msrp;

table drivetrain,origin=' ',msrp=' '*cylinders='# of Cylinders'*mean=' '*F=dollar8. /box="Mean MSRPs by Origin";

run;