PQL Procedures

Home	Starting	Environment	DBMS	VisualPQL	Procedures	SQL	Forms	Host/API	Index

PQL Procedures

Tabulate

The TABULATE procedure selects and summarises data. It then produces cross-tabulations with summary data, statistics or percentages. See syntax. A wide variety of cross-tabulations can be produced:

One, two and three dimensional tables can be produced; the first dimension being columns across a page, the second being rows down a page and the third being different sections. These three dimensions are known as headers (for columns), stubs (for rows) and wafers (for sections).
Control variables, with discrete values, and observation variables, with continuous values, can be used.
Totals, means, minimums, maximums, standard deviations, medians, quantiles and percentages can be specified with flexibility in choosing percent 'base' cells, for various combinations of wafer, row and column percentages.
Wafer, stub and header labeling can be controlled with use of variable labels and value labels. There is automatic segmentation of labels and adjustment of table size parameters to fit. Tables too large to fit on a single physical page are broken across pages and page headings can be specified.
Output can be in standard format or produced as html suitable for viewing and printing by software that uses that format such as browsers or Microsoft Word^®.

Syntax

TABULATE
     FILENAME = filename
     HEADER   = (expression)
    [STUB     = (expression)]
    [WAFER    = (expression)]

Record Filtering

  BOOLEAN     = (boolean expression)
  SAMPLE      = fraction
  WEIGHT      = varname

Cell Statistics

  TOTAL      = varname ['label']...
  COUNT      = varname ['label']...
  CSS        = varname ['label']...
  CV         = varname ['label']...
  CVERR      = varname ['label']...
  MAXIMUM    = varname ['label']...
  MEAN       = varname ['label']...
  MEDIAN     = varname ['label']...
  MINIMUM    = varname ['label']...
  MISSING    = varname ['label']...
  NORMALIZED = varname ['label'](n1)['label1'],(n2)['label2']...
  PERCENT    = varname ['label']...
  QUANTILE   = varname ['label'] n ['label1','label2'...]...
  QUANTILE   = varname ['label'](n1)['label1'],(n2)['label2']...
  RANGE      = varname (lo,hi) ['label']...
  STDERR     = varname ['label']...
  STDEV      = varname ['label']...
  TSTATISTIC = varname ['label']...
  USS        = varname ['label']...
  ISDNUMBER = n

Data Print Formatting

  PRINTFORMATS= varlist (option)...

Page Formatting

  PAGETITLE   = 'string'
  PAGELENGTH  = n
  PAGEWIDTH   = n
  COLLAPSE

Header Formatting

  HEADERWIDTH       = n
  HEADERINDENTATION = n
  HEADERDIVIDER     = 'character'
  NODIVIDERS
  NOHEADERCENTER

Stub Formatting

  STUBTITLE       = option 'string'
  STUBWIDTH       = n
  STUBINDENTATION = n
  STUBCONTINUATION= n
  STUBHEIGHT      = n
  STUBFILLER      = 'character'
  STUBDIVIDER     = 'character'

Wafer Formatting

  WAFERTITLE        = option 'string'
  NOWAFERTITLE
  WAFERCENTER
  WAFERSTART        = n
  WAFERINDENTATION  = n
  WAFERCONTINUATION = n
  WAFERHEIGHT       = n

Other Options

  HTML
  NOBOTTOMBORDER
  NOLEFTBORDER
  NORIGHTBORDER
  NOTES= 'string'
  FOOTNOTES= 'string'
  WAFERNOTES= 'string'
  NOZEROS
  ZEROCHARACTER= 'character'
  NOEMPTYROWS
  DELETEMISSING
  UPPERCASE
  SPANNERS
  DUMMY  = varname 'label' ...
  LANGUAGE = DUTCH | GERMAN

The FILENAME and the header expression are the only required clauses.

FILENAME Specify the filename created by the procedure. If a filename is not specified on a subsequent TABULATE procedure, the output is appended to the last TABULATE output file.
If the filename is CGI then, if the procedure is run when SIR/XS is operating from the common gateway interface, the output is returned to the user's browser. When run locally, a filename of CGI results in the file sircgi.htm. It is normal to specify the HTML keyword when using CGI as the filename.

expressions

The three possible expressions (HEADER, STUB & WAFER) define the data in one of the dimensions of the table. Each expression consists of from one to n variables and/or statistical keywords all enclosed in brackets. If multiple variables are specified on a single expression they are related by the keywords BY or THEN.

Percentages can also be used in expressions.

The type of variable, Variable Modifiers, and statistics all effect the Cell contents.

HEADER = (expression) Defines the columns of the tabulation. This clause is required.
STUB = (expression) Defines the rows of the tabulation.
WAFER = (expression) Defines the sections of the tabulation. Sections are a number of two dimensional sections specified by the stub and header.
BY Nest a variable within another thus producing combinations of the variable values within one dimension of the table. The asterisk (*) symbol may be used instead of the keyword.
THEN Appends a second variable or statistical keyword to the set of variables displayed within one dimension of the table. The plus (+) symbol may be used instead of the keyword.
BY and THEN Can be used in combination to create complex structures within a single dimension or expression.

Errors may occur, not only if the table is misspecified, but also if it cannot fit the output page.

Expressions

The TABULATE command has the general form:

TABULATE  HEADER = (expression) [STUB = (expression) [WAFER = (expression)]]

A table consists of up to three components or dimensions: header, stub and wafer. Each dimension is defined by an expression. Each expression is identical in format but applies to a different dimension of the table. These expressions name the variables to tabulate and their relation to each other in the table. The different keywords (HEADER, STUB & WAFER) define the dimension being specified. The HEADER is required; WAFER can only be specified if a STUB is also specified. The sequence of the three expressions in the command is irrelevant.

The header defines variables whose values correspond to columns; the stub defines variables whose values correspond to rows. The rows and columns define a two-dimensional array called the wafer. The wafer is a higher level categorisation resulting in multiple individual row/column combinations. The table description consists of one to three expressions.

Define an expression with variable names and keywords. At the most basic an expression consists of a single variable name. Multiple variables or statistics may be defined in a single expression separated with the keywords BY and THEN. BY specifies a level of nesting within a single dimension; THEN specifies multiple variables within a single dimension.

The structure of the table is defined by the values of the control variables specified in the various expressions. The values in the variables named in these expressions are used as the labels for the particular dimension being specified. The number of categories of any control variable is taken from the schema definition which contains information on variable ranges, valid values and value labels.

The body of the table consists of Cells. The content of a cell is a summary of data which applies to that particular combination of variables in the expressions. This might be a count of occurrences of data, a sum of values, a percentage or some other statistic.

If the command has a single header expression, a table containing a single value for each column is produced. That is one line of data is produced with one entry per column. For example:

TABULATE  HEADER = (REGION)
TABULATE  HEADER = (AGE BY SEX)
TABULATE  HEADER = (SALARY BY (MIN THEN MEAN THEN MAX))

If the tabulate statement contains a STUB expression, then a two-dimensional table is produced. The stub defines the rows of the table going down the page. The header defines the columns going across the page. For example,

TABULATE HEADER = (RACE)  STUB = (REGION)
TABULATE HEADER = (RACE THEN SEX) STUB = (REGION)
TABULATE HEADER = (SALARY BY (MEAN THEN MEDIAN)) STUB = (REGION BY RACE)

By specifying variables in two expressions, a cross-tabulation is produced with one cell for each value of each variable. For example, if a cross tabulation for two variables is specified, where each variable has two values, four cells are produced. The more values, the more cells.

Nesting variables with the BY operator also produces a cross-tabulation within a single dimension. If the same variables are used as a Stub and header or as a header with a BY clause, the same cells are produced but the table has a different shape.

If the tabulate statement contains a WAFER expression, a set of two-dimensional tables is produced. There must be a stub expression for a WAFER expression to be valid. When more than one wafer is generated a table of contents is also produced. If a particular wafer contains no data, it is not printed and is marked deleted in the table of contents. If only one wafer is produced, it is printed even if it is empty. For example:

TABULATE HEADER = (SEX)           STUB = (AGE)         WAFER = (REGION)
TABULATE HEADER = (SEX)           STUB = (AGE)         WAFER = (REGION BY RACE)
TABULATE HEADER = (AGE BY REGION) STUB = (RACE BY SEX) WAFER = (COUNT THEN PCT)

BY

Specifying BY between two variables nests each of the values of the second variable in each of the values of the first variable. If a second BY is specified, a third level of nesting takes place. BY defines a structure within a particular dimension. Each variable specifies a further level of nesting. The number of cells produced is the product of the number of values for each variable. There is no limit to the number of levels allowed within one expression. The asterisk (*) symbol can be used instead of BY.

Example: BY

HEADER = (AGE BY INCOME)

Young		Old
Rich	Poor	Rich	Poor
6	4	8	3

HEADER = (SEX BY AGE BY INCOME)

Male				Female
Young		Old		Young		Old
Rich	Poor	Rich	Poor	Rich	Poor	Rich	Poor
4	1	6	2	2	3	2	1

THEN

Specifying THEN between two variables or two clauses in effect concatenates two independent tables with all values of the first table followed by all values of the table. The plus (+) symbol can be used instead of THEN.

Example: THEN

The following header expression results in the independent tabulation of individual occurrences of the two values for each of the two variables:

HEADER = (AGE THEN SEX)

Young	Old	Male	Female
10	11	13	8

When two variables are joined by THEN, each value of each variable defines a cell. The number of cells produced is the sum of the number of values for each variable.

Combining BY and THEN

BY takes precedence over THEN when the two operators are used in combination in an expression. Use parentheses to specify a different precedence.

Example: BY and THEN

The following examples show various combinations of the same variables in a header expression. In the first example, since nesting takes precedence, SEX is nested with INCOME, and the result of that nesting operation is concatenated with AGE.

HEADER = (AGE THEN INCOME BY SEX)

Young	Old	Rich		Poor
Young	Old	Male	Female	Male	Female
10	11	10	4	3	4

The next example concatenates AGE and INCOME and nests SEX within both AGE and INCOME, using parentheses to cause the concatenation operator to be applied first.

HEADER = ((AGE THEN INCOME) BY SEX)

Young		Old		Young		Old
Male	Female	Male	Female	Male	Female	Male	Female
5	5	8	3	10	4	3	4

Other combinations might be:

HEADER = (AGE BY INCOME THEN SEX)

Young		Old		Male	Female
Rich	Poor	Rich	Poor	Male	Female
6	4	8	3	13	8

HEADER = (AGE BY (INCOME THEN SEX))

Young				Old
Rich	Poor	Male	Female	Rich	Poor	Male	Female
6	4	5	5	8	3	8	3

Nesting and concatenation can also be used in the stub and wafer expressions.

Variable Types

Tabulate uses numeric variables only. This includes categorical variables, which are numeric codes representing discrete string values. In a VisualPQL program, string variables can be recoded into numeric categorical values. There are two types of numeric variables; variables with discrete categories, known as CONTROL variables and variables with continuous values, known as OBSERVATION variables.

Tabulate determines the tabulation type of a variable from the information in the variable schema according to the following rules:

Variables that are CATEGORICAL, or have VALID VALUES or have VALUE LABELS are control variables. All other variables are observation variables.
Database and table variables may be explicitly declared as CONTROL VARS or OBSERVATION VARS in the data dictionary.
Variables implicitly defined with GET VARS maintain their tabulation type.
The CONTROL VARS and OBSERVATION VARS commands in a VisualPQL program assign an explicit tabulation type to local variables created in the program.
The tabulation type of variables can be modified in the tabulate statement.
Variables declared as CONTROL variables must have ranges specified.

Most variables are either control or observation. For example, salary would be an observation variable and department would be a control variable. However, some variables may be reasonably used in either way. A variable containing number of persons per family might be used as a control variable in one table but might be aggregated in another table along with a count of families, in order to get the average persons per family. Such a variable can be defined as observation type in the VisualPQL program or database and then reclassified in the TABULATE statement as a control variable.

If a control variable contains missing values, the record is normally excluded from the body of the tabulation. Categories can be generated to show cells for missing control variables. If an observation variable contains missing values, the record is normally included but the variable is not computed into the cell contents. This can give rise to situations where overall totals do not agree with counts in the body of the table. Specify the DELETEMISSING keyword to exclude records where any cross-tabulated variable has a missing or undefined value.

Cell Contents

Different types of variables combine to give different cell contents and a different table.

Control variables provide classifications which determine the structure of the table and the presence of a particular cell. Observation variables give the result that appears in the cell.

When two control variables are cross tabulated or nested, the content of the cell is a count of the number of occurrences of data which match that combination of values.

When an observation variable is cross tabulated or nested with a control variable, the total (sum) for the observation variable is the default. That is, the structure of the table is unaltered by the inclusion of an observation variable but, instead of a count of data appearing in the cell contents, a total of the observation variable is calculated and an appropriate header label printed. Other statistics can be specified for observation variables.

Two observation variables cannot be cross tabulated or nested within each other and therefore observation variables may be specified in one and only one expression (dimension) of the TABULATE statement.

Different types of information are reported in the cell contents for Control and Observation variables. For Control variables, which have discrete values, counts or percentages are reported. For Observation variables, which have continuous values, a variety of cell statistics are reported, the default being a sum of the values. For any cell that cross tabulates a control and an observation variable, the observation variable statistics are reported.

Example: Observation Variables

In the following three examples, the observation variable SALARY is included in the expression. The number in any given cell is the sum of the SALARY values from each record having the combination of values for the control variables for that particular cell. The placement of the observation variable in the header expression determines the placement of the label, but does not alter the content of the table. The observation variable does not alter the number of cells. Observation variables and counts may be specified in a single expression in various ways:

HEADER = (SALARY BY AGE BY SEX) STUB = (REGION)

	SALARY
	Young		Old
	Male	Female	Male	Female
North.......... South..........	2800 2200	5650 11700	20100 4950	3200 5000

HEADER = (SEX THEN (SALARY BY SEX) STUB = (AGE)

	Male	Female	Current Monthly Salary
			Male	Female
Young... Old.....	5 8	5 3	13050 21900	14450 8200

HEADER = (AGE BY (SEX THEN SALARY)) STUB = (REGION)

	Young			Old
	Male	Female	Current Monthly Salary	Male	Female	Current Monthly Salary
North.... South....	3 2	2 3	13600 13900	6 2	1 2	20150 9950

Variable Modifiers

Modifiers change the attributes of a variable in an expression. These have the form:

variable[.modifier][.modifier]...

Several modifiers can be appended to a single variable. The modifiers and their abbreviations are:

.CONTROL   (.C)
.OBS       (.O)
.FIRST     (.F)
.TOTAL     (.T)
.MISSING   (.M)
.UNDEFINED (.U)
.SPAN      (.S)
.NOSPAN    (.NOS)

The .CONTROL and .OBS modifiers alter the default tabulation type of a variable (i.e. control or observation). The .CONTROL modifier may only be specified for an observation variable if valid ranges have been specified for it.

The .FIRST modifier is used in conjunction with the percent base marker as described in the section on percentages. It specifies that the first occurrence of the percent base marker is used.

The .TOTAL modifier automatically concatenates a TOTAL control variable as described in the section on TOTAL. The TOTAL is displayed before the variable being modified.

When a control variable has a missing or undefined value, it is normally excluded from the detail cells in the table. Specify the .MISSING or .UNDEFINED modifiers to generate additional cells in the table. The .UNDEFINED modifier groups all missing and undefined values into a single cell. The .MISSING modifier displays undefined and each defined missing value as separate cells. These modifiers override any DELETEMISSINGspecification for that variable.

Example: Undefined Modifier

HEADER = (AGE.T.U)

TOTAL	Young	Old	AGE= UNDEFINED
22	10	11	1

By default, the rows and columns of a table are labeled with value labels (if they exist) or with the "variable=value" notation. The .SPAN modifier generates a second level of heading for a variable. This heading is either the variable label (if it exists) or the variable name. This new heading "spans" all the categories of the variable, hence the name .SPAN.

Example: SPAN Modifier

The next two tables illustrate the effect of .SPAN. The only difference between them is the .SPAN modifier in the second table.

HEADER = (AGE BY SEX) STUB = (REGION)

	Young		Old
	Male	Female	Male	Female
North.......... South..........	3 2	2 3	6 2	1 2

HEADER = (AGE.S BY SEX.S) STUB = (REGION.S)

	AGE
	Young		Old
	Gender
	Male	Female	Male	Female
REGION North.......... South..........	3 2	2 3	6 2	1 2

The SPANNERS option can be used to turn on spanning labels for all variables. The .NOSPAN modifier turns off spanning labels for a particular variable when the SPANNERS option is in effect.

Statistics

Other than TOTAL, statistics are used with observation variables. Statistics are specified with the BY keyword to associate them to an observation variable.

There are two methods of specifying statistics.

The first method is to specify a pseudo-variable in the expression and to use one of the statistical option clauses to specify the statistic and label to produce in the form:

keyword = pseudo-variable 'label'

The second method is to use a statistical keyword directly. In effect, these are pre-defined, convenient pseudo-variables with standard labels. (This method cannot be used with NORMALIZED, PERCENT, QUANTILE and RANGE, since additional data is required on the specification.)

The statistical clauses are:

COUNT A count of records in this category.
CSS Corrected Sum of Squares, where the
cell = sum of squares - ((sum**2)/n).
CV Coefficient of Variance, where the
cell = (standard deviation/mean) * 100.
CVERR Coefficient of Error, where the
cell = (standard error/mean) * 100.
MAXIMUM Maximum value of the variable.
MEAN Mean value of the variable.
MEDIAN Median value of the variable.
MINIMUM Minimum value of the variable.
MISSING Count of missing values in the variable.
NORMALIZED* Normalized quantiles (by median value) where cell = (quantile/median) * 100.
PERCENT* Percentage, where cell = (cell/base) * 100.
QUANTILE* Quantile, equal or unequal quantiles can be specified.
RANGE* Range produces the range of the variable expressed as the difference between the lowest and the highest values encountered. The format of the range clause is:
RANGE = variable (lo,hi) ['label']
The specification of the lo and hi values defines the outer limits of any values to be included in the calculation. This can be used, for example, to exclude negative or zero values or very high numbers used as some coding convention.
STDERR Standard Error where cell = standard deviation/sqrt(n).
STDEV Standard Deviation about the mean.
TOTAL Count of records. When cross tabulated with an observation variable, gives sum of values of the observation variable.
TSTATISTIC T-Statistic where cell = mean/standard error.
USS Uncorrected Sum of Squares where cell = sum of squares.
* Cannot be used as keywords directly in an expression.

Example Statistical Clauses

The TOTAL clause specifies that the pseudo-variable N is a total, with the label 'Valid Cases'. The MEAN and STDEV clauses assign labels for these pseudo-variables to 'X Bar' and 'Sigma'.

TABULATE HEADER = (N THEN SALARY BY (MEAN THEN STDEV)) STUB = (SEX)
         TOTAL  = N 'Valid Cases'
         MEAN   = MEAN 'X Bar'
         STDEV  = STDEV 'Sigma'

	Valid Cases	Current monthly salary
		X Bar	Sigma
Male........... Female.........	13 8	2688 2831	486 746

The pseudo-variables created can be used anywhere that is valid. If the pseudo-variable is a TOTAL it is a control variable, otherwise it is an observation variable. Multiple pseudo-variables can be created and assigned to the same keyword. In the next example, both N and SUM are totals, N is the count of records and SUM is the total SALARY.

TABULATE HEADER = (N THEN SALARY BY (SUM THEN MEAN THEN STDEV)) STUB = (SEX)
         TOTAL = N 'Valid Cases'  SUM 'Sum'
         MEAN  = MEAN 'X Bar'
         STDEV = STDEV 'Sigma'

	Valid Cases	Current monthly salary
	Valid Cases	Sum	X Bar	Sigma
Male........... Female.........	13 8	34950 22650	2688 2831	486 746

Example: Statistical keywords

HEADER = (SALARY BY (MIN THEN MAX)) STUB = (SEX)

	Current monthly salary
	MINIMUM	MAXIMUM
Male........... Female.........	2000 1650	3600 4000

Keywords must be nested within an observation variable. Keywords may not be nested within each other; they may be concatenated. For example, SALARY BY MEAN BY STDEV is not allowed; SALARY BY (MEAN THEN STDEV) is allowed.

Accuracy and ISDNUMBER

Medians and quantiles are estimates of the true values. The accuracy of the estimates is governed by a value called the interval size designator or ISD number . This number is initially set to 4 which means that the maximum relative error in a median or quantile estimate should not exceed 12.5%. The value is usually more accurate if the values are distributed smoothly. Set the ISD number explicitly by using the clause:

ISDNUMBER = n /

that sets the ISD number to n. Increasing the ISD number by 1 cuts the maximum relative error in half while doubling the internal storage required to compute the medians or quantiles. The ISDNUMBER clause must precede the clauses it is to affect. For example,

TABULATE HEADER    = (SALARY BY (MEAN THEN Q)) STUB = (TOTAL THEN REGION)
         ISDNUMBER = 5
         QUANTILES = Q 'Quantiles' 4

Total

TOTAL can be specified as an independent control variable which is a count of records. It is specified in exactly the same way as other statistics (using either pseudo-variables or a keyword). However, since it is a control variable rather than an observation variable, it operates in a different manner.

Example: TOTAL

HEADER = (AGE THEN TOTAL)

Young	Old	TOTAL
10	11	22

The control variable, AGE, with two classifications is concatenated with TOTAL. The total column equals the count of all occurrences at the appropriate level of nesting. (Note that records with undefined values have still been counted.)

HEADER = (SALARY THEN TOTAL)

Current monthly salary	TOTAL
57600	22

SALARY is an observation variable and thus the sum is displayed by default. The keyword TOTAL is a control variable and displays a count. Cross tabulating or nesting TOTAL with either control or observation variables is allowed since it is a control variable. As per the standard rules for cell contents, if TOTAL is nested with a control variable, a count is produced; if it is nested with an observation variable, a sum of the observation variable is produced.

Example: TOTAL with BY and THEN

The following header expression uses the observation variable SALARY and the control variables AGE and SEX:

HEADER = (SALARY BY AGE BY (TOTAL THEN SEX))

Current Monthly Salary
Young			Old
TOTAL	Male	Female	TOTAL	Male	Female
27500	13050	14450	30100	21900	8200

Since TOTAL is nested within an observation variable, this column contains the total SALARY for each AGE group.

TOTAL shorthand notation

There is a shorthand notation for TOTAL. Appending a single T to the variable name separated by a period specifies that a TOTAL is produced before the variable. The following two expressions are identical:

TOTAL THEN AGE
AGE.T

A tabulate statement can be shortened as follows:

TABULATE  HEADER = (SALARY BY AGE BY (TOTAL THEN SEX))
TABULATE  HEADER = (SALARY BY AGE BY SEX.T)

Normalized and Quantiles

Normalized and Quantile specifications are very similar to other statistics but have to supply some additional information. This can take one of two forms:

In the first form, the quantity "n" specifies the number of equal-sized quantiles to be produced. Of these, n-1 are printed. Thus for n=4, TABULATE prints output for the 25%, 50% and 75% quantiles. In the second form a number of non-equal sized quantiles are produced and each n specifies the quantiles to produce. For example, specifying Ns of 15 and 85 produces the 15th and 85th percentage quantiles. Both forms allow specification of labels. In the first form, when labels are used for individual quantiles, the commas separating 'label1', 'label2', etc. are required.

1)  keyword = variable ['label'] n ['label1', ...]
2)  keyword = variable ['label'] (n1) ['label'] (n2)...

Example: Quantiles

TABULATE HEADER = (SALARY BY (MEAN THEN Q)) STUB = (REGION.T)
         QUANTILES = Q 'Quantiles' 4

	Current monthly salary
	MEAN	Quantiles
		QUANTILE- =25	QUANTILE- =50	QUANTILE- =75
TOTAL....... North........ South........	2743 2813 2650	2311 2238 2369	2678 2825 2538	3222 3315 2913

A specification in the first form , with multiple labels:

TABULATE  HEADER    = (SALARY BY (MEAN THEN Q)) STUB = (REGION.T)
          QUANTILES = Q 'Quantiles' 4 '25%', 'Median', '75%'

	Current monthly salary
	MEAN	Quantiles
		25%	Median	75%
TOTAL........ North........ South........	2743 2813 2650	2311 2238 2369	2678 2825 2538	3222 3315 2913

A specification in the second form:

TABULATE HEADER    = (SALARY BY (MEAN THEN Q)) STUB = (REGION.T)
         QUANTILES = D 'Deciles' (10) '10th' (50) '50th' (90) '90th'

	Current monthly salary
	MEAN	Deciles
		10th	50th	90th
TOTAL........ North........ South........	2743 2813 2650	2051 1973 2267	2678 2825 2538	3407 3647 3265

Percentages

TABULATE can calculate and display percents. Tables can be specified which display just percentages or which also include the original values.

Tables can be specified which contain row percentages, column percentages or overall percentages. Multiple percentages can be displayed in a single table. A single table can have percentages computed from more than one base value. However a table cannot have any cell which is an intersect of two percentage calculations. This means that all percentage cells must appear in a single expression; there cannot be percentages in both a row and column expression.

To display percentages, specify where the percents appear and the base of each percent calculation. Use the PERCENT option to define a pseudo-variable and then reference this in an expression to display the percent. Specify the base variable to use to calculate the percentage by appending a percentage symbol (%) and the same pseudo-variable name to a base variable. The only valid base variables are TOTAL control variables, typically cross tabulated against the observation variable that is the percentage variable.

Shorthand Notation for Base Marker Specification

If a percent base marker is appended to a variable, it specifies that a TOTAL precedes the variable and that this is the base. For example, the following expressions are equivalent:

TOTAL%X THEN REGION
REGION%X

This shorthand notation can simplify the tabulate expressions. For example, both the following statements are equivalent:

TABULATE HEADER = (TOTAL%X THEN REGION) STUB = (X BY (TOTAL%X THEN AGE))
TABULATE HEADER = (REGION%X)            STUB = (X BY AGE%X)

The table output is the same in both cases. With the shorthand form, the label of the created TOTAL category is always "TOTAL".

A brief review of .T and %X notation: An expression such as AGE.T, produces three rows or columns - TOTAL,Young and Old. An expression such as REGION%X, produces three categories - TOTAL, North and South. It also specifies the TOTAL column as the percent base. The sub-expression, X.T, is the same as TOTAL THEN X and produces two cells for each category.

The basics of producing percentages in tables are covered in simple percentages. You can also get percentages with observation variables and tables with values and percentages. Tables can have percentages for row totals and can be displayed in different wafers. There are numerous examples of different tables using percentages in percentage examples.

Percentages with Counts

The simplest percentage tables are produced with control variables and cell contents are therefore counts. Consider the following standard table:

HEADER = (TOTAL THEN REGION) STUB = (TOTAL THEN AGE)

	TOTAL	North	South
TOTAL.......... Young.......... Old............	22 10 11	12 5 7	9 5 4

To add percentages to this table, define a pseudo-variable, say X, and reference this on PERCENT clause, for example:

PERCENT = X 'Percent'

Choose the base row or column for the percentage computations. Base cells are indicated within an expression by appending a percentage symbol (%) and the pseudo-variable to the base variable. For row percentages, there must be a total column for the base or 100% column. In the example, the TOTAL variable in the header expression is the total column. Therefore, append %X to TOTAL to make it the base for percentage calculations.

Decide where to place the percent variable X, ensuring that it is nested correctly. Specifying it in the header expression or in the stub expression only affects the labeling of the table.

Row Percents

TABULATE HEADER  = (X BY (TOTAL%X THEN REGION)) STUB = (TOTAL THEN AGE)
         PERCENT =  X 'Percent'

	Percent
	TOTAL	North	South
TOTAL.......... Young.......... Old............	100 100 100	55 50 64	41 50 36

The total (100%) and regional percentages for each age group (row) are displayed in the table. The table contains only percentages because the percent variable, X, is at the highest level of the header expression nesting. The remainder of the header, (TOTAL THEN REGION), is nested within X. The percent marker, %X, specifies the header column TOTAL as the 'set' of base cells for this table's percent calculations. Each cell value for TOTAL is the base for the percents in its row. For example, the calculations for the first row of the table are:

TOTAL  = 2200/2200 * 100 = 100
         North  = 1200/2200 * 100 = 55
         South  =  800/2200 * 100 =  41

Note: The above illustrates the effect of missing values in a control variable. Some records had missing values in REGION and thus the columns do not reflect these records. If required, specify the MISSING modifier on REGION to generate an extra column for those records.

Moving X to the stub expression changes the labeling of the table but not its contents:

TABULATE HEADER = (TOTAL%X THEN REGION) STUB = (X BY (TOTAL THEN AGE))
         PERCENT = X 'Percent'

	TOTAL	North	South
Percent TOTAL.......... Young.......... Old............	100 100 100	55 50 64	41 50 36

Column Percents

To display column percentages instead of row percentages, use a total row to be the base or 100% row.

In the example, move the base marker, %X, from TOTAL in the header and append it to TOTAL in the stub:

TABULATE HEADER = (X BY (TOTAL THEN REGION)) STUB = (TOTAL%X THEN AGE)
         PERCENT = X 'Percent'

	Percent
	TOTAL	North	South
TOTAL.......... Young.......... Old............	100 45 50	100 42 58	100 56 44

As in the preceding table, the percent variable, X, is nested with all cells of the table so that all cells contain percents. In this table, however, the percent marker, %X, specifies the TOTAL row in the stub expression as the set of base cells for this table's percent calculations. For example, the calculations for the first column of the table are:

          TOTAL            = 2200/2200 * 100 = 100
          Young            = 600/2200 * 100 =  45
          Old              = 1400/2200 * 100 =  50

Base Markers in Both the Header and the Stub

If the percent base marker is appended to a total column in the header expression, row percentages are produced.
If the percent base marker is appended to a total row in the stub expression, column percentages are produced.

To get overall or grand total percentages, place the percent base markers in both the stub and the header. By convention, the cell (or cells) at the intersection of the base row and column are considered the base cells for the table.

TABULATE  HEADER = (X BY (TOTAL%X THEN REGION)) STUB = (TOTAL%X THEN AGE)
          PERCENT = X 'Percent'

	Percent
	TOTAL	North	South
TOTAL.......... Young.......... Old............	100 45 50	55 23 32	41 23 18

In this table, all percentages are measured using the first cell of the table as the base. This cell is at the intersection of the stub TOTAL row and the header TOTAL column.

Percentages with Observation Variables

Percentages of observation variables can be produced but at least one control variable must be nested with the percent variable.

Example: Percentages for Observation Variables

The following shows salary by age group:

TABULATE  HEADER = (SALARY BY (TOTAL THEN AGE))

Monthly salary
TOTAL	Young	Old
57600	27500	30100

The corresponding percentage table is:

TABULATE  HEADER = (X BY SALARY BY (TOTAL%X THEN AGE))
PERCENT = X 'Percent' /

Percent
Monthly salary
TOTAL	Young	Old
100	48	52

Moving the clause (TOTAL%X THEN AGE) to the front of the expression, results in the same data in the table with different labeling:

TABULATE HEADER = ((TOTAL%X THEN AGE) BY X BY SALARY)
         PERCENT = X 'Percent'

TOTAL	Young	Old
Percent	Percent
Monthly salary	Monthly salary	Monthly salary
100	48	52

Original Values and Percents

Original values can be included as well as the percents. In the examples that follow, all of the manipulations are performed in the header expression. However, they could equally be done in stub or wafer expressions.

The following has alternating columns of original values and percents. Note only the cells nested within the percent variable X contain percents.

TABULATE HEADER  = (REGION%X BY X.T) STUB = (AGE.T)
         PERCENT =  X 'Pct' /

	TOTAL		North		South
	TOTAL	Pct	TOTAL	Pct	TOTAL	Pct
TOTAL...... Young...... Old........	22 10 11	100 100 100	12 5 7	55 50 64	9 5 4	41 50 36

Instead of presenting the data in alternating columns, it could be grouped into separate blocks of frequencies and percents by inverting the order of the header expression.

TABULATE HEADER  = (X.T BY REGION%X) STUB = (AGE.T)
         PERCENT =  X 'Pct' /

	TOTAL			Pct
	TOTAL	North	South	TOTAL	North	South
TOTAL..... Young..... Old.......	22 10 11	12 5 7	9 5 4	100 100 100	55 50 64	41 50 36

Percents for Row Totals.

TABULATE HEADER  = (TOTAL THEN (X BY REGION%X)) STUB = (AGE.T)
         PERCENT =  X 'Pct' /

	TOTAL	Pct
	TOTAL	TOTAL	North	South
TOTAL.......... Young.......... Old............	22 10 11	100 100 100	55 50 64	41 50 36

An analogous set of tables can be designed for percentages and totals of an observation variable. The following table displays alternating columns of total salary and percentage.

TABULATE HEADER  = (SALARY BY SEX%X BY X.T STUB) = (AGE.T)
         PERCENT =  X 'Pct'

	Current monthly salary
	TOTAL		Male		Female
	TOTAL	Pct	TOTAL	Pct	TOTAL	Pct
TOTAL..... Young..... Old.......	57600 27500 30100	100 100 100	34950 13050 21900	61 47 73	22650 14450 8200	39 53 27

This table groups all salary columns followed by all the percent columns.

TABULATE HEADER  = (SALARY BY X.T BY SEX%X) STUB = (AGE.T)
         PERCENT =  X 'Pct'

	Current monthly salary
	TOTAL			Pct
	TOTAL	Male	Female	TOTAL	Male	Female
TOTAL..... Young..... Old.......	57600 27500 30100	34950 13050 21900	22650 14450 8200	100 100 100	61 47 73	39 53 27

The following shows total salary for both sexes and percentage by sex.

TABULATE HEADER  = (SALARY BY (TOTAL THEN (X BY SEX%X))) STUB = (AGE.T)
         PERCENT =  X 'Pct'

	Current monthly salary
	TOTAL	Pct
		TOTAL	Male	Female
TOTAL..... Young..... Old.......	57600 27500 30100	100 100 100	61 47 73	39 53 27

Percentages in Wafers

It is possible to display frequencies and percentages in different wafers of the same table. For example:

TABULATE HEADER  = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
         PERCENT =  X 'Percent'

The wafer expression is X.T which stands for TOTAL THEN X. Thus the table consists of two wafers, the first wafer contains totals and the second contains percentages. Each wafer is a two-dimensional table of AGE (in the stub) by REGION (in the header). Since the base marker, %X, appears in both the stub and the header, the percentage wafer contains overall percentages. The following two wafers are the output of the previous tabulate statement:

HEADER  = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
Table of Contents:
TOTAL........................................  page 1
Percent......................................  page 2

HEADER  = (REGION%X) STUB = (AGE%X) WAFER = (X.T)
TOTAL..............

	TOTAL	North	South
TOTAL.......... Young.......... Old............	22 10 11	12 5 7	9 5 4

(On next page)

HEADER = (REGION%X) STUB = (AGE%X) WAFER = (X.T) Percent............

	TOTAL	North	South
TOTAL.......... Young.......... Old............	100 45 50	55 23 32	41 23 18

Example percentage tables

The following examples illustrate some of the different ways to display percentages. The examples are all based on the following table of frequency counts.

TABULATE HEADER = (REGION.T) STUB = (AGE.T BY SEX.T)

	TOTAL	North	South
TOTAL TOTAL........... Male............ Female.......... Young TOTAL........... Male............ Female.......... Old TOTAL........... Male............ Female..........	22 13 8 10 5 5 11 8 3	12 9 3 5 3 2 7 6 1	9 4 5 5 2 3 4 2 2

Base marker appended to AGE

This table contains nine base (100%) cells. The TOTAL for AGE is the base cell (AGE%X). Since


SEX.T

is nested within AGE, there are three base cell rows, and since the header contains three columns (REGION.T) there is a total of nine base cells (three columns times three rows).

TABULATE HEADER = (X BY REGION.T)  STUB = (AGE%X BY SEX.T)  PERCENT = X 'Percent'

	Percent
	TOTAL	North	South
TOTAL TOTAL.......... Male........... Female......... Young TOTAL.......... Male........... Female......... Old TOTAL.......... Male........... Female.........	100 100 100 45 38 63 50 62 38	100 100 100 42 33 67 58 67 33	100 100 100 56 50 60 44 50 40

Base marker appended to SEX

Note that the nine base cells now occur where TOTAL for

SEX

appears.

TABULATE HEADER = (X BY REGION.T) STUB = (AGE BY SEX%X) PERCENT = X 'Percent'

	Percent
	TOTAL	North	South
Young TOTAL.......... Male........... Female......... Old TOTAL.......... Male........... Female.........	100 50 50 100 73 27	100 60 40 100 86 14	100 40 60 100 50 50

Use of FIRST

In the previous table, base rows appeared at every occurrence of the TOTAL variable for SEX. The next table uses .FIRST to restrict the base row to the first occurrence of TOTAL for SEX.

TABULATE HEADER  = (X BY REGION.T) STUB = (AGE BY SEX%X.F)
         PERCENT =  X 'Percent'

	Percent
	TOTAL	North	South
Young TOTAL.......... Male........... Female......... Old TOTAL....... ... Male........... Female.........	100 50 50 110 80 30	100 60 40 140 120 20	100 40 60 80 40 40

Multiple Base Markers

The next three tables are similar to the first three with the addition of a second base marker appended to REGION in the header. The base cells occur at the intersections of the base rows and columns.

TABULATE HEADER  = (X BY REGION%X) STUB = (AGE%X BY SEX.T)
         PERCENT =  X 'Percent'

	Percent
	TOTAL	North	South
TOTAL TOTAL.......... Male........... Female......... Young TOTAL..... ..... Male........... Female......... Old TOTAL.......... Male........... Female.........	100 100 100 45 38 63 50 62 38	55 69 38 23 23 25 32 46 13	41 31 63 23 15 38 18 15 25

TABULATE HEADER  = (X BY REGION%X) STUB = (AGE BY SEX%X)
         PERCENT = X 'Percent'

	Percent
	TOTAL	North	South
Young TOTAL.......... Male........... Female......... Old TOTAL....... ... Male........... Female.........	100 50 50 100 73 27	50 30 20 64 55 9	50 20 30 36 18 18

TABULATE HEADER  = (X BY REGION%X) STUB = (AGE BY SEX%X.F)
         PERCENT = X 'Percent'

	Percent
	TOTAL	North	South
Young TOTAL.......... Male........... Female......... Old TOTAL....... ... Male........... Female.........	100 50 50 110 80 30	50 30 20 70 60 10	50 20 30 40 20 20

Multiple Percent Variables Within a Table

More than one percent distribution can be displayed in a single table by specifying more than one percent variable. For example, the following two variables could be specified representing row and column percentages:

PERCENT=  PR 'Row Pct' PC 'Col Pct' /

Since PR represents row percentages, the base marker associated with PR must be in the header expression. Similarly, since PC, represents column percentages, the base marker associated with PC must be in the stub expression. In the example below, the results are presented in two tables concatenated together. The first table contains row percentages, the second contains column percentages.

TABULATE HEADER  = ((PR + PC) BY REGION%PR) STUB = (AGE%PC)
         PERCENT = PR 'Row Pct' PC 'Col Pct'

	Row Pct			Col Pct
	TOTAL	North	South	TOTAL	North	South
TOTAL..... Young..... Old.......	100 100 100	55 50 64	41 50 36	100 45 50	100 42 58	100 56 44

Multiple percents in Rows

The next example presents the same data in a more conventional form. Notice (PR + PC) is moved to the stub expression.

TABULATE HEADER  = (REGION%PR) STUB = (AGE%PC BY (PR + PC))
         PERCENT = PR 'Row Pct' /
         PERCENT = PC 'Col Pct' /

	TOTAL	North	South
TOTAL Row Pct......... Col Pct......... Young Row Pct......... Col Pct......... Old Row Pct......... Col Pct.........	100 100 100 45 100 50	55 100 50 42 64 58	41 100 50 56 36 44

Example: Overall Percents

Overall percentages can be included in the same table by adding a third percent variable, PT. Since PT is to represent overall percentages, append the PT base marker to both the stub and the header variables.

TABULATE HEADER =  (REGION%PR%PT) STUB = (AGE%PC%PT BY (PR + PC + PT))
         PERCENT = PR 'Row Pct'
         PERCENT = PC 'Col Pct'
         PERCENT = PT 'Tot Pct'

	TOTAL	North	South
TOTAL Row Pct......... Col Pct......... Tot Pct......... Young Row Pct......... Col Pct......... Tot Pct......... Old Row Pct......... Col Pct......... Tot Pct.........	100 100 100 100 45 45 100 50 50	55 100 55 50 42 23 64 58 32	41 100 41 50 56 23 36 44 18

Counts and Multiple Percents

Counts can be included by adding the TOTAL keyword:

TABULATE HEADER = (REGION%PR%PT) STUB = (AGE%PC%PT BY (TOTAL + PR + PC + PT))
         TOTAL = TOTAL 'Count'
         PERCENT = PR 'Row Pct'
         PERCENT = PC'Col Pct'
         PERCENT = PT 'Tot Pct'

	TOTAL	North	South
TOTAL Count........... Row Pct......... Col Pct......... Tot Pct......... Young Count........... Row Pct......... Col Pct......... Tot Pct......... Old Count........... Row Pct......... Col Pct......... Tot Pct.........	22 100 100 100 10 100 45 45 11 100 50 50	12 55 100 55 5 50 42 23 7 64 58 32	9 41 100 41 5 50 56 23 4 36 44 18

Percentages in both Columns and Rows

It is not possible to produce a table which has a column of percentages on the total and a row of percentages on the same total. The solution to this is to produce a second table on the same file with a single row - the final percentages.

Suppose the table is a breakdown of males and females under and over forty. The basic table (without percentages) is easily produced with a table statement such as:

TABULATE HEADER = (AGEGROUP.T)  STUB = (GENDER.T)

However it is impossible to produce both sets of percentages in the same table. The required output is the following:

	Total Young and Old	Under 40	40+	% BY AGE
Total Males and Females	20	6	14	100%
Male	12	2	10	60%
Female	8	4	4	40%
% BY GENDER	100%	30%	70%

It is simple to add percentages to either the row totals, giving percentages of males and females or the column totals giving percentages of age groups but not both at once. The solution is to produce two tables in the one run:

TABULATE HEADER =(TOTY THEN AGEGROUP THEN X) STUB = (TOTX%X THEN GENDER)
         FILENAME = A
         TOTAL = TOTX 'Total Males and Females'
         PERCENT = X '% BY AGE'
TABULATE HEADER = (TOTY%Y THEN AGEGROUP) STUB = (Y)
         TOTAL = TOTY 'Total Young and Old'
         PERCENT = Y '% BY GENDER'

Note that the second tabulate does not have a FILE clause and thus writes the output to the same file. This second tabulate produces one row which is the required set of final percentages.

Record Filtering

BOOLEAN  = (logical expression)
SAMPLE   = fraction
WEIGHT   = varname

`BOOLEAN`	Specifies which procedure table records are used by the procedure. The procedure table records for which the logical expression is true are used by the procedure. If this option is not specified, all procedure table records are used.
`SAMPLE`	Specifies that a random sample of the procedure table records are used by the procedure. The fraction specifies the percent of records used and is specified as a positive decimal number less than or equal to 1 (one). .25, for example specifies that a 25% sample be used.
`WEIGHT`	Specifies an integer procedure variable used to weight the frequency counts and aggregations. By default, each record adds a count of 1 to frequency counts. Specifying a weight variable, adds the value of that variable rather than 1. For example, if `WEIGHT = FAMSIZE` were specified, then, in a table of `RACE` by `REGION`, a procedure table record would contribute a count of 5 to the `RACE=1`, `REGION=2` cell of the table if it contained the data: RACE=1 REGION=2 FAMSIZE =5

Print Formatting

PRINTFORMATS = var list (options,...) .../

By default, cell contents are printed as integers. The PRINTFORMATS option is used to alter the defaults. The options and their abbreviations are:

COMMAS (C) Separates groups of 3 digits by commas.
DECIMAL=n (n) Specifies the number of decimal places to be printed. A decimal point is only printed when the number of decimal places is non-zero.
DOLLAR (D) Places a floating dollar sign before the number.
PERCENT (P) Places a trailing percent sign after the number.
By default the DOLLAR, PERCENT and COMMAS options are not in effect and DECIMAL= 0 (i.e. numbers are printed as integers).

Use of the PRINTFORMATS clause.

TABULATE HEADER = (SALARY BY (TOTAL THEN (X BY SEX%X))) STUB = (AGE.T)
         PERCENT = X 'Pct' /
         PRINTFORMATS = SALARY (D,C) X (P,2) /

	Current monthly salary
	TOTAL	Pct
		TOTAL	Male	Female
TOTAL.......... Young.......... Old............	$57,600 $27,500 $30,100	100.00% 100.00% 100.00%	60.68% 47.45% 72.76%	39.32% 52.55% 27.24%

TOTAL under SALARY is printed with dollar signs and commas. However, TOTAL under PCT is printed with a percent sign and two decimal places because of the precedence rules for print formats.

Precedence rules for print formats

Except for percents, variables that have no assigned PRINTFORMAT are not considered in the precedence analysis.
PRINTFORMATS for percentages have the highest precedence. If a percent variable is not assigned an explicit PRINTFORMAT, it is printed as an integer.
In an expression containing an observation variable nested with a control variable, the observation variable's PRINTFORMAT takes precedence.
When a statistical keyword (e.g. MEAN, MAX, etc.) is nested with an observation variable, the observation variable's PRINTFORMAT is used. When nesting variables, the PRINTFORMAT of the variable lower in the nesting takes precedence.

For example:

TABULATE HEADER  = (SALARY BY REGION BY (MIN THEN MAX))
         MINIMUM = MIN 'Lowest'
         MAXIMUM = MAX 'Highest'
         PRINTFORMATS = SALARY (D) MIN (2)MAX (2)

The salaries are listed with 2 decimal points and no dollar sign. To print the salaries with dollar signs and no decimal points the expression is:

TABULATE HEADER = (REGION BY (MIN THEN MAX) BY SALARY)

When an expression contains two or more nested control variables, the PRINTFORMAT of the lowest variable in the nesting takes precedence.
For the purposes of PRINTFORMAT precedence, the header expression has highest precedence followed by the stub and then the wafer.

Labels

When control variable values are printed, they are labeled by their value labels. If there are no value labels, the name of the variable and the individual value is displayed.

Observation variables are displayed in a summary cell labeled with the variable label. If there is no variable label, the variable name is used.

When not enough room is provided in a table for label or title information, the label or title is broken into two or more segments so that it can fit in the allotted space. The VALUE LABELS command defines value labels for missing values including BLANK and UNDEFINED.

Page Formatting

PAGETITLE = 'string'['string'['string']]
PAGELENGTH = n
PAGEWIDTH = n
COLLAPSE

The following options alter the default formatting of the page.

PAGETITLE Specifies the title printed left justified at the top of the page. By default, the page title is the string "SIR/XS Tabulate Procedure" and the page number is printed at the top right-hand side of the page. The page title may consist of up to 3 lines. The first line contains the page number. The second and third lines are printed if there are wafers and/or if the wafers must be broken into chunks (i.e. the wafer does not fit on a single page).
PAGELENGTH The maximum number of lines that can appear on a page. Specifying PAGELENGTH = NOEJECT turns off paging. The default PAGELENGTH value is the current output file page length. This must include space for:
the page title and page number line(s),
the wafer title, wafer label, and the blank lines between,
the table (wafer) itself and any WAFERNOTES specified.
PAGEWIDTH Controls the width of the printed page. By default, tables are printed centred within PAGEWIDTH print positions. The default value is the current output file page width.
COLLAPSE If a wafer exceeds either PAGELENGTH or PAGEWIDTH, an attempt is made to break the wafer into multiple 'chunks', each printed on a separate page. The chunks appear in the output in order from left to right, then top to bottom. The wafer and stub labels appear as needed. The COLLAPSE option allows chunks of broken up wafers to be printed on the same page. No more than 2 chunks are printed per page. The default is not to collapse chunks into a single page.

Header Formatting

HEADERWIDTH = n
HEADERINDENTATION = n
HEADERDIVIDER = 'character'
NODIVIDERS
NOHEADERCENTER

The following options alter the default formatting of the column header.

HEADERWIDTH Specifies the width to be used in printing each column of the table. This width does not include the column divider. The default header width is 10 print positions.
If HEADERWIDTH is set too small, there may be insufficient room to print a number in the table, in which case the field is filled with Xs. Also, if HEADERWIDTH is too small, there may not be enough room for the header labels to be printed and they may be severely segmented. Conversely, if HEADERWIDTH is set too large, the table header may contain too many print positions for one page. When this occurs, the page is broken into chunks. Chunks are printed on successive pages left to right, top to bottom.
HEADERINDENTATION Specifies the number of print lines for each level of the header nesting. The default is 3 and it cannot be set smaller.
HEADERDIVIDER Specifies the character used as the column divider. The default is a vertical bar (|). This character is used for both the label area of the header as well as between the columns of data.
NODIVIDERS Sets the divider character between columns of data to blank. It does not affect the label area of the header.
NOHEADERCENTER Left justifies header labels in a column. By default header labels are centred in the column.

Stub Formatting

STUBTITLE = option 'string'
STUBWIDTH = n
STUBINDENTATION = n
STUBCONTINUATION = n
STUBHEIGHT = n
STUBFILLER = 'character'
STUBDIVIDER = 'character'

The following options alter the default formatting of the stub:

STUBTITLE Specifies a title which appears in the boxed in area to the left of the header and directly above the stub labels. The options allowed are CENTER, LEFT and RIGHT and control the justification of the stub label. If the option is omitted, it is assumed to be CENTER. If the stub title is too long, it is automatically segmented over two or more lines. There is no default stub title.
STUBWIDTH Specifies the number of print positions for stub labels. The default is 20 print positions.
If "too much" stub label segmentation occurs, TABULATE aborts with a WAFER/STUB OUTPUT FORMATTING ERROR. The solution is to increase the STUBWIDTH.
STUBINDENTATION Specifies the indentation for each level of nesting. The default is 2 print positions.
STUBCONTINUATION Specifies the indentation for continuation lines. A label which exceeds the stub width is segmented into two or more lines. Continuation lines are indented by the parameter. The default indentation is 3 print positions.
STUBHEIGHT Specifies the number of print lines used for each stub label. The default is 1 line. Label segmentation works independently from the STUBHEIGHT parameter and, therefore, some stub labels may occupy more than STUBHEIGHT lines.
STUBFILLER Stub labels (on lines which contain data) are filled with the character to the full width of the stub. The default stub filler is a period (.).
STUBDIVIDER Specifies the horizontal divider character. The default is the dash (-).

Stub Formatting

TABULATE STUB = (AGE BY SEX) HEADER = (REGION.T)
         STUBTITLE = 'Age by Sex'
         STUBWIDTH = 20

Age by Sex	TOTAL	North	South
Young Male............ Female.......... Old Male............ Female..........	5 5 8 3	3 2 6 1	2 3 2 2

TABULATE STUB = (AGE BY SEX) HEADER = (REGION.T
         STUBFILLER = ' '

	TOTAL	North	South
Young Male Female Old Male Female	5 5 8 3	3 2 6 1	2 3 2 2

Wafer Formatting

WAFERTITLE = option 'string'
NOWAFERTITLE
WAFERCENTER
WAFERSTART = n
WAFERINDENTATION = n
WAFERCONTINUATION = n
WAFERHEIGHT = n

When a tabulate statement contains a wafer expression, each wafer produced contains a wafer label in the upper left corner below the wafer title. If there is no wafer expression in the tabulate statement, a wafer label is not printed.

Wafer labels are formed in the same way that header and stub labels are produced.

WAFERTITLE Specifies a wafer title printed above each wafer. By default, the tabulate statement is printed as the wafer title, centred and one line above the wafer (table). The option may be CENTER, LEFT or RIGHT to position the wafer title over the wafer.
NOWAFERTITLE Suppresses printing of the wafer title.
WAFERCENTER Centres the wafer on the page. By default wafers are left justified on the page.
WAFERSTART Specifies the starting print position for the wafer. Does not apply when WAFERCENTER is specified.
WAFERINDENTATION Specifies the indentation used for each level of nesting in the wafer title. The default is 2 print positions.
WAFERCONTINUATION Specifies the indentation for continuation lines when a wafer label is broken up into multiple lines. The default is 3 print positions.
WAFERHEIGHT Specifies the number of print lines for the wafer label. The default is 1. This height is automatically adjusted if a label does not fit in the width of the wafer title and has to be segmented.
The number of print positions available for printing the wafer label is the same as the width of the stub titles. If the wafer title does not fit, TABULATE reports a WAFER/STUB OUTPUT FORMATTING ERROR. The solution is to increase the size of STUBWIDTH.

Wafers

The following table would consist of four wafers labeled as follows:

TABULATE  WAFER = (AGE BY REGION) STUB = (SEX) HEADER = (RACE)
         Under 21
          North.......

         Under 21
          South.......

         21 and Over
           North.......

         21 and Over
           South.......

Other Options

HTML 
NOBOTTOMBORDER 
NOLEFTBORDER 
NORIGHTBORDER 
NOTES     = 'string'
FOOTNOTES = 'string' 
WAFERNOTES = 'string' 
NOZEROS 
ZEROCHARACTER = 'character' 
NOEMPTYROWS 

DELETEMISSING 
UPPERCASE 
SPANNERS 
DUMMY   = variable 'label' ...
LANGUAGE

HTML

Specifies that the output file contains data in html format for viewing by a browser or other package which expects this format.

When output is in html format, then the print formatting, page formatting and many of the general output control options can be specified but have no effect. There is no control over the detailed appearance of the table. The general shape of the table is dictated by the table expressions and the specifics of the table appearance depend on the software used to view and/or print the resulting html.

Border Generation

By default all borders are generated. The NOBOTTOMBORDER, NOLEFTBORDER and


NORIGHTBORDER

options alter the production of the left, right and bottom borders of each wafer. They can be used in conjunction with the HEADERDIVIDER and STUBDIVIDER.

Example: Dummy Spanner Labels

TABULATE HEADER = (J BY (SEX THEN REGION))
         DUMMY  = J 'Sex and region Data'

	Sex and region Data
	Male	Female	North	South
Dummy..........	13	8	12	9

Language

If a language is specified, the effect is to produce translated versions of the following words or phrases:
SIR/XS Tabulate Procedure, Footnotes, Notes,Page, empty/deleted, Table of Contents, cont'd, of, Wafer, and, Chunk.
Recognised language keywords are HEBREW, DUTCH and GERMAN.

String Specifications

Whenever an option includes the specification of strings or labels, a string can be specified in multiple segments using the following format:

'segment' option 'segment'

Option can be a plus (+), blank or minus(-); the different characters specify where strings can be broken across lines:

Plus (+) simply concatenates two segments; blank forces a line break between the two segments; minus (-) specifies optional segmentation as per the following rules.

Label Segmentation

If a label does not fit, it is broken into segments by splitting the text at an appropriate breakpoint. The breakpoint is chosen by searching the label from right to left looking for one of the following conditions taken in sequence:

Forced break.
Conditional break.
Space, hyphen, or underscore.
Vowels. If a vowel is found, the break occurs after the vowel, with that segment followed by a hyphen when printed.

If no breakpoint is found, the label is truncated at a point which allows a hyphen to be printed.

After the first line segment has been formed, all the remaining characters of the original label to the right of the break character are considered next. If the remaining segment fits in the allocated space, it is printed, otherwise, the first n characters of this portion of the label (where n is the space available) are also scanned from right to left and segmented as described above. This process continues until the entire label has been printed.

Error Processing

There are two stages at which errors in the TABULATE command are detected, during compilation and during execution. Errors detected during compilation cause error messages to be displayed and the VisualPQL program is not executed. Errors detected during execution cause the program to terminate.

If during compilation, a CELL CONTENTS ERROR error occurs, check that:

Not more than 1 observation variable or statistic has been nested into the same table cell.
That there is an observation variable nested with each cell that contains statistics such as mean and median.
The base marker(s) have been specified for a percentage table.

The table is formatted after the program is run, and some problems are diagnosed at this stage. The error WAFER/STUB OUTPUT FORMATTING ERROR implies that there is a problem inserting the wafer and/or stub labels within their allocated space in each wafer. Usually there are too many levels of nesting, within the wafer or stub, for the width of the STUBWIDTH.

If there is not enough room on a page to print an entire row or column, TABULATE breaks the wafer into + "chunks". If a BY clause has been specified, all of the rows or columns beneath the highest level in a nesting must fit on a single page, otherwise a CHUNK FORMATTING ERROR is reported. This implies that a wafer is too large to print on a single page and that it cannot be broken at any 'clean' place to produce multiple 'chunks' of output. A wafer can be broken into chunks only along the highest variable nested within the stub or header. The solutions are to:

Reorganise the TABULATE statement, or
Change the size formatting options to fit more of each wafer onto each page.

`HEADER = (expression)`	Defines the columns of the tabulation. This clause is required.
`STUB = (expression)`	Defines the rows of the tabulation.
`WAFER = (expression)`	Defines the sections of the tabulation. Sections are a number of two dimensional sections specified by the stub and header.
BY	Nest a variable within another thus producing combinations of the variable values within one dimension of the table. The asterisk (*) symbol may be used instead of the keyword.
THEN	Appends a second variable or statistical keyword to the set of variables displayed within one dimension of the table. The plus (+) symbol may be used instead of the keyword.
BY and THEN	Can be used in combination to create complex structures within a single dimension or expression.

`COUNT`	A count of records in this category.
`CSS`	Corrected Sum of Squares, where the cell = sum of squares - ((sum**2)/n).
`CV`	Coefficient of Variance, where the cell = (standard deviation/mean) * 100.
`CVERR`	Coefficient of Error, where the cell = (standard error/mean) * 100.
`MAXIMUM`	Maximum value of the variable.
`MEAN`	Mean value of the variable.
`MEDIAN`	Median value of the variable.
`MINIMUM`	Minimum value of the variable.
`MISSING`	Count of missing values in the variable.
`NORMALIZED*`	Normalized quantiles (by median value) where cell = (quantile/median) * 100.
`PERCENT*`	Percentage, where cell = (cell/base) * 100.
`QUANTILE*`	Quantile, equal or unequal quantiles can be specified.
`RANGE*`	Range produces the range of the variable expressed as the difference between the lowest and the highest values encountered. The format of the range clause is: `RANGE = variable (lo,hi) ['label']` The specification of the lo and hi values defines the outer limits of any values to be included in the calculation. This can be used, for example, to exclude negative or zero values or very high numbers used as some coding convention.
`STDERR`	Standard Error where cell = standard deviation/sqrt(n).
`STDEV`	Standard Deviation about the mean.
`TOTAL`	Count of records. When cross tabulated with an observation variable, gives sum of values of the observation variable.
`TSTATISTIC`	T-Statistic where cell = mean/standard error.
`USS`	Uncorrected Sum of Squares where cell = sum of squares.

`COMMAS (C)`	Separates groups of 3 digits by commas.
`DECIMAL=n (n)`	Specifies the number of decimal places to be printed. A decimal point is only printed when the number of decimal places is non-zero.
`DOLLAR (D)`	Places a floating dollar sign before the number.
`PERCENT (P)`	Places a trailing percent sign after the number.

`PAGETITLE`	Specifies the title printed left justified at the top of the page. By default, the page title is the string "SIR/XS Tabulate Procedure" and the page number is printed at the top right-hand side of the page. The page title may consist of up to 3 lines. The first line contains the page number. The second and third lines are printed if there are wafers and/or if the wafers must be broken into chunks (i.e. the wafer does not fit on a single page).
`PAGELENGTH`	The maximum number of lines that can appear on a page. Specifying `PAGELENGTH = NOEJECT` turns off paging. The default `PAGELENGTH` value is the current output file page length. This must include space for: the page title and page number line(s), the wafer title, wafer label, and the blank lines between, the table (wafer) itself and any `WAFERNOTES` specified.
`PAGEWIDTH`	Controls the width of the printed page. By default, tables are printed centred within `PAGEWIDTH` print positions. The default value is the current output file page width.
`COLLAPSE`	If a wafer exceeds either `PAGELENGTH` or `PAGEWIDTH`, an attempt is made to break the wafer into multiple 'chunks', each printed on a separate page. The chunks appear in the output in order from left to right, then top to bottom. The wafer and stub labels appear as needed. The `COLLAPSE` option allows chunks of broken up wafers to be printed on the same page. No more than 2 chunks are printed per page. The default is not to collapse chunks into a single page.

`HEADERWIDTH`	Specifies the width to be used in printing each column of the table. This width does not include the column divider. The default header width is 10 print positions. If `HEADERWIDTH` is set too small, there may be insufficient room to print a number in the table, in which case the field is filled with `X`s. Also, if `HEADERWIDTH` is too small, there may not be enough room for the header labels to be printed and they may be severely segmented. Conversely, if `HEADERWIDTH` is set too large, the table header may contain too many print positions for one page. When this occurs, the page is broken into chunks. Chunks are printed on successive pages left to right, top to bottom.
`HEADERINDENTATION`	Specifies the number of print lines for each level of the header nesting. The default is 3 and it cannot be set smaller.
`HEADERDIVIDER`	Specifies the character used as the column divider. The default is a vertical bar (\|). This character is used for both the label area of the header as well as between the columns of data.
`NODIVIDERS`	Sets the divider character between columns of data to blank. It does not affect the label area of the header.
`NOHEADERCENTER`	Left justifies header labels in a column. By default header labels are centred in the column.

`STUBTITLE`	Specifies a title which appears in the boxed in area to the left of the header and directly above the stub labels. The options allowed are `CENTER, LEFT` and `RIGHT` and control the justification of the stub label. If the option is omitted, it is assumed to be `CENTER`. If the stub title is too long, it is automatically segmented over two or more lines. There is no default stub title.
`STUBWIDTH`	Specifies the number of print positions for stub labels. The default is 20 print positions. If "too much" stub label segmentation occurs, `TABULATE` aborts with a `WAFER/STUB OUTPUT FORMATTING ERROR.` The solution is to increase the `STUBWIDTH`.
`STUBINDENTATION`	Specifies the indentation for each level of nesting. The default is 2 print positions.
`STUBCONTINUATION`	Specifies the indentation for continuation lines. A label which exceeds the stub width is segmented into two or more lines. Continuation lines are indented by the parameter. The default indentation is 3 print positions.
`STUBHEIGHT`	Specifies the number of print lines used for each stub label. The default is 1 line. Label segmentation works independently from the `STUBHEIGHT` parameter and, therefore, some stub labels may occupy more than `STUBHEIGHT` lines.
`STUBFILLER`	Stub labels (on lines which contain data) are filled with the character to the full width of the stub. The default stub filler is a period (.).
`STUBDIVIDER`	Specifies the horizontal divider character. The default is the dash (-).

`WAFERTITLE`	Specifies a wafer title printed above each wafer. By default, the tabulate statement is printed as the wafer title, centred and one line above the wafer (table). The option may be CENTER, LEFT or `RIGHT` to position the wafer title over the wafer.
`NOWAFERTITLE`	Suppresses printing of the wafer title.
`WAFERCENTER`	Centres the wafer on the page. By default wafers are left justified on the page.
`WAFERSTART`	Specifies the starting print position for the wafer. Does not apply when `WAFERCENTER` is specified.
`WAFERINDENTATION`	Specifies the indentation used for each level of nesting in the wafer title. The default is 2 print positions.
`WAFERCONTINUATION`	Specifies the indentation for continuation lines when a wafer label is broken up into multiple lines. The default is 3 print positions.
`WAFERHEIGHT`	Specifies the number of print lines for the wafer label. The default is 1. This height is automatically adjusted if a label does not fit in the width of the wafer title and has to be segmented.

Tabulate

Syntax

expressions

Expressions

BY

Example: BY

THEN

Example: THEN

Combining BY and THEN

Example: BY and THEN

Variable Types

Cell Contents

Example: Observation Variables

Variable Modifiers

Statistics

Example: Statistical keywords

Accuracy and ISDNUMBER

Total

Example: TOTAL

Example: TOTAL with BY and THEN

TOTAL shorthand notation

Normalized and Quantiles

Example: Quantiles

Percentages

Percentages with Counts

Row Percents

Column Percents

Base Markers in Both the Header and the Stub

Percentages with Observation Variables

Example: Percentages for Observation Variables

Original Values and Percents

Percents for Row Totals.

Percentages in Wafers

Example percentage tables

Base marker appended to AGE

Base marker appended to SEX

Use of FIRST

Multiple Base Markers

Multiple Percent Variables Within a Table

Multiple percents in Rows

Example: Overall Percents

Counts and Multiple Percents

Percentages in both Columns and Rows

Record Filtering

Print Formatting

Use of the PRINTFORMATS clause.

Precedence rules for print formats

Page Formatting

Header Formatting

Stub Formatting

Wafer Formatting

Other Options

HTML

Footnotes

Zero Printing

Converting Text to Uppercase

Language

Error Processing