|
SQL | SELECT |
SELECT [ DISTINCT ] variable_list FROM [ tabfile. ] table_name, | [ tabfile. ] view_name , | [ database.] record_name, | [ database.] CIR , | path_name , | [AS] alias_name,Keywords:
CASELIM n COMPILE_ONLY DBMS [ filename ] OUTPUT filename RECLIM n SAMPLE proportion [ , seed ] SELLIM nClauses:
FORMAT
GROUP BY
ON
ORDER BY
OUTER
UNION
WHERE
The SELECT
command takes data from one or
more existing records or rows and creates a new table with the selected data in
it. The definition of data in the new table is copied from the
existing definition. The number of rows in the new table depends on the number
of rows in the input and the particular clauses specified.
(See Output from SELECT.)
The variable list and the FROM
clause are required. All other
clauses are optional.
The result of the SELECT
is a new table containing the
selected variables. Tables created with the SELECT
command
are exactly the same as tables created in any other way. The
SELECT
command also controls the display format of the
variables in the created table. The display format of the table can be modified
with the display processor. Formatting can be specified on the
SELECT
with the FORMAT
option.
The basic form of the command is:
SELECT variable_list FROM record_list WHERE conditionThe variable list specifies the variables to be saved in the new table; the FROM clause, lists the records and tables which contain the input data and the WHERE clause, specifies the conditions under which records or rows are selected.
SELECT
affect the
way in which the output table is constructed. If variables are selected from a
single record or table, one output row is written for every individual data record
or row
which satisfies the WHERE
clause.If variables are selected from multiple records or rows, one output row is written for each individual data record or row which exists and which meets the selection criteria. For example the following results in one output row per combination of employee and review:
SELECT NAME POSITION RATING - FROM EMPLOYEE REVIEW - WHERE EMPLOYEE.ID = REVIEW.IDThis is sometimes referred to as an inner join. There is no output for instances where a record of one type exists but not the other. A join which produces an output row regardless of whether the joined record or row exists is known as an outer join. Specify the
OUTER
keyword following the FROM
clause.
Follow OUTER
with one or more record or table names. For
example;
SELECT .... FROM A B OUTER B SELECT .... FROM A B OUTER A SELECT .... FROM A B OUTER A BAll these examples generate an output row for every combination of A and B; the first example adds all occurrences of A where B does not exist; the second example adds all occurrences of B where A does not exist; the third example takes both occurrences of A with no B and occurrences of B with no A. These are sometimes referred to as a
RIGHT
outer join, a
LEFT
outer join and a SYMMETRIC
outer
join.
DISTINCT
keyword specifies that one output row is written
for each distinct or unique set of values selected. For example;
SELECT DISTINCT SALARY FROM EMPLOYEElists each salary that one or more people are earning. There is only one output row per value of salary, regardless of how many people earn that salary. The keyword
UNIQUE
(abbreviation UNQ
) is a
synonym for DISTINCT
.
SELECT AVG(SALARY) FROM EMPLOYEEThis returns one value, the average salary of all the employees; correspondingly, only one row is created in the output table.
The following functions are aggregation functions and alter the number of output rows produced:
SUM
AVG
STD
MAX
MIN
FIRST
LAST
COUNT
SUM, AVG
and STD
operate only on numeric data.
Other aggregation functions operate on any type of data.GROUP BY
The GROUP BY
clause specifies that one output row is written
for each unique value of the variables specified . For example,SELECT ..... FROM EMPLOYEE GROUP BY CURRPOS
produces one output row per value in the CURRPOS
variable.
Since this is a grouping of data, individual data items cannot be selected; the
only legal expressions that can be used in the variable list when GROUP
BY
is specified are aggregate data or the variables in the
GROUP BY
clause. For example, to produce a count of people
and a total salary in each position.SELECT CURRPOS COUNT(SALARY) SUM(SALARY) FROM EMPLOYEE -
GROUP BY CURRPOS
The table that results from aggregation has one entry per aggregation level. To
produce a table with individual values for a column plus subtotals and totals
(sums, averages, counts, or other statistics) , use the
Format clauses.
SELECT
is a list of variable names and
expressions. When a variable name is specified, the definition of that variable
is copied from the existing definition. If variables have the same name on
different records or tables, qualify the variable name by the name of the
record or table. Qualified names are separated by a period.
An asterisk (*) specifies all of the variables in all of the records or
rows referenced in the
Numeric expressions may use numeric constants, numeric variables, the arithmetic operators
( + , - , / , * , ** ), SQL functions, and parentheses to denote the order of operations.
Character expressions may use quoted strings, variables which are strings, the
"+" character to join strings, SQL string functions, and parentheses to
denote the order of operation. Enclose strings in single quotes.
FROM
clause. For
example,
SELECT ID NAME FROM EMPLOYEE
SELECT * FROM REVIEW
SELECT EMPLOYEE.NAME ..... FROM EMPLOYEE .....
Computing New Values
Expressions may be specified in the variable list. For example, the
following query computes the weekly salary by multiplying monthly salary
(SALARY
) by 12 and dividing by 52.SELECT ID NAME SALARY*12/52 FROM EMPLOYEE
The name of the computed variable in the output table is the first 32 characters
of the expression used to calculate it.Functions
Functions are specified
as a keyword followed by one or more arguments enclosed in
parentheses. Arguments may be variable names, constants or expressions. The
function operates on the arguments and returns a single value for each record
selected.
For example, RND
is a function which returns a number rounded
to the nearest whole integer:
SELECT NAME RND(SALARY*12/52) FROM EMPLOYEE
FROM
clause is required on the SELECT
and specifies the records,
paths, views and tables to be accessed. Database records may be specified
by name or number. If more than one record or table is
specified on the FROM
clause, a join is performed.
FROM
clause references more than
one source for the data.
There must be common values in some corresponding columns between the data
sources in the FROM
clause. For example, to create a table of
employee name and position level for each position ever held by the employee:
SELECT NAME POSITION FROM EMPLOYEE OCCUP - WHERE EMPLOYEE.ID = OCCUP.IDThis uses
ID
to form the relationship between the records
in the WHERE
clause. Use the record name as prefix to
differentiate between column names which are the same in different records or
tables. That is, EMPLOYEE.ID
refers to the value of
ID
in the Employee record, while OCCUP.ID
refers to the value of ID
in the Occup record.
In a case structured database
with case mode enabled (the default),
there is an automatic relationship between records. A query automatically
joins records for the same case. The automatic relationship has the same effect
as the WHERE
clauses matching each record on the case ID. For
example, referencing data from multiple records on the COMPANY
database, has the following implicit WHERE
clause:
SELECT ... FROM EMPLOYEE OCCUP REVIEW - WHERE EMPLOYEE.ID = OCCUP.ID - AND EMPLOYEE.ID = REVIEW.IDThere is no need to specify the
WHERE
clause to join on the
case identifier.
PATH
links one record (or table) to another record
(or table) and specifies the manner in which they are to be joined. System
defined paths are automatically created by SQL between records in a database
which have common keyfields. Common keyfields mean the same variable names of
the same type in the same sequence in different records. For example, on
the company database there is a system defined path between
OCCUP
and REVIEW
which is equivalent to:
... WHERE OCCUP.POSITION = REVIEW.POSITIONEach path has a name. Paths are explicitly invoked by naming a record (or table) and a path name on the
FROM
clause. The path is
invoked implicitly by naming both records (or tables) in the
FROM
clause. If there are multiple paths between two record
types, SQL uses the earliest defined path. SQL never automatically creates
more than one path between any two records.
The SHOW PATH
command displays the path definitions in order.Create paths with the CREATE PATH command. This names the path, the two records (or tables) to be joined and the variable(s) used to join them. For example,
CREATE PATH MYPATH - FROM COMPANY.OCCUP TO COMPANY.REVIEW - VIA POSITIONUse the path by naming the path rather than the second record on the
FROM
clause. For example,
SELECT .... FROM OCCUP MYPATH
FROM
clause. The way in which records are joined is
determined by the WHERE
clause, by any PATH
s
that are referenced in the FROM
clause and by the current
settings of CASE
mode and PATH
mode.
If CASE
mode is set, records are joined when the case
identifier variable in one record is equal to the case identifier in
another. This is equivalent to specifying a WHERE
clause such
as:
... WHERE RECONE.CASEID = RECTWO.CASEIDIf
PATH
mode is set, any paths
between the records referenced in the FROM
clause are
automatically used. The system defined paths imply joins based on the case
identifier and relationships implied in the keyfields.
For example, the default path joining the OCCUP
and
REVIEW
records in the sample database is equivalent to the
WHERE
clause:
... WHERE OCCUP.ID EQ REVIEW.ID AND OCCUP.POSITION EQ REVIEW.POSITIONThe setting of
CASE
has no effect on joins if
PATH
mode is set. (However if you are using a subquery, case
mode must be off). The setting of CASE
does have an
effect if PATH
s are cleared (records are still joined within
case).
If PATHS
are explicitly referenced in the
FROM
clause, the setting of CASE
and
PATH
have no effect, the PATH
is always
applied as defined.
A join condition can specify other relationships between columns, such as "greater than" (GT or >), "less than" (LT or <), etc. These are referred to as Non-Equi-Joins. For example, suppose a table is created of minimum and maximum starting salaries by division:
SELECT DIVISION MIN(STARTSAL) MAX(STARTSAL) - FROM OCCUP - ON DIVSAL - GROUP BY DIVISION - FORMAT COL 2 NAME MINSAL - FORMAT COL 3 NAME MAXSALThen, select anyone whose current salary is greater than the maximum or less than the minimum starting salaries.
SELECT ID NAME SALARY FROM EMPLOYEE REVIEW DIVSAL - WHERE SALARY GT MAXSAL OR SALARY LT MINSALNote that this non-equi-join is a join and not just a test on a column value. This means that a row is produced for every matching condition between the joined rows. Therefore, a row appears once for each time that salary is greater than one of the division maxsal columns or less than one of the division minsal columns.
Non-equi-joins on large tables can produce a tremendous number of rows and the
WHERE
conditions should be carefully examined to limit the
output to the required combinations.
FROM
clause. Use an alias to qualify a variable name when it
cannot be done by using the unique record or table name. A record or table
name is not unique if the same record or table is on two different
databases or tables you are joining, or if you join a record or table to itself.
The alias is defined in the record or table specification in the FROM
clause and is
used wherever the table or record name would be used in other parts of the SELECT
statement.
When an alias is defined, it follows the record name or the optional keyword AS
and
must be followed by a
comma (,) to separate it from any other record names.
For example, suppose a genealogical database where everybody is in a
PERSON
record, and each person has an ID
.
In each person's record are the ID
s of their father and
mother. Because this involves joining a record type to itself, use an alias:
SELECT CHILD.ID FATHER.ID MOTHER.ID - FROM PERSON AS CHILD, PERSON AS FATHER, PERSON AS MOTHER - WHERE CHILD.FATHERID EQ FATHER.ID AND - CHILD.MOTHERID EQ MOTHER.ID
CASELIM n
COMPILE_ONLY
DBMS filename
OUTPUT filename
PRINT
commands. This can also be set by the SET
command and by the
DISPLAY
command.
RECLIM n
RLIM
SAMPLE proportion [,seed]
SELLIM n
SLIM
FORMAT COLUMN column_list column_formatUse
FORMAT
clause(s) on a SELECT command to specify
the appearance of particular columns.
When a table is created, the default settings are used to format columns
unless specifically overridden with FORMAT
clauses.
Specify multiple format clauses on a single select command by
repeating the complete FORMAT
clause.
The first part of a format clause specifies one or more columns
to be formatted. Specify the keyword COLUMN
followed by the
column_list.
The next part of the FORMAT clause specifies the
Column Format for the particular columns
which is typically a keyword (e.g. WIDTH
) and a setting.
You can specify multiple options on a single format clause where the options apply to the specified column(s). For example:
SELECT ID SALARY*1.1 FROM EMPLOYEE - FORMAT COLUMN 1 NAME EMPLOYEE_ID WIDTH 12 - FORMAT COLUMN 2 NAME NEWSALARY
Specify a list or range of consecutive columns by specifying the start and end columns separated by a colon (:).
Examples of column specifications might be as follows:
FORMAT COLUMN SALARY option FORMAT COLUMN 1 option FORMAT COLUMN 3:7 option FORMAT COLUMN 1 4 8 option FORMAT COLUMN SALARY option FORMAT COLUMN BIRTHDAY option
The column formats are specified as a keyword and a setting as follows:
[ DATE 'date_map' ] [ DPLACES n ] [ EXPONENT n ] [ LABEL ON | OFF ] [ MISSCHAR 'c' ] [ NAME column_name ] [ NULL 'string'] [ SEPARATOR 'string' | n BLANKS ] [ TIME'time_map' ] [ VALLAB ON | OFF ] [ WIDTH n ] [ ZEROS ON | OFF | 'string']
DATE 'date map'
DPLACES n
DPL
EXPONENT n
EXPONENT 3
.EXPONENT 6
.
LABEL
ON
specifies that the variable label is used as
the column name. OFF
specifies the variable name is
the column name and is the default. Abbreviation: LAB
MISSCHAR
MISS
NAME
FORMAT col 3 NAME ANNUAL_SALARY
NULL
NULL
takes precedence over any
MISSCHAR
specified.
SEPARATOR
BLANKS
. Abbreviations: SEP
and
BL, BLANK
. For example:FORMAT COLUMN salary SEPARATOR 4 BLANKS
TIME
FORMAT COLUMN startime TIME 'HH:MM'
VALLAB
VALLAB ON
to display value labels instead
of data values. Specify VALLAB OFF
to display values. Use
SET
and CLEAR
to change the
VALLAB
setting for all variables in the
table. For example:FORMAT COLUMN marstat VALLAB ON
WIDTH
WID
. For example:FORMAT COLUMN salary WIDTH 12
ZEROS
ZEROS OFF
is the default. You can also specify a
string to be displayed if the value is zero. For example:FORMAT COLUMN salary ZEROS ON
GROUP BY
specifies that all sets of values selected are
grouped together according to their unique values in the value list. This
produces a summary table with one entry per group of records. For example, to
calculate the average salary for male and female employees:
SELECT VALLAB(GENDER) AVG(SALARY) FROM EMPLOYEE - GROUP BY GENDERThis produces a table with two entries:
VALLAB(GENDER) AVG(SALARY) Male 2745.83 Female 2831.25The aggregation can be done at additional levels by adding further variables to the
GROUP BY
clause:
SELECT VALLAB(GENDER) VALLAB(EDUC) AVG(SALARY) - FROM EMPLOYEE GROUP BY GENDER EDUC VALLAB(GENDER) VALLAB(EDUC) AVG(SALARY) Male Elementary 2533.33 Male High School 2500.00 Male Some University 2550.00 Male B.Sc. or B.A. 3050.00 Male M.S. 2625.00 Male Ph.D. 3350.00 Female High School 2600.00 Female Some University 2700.00 Female B.Sc. or B.A. 3533.33 Female M.S. 1650.00 Female Ph.D. 2400.00Where there are no records for a level, no row is created in the retrieval. For example, there are no female employees with only elementary education.
HAVING
clause with the GROUP BY
clause to select groups according to some condition. For example, to select
groups with an average salary greater than 2500.
SELECT VALLAB(EDUC) VALLAB(GENDER) COUNT(SALARY) AVG(SALARY) - FROM EMPLOYEE GROUP BY EDUC GENDER HAVING AVG(SALARY) > 2500
CLEAR CASE
command. This turns off case mode for the rest of
the session, or until it is re-enabled. Use the SET CASE
command to re-enable case mode. For example; with case mode on, to select the
average starting salary for all the positions an employee has had:
SELECT ID AVG(STARTSAL) FROM OCCUPThe result is one row for each employee with any
OCCUP
records. This gives the average of the starting salaries for each position the
employee has had. This same query with cases cleared gives a very different
result:
CLEAR CASE SELECT AVG(STARTSAL) FROM OCCUPThe result is one row, giving the average starting salary for all the positions held by all the employees in the company.
Case mode is equivalent to a GROUP BY
clause on the case id
for the aggregation functions.
For example, the following query computes the mean salary and the number of all
male employees.
If an output table is not specified with the
All tables are on tabfiles. If a tabfile is not specified, the default is
used. If a default tabfile has not been explicitly
If all defaults are used, the results of a
You cannot select
The sort keys may contain variables and expressions. The sort key variables do
not have to be in the variable list of the
Normally, a join operation creates a row for the resultant table if there exists
a record for every record name in the
The main
As many
If sequence of the output table is important, use the
The
The
The
The logical operator
The data comes from multiple cases and case structure must be off (
Subqueries can be used wherever a
The previous example shows the most basic use of a subquery - one that returns a
single value. If a subquery can return more than one value, specify how the
returned values are treated in the
Where a subquery can return more than one value, this is equivalent to a list.
For example, to select people whose salary is greater than anyone whose current
position is in division 1:
A
The logical function
Note: This query could have been performed more easily and
more efficiently as a simple join although this would return multiple rows for
people who had received multiple ratings of 5:
Aggregations with Missing or Undefined Values
Missing or undefined values are ignored in the computation of aggregation
functions. For example, if the value of SALARY
is missing for
an employee, the record is ignored in the computation of the average. The
average is a true average of actual values.COUNT
COUNT
counts the number of values selected in a query. It is
often used in conjunction with other aggregation functions to count how many
records were used in computing the aggregated value.SELECT AVG(SALARY) COUNT(SALARY) FROM EMPLOYEE -
WHERE GENDER = 1
This counts only those records which have a non-missing salary.
COUNT
can also have the argument * which specifies a count of all selected records
regardless of whether the values are valid, missing, or undefined. ON
ON
specifies the name of the new table to contain the results
of the query.ON
clause, a
table called PREVIOUS_SELECT
is used.SET
, a
tabfile called $SYSTEM
is used.SELECT
are stored
on the table $SYSTEM.PREVIOUS_SELECT
. This table is overwritten
as necessary without prompting for confirmation. If you specify a table as
output which already exists, you are prompted for confirmation that you want it
overwritten before proceeding with the SELECT
.ON
to a table you are selecting
FROM
. ORDER BY
Specify ORDER BY
to sort the selected rows. The first variable
in the ORDER BY
list is the major sort key. Specify keys in
sequence from major to minor. By default, variables sort in ascending sequence.
Follow the variable with the DESC
keyword to sort in
descending sequence. This only applies to the variable immediately preceding
DESC
. Missing values sort to the beginning of a set of values regardless as to
whether ascending or descending is specified. (The MISS
function can be used to retrieve original values for missing values.)SELECT
.DISPLAY
does not resequence rows and ORDER
BY
must be specified on the SELECT
to create the
table in a particular sequence if this differs from the source data. For
example:SELECT ID NAME SALARY FROM EMPLOYEE -
ORDER BY SALARY NAME
Synonyms: ORDER, SORT, SORT BY
OUTER
When joining records on a case structured database, OUTER
specifies that, if no matching record exist, the SELECT
operates as if a record containing undefined values for all variables did exist.FROM
clause.
The OUTER
option allows the retrieval of some data even when
some records may not exist. This operation is called an outer join. The
keyword OUTER
is specified after all records in the
FROM
clause and specifies all record names that the
OUTER
applies to.SELECT NAME REVDATE FROM EMPLOYEE REVIEW OUTER REVIEW
UNION
UNION
adds the result of a second SELECT
clause to the table created by the main SELECT
command. With
the UNION
clause, each SELECT
must result in
the same number of output columns and each column must correspond in type. The
first SELECT
command determines the names and the types of the
output columns. Numeric variables must correspond to numeric variables, string
variables with string variables.SELECT
command defines the table that is produced
when using the UNION
statement. For example, assume two
tables, one for current employees and one for ex-employees. A single output
table for all employees could be produced with the UNION
clause:SELECT NAME SALARY FROM EMPLOYEE ON ALLEMPLOYEES -
UNION SELECT NAME SALARY FROM EXEMPLOYEE
This creates a new table with two columns and a row per employee.SELECT
clauses as required may be
UNIONed
together as long as the rules on number and type of
variables are followed.ORDER BY
clause to specify it. Any such ORDER BY
should follow the
last clause of the last SELECT
in the command. WHERE
WHERE
specifies the logical conditions used to select records
or rows. Only records or rows meeting the conditions are selected. The
WHERE
clause can reference any variables, regardless as to
whether the variables are in the SELECT
variable list or not.WHERE
clause can reference expressions .
Expressions are a combination of variables and
operators which produce a new value.WHERE
clause may contain compound conditions connected by
the logical operators AND, OR, XOR
and NOT
.
AND
means both expressions must be true;
OR
means either expression must be true;
XOR
means one expression must be true but not both;
NOT
means the expression must not be true.WHERE
clause is evaluated in the following order of
precedence (parentheses ( ) can be used to denote an explicit order of
evaluation):
Relational operators
The WHERE
clause may include the following operators:
EQ or = or IS
NE or ><
LT or <
LE or <=
GT or >
GE or >=
BETWEEN expr AND expr
IN (expr,......)
LIKE
EQ NULL
NE NULL
NOT
can be used to test for the opposite
of any condition.EQ, NE, LT, LE, GT, GE & BETWEEN
These operators
test the relationship between two values. If the specified condition is true,
the data is selected. NE
is provided as a convenient
shorthand; it is identical to NOT EQ
.
BETWEEN
means equal to the end values or any value in
between.IN
Selects records or rows when the value that matches one or more values in a
list. For example, to select data for employees 1, 5, and 7.SELECT ... FROM EMPLOYEE WHERE ID IN (1,5,7)
To select all records except those in the list use NOT
:SELECT .
... WHERE NOT ( ID IN (1, 5, 7))
LIKE Pattern Matching
A pattern
is a partial string where symbols are used to indicate how
that position is to be treated. The pattern consists of symbols
plus the string you want to match. Subqueries
Subqueries are used to select rows from a table based on data in other rows.
The rows returned by one SELECT
statement are used in the
WHERE
clause of another SELECT
statement.
The subquery executes first and returns one or more values which are then used
by the main SELECT
as if it were given a set of constant
values. For example, to select the name, gender and education of all employees
who have the same education as Mary Black.SELECT NAME GENDER EDUC FROM EMPLOYEE -
WHERE EDUC = -
(SELECT EDUC FROM EMPLOYEE WHERE NAME = 'Mary Black')
The subquery (the one enclosed in parentheses) returns the value of Mary Black's
education. This value is then used as the object of the WHERE
clause for the main SELECT
. The information selected by the
main SELECT
consists of the name and education for all
employees with the same education as Mary. This set of employees naturally
includes Mary.CLEAR
CASE
) for this query to operate as required. With a case structured
database and subqueries which retrieve data from one case which is used to
SELECT
other cases, CLEAR CASE
mode.WHERE
can be specified.WHERE
clause with the
IN, ANY
and ALL
functions.
ANY
tests against any returned value; ALL
tests against every returned value. These can be used with the relational
operators (EQ, NE, LT, GT, LE, GE
). IN
tests a value to be equal to a value in the list of returned values and is
equivalent to EQ ANY
.SELECT ID NAME SALARY FROM EMPLOYEE -
WHERE SALARY GT ALL -
(SELECT SALARY FROM EMPLOYEE OCCUP -
WHERE CURRPOS EQ POSITION AND DIVISION EQ 1)
The subquery finds the division in the OCCUP record which matches the current
position and tests to be in division 1; the salary of all employees where this
is true is retrieved and these are now equivalent to a list of salaries. The
salary for each employee is tested against this list to be greater than all
entries in the list. A subquery must only return one column to be used to
construct the list.WHERE
clause can contain a combination of conditions and
subqueries. Any subquery either returns a single value or a list and can be
treated as equivalent to a value or list specified by expressions which are not
subqueries. The logical operators (AND, OR, XOR, NOT
) are
used to connect separate clauses in the WHERE
expression.EXISTS
tests that the subquery returns at
least one row. EXISTS
returns "True" if at least one row
exists, "False" if not. The test can be reversed with the NOT
logical operator. For example:SELECT ID NAME SALARY FROM 1 WHERE -
EXISTS (SELECT * FROM REVIEW WHERE RATING = 5)
When using the EXISTS
function, the column returned by the
subquery is irrelevant and must be specified as an asterisk (*).SELECT ID NAME SALARY FROM EMPLOYEE REVIEW -
WHERE RATING EQ 5