DataSet-SQL Select Statement

SELECT Statement

Retrieves rows from the DataSet and allows the selection of one or many rows or columns from one or many tables.

Syntax
SELECT statement ::=
    < query_expression >
    [ ORDER BY { order_by_expression | column_position [ ASC | DESC ] } [ ,...n ] ]

< query expression > ::=
    { < query specification > | ( < query expression > ) }
    [ UNION [ ALL ] < query specification | ( < query expression > ) [...n ] ]

< query specification > ::=
    SELECT [ ALL | DISTINCT ]
    [ { TOP integer } ]
    < select_list >
    [ INTO new_table ]
    [ FROM { < table_source > } [ ,...n ] ]
    [ WHERE < search_condition > ]
    [ GROUP BY group_by_expression [ ,...n ] ]
    [ HAVING < search_condition > ]

The UNION operator can be used between queries to combine their results into a single result set.

The -- (double hyphen) and /* ... */ comment syntaxes are supported and can be embedded anywhere in a SQL statement.

SELECT Clause

Specifies the columns to be returned by the query.

Syntax
SELECT [ ALL | DISTINCT ]
    [ TOP n ]
    < select_list >

< select_list > ::=
    { *
    | { table_name }.*
    | { column_name | expression }
    [ [ AS ] column_alias ]
    | column_alias = expression
    } [ ,...n ]

Arguments
ALL

Specifies that duplicate rows can appear in the result set. ALL is the default.

DISTINCT

Specifies that only unique rows can appear in the result set. Null values are considered equal for the purposes of the DISTINCT keyword.

TOP n

Specifies that only the first n rows are to be output from the query result set. n is an integer between 0 and 4294967295.  PERCENT is not supported.

< select_list >

The columns to be selected for the result set. The select list is a series of expressions separated by commas.

*
Specifies that all columns from all tables in the FROM clause should be returned. The columns are returned by table as specified in the FROM clause, and in the order in which they exist in the table.

table_name.*
Limits the scope of the * to the specified table.

column_name
Is the name of a column to return. Qualify column_name to prevent an ambiguous reference, such as occurs when two tables in the FROM clause have columns with duplicate names.

expression
Is a column name, constant, function, any combination of column names, constants, and functions connected by an operator(s), or a sub-query.

column_alias
Is an alternative name to replace the column name in the query result set. Aliases are used also to specify names for the results of expressions, for example:

SELECT AVG(Price) AS 'Average Price' FROM Product

 

INTO Clause

Creates a new DataTable and inserts the resulting rows from the query into it.  The DataTable is associated with the returned DataView.

You can use SELECT...INTO to create an identical table definition (different table name) with no data by having a FALSE condition in the WHERE clause.

Syntax
[ INTO new_table ]

Arguments
new_table

Specifies the name of a new table to be created, based on the columns in the select list and the rows chosen by the WHERE clause. The format of new_table is determined by evaluating the expressions in the select list. The columns in new_table are created in the order specified by the select list. Each column in new_table has the same name, data type, and value as the corresponding expression in the select list.

When a computed column is included in the select list, the corresponding column in the new table is not a computed column. The values in the new column are the values that were computed at the time SELECT...INTO was executed.

FROM Clause

Specifies the table(s) from which to retrieve rows. The FROM clause is required except when the select list contains only constants, variables, and arithmetic expressions (no column names).

Syntax
[ FROM < table_source > [ ,...n ] ]

< table_source > ::=
    table_name [ [ AS ] table_alias ]
    | < joined_table >

< joined_table > ::=
    < table_source > < join_type > < table_source > ON < search_condition >
    | < table_source > CROSS JOIN < table_source >
    | < joined_table >
    | <Pivoted Table>
    | <Unpivoted Table>
    | <OpenRowSet>

< join_type > ::=
    [ INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } ] JOIN

<Pivoted Table> ::= <table source> PIVOT <pivot_clause> <alias>

<pivot_clause> ::= '(' AggregateFunction '(' ValueColumnName ')' FOR PivotColumnName IN '(' <column list> ')' ')'
    | '(' AggregateFunction '(' ValueColumnName ')' FOR PivotColumnName ')'
    | '(' AggregateFunction '(' ValueColumnName ')' FOR PivotColumnName IN '(' <Sub Query> ')' ')'

<Unpivoted Table> ::= <table source> UNPIVOT <unpivot_clause> <alias>

<unpivot_clause> ::= '(' Identifier FOR Identifier IN '(' <column list> ')' ')'

<OpenRowSet> ::= OPENROWSET '(' <source> ',' {table_name | query} ')' [As table_alias]

<source> ::= provider_name ',' connection_string
          | registered_runtime_connection_alias

Arguments
< table_source >

Specifies tables and joined tables for the SELECT statement.

table_name [ [ AS ] table_alias ]
Specifies the name of a table and an optional alias.

< joined_table >

Is a result set that is the product of two or more tables. For example:

SELECT *
FROM tab1 LEFT OUTER JOIN tab2 ON tab1.c3 = tab2.c3
RIGHT OUTER JOIN tab3 LEFT OUTER JOIN tab4
ON tab3.c1 = tab4.c1
ON tab2.c3 = tab4.c3

For multiple CROSS joins, use parentheses to change the natural order of the joins.

< join_type >

Specifies the type of join operation.

INNER
Specifies that all matching pairs of rows are returned. Discards unmatched rows from both tables. This is the default if no join type is specified.

LEFT [ OUTER ]
Specifies that all rows from the left table not meeting the specified condition are included in the result set in addition to all rows returned by the inner join. Output columns from the left table are set to NULL.

RIGHT [ OUTER ]
Specifies that all rows from the right table not meeting the specified condition are included in the result set in addition to all rows returned by the inner join. Output columns from the right table are set to NULL.

FULL [ OUTER ]
If a row from either the left or right table does not match the selection criteria, specifies the row be included in the result set, and output columns that correspond to the other table be set to NULL. This is in addition to all rows usually returned by the inner join.

JOIN
Indicates that the specified tables or views should be joined.

ON < search_condition >

Specifies the condition on which the join is based. The condition can specify any predicate, although columns and comparison operators are often used. For example:

SELECT *
FROM tab1 JOIN tab2
ON tab1.c1 = tab2.c2

When the condition specifies columns, the columns do not have to have the same name or same data type. However, if the data types are not identical, they must be either compatible or types that .NET can implicitly convert. If the data types cannot be implicitly converted, the condition must explicitly convert the data type using the CAST function.

For more information about search conditions and predicates, see Search Condition.

CROSS JOIN

Specifies the cross-product of two tables. Returns the same rows as if the tables to be joined were simply listed in the FROM clause and no WHERE clause was specified. For example, both of these queries return a result set that is a cross join of all the rows in T1 and T2:

SELECT * FROM T1, T2
SELECT * FROM T1 CROSS JOIN T2

PIVOT <pivot_clause>

Specifies that the table_source is pivoted based on the PivotColumnName. table_source is a table or sub-query. The output is a table that contains all columns of the table_source except the pivot_column and value_column. PIVOT performs a grouping operation on the input table with regard to the value and pivot columns and returns one row for each group.

The output contains one column for each value specified in the column_list that appears in the pivot_column of the input_table.

AggregateFunction
One of the standard COUNT, AVG, SUM, MIN, MAX, COUNT, STDEV, or VAR aggregate functions.

ValueColumnName
Is the value column of the PIVOT operator. When used with UNPIVOT, this column cannot be the name of an existing column in the input table_source.

FOR PivotColumnName
For PIVOT, this column must be of a type implicitly convertible to a string.

For UNPIVOT, this is the name of the output column that becomes narrowed from the table_source. There cannot be an existing column in table_source with that name.

IN ( column_list )
For PIVOT, this lists the values in the pivot_column that will become the column names of the output table. The list cannot specify any column names that already exist in the input table_source that is being pivoted. When ommitted, each distinct value encountered in the pivot_column will produce a new column in the output table.

In the UNPIVOT clause, lists the columns in table_source that will be narrowed into a single pivot_column.

IN ( Sub Query )
For PIVOT, a sub-query returns the list of values in the pivot_column that will become the column names of the output table. A DISTINCT list of the result-set is implicitly generated from the sub-query. A result from the sub-query must be of a type that is implicitly convertible to a string.

UNPIVOT < unpivot_clause >
Specifies that the input table is narrowed from multiple columns in column_list into a single column called pivot_column.

OPENROWSET
Specified a .NET Provider, database connection and query that can be used as a table source.  Use it for joining data in a DataSet with data in an external database.  This is useful when part of the data is too large or practical to cache in a DataSet.   It is also useful when the underlying database does not support a feature of QueryADataset such as PIVOT/UNPIVOT. 

Openrowset has two options for connecting to a  database.  When a provider_name and connection_string is specified, QueryADataset will load the provider and open the connection.  An alternative is to register an existing open connection and provide a registered_runtime_connection_alias name.  For example:

Dim cn As New SqlConnection(ConnectionString)
cn.Open()
DsCommand.Registry.DbConnections.Add("myconnection", cn)
 
Dim sql As String = "select * from openrowset('myconnection', 'select * from titles')"
DsCommand.Execute(sql, Nothing)

The provider_name is the invariant name of a .NET data provider.  These can be found in machine.config.  The following are some examples:

<DbProviderFactories>
  <add name="Odbc Data Provider" invariant="System.Data.Odbc" description=".Net Framework Data Provider for Odbc"/>
  <add name="OleDb Data Provider" invariant="System.Data.OleDb" description=".Net Framework Data Provider for OleDb"/>
  <add name="OracleClient Data Provider" invariant="System.Data.OracleClient" description=".Net Framework Data Provider for Oracle"/>
  <add name="SqlClient Data Provider" invariant="System.Data.SqlClient" description=".Net Framework Data Provider for SqlServer"/>
  <add name="SQL Server CE Data Provider" invariant="Microsoft.SqlServerCe.Client" description=".NET Framework Data Provider for Microsoft SQL Server 2005 Mobile Edition"/>
  <add name="XML Data Provider" invariant="QueryADataset.XmlClient" description=".NET Data Provider for XML"/>
  <add name="SQLite Data Provider" invariant="System.Data.SQLite" description=".Net Framework Data Provider for SQLite"/>
</DbProviderFactories>

connection_string is the .NET provider connection string to the database.  It is provider specific, but usually contains properties for Data Source, Initial Catalog, Integrated Security or User Name and password.

OpenRowSet accepts either a table_name or SELECT query.  When table_name is specified, all the rows and columns of the table are returned as a DataTable.  Note that the underlying provider (such as Access) must support the CommandType.TableDirect feature, otherwise an exception is thrown.  The following is an example of a opening a table.

select * from openrowset('System.Data.OleDb', 'Provider=Microsoft.Jet.OLEDB.4.0;Data Source=northwind.mdb', shippers)

When query is specified, it much confirm to the syntax of the external database.  The returned result-set is readonly.  The following are some example queries:

select * from openrowset('System.Data.SqlClient', 'Data Source=.;Initial Catalog=pubs;Integrated Security=True', 'select * from titles')
 
SELECT [1] AS 'January',[2] AS 'February',[3] AS 'March',[4] AS 'April',[5] AS 'May',[6] AS 'June',[7] AS 'July',[8] AS 'August',[9] AS 'September',[10] AS 'October',[11] AS 'November',[12] AS 'December'
FROM openrowset('myconnection',
'SELECT MONTH(OrderDate) AS ''MonthNumber'', COUNT(*) AS ''OrderCount'' FROM Sales.SalesOrderHeader WHERE Year(OrderDate) = 2002 GROUP BY MONTH(OrderDate)') AS Data
PIVOT(SUM(OrderCount) FOR MonthNumber IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS PVT

WHERE Clause

Specifies a search condition to restrict the rows returned.  Note the old outer join *= or =* syntax is not supported.

Syntax
[ WHERE < search_condition > ]

Arguments
< search_condition >

Restricts the rows returned in the result set through the use of predicates. There is no limit to the number of predicates that can be included in a search condition. Search condition is a combination of one or more predicates using the logical operators AND, OR, and NOT.

Syntax
< search_condition > ::=
    { [ NOT ] < predicate > | ( < search_condition > ) }
    [ { AND | OR } [ NOT ] { < predicate > | ( < search_condition > ) } ] } [ ,...n ]

< predicate > ::=
    { expression { = | < > | ! = | > | > = | < | < = } expression
    | string_expression [ NOT ] LIKE string_expression
    | expression [ NOT ] BETWEEN expression AND expression
    | expression IS [ NOT ] NULL
    | expression [ NOT ] IN ( subquery | expression [ ,...n ] )
    | EXISTS ( subquery ) }

Arguments
< search_condition >

Specifies the conditions for the rows returned in the result set for a SELECT statement, query expression, or subquery. There is no limit to the number of predicates that can be included in a SQL statement search condition.

NOT
Negates the Boolean expression specified by the predicate.

AND
Combines two conditions and evaluates to TRUE when both of the conditions are TRUE.

OR
Combines two conditions and evaluates to TRUE when either condition is TRUE.

< predicate >

Is an expression that returns TRUE, FALSE, or UNKNOWN.

expression
Is a column name, a constant, a function, a variable, a scalar subquery, or any combination of column names, constants, and functions connected by an operator(s) or a subquery. The expression can also contain the CASE function.

=
Is the operator used to test the equality between two expressions.

<>
Is the operator used to test the condition of two expressions not being equal to each other.

!=
Is the operator used to test the condition of two expressions not being equal to each other.

>
Is the operator used to test the condition of one expression being greater than the other.

>=
Is the operator used to test the condition of one expression being greater than or equal to the other expression.

<
Is the operator used to test the condition of one expression being less than the other.

<=
Is the operator used to test the condition of one expression being less than or equal to the other expression.

string_expression
Is a string of characters and wildcard characters.

[ NOT ] LIKE
Indicates that the subsequent character string is to be used with pattern matching.

[ NOT ] BETWEEN
Specifies an inclusive range of values. Use AND to separate the beginning and ending values.

IS [ NOT ] NULL
Specifies a search for null values, or for values that are not null, depending on the keywords used. An expression with a bitwise or arithmetic operator evaluates to NULL if any of the operands is NULL.

[ NOT ] IN
Specifies the search for an expression, based on the expression's inclusion in or exclusion from a list. The search expression can be a constant or a column name, and the list can be a set of constants or, more commonly, a subquery. Enclose the list of values in parentheses.

subquery
Can be considered a restricted SELECT statement and is similar to <query_expresssion> in the SELECT statement. The ORDER BY clause and the INTO keyword are not allowed.

EXISTS
Used with a subquery to test for the existence of rows returned by the subquery.

GROUP BY Clause

Specifies the groups into which output rows are to be placed and, if aggregate functions are included in the SELECT clause <select list>, calculates a summary value for each group. When GROUP BY is specified, either each column in any non-aggregate expression in the select list should be included in the GROUP BY list, or the GROUP BY expression must match exactly the select list expression.

Note  If the ORDER BY clause is not specified, groups returned using the GROUP BY clause are not in any particular order. It is recommended that you always use the ORDER BY clause to specify a particular ordering of the data.

Syntax
 [ GROUP BY group_by_expression [ ,...n ] ]
 
Arguments
 
group_by_expression
 
Is an expression on which grouping is performed. group_by_expression is also known as a grouping column. group_by expression can be a column or a non-aggregate expression that references a column. A column alias that is defined in the select list cannot be used to specify a grouping column.

Note  .NET byte array columns containing documents or images cannot be used in a group_by_expression.

Important  Distinct aggregates, for example, AVG(DISTINCT column_name), COUNT(DISTINCT column_name), and SUM(DISTINCT column_name), are not supported.  However, the same result can be achieved using a sub-query, for example:

SELECT count(column_name) FROM (SELECT DISTINCT column_name FROM tbl)

HAVING Clause

Specifies a search condition for a group or an aggregate. HAVING is usually used with the GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.

Syntax
[ HAVING < search_condition > ]

Arguments
< search_condition >

Specifies the search condition for the group or the aggregate to meet. When HAVING is used with GROUP BY ALL, the HAVING clause overrides ALL. For more information, see WHERE Clause.

HAVING cannot be used for .NET byte array columns.

ORDER BY Clause

Specifies the sort for the result set.

Syntax
[ ORDER BY { order_by_expression [ ASC | DESC ] } [ ,...n] ]

Arguments
order_by_expression

Specifies a column on which to sort. A sort column can be specified as a name or column alias (which can be qualified by the table name), an expression, or a non-negative integer representing the position of the column in the select list.

Multiple sort columns can be specified. The sequence of the sort columns in the ORDER BY clause defines the organization of the sorted result set.

The ORDER BY clause must include sort columns appearing in the select list.

Furthermore, when the SELECT statement includes a UNION operator, the column names or column aliases must be those specified in the first select list.

Note  .NET byte array columns cannot be used in an ORDER BY clause.

ASC

Specifies that the values in the specified column should be sorted in ascending order, from lowest value to highest value.

DESC

Specifies that the values in the specified column should be sorted in descending order, from highest value to lowest value.

Null values are treated as the lowest possible values.

There is no limit to the number of items in the ORDER BY clause.

UNION Operator

Combines the results of two or more queries into a single result set consisting of all the rows belonging to all queries in the union.

Two basic rules for combining the result sets of two queries with UNION are:

1.      The number and the order of the columns must be identical in all queries.

2.      The data types must be compatible.

Syntax
{ < query specification > | ( < query expression > ) }
    UNION [ ALL ]
    < query specification | ( < query expression > )
        [ UNION [ ALL ] < query specification | ( < query expression > ) [ ...n ] ]

Arguments
< query_specification > | ( < query_expression > )

Is a query specification or query expression that returns data to be combined with the data from another query specification or query expression. The definitions of the columns that are part of a UNION operation do not have to be identical, but they must be compatible through implicit conversion.

UNION

Specifies that multiple result sets are to be combined and returned as a single result set.

ALL

Incorporates all rows into the results, including duplicates. If not specified, duplicate rows are removed.