Base de Données Sybase IQ

Introduction

Stockage colonnes et Sybase IQ

* Sybase IQ stocke les données "en colonnes" et non "en lignes"
* Sybase IQ compresse les données
* Sybase IQ dispose de procédés d'indexation "bitmap" et "bitwise"
* Sybase IQ dispose d'un grand nombre de méthodes d'accès et de jointures (hash join, sort merge, ...)
* Sybase IQ n'utilise pas un journal de transactions pour assurer la cohérence de la base lors des mises à jour, mais conserve les différentes versions des pages de données modifies (les mises à jour ensemblistes sont alors plus rapides mais la concurrence d'accès sur une même table est très limitée)

Sybase IQ a pour objectif d'exécuter très rapidement des requêtes d'interrogation sur des grosses volumétries, mais ne supporte pas des mises à jour concurrentes sur une même table. La mise à jour d'une base Sybase IQ s'effectuera donc soit périodiquement par un traitement batch, ou en continu par un seul flot d'alimentation.

Sybase IQ supporte deux langages de programmation :
* le langage WATCOM SQL, langage natif de la base de données Sybase Anywhere utilisée par Sybase IQ pour gérer son catalogue,
* le langage Transact SQL de Sybase ASE, et ceci à des fins de compatibilité. Ce document décrit les différences de langages.

===== Load Table =====

<code>

LOAD TABLE IQTRESO.table_import

(

Col1 ';',
Col2 ';',
Col3 ';',
Col4 '\x0a'
)

FROM '/home/import_files/file_to_import.csv'
QUOTES OFF
ESCAPES OFF
SKIP 1;

</code>

==== Types de Données 

=== NUMERIC DATA TYPES ===

|Data Type Name|Raw Type|Length (Range)|Precision (Range)|Scale (Range)|Description|    

|BIGINT|NUMERIC|Signed between -9223372036854775808 and 9223372036854775807\\ Unsigned 0 to 18446744073709551615|X|X|A signed 64-bit integer, requiring 8 bytes of storage.\\ You can specify integers as UNSIGNED. By default the data type is signed. Its range is between -9223372036854775808 and 9223372036854775807 (signed) or from 0 to 18446744073709551615 (unsigned).|

|BIT|NUMERIC|0 or 1|X|X|for Boolean values - By default, columns of BIT data type do not allow NULL.|

|DECIMAL|NUMERIC|X|15|5|A signed decimal number with precision total digits and with scale of the digits after the decimal point. The precision can equal 1 to 126, and the scale can equal 0 up to precision value. The defaults are scale = 38 and precision = 126. Results are calculated based on the actual data type of the column to ensure accuracy, but you can set the maximum scale of the result returned to the application using the MAX_CLIENT_NUMERIC_SCALE option.|

|DOUBLE|NUMERIC|Between 2.2250738585072014e-308 and 1.797693134862315708e+308|X|X|A signed double-precision floating-point number stored in 8 bytes. The range of absolute, nonzero values is between 2.2250738585072014e-308 and 1.797693134862315708e+308. Values held as DOUBLE are accurate to 15 significant digits, but might be subject to rounding errors beyond the fifteenth digit.\\ The DOUBLE data type is an approximate numeric data type; it is subject to rounding errors after arithmetic operations.|

|FLOAT|NUMERIC|1.175494351e-38 to 3.402823466e+38|X|X|If precision is not supplied, the FLOAT data type is the same as the REAL data type. If precision supplied, then the FLOAT data type is the same as the REAL or DOUBLE data type, depending on the value of the precision. The cutoff between REAL and DOUBLE is platform-dependent, and it is the number of bits used in the mantissa of single-precision floating point number on the platform.\\ When a column is created using the FLOAT data type, columns on all platforms are guaranteed to hold the values to at least the specified minimum precision. In contrast, REAL and DOUBLE do not guarantee a platform-independent minimum precision.\\ The FLOAT data type is an approximate numeric data type; it is subject to rounding errors after arithmetic operations.|

|INT|NUMERIC|Between -2147483648 and 2147483647|X|X|A signed 32-bit integer with a range of values between -2147483648 and 2147483647 requiring 4 bytes of storage.\\ The INTEGER data type is an exact numeric data type; its accuracy is preserved after arithmetic operations.\\ You can specify integers as UNSIGNED; by default the data type is signed. The range of values for an unsigned integer is between 0 and 4294967295.|

|INTEGER|NUMERIC|Signed between -2147483648 and 2147483647\\ Unsigned 0 to 4294967295|X|X|The range for signed integers is 31 to 2 31 1, or 2147483648 to 2147483647.\\ The range for unsigned integers is 0 to 2 32 1, or 0 to 4294967295.|

|MONEY|NUMERIC|X|X|X|Adaptive Server IQ includes money as a  user-defined data type, MONEY, which is implemented as NUMERIC(19,4).  Provided primarily for compatibility with Adaptive Server Enterprise.|

|NUMERIC|NUMERIC|X|15|5|Same as DECIMAL.|

|OLDBIT|BIT|X|X|X|OLDBIT|

|REAL|NUMERIC|Between 1.175494351e-38 and 3.402823466e+38.|X|X|A signed single-precision floating-point number stored in 4 bytes. The range of absolute, nonzero values is 1.175494351e-38 to 3.402823466e+38. Values held as REAL are accurate to 6 significant digits, but might be subject to rounding errors beyond the sixth digit.\\ The REAL data type is an approximate numeric data type; it is subject to rounding errors after arithmetic operations.|

|SMALLINT|NUMERIC|Between -32768 and 32767|X|X|A signed 16-bit integer with a range between -32768 and 32767, requiring 2 bytes of storage.\\ The SMALLINT data type is an exact numeric data type; its accuracy is preserved after arithmetic operations.|

|SMALLMONEY|NUMERIC|X|X|X|Adaptive Server IQ includes smallmoney as a user-defined data type, SMALLMONEY, which is implemented NUMERIC(10,4).  Provided primarily for compatibility with Adaptive Server Enterprise.|

|TINYINT|NUMERIC|Between 0 and 255|X|X|An unsigned 8-bit integer with a range between 0 and 255, requiring 1 byte of storage.\\ The TINYINT data type is an exact numeric data type; its accuracy is preserved after arithmetic operations.|

|UNIQUEIDENTIFIER|CHARACTER|X|X|X|The UNIQUEIDENTIFIER data type is used for storage of UUID (also known as GUID) values.|

|UNIQUEIDENTIFIERSTR| |X|X|X|Domain implemented as CHAR( 36 ). This data type is used for remote data access, when mapping Microsoft SQL Server uniqueidentifier columns.|


== CHARACTER DATA TYPES           ==

 

 CHAR   CHARACTER 1 through 254   X   X

Character data of maximum length max-length bytes. If max-length is omitted, the default is 1. The maximum size allowed is 32KB – 1. See Notes for restrictions on CHAR data greater than 255 bytes.


See the notes below on character data representation in the database, and on storage of long strings.


All CHAR values are blank padded up to max-length, regardless of whether the BLANK PADDING option is specified. When multibyte character strings are held as a CHAR type, the maximum length is still in bytes, not characters.

 LONGSYSNAME   CHARACTER   varchar(255) "not null"   X   X longsysname is a user-defined datatype that is distributed on the Adaptive Server installation tape and used in the system tables.  varchar(255) "not null

 SYSNAME   CHARACTER   varchar(30) "not null"   X   X sysname is a user-defined datatype that is distributed on the Adaptive Server installation tape and used in the system tables.  varchar(30) "not null

 VARCHAR   CHARACTER 1 through 254   X   X Same as CHAR, except that no blank padding is added to the storage of these strings, and VARCHAR strings can have a maximum length of (32KB – 1). See Notes for restrictions on VARCHAR data greater than 255 bytes.

DATE & TIME DATA TYPES           

 DATE   DATETIME 0001-01-01 to 9999-12-31   X   X A calendar date, such as a year, month and day. The year can be from 0001 to 9999. The day must be a nonzero value, so that the minimum date is 0001-01-01. A DATE value requires 4 bytes of storage.

 DATETIME   TIMESTAMP   X   X   X A domain, implemented as TIMESTAMP. DATETIME is provided primarily for compatibility with Adaptive Server Enterprise.

 SMALLDATETIME   DATETIME   X   X   X A domain, implemented as TIMESTAMP. SMALLDATETIME is provided primarily for compatibility with Adaptive Server Enterprise.

 TIME   TIME 00:00:00.000000 to 23:59:59.999999   X   X Time of day, containing hour, minute, second, and fraction of a second. The fraction is stored to 6 decimal places. ATIME value requires 8 bytes of storage. (ODBC standards restrict TIME data type to an accuracy of seconds. For this reason, do not use TIME data types in WHERE clause comparisons that rely on a higher accuracy than seconds.)

 TIMESTAMP   TIMESTAMP 0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999   X   X Point in time, containing year, month, day, hour, minute, second, and fraction of a second. The fraction is stored to 6 decimal places. The day must be a nonzero value. A TIMESTAMP value requires 8 bytes of storage.

BINARY & LARGE OBJECT DATA TYPES           

 BINARY   BINARY   1   X   X Binary data of length length bytes. If length is omitted, the default is 1 byte. The maximum size allowed is 255 bytes. Use the fixed-length binary type BINARY for data in which all entries are expected to be approximately equal in length. Because entries in BINARY columns are zero-padded to the column length length, they might require more storage space than entries in VARBINARY columns.

 BLOB   BINARY   X   X   X

Binary Large Object (BLOB) data in Sybase IQ is stored in columns of data type LONG BINARY or BLOB.

An individual LONG BINARY data value can have a length ranging from zero (0) to 512TB (terabytes) for an IQ page size of 128KB or 2PB (petabytes) for an IQ page size of 512KB. (The maximum length is equal to 4GB multiplied by the database page size.) The IQ database must be created with an IQ page size of at least 128KB (131072 bytes) in order to accommodate a table with LONG BINARY data.

 

A table or database can contain any number of LONG BINARY columns up to the supported maximum columns per table and maximum columns per database, respectively.

LONG BINARY columns can be either NULL or NOT NULL and can store zero-length values. The domain BLOB is a LONG BINARY data type that allows NULL.

A non-FP index or join index cannot be constructed on a LONG BINARY column.


A LONG BINARY column can be modified using the UPDATE, INSERT, LOAD TABLE, DELETE, TRUNCATE, SELECT...INTO and INSERT...LOCATION SQL statements. Positioned updates and deletes are not supported on LONG BINARY columns.

 

An Adaptive Server® Enterprise IMAGE column can be inserted into a LONG BINARY column using the INSERT...LOCATION command. All IMAGE data inserted is silently right truncated at 2147483648 bytes (2 gigabytes).

 

 CLOB   BINARY   X   X   X

Character Large Object (CLOB) data in Sybase IQ is stored in columns of data type LONG VARCHAR or CLOB.

An individual LONG VARCHAR data value can have a length ranging from zero (0) to 512TB (terabytes) for an IQ page size of 128KB or 2PB (petabytes) for an IQ page size of 512KB. (The maximum length is equal to 4GB multiplied by the database page size.) The IQ database must be created with an IQ page size of at least 128KB (131072 bytes) in order to accommodate a table with LONG VARCHAR data.

A table or database can contain any number of LONG VARCHAR columns up to the supported maximum columns per table and maximum columns per database, respectively.

 

Sybase IQ supports both single byte and multibyte LONG VARCHAR data.

 

LONG VARCHAR columns can be either NULL or NOT NULL and can store zero-length values. The domain CLOB is a LONG VARCHAR data type that allows NULL. To create a non-null LONG VARCHAR column, explicitly specify NOT NULL in the column definition.

You can create a LONG VARCHAR column using the domain CLOB, when you create a table or add a column to an existing table. For example:

CREATE TABLE lvtab (c1 INTEGER, c2 CLOB,

                     c3 CLOB NOT NULL);

ALTER TABLE lvtab ADD c4 CLOB;

A WORD (WD) index can be created on a LONG VARCHAR column. Other non-FP index types and join indexes cannot be constructed on a LONG VARCHAR column.


A LONG VARCHAR column can be modified using the UPDATE, INSERT...VALUES, INSERT...SELECT, LOAD TABLE, DELETE, TRUNCATE, SELECT...INTO and INSERT...LOCATION SQL statements. Positioned updates and deletes are not supported on LONG VARCHAR columns.

 

An Adaptive Server Enterprise TEXT column can be inserted into a LONG VARCHAR column using the INSERT...LOCATION command. All TEXT data inserted is silently right truncated at 2147483648 bytes (2 gigabytes).

 IMAGE   BLOB   X   X   X

Image is a synonym for LONG BINARY data allowing NULL

 LONG BINARY   BINARY   X   X   X   Binary data greater than 32K and less than 64K bytes for databases with IQ page size 128KB or larger. Use the binary data type LONG BINARY for data over 32KB in length. IQ handles LONG BINARY columns as VARBINARY columns.

 LONG VARCHAR   CHARACTER   X   X   X Arbitrary length character data. The maximum size is limited by the maximum size of the database file (currently 2 gigabytes).

 TEXT   CHARACTER   X   X   X This is a user-defined data type. It is implemented as a LONG VARCHAR allowing NULL.

 VARBINARY   BINARY   1   X   X Binary data up to a length of max-length bytes. If max-length is omitted, the default is 1 byte. The maximum size allowed is (32K – 1) bytes. Use the variable-length binary type VARBINARY for data that is expected to vary greatly in length.

 XML     X   X   X   XML data

===== Indexes =====


- FP (Fast Projection) : Il est créé systématiquement pour toutes les colonnes. 

- LF (Low Fast) : C’est l’indexe le plus rapide sur IQ. Cet index est fait pour les colonnes ayant très peu de valeurs différentes (<1500). Cet indexe est structuré possède une structure bit map spécifique à IQ. Au-delà de 1500 valeurs il faudra choisir un autre indexe sous peine de perte de performance. Physiquement cet indexe ne peut pas avoir plus de 9 999 valeurs différentes.

- HG : High Group : Performant pour des colonnes à plus de 1500 valeurs distinctes. C’est un indexe particulier ment efficace pour des jointures et des ordres de group by. L’indexe est basé sur un Btree, c’est pour cette raison que cet index est plus volumineux qu’un indexe HNG (cf. plus loin).  Cet indexe est implicitement créé lors de déclaration de clés uniques ou de foreignkey.

- HNG  High Non Group: C’est l’indexe qui permet d’être très efficace lors de requêtes cherchant un intervalle sur une colonne. Sa grande particularité est son taux de compression important.

Il faut utiliser cet indexe pour des fonctions d’agrégat sur une colonne comme SUM et AVG.

- CMP : compare Cet indexe est particulier et ne s’utilise que dans un cas particulier. Il permet de comparer très rapidement deux colonnes d’une même table. Il faut que les deux colonnes soient de type identique. 

Il est impossible de créer un index compare sur une colonne qui a l’unicité

Notes : un indexe compare peut être créé entre numeric et decimal.

- WD : Word : C’est un index qui est très utile pour les requêtes manipulant des chaînes de caractères. Il va être utilisé lors d’une requête avec like ou contains. Là où un index ASE n’est pas efficace lors d’une recherche like ‘%…’ l’indexe WD va pouvoir être utilisé pour ce type de recherche. En revanche cet index peut être volumineux.


Notes : la chaîne ne doit pas dépasser 32Ko.

L’option UNIQUE n’est pas utilisable avec l’indexe WORD 


- DATE, TIME ou DTTM

Ces indexe permet des recherches efficaces avec respectivement les types DATE, DATETIME et TIME. Ils vont être plus efficaces que le HNG pour les intervalles (très utilisé avec les dates).

Ressources

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38159.1540/doc/html/title.html

Commentaires

Posts les plus consultés de ce blog

Sécurité des Applications

Principes de la Programmation Orientée Objet