http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
11.6 Data Type Storage Requirements
The storage requirements for data vary, according to the storage
engine being used for the table in question. Different storage
engines use different methods for recording the raw data and
different data types. In addition, some engines may compress the
information in a given row, either on a column or entire row
basis, making calculation of the storage requirements for a given
table or column structure.
However, all storage engines must communicate and exchange
information on a given row within a table using the same
structure, and this information is consistent, irrespective of the
storage engine used to write the information to disk.
This sections includes some guideliness and information for the
storage requirements for each data type supported by MySQL,
including details for the internal format and the sizes used by
storage engines that used a fixed size representation for
different types. Information is listed by category or storage
engine.
The internal representation of a table has a maximum row size of
65,535 bytes, even if the storage engine is capable of supporting
larger rows. This figure excludes
BLOB
or
TEXT
columns, which contribute only
9 to 12 bytes toward this size. For
BLOB
and
TEXT
data, the information is
stored internally in a different area of memory than the row
buffer. Different storage engines handle the allocation and
storage of this data in different ways, according to the method
they use for handling the corresponding types. For more
information, see Chapter 14, Storage Engines, and
Section E.7.4, “Limits on Table Column Count and Row Size”.
Storage Requirements for InnoDB
Tables
See Section 14.6.3.12.5, “Physical Row Structure” for information about
storage requirements for InnoDB
tables.
Storage Requirements for NDBCLUSTER
Tables
Important
For tables using the NDBCLUSTER
storage engine, there is the factor of 4-byte
alignment to be taken into account when calculating
storage requirements. This means that all
NDB
data storage is done in
multiples of 4 bytes. Thus, a column value that would take 15
bytes in a table using a storage engine other than
NDB
requires 16 bytes in an
NDB
table. This requirement applies
in addition to any other considerations that are discussed in
this section. For example, in
NDBCLUSTER
tables, the
TINYINT
,
SMALLINT
,
MEDIUMINT
, and
INTEGER
(INT
) column types each require 4
bytes storage per record due to the alignment factor.
An exception to this rule is the
BIT
type, which is
not 4-byte aligned. In MySQL Cluster
tables, a BIT(M
)
column takes M
bits of storage space.
However, if a table definition contains 1 or more
BIT
columns (up to 32
BIT
columns), then
NDBCLUSTER
reserves 4 bytes (32
bits) per row for these. If a table definition contains more
than 32 BIT
columns (up to 64
such columns), then NDBCLUSTER
reserves 8 bytes (that is, 64 bits) per row.
In addition, while a NULL
itself does not
require any storage space,
NDBCLUSTER
reserves 4 bytes per row
if the table definition contains any columns defined as
NULL
, up to 32 NULL
columns. (If a MySQL Cluster table is defined with more than 32
NULL
columns up to 64 NULL
columns, then 8 bytes per row is reserved.)
When calculating storage requirements for MySQL Cluster tables,
you must also remember that every table using the
NDBCLUSTER
storage engine requires a
primary key; if no primary key is defined by the user, then a
“hidden” primary key will be created by
NDB
. This hidden primary key consumes
31-35 bytes per table record.
You may find the ndb_size.pl
utility to be
useful for estimating NDB
storage
requirements. This Perl script connects to a current MySQL
(non-Cluster) database and creates a report on how much space that
database would require if it used the
NDBCLUSTER
storage engine. See
Section 17.4.23, “ndb_size.pl — NDBCLUSTER Size Requirement Estimator”, for more
information.
Storage Requirements for Numeric Types
Values for DECIMAL
(and
NUMERIC
) columns are represented
using a binary format that packs nine decimal (base 10) digits
into four bytes. Storage for the integer and fractional parts of
each value are determined separately. Each multiple of nine digits
requires four bytes, and the “leftover” digits
require some fraction of four bytes. The storage required for
excess digits is given by the following table.
Storage Requirements for Date and Time Types
For details about internal representation of temporal values, see
MySQL Internals:
Important Algorithms and Structures.
Storage Requirements for String Types
In the following table, M
represents
the declared column length in characters for nonbinary string
types and bytes for binary string types.
L
represents the actual length in bytes
of a given string value.
Variable-length string types are stored using a length prefix plus
data. The length prefix requires from one to four bytes depending
on the data type, and the value of the prefix is
L
(the byte length of the string). For
example, storage for a MEDIUMTEXT
value requires L
bytes to store the
value plus three bytes to store the length of the value.
To calculate the number of bytes used to store a particular
CHAR
,
VARCHAR
, or
TEXT
column value, you must take
into account the character set used for that column and whether
the value contains multi-byte characters. In particular, when
using the utf8
Unicode character set, you must
keep in mind that not all characters use the same number of bytes
and can require up to three bytes per character. For a breakdown
of the storage used for different categories of
utf8
characters, see
Section 10.1.10, “Unicode Support”.
VARCHAR
,
VARBINARY
, and the
BLOB
and
TEXT
types are variable-length
types. For each, the storage requirements depend on these factors:
The actual length of the column value
The column's maximum possible length
The character set used for the column, because some character
sets contain multi-byte characters
For example, a VARCHAR(255)
column can hold a
string with a maximum length of 255 characters. Assuming that the
column uses the latin1
character set (one byte
per character), the actual storage required is the length of the
string (L
), plus one byte to record the
length of the string. For the string 'abcd'
,
L
is 4 and the storage requirement is
five bytes. If the same column is instead declared to use the
ucs2
double-byte character set, the storage
requirement is 10 bytes: The length of 'abcd'
is eight bytes and the column requires two bytes to store lengths
because the maximum length is greater than 255 (up to 510 bytes).
The effective maximum number of bytes that
can be stored in a VARCHAR
or
VARBINARY
column is subject to the
maximum row size of 65,535 bytes, which is shared among all
columns. For a VARCHAR
column that
stores multi-byte characters, the effective maximum number of
characters is less. For example,
utf8
characters can require up to three bytes
per character, so a VARCHAR
column
that uses the utf8
character set can be
declared to be a maximum of 21,844 characters. See
Section E.7.4, “Limits on Table Column Count and Row Size”.
The NDBCLUSTER
storage engine in
MySQL 5.1 supports variable-width columns. This means that a
VARCHAR
column in a MySQL Cluster
table requires the same amount of storage as it would using any
other storage engine, with the exception that such values are
4-byte aligned. Thus, the string 'abcd'
stored
in a VARCHAR(50)
column using the
latin1
character set requires 8 bytes (rather
than 6 bytes for the same column value in a
MyISAM
table). This represents a change in
behavior from earlier versions of
NDBCLUSTER
, where a
VARCHAR(50)
column would require 52 bytes
storage per record regardless of the length of the string being
stored.
TEXT
and
BLOB
columns are implemented
differently in the NDB Cluster storage engine, wherein each row in
a TEXT
column is made up of two
separate parts. One of these is of fixed size (256 bytes), and is
actually stored in the original table. The other consists of any
data in excess of 256 bytes, which is stored in a hidden table.
The rows in this second table are always 2,000 bytes long. This
means that the size of a TEXT
column is 256 if size
<= 256 (where
size
represents the size of the row);
otherwise, the size is 256 + size
+
(2000 – (size
– 256) %
2000).
The size of an ENUM
object is
determined by the number of different enumeration values. One byte
is used for enumerations with up to 255 possible values. Two bytes
are used for enumerations having between 256 and 65,535 possible
values. See Section 11.4.4, “The ENUM
Type”.
The size of a SET
object is
determined by the number of different set members. If the set size
is N
, the object occupies
(N
+7)/8
bytes,
rounded up to 1, 2, 3, 4, or 8 bytes. A
SET
can have a maximum of 64
members. See Section 11.4.5, “The SET
Type”.