Data types

class odps.types.Boolean(*args, **kwargs)[source]

Represents boolean type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.boolean) instead.

class odps.types.Tinyint(*args, **kwargs)[source]

Represents tinyint type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.tinyint) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Smallint(*args, **kwargs)[source]

Represents smallint type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.smallint) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Int(*args, **kwargs)[source]

Represents int type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.int_) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Bigint(*args, **kwargs)[source]

Represents bigint type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.bigint) instead.

class odps.types.Decimal(*args, **kwargs)[source]

Represents decimal type with size limit in MaxCompute.

Parameters:
  • precision (int) – The precision (or total digits) of decimal type.

  • scale (int) – The decimal scale (or decimal digits) of decimal type.

Example:

>>> decimal_type = Decimal(18, 6)
>>> print(decimal_type)
decimal(18, 6)
>>> print(decimal_type.precision, decimal_type.scale)
18 6
Note:

Need to set options.sql.use_odps2_extension = True to enable full functionality when you are setting precision or scale.

precision

Precision (or total digits) of the decimal type.

scale

Decimal scale (or decimal digits) of the decimal type.

class odps.types.Float(*args, **kwargs)[source]

Represents float type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.float_) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Double(*args, **kwargs)[source]

Represents double type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.double) instead.

class odps.types.Binary(*args, **kwargs)[source]

Represents binary type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.binary) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Char(*args, **kwargs)[source]

Represents char type with size limit in MaxCompute.

Parameters:

size_limit (int) – The size limit of char type.

Example:

>>> char_type = Char(65535)
>>> print(char_type)
char(65535)
>>> print(char_type.size_limit)
65535
Note:

Need to set options.sql.use_odps2_extension = True to enable full functionality.

size_limit

Size limit of the varchar type.

class odps.types.String(*args, **kwargs)[source]

Represents string type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.string) instead.

class odps.types.Varchar(*args, **kwargs)[source]

Represents varchar type with size limit in MaxCompute.

Parameters:

size_limit (int) – The size limit of varchar type.

Example:

>>> varchar_type = Varchar(65535)
>>> print(varchar_type)
varchar(65535)
>>> print(varchar_type.size_limit)
65535
Note:

Need to set options.sql.use_odps2_extension = True to enable full functionality.

size_limit

Size limit of the varchar type.

class odps.types.Json(*args, **kwargs)[source]

Represents json type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.json) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Date(*args, **kwargs)[source]

Represents date type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.date) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Datetime(*args, **kwargs)[source]

Represents datetime type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.datetime) instead.

class odps.types.Timestamp(*args, **kwargs)[source]

Represents timestamp type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.timestamp) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.TimestampNTZ(*args, **kwargs)[source]

Represents timestamp_ntz type in MaxCompute.

Note:

This class may not be used directly. Use its singleton instance (odps.types.timestamp_ntz) instead.

Need to set options.sql.use_odps2_extension = True to enable full functionality.

class odps.types.Array(*args, **kwargs)[source]

Represents array type in MaxCompute.

Parameters:

value_type – type of elements in the array

Example:

>>> from odps import types as odps_types
>>>
>>> array_type = odps_types.Array(odps_types.bigint)
>>> print(array_type)
array<bigint>
>>> print(array_type.value_type)
bigint
Note:

Need to set options.sql.use_odps2_extension = True to enable full functionality.

value_type

Type of elements in the array.

class odps.types.Map(*args, **kwargs)[source]

Represents map type in MaxCompute.

Parameters:
  • key_type – type of keys in the array

  • value_type – type of values in the array

Example:

>>> from odps import types as odps_types
>>>
>>> map_type = odps_types.Map(odps_types.string, odps_types.Array(odps_types.bigint))
>>> print(map_type)
map<string, array<bigint>>
>>> print(map_type.key_type)
string
>>> print(map_type.value_type)
array<bigint>
Note:

Need to set options.sql.use_odps2_extension = True to enable full functionality.

key_type

Type of keys in the map.

value_type

Type of values in the map.

class odps.types.Struct(*args, **kwargs)[source]

Represents struct type in MaxCompute.

Parameters:

field_types – types of every field, can be a list of (field_name, field_type) tuples or a dict with field names as keys and field types as values.

Example:

>>> from odps import types as odps_types
>>>
>>> struct_type = odps_types.Struct([("a", "bigint"), ("b", "array<string>")])
>>> print(struct_type)
struct<`a`:bigint, `b`:array<string>>
>>> print(struct_type.field_types)
OrderedDict([("a", "bigint"), ("b", "array<string>")])
>>> print(struct_type.field_types["b"])
array<string>
Note:

Need to set options.sql.use_odps2_extension = True to enable full functionality.

field_types

Types of fields in the struct, as an OrderedDict.

Example:

The example below extracts field types of a struct.

import odps.types as odps_types

# obtain field types of the Struct instance
struct_type = odps_types.Struct(
    {"a": odps_types.bigint, "b": odps_types.string}
)
for field_name, field_type in struct_type.field_types.items():
    print("field_name:", field_name, "field_type:", field_type)
odps.types.validate_data_type(data_type)[source]

Parse data type instance from string in MaxCompute DDL.

Example:

>>> field_type = validate_data_type("array<int>")
>>> print(field_type)
array<int>
>>> print(field_type.value_type)
int
class odps.types.Column(name=None, typo=None, comment=None, label=None, nullable=True, generate_expression=None, **kw)[source]

Represents a column in a table schema.

Parameters:
  • name (str) – column name

  • typo (str) – column type. Can also use type as keyword.

  • comment (str) – comment of the column, None by default

  • nullable (bool) – is column nullable, True by default

Example:

>>> col = Column("col1", "bigint")
>>> print(col.name)
col1
>>> print(col.type)
bigint
name

Name of the column.

type

Type of the column.

nullable

True if the column is nullable.

class odps.types.Partition(name=None, typo=None, comment=None, label=None, nullable=True, generate_expression=None, **kw)[source]

Represents a partition column in a table schema.

Parameters:
  • name (str) – column name

  • typo (str) – column type. Can also use type as keyword.

  • comment (str) – comment of the column, None by default

  • nullable (bool) – is column nullable, True by default

Example:

>>> col = Partition("col1", "bigint")
>>> print(col.name)
col1
>>> print(col.type)
bigint
name

Name of the column.

type

Type of the column.

nullable

True if the column is nullable.

class odps.models.Record(columns=None, schema=None, values=None, max_field_size=None)[source]

A record generally means the data of a single line in a table. It can be created from a schema, or by odps.models.Table.new_record() or by odps.tunnel.TableUploadSession.new_record().

Hints on getting or setting different types of data can be seen here.

Example:

>>> schema = TableSchema.from_lists(['name', 'id'], ['string', 'string'])
>>> record = Record(schema=schema, values=['test', 'test2'])
>>> record[0] = 'test'
>>> record[0]
>>> 'test'
>>> record['name']
>>> 'test'
>>> record[0:2]
>>> ('test', 'test2')
>>> record[0, 1]
>>> ('test', 'test2')
>>> record['name', 'id']
>>> for field in record:
>>>     print(field)
('name', u'test')
('id', u'test2')
>>> len(record)
2
>>> 'name' in record
True
class odps.models.TableSchema(**kwargs)[source]

Schema includes the columns and partitions information of a odps.models.Table.

There are two ways to initialize a Schema object, first is to provide columns and partitions, the second way is to call the class method from_lists. See the examples below:

Example:

>>> columns = [Column(name='num', type='bigint', comment='the column')]
>>> partitions = [Partition(name='pt', type='string', comment='the partition')]
>>> schema = TableSchema(columns=columns, partitions=partitions)
>>> schema.columns
[<column num, type bigint>, <partition pt, type string>]
>>>
>>> schema = TableSchema.from_lists(['num'], ['bigint'], ['pt'], ['string'])
>>> schema.columns
[<column num, type bigint>, <partition pt, type string>]
property columns

List of columns and partition columns as a list of Column.

property partitions

List of partition columns as a list of Partition.

property simple_columns

List of columns as a list of Column. Partition columns are excluded.

classmethod from_lists(names, types, partition_names=None, partition_types=None)

Create a schema from lists of column names and types.

Parameters:
  • names – List of column names.

  • types – List of column types.

  • partition_names – List of partition names.

  • partition_types – List of partition types.

Example:

>>> schema = TableSchema.from_lists(['id', 'name'], ['bigint', 'string'])
>>> print(schema.columns)
[<column id, type bigint>, <column name, type string>]