Mars Reference

odps.mars_extension.create_mars_cluster(odps, worker_num=1, worker_cpu=8, worker_mem=32, cache_mem=None, min_worker_num=None, disk_num=1, disk_size=100, supervisor_num=1, supervisor_cpu=None, supervisor_mem=None, with_notebook=False, notebook_cpu=None, notebook_mem=None, with_graphscope=False, coordinator_cpu=None, coordinator_mem=None, timeout=None, extra_modules=None, resources=None, instance_id=None, name='default', if_exists='reuse', project=None, **kw)[source]

Create a Mars cluster and a Mars session as default session, then all tasks will be submitted to cluster.

Parameters:

worker_num – mars cluster worker’s number
worker_cpu – number of cpu cores on each mars worker
worker_mem – memory size on each mars worker
cache_mem – cache memory size on each mars worker
disk_num – number of mounted disk
min_worker_num – return if cluster worker’s number reach to min_worker
resources – resources name
extra_modules – user defined module path
supervisor_num – the number of supervisors, default is 0
with_notebook – whether launch jupyter notebook, default is False
instance_id – existing mars cluster’s instance id
name – cluster name, ‘default’ will be default name
if_exists – ‘reuse’, ‘raise’ or ‘ignore’, if ‘reuse’, will reuse the first created cluster with the same name, if not created, create a new one; if ‘raise’, will fail if cluster with same name created already; if ‘ignore’, will always create a new cluster
project – project name

Returns:

class: MarsClient

odps.mars_extension.to_mars_dataframe(odps, table_name, shape=None, partition=None, chunk_bytes=None, sparse=False, columns=None, add_offset=None, calc_nrows=True, index_type='chunk_incremental', use_arrow_dtype=False, string_as_binary=None, chunk_size=None, memory_scale=None, runtime_endpoint=None, append_partitions=False, with_split_meta_on_tile=False, tunnel_quota_name=None, extra_params=None, **kw)[source]

Read table to Mars DataFrame.

Parameters:

table_name – table name
shape – table shape. A tuple like (1000, 3) which means table count is 1000 and schema length is 3.
partition – partition spec.
chunk_bytes – Bytes to read for each chunk. Default value is ‘16M’.
chunk_size – Desired chunk size on rows.
sparse – if read as sparse DataFrame.
columns – selected columns.
add_offset – if standardize the DataFrame’s index to RangeIndex. False as default.
index_type – type of retrieved index
calc_nrows – if calculate nrows if shape is not specified.
use_arrow_dtype – read to arrow dtype. Reduce memory in some saces.
string_as_binary – read string columns as binary type.
memory_scale – Scale that real memory occupation divided with raw file size.
append_partitions – append partition name when reading partitioned tables.
tunnel_quota_name – name of tunnel quota

Returns:

Mars DataFrame.

odps.mars_extension.persist_mars_dataframe(odps, df, table_name, overwrite=False, partition=None, write_batch_size=None, unknown_as_string=None, as_type=None, drop_table=False, create_table=True, drop_partition=False, create_partition=None, lifecycle=None, tunnel_quota_name=None, runtime_endpoint=None, **kw)[source]

Write Mars DataFrame to table.

Parameters:

df – Mars DataFrame.
table_name – table to write.
overwrite – if overwrite the data. False as default.
partition – partition spec.
write_batch_size – batch size of records to write. 1024 as default.
unknown_as_string – set the columns to string type if it’s type is Object.
as_type – specify column dtypes. {‘a’: ‘string’} will set column a as string type.
drop_table – drop table if exists, False as default
create_table – create table first if not exits, True as default
drop_partition – drop partition if exists, False as default
create_partition – create partition if not exists, None as default
lifecycle – table lifecycle. If absent, options.lifecycle will be used.
tunnel_quota_name – name of tunnel quota

Returns:

None

odps.mars_extension.list_mars_instances(odps, project=None, days=3, return_task_name=False)[source]

List all running mars instances in your project.

Parameters:

project – default project name
days – the days range of filtered instances
return_task_name – If return task name

Returns:

Instances.