Mars Reference

odps.mars_extension.create_mars_cluster(odps, worker_num=1, worker_cpu=8, worker_mem=32, cache_mem=None, min_worker_num=None, disk_num=1, disk_size=100, supervisor_num=1, supervisor_cpu=None, supervisor_mem=None, with_notebook=False, notebook_cpu=None, notebook_mem=None, with_graphscope=False, coordinator_cpu=None, coordinator_mem=None, timeout=None, extra_modules=None, resources=None, instance_id=None, name='default', if_exists='reuse', project=None, **kw)[source]

Create a Mars cluster and a Mars session as default session, then all tasks will be submitted to cluster.

Parameters:
  • worker_num – mars cluster worker’s number

  • worker_cpu – number of cpu cores on each mars worker

  • worker_mem – memory size on each mars worker

  • cache_mem – cache memory size on each mars worker

  • disk_num – number of mounted disk

  • min_worker_num – return if cluster worker’s number reach to min_worker

  • resources – resources name

  • extra_modules – user defined module path

  • supervisor_num – the number of supervisors, default is 0

  • with_notebook – whether launch jupyter notebook, default is False

  • instance_id – existing mars cluster’s instance id

  • name – cluster name, ‘default’ will be default name

  • if_exists – ‘reuse’, ‘raise’ or ‘ignore’, if ‘reuse’, will reuse the first created cluster with the same name, if not created, create a new one; if ‘raise’, will fail if cluster with same name created already; if ‘ignore’, will always create a new cluster

  • project – project name

Returns:

class: MarsClient

odps.mars_extension.to_mars_dataframe(odps, table_name, shape=None, partition=None, chunk_bytes=None, sparse=False, columns=None, add_offset=None, calc_nrows=True, index_type='chunk_incremental', use_arrow_dtype=False, string_as_binary=None, chunk_size=None, memory_scale=None, runtime_endpoint=None, append_partitions=False, with_split_meta_on_tile=False, tunnel_quota_name=None, extra_params=None, **kw)[source]

Read table to Mars DataFrame.

Parameters:
  • table_name – table name

  • shape – table shape. A tuple like (1000, 3) which means table count is 1000 and schema length is 3.

  • partition – partition spec.

  • chunk_bytes – Bytes to read for each chunk. Default value is ‘16M’.

  • chunk_size – Desired chunk size on rows.

  • sparse – if read as sparse DataFrame.

  • columns – selected columns.

  • add_offset – if standardize the DataFrame’s index to RangeIndex. False as default.

  • index_type – type of retrieved index

  • calc_nrows – if calculate nrows if shape is not specified.

  • use_arrow_dtype – read to arrow dtype. Reduce memory in some saces.

  • string_as_binary – read string columns as binary type.

  • memory_scale – Scale that real memory occupation divided with raw file size.

  • append_partitions – append partition name when reading partitioned tables.

  • tunnel_quota_name – name of tunnel quota

Returns:

Mars DataFrame.

odps.mars_extension.persist_mars_dataframe(odps, df, table_name, overwrite=False, partition=None, write_batch_size=None, unknown_as_string=None, as_type=None, drop_table=False, create_table=True, drop_partition=False, create_partition=None, lifecycle=None, tunnel_quota_name=None, runtime_endpoint=None, **kw)[source]

Write Mars DataFrame to table.

Parameters:
  • df – Mars DataFrame.

  • table_name – table to write.

  • overwrite – if overwrite the data. False as default.

  • partition – partition spec.

  • write_batch_size – batch size of records to write. 1024 as default.

  • unknown_as_string – set the columns to string type if it’s type is Object.

  • as_type – specify column dtypes. {‘a’: ‘string’} will set column a as string type.

  • drop_table – drop table if exists, False as default

  • create_table – create table first if not exits, True as default

  • drop_partition – drop partition if exists, False as default

  • create_partition – create partition if not exists, None as default

  • lifecycle – table lifecycle. If absent, options.lifecycle will be used.

  • tunnel_quota_name – name of tunnel quota

Returns:

None

odps.mars_extension.list_mars_instances(odps, project=None, days=3, return_task_name=False)[source]

List all running mars instances in your project.

Parameters:
  • project – default project name

  • days – the days range of filtered instances

  • return_task_name – If return task name

Returns:

Instances.