Resources

Resources commonly apply to UDF and MapReduce on MaxCompute.

PyODPS mainly supports two resource types, namely, file resources and table resources. They share same iteration and deletion operations, while there are slight differences between creation and modification operations of these two resource types. The following describes operations of two resource types.

Basic operations

You can use list_resources to list all resources and use exist_resource to check whether a resource exists. You can call delete_resource to delete resources or directly call the drop method for a resource object.

For instance, if you want to iterate through all resources in a project, you can use code below.

for res in o.list_resources():
    print(res.name)

To iterate through resources with given prefixes, you can use code below.

for res in o.list_resources(prefix="prefix"):
    print(res.name)

To check if resource with given name exists, you can use code below.

o.exist_resource("resource_name.tar.gz")

To delete certain resources, you may use delete_resource method of ODPS entrance object, or use drop method of the Resource object.

# use ODPS.delete_resource method
o.delete_resource("resource_name.tar.gz")
# use Resource.drop method
o.get_resource("resource_name.tar.gz").drop()

File resources

File resources include the basic file type, and py, jar, and archive.

Create a file resource

You can create a file resource by specifying the resource name, file type, and a file-like object (or a string object), as shown in the following example:

# File-like objects as file content. Use binary mode to read source file.
resource = o.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file'))
# Strings as file content.
resource = o.create_resource('test_py_resource', 'py', fileobj='import this')

You can use argument temp=True to create a temporarily resource.

resource = o.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file'), temp=True)

Note

When fileobj is a string, the content of the created resource is the string itself, not the content of the file the string point to.

If the size of file to upload is over certain size (for instance, 64MB), PyODPS might upload the file in parts, which is not supported in old releases of PyODPS. In this case you may specify options.upload_resource_in_chunks = False.

Read and modify a file resource

You can call the open method for a file resource or call open_resource at the MaxCompute entry to open a file resource. The opened object is a file-like object. Similar to the open method built in Python, file resources also support the open mode. For example:

>>> with resource.open('r') as fp:  # open a resource in read mode
>>>     content = fp.read()  # read all content
>>>     fp.seek(0)  # return to the start of the resource
>>>     lines = fp.readlines()  # read multiple lines
>>>     fp.write('Hello World')  # an error will be raised as resources cannot be written in read mode
>>>
>>> with o.open_resource('test_file_resource', mode='r+') as fp:  # enable read/write mode
>>>     fp.read()
>>>     fp.tell()  # current position
>>>     fp.seek(10)
>>>     fp.truncate()  # truncate the following content
>>>     fp.writelines(['Hello\n', 'World\n'])  # write multiple lines
>>>     fp.write('Hello World')
>>>     fp.flush()  # manual call submits the update to MaxCompute

The following open modes are supported:

  • r: Read mode. The file can be opened but cannot be written.

  • w: Write mode. The file can be written but cannot be read. Note that file content is cleared first if the file is opened in write mode.

  • a: Append mode. Content can be added to the end of the file.

  • r+: Read/write mode. You can read and write any content.

  • w+: Similar to r+, but file content is cleared first.

  • a+: Similar to r+, but content can be added to the end of the file only during writing.

In PyODPS, file resources can be opened in binary mode. For example, some compressed files must be opened in binary mode. rb indicates opening a file in binary read mode, and r+b indicates opening a file in binary read/write mode.

For large file resources, you may read or write them in streams by adding a stream=True argument in open_resource calls.

>>> with o.open_resource('test_file_resource', mode='w') as fp:  # open resource in write mode
>>>     fp.writelines(['Hello\n', 'World\n'])  # write multiple lines
>>>     fp.write('Hello World')
>>>     fp.flush()  # if called manually, will submit contents into MaxCompute immediately
>>>
>>> with resource.open('r', stream=True) as fp:  # open resource in read mode
>>>     content = fp.read()  # read all contents
>>>     line = fp.readline()  # read one single line
>>>     lines = fp.readlines()  # read multiple lines

When stream=True is specified, only r, rb, w and wb are supported in mode.

Table resources

Create a table resource

>>> o.create_resource('test_table_resource', 'table', table_name='my_table', partition='pt=test')

Update a table resource

>>> table_resource = o.get_resource('test_table_resource')
>>> table_resource.update(partition='pt=test2', project_name='my_project2')

Obtain associated table and partition

>>> table_resource = o.get_resource('test_table_resource')
>>> table = table_resource.table
>>> print(table.name)
>>> partition = table_resource.partition
>>> print(partition.spec)

Read and write table

>>> table_resource = o.get_resource('test_table_resource')
>>> with table_resource.open_writer() as writer:
>>>     writer.write([0, 'aaaa'])
>>>     writer.write([1, 'bbbbb'])
>>> with table_resource.open_reader() as reader:
>>>     for rec in reader:
>>>         print(rec)