Resources
Resources commonly apply to UDF and MapReduce on MaxCompute.
PyODPS mainly supports two resource types, namely, file resources and table resources. They share same iteration and deletion operations, while there are slight differences between creation and modification operations of these two resource types. The following describes operations of two resource types.
Basic operations
You can use list_resources
to list all resources and use exist_resource
to check whether a resource exists. You can call delete_resource
to delete resources or directly call the drop method for a resource object.
For instance, if you want to iterate through all resources in a project, you can use code below.
for res in o.list_resources():
print(res.name)
To iterate through resources with given prefixes, you can use code below.
for res in o.list_resources(prefix="prefix"):
print(res.name)
To check if resource with given name exists, you can use code below.
o.exist_resource("resource_name.tar.gz")
To delete certain resources, you may use delete_resource
method of ODPS entrance object, or use drop
method of the Resource
object.
# use ODPS.delete_resource method
o.delete_resource("resource_name.tar.gz")
# use Resource.drop method
o.get_resource("resource_name.tar.gz").drop()
File resources
File resources include the basic file
type, and py
, jar
, and archive
.
Create a file resource
You can create a file resource by specifying the resource name, file type, and a file-like object (or a string object), as shown in the following example:
# File-like objects as file content. Use binary mode to read source file.
resource = o.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file'))
# Strings as file content.
resource = o.create_resource('test_py_resource', 'py', fileobj='import this')
You can use argument temp=True
to create a temporarily resource.
resource = o.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file'), temp=True)
Note
When fileobj
is a string, the content of the created resource is the string itself, not the content of the file the string point to.
If the size of file to upload is over certain size (for instance, 64MB), PyODPS might upload the file in parts, which is not supported in old releases of PyODPS. In this case you may specify options.upload_resource_in_chunks = False
.
Read and modify a file resource
You can call the open
method for a file resource or call open_resource
at the MaxCompute entry to open a file resource. The opened object is a file-like object. Similar to the open method built in Python, file resources also support the open
mode. For example:
>>> with resource.open('r') as fp: # open a resource in read mode
>>> content = fp.read() # read all content
>>> fp.seek(0) # return to the start of the resource
>>> lines = fp.readlines() # read multiple lines
>>> fp.write('Hello World') # an error will be raised as resources cannot be written in read mode
>>>
>>> with o.open_resource('test_file_resource', mode='r+') as fp: # enable read/write mode
>>> fp.read()
>>> fp.tell() # current position
>>> fp.seek(10)
>>> fp.truncate() # truncate the following content
>>> fp.writelines(['Hello\n', 'World\n']) # write multiple lines
>>> fp.write('Hello World')
>>> fp.flush() # manual call submits the update to MaxCompute
The following open modes are supported:
r
: Read mode. The file can be opened but cannot be written.w
: Write mode. The file can be written but cannot be read. Note that file content is cleared first if the file is opened in write mode.a
: Append mode. Content can be added to the end of the file.r+
: Read/write mode. You can read and write any content.w+
: Similar tor+
, but file content is cleared first.a+
: Similar tor+
, but content can be added to the end of the file only during writing.
In PyODPS, file resources can be opened in binary mode. For example, some compressed files must be opened in binary mode. rb
indicates opening a file in binary read mode, and r+b
indicates opening a file in binary read/write mode.
For large file resources, you may read or write them in streams by adding a stream=True
argument in open_resource
calls.
>>> with o.open_resource('test_file_resource', mode='w') as fp: # open resource in write mode
>>> fp.writelines(['Hello\n', 'World\n']) # write multiple lines
>>> fp.write('Hello World')
>>> fp.flush() # if called manually, will submit contents into MaxCompute immediately
>>>
>>> with resource.open('r', stream=True) as fp: # open resource in read mode
>>> content = fp.read() # read all contents
>>> line = fp.readline() # read one single line
>>> lines = fp.readlines() # read multiple lines
When stream=True
is specified, only r
, rb
, w
and wb
are supported in mode
.
Table resources
Create a table resource
>>> o.create_resource('test_table_resource', 'table', table_name='my_table', partition='pt=test')
Update a table resource
>>> table_resource = o.get_resource('test_table_resource')
>>> table_resource.update(partition='pt=test2', project_name='my_project2')
Obtain associated table and partition
>>> table_resource = o.get_resource('test_table_resource')
>>> table = table_resource.table
>>> print(table.name)
>>> partition = table_resource.partition
>>> print(partition.spec)
Read and write table
>>> table_resource = o.get_resource('test_table_resource')
>>> with table_resource.open_writer() as writer:
>>> writer.write([0, 'aaaa'])
>>> writer.write([1, 'bbbbb'])
>>> with table_resource.open_reader() as reader:
>>> for rec in reader:
>>> print(rec)