Schema
Note
Schema is a beta function of MaxCompute. You need to apply for a trial of new features before accessing it. PyODPS above 0.11.3 is also needed.
Schema is a concept between projects and objects like tables, resources or functions. It maintains categories for these objects.
Basic operations
You may use exist_schema
to check if the schema with specific name exists.
print(o.exist_schema("test_schema"))
Use create_schema
to create a schema object.
schema = o.create_schema("test_schema")
print(schema)
Use delete_schema
to delete a schema object.
schema = o.delete_schema("test_schema")
Use get_schema
to obtain a schema object and print its owner.
schema = o.get_schema("test_schema")
print(schema.owner)
Use list_schema
to list all schemas in s project and print their names.
for schema in o.list_schema():
print(schema.name)
Handling objects in Schema
After schemas are enabled, calls on your MaxCompute entrance only affects objects in the schema named DEFAULT
by default. To handle objects in other schemas, you need to provide the name of the schema. For instance,
import os
from odps import ODPS
# Make sure environment variable ALIBABA_CLOUD_ACCESS_KEY_ID already set to Access Key ID of user
# while environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user.
# Not recommended to hardcode Access Key ID or Access Key Secret in your code.
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='**your-project**',
endpoint='**your-endpoint**',
schema='**your-schema-name**',
)
You can also specify names of schemas when handling MaxCompute objects. For instance, the code below lists all tables under the schema test_schema
.
for table in o.list_tables(schema='test_schema'):
print(table)
You can also specify name of the default schema when executing SQL statements.
o.execute_sql("SELECT * FROM dual", default_schema="test_schema")
For tables, if schema is not enabled in project, get_table
will handle x.y
as project.table
. When odps.namespace.schema
is enabled for current tenant, get_table
will handle x.y
as schema.table
, or it will be still handled as project.table
. If the option is not specified, you may configure options.always_enable_schema = True
in your Python code and then all table names like x.y
will be handled as schema.table
.
from odps import options
options.always_enable_schema = True
print(o.get_table("myschema.mytable"))