-
Notifications
You must be signed in to change notification settings - Fork 77
Closed
Description
According to previous discussions, we'd like to put job management, autoscaling, job priority scheduler to "controller", move the python implementation to go.
This change can make the website slimmer, also make the user management pluggable. Modules we should have after this changes:
- Static web pages for general information.
- Simple account management.
paddlectlsimply cache k8s keys and storage service keys (like ceph) and talk directly to k8s api-server using "TrainingJob" resource.controllerparse "TrainingJob" resource to paddle job, including master, pserver and trainer.- autoscaler runs in background and scale the current jobs in cluster.
- scheduler determine how much each job can consume using some GPU priority algorithm.
Reactions are currently unavailable