data-job

DataJob CRD, GVK: data.demo.orkestra.io/v1alpha1, Kind=DataJob

healthy data-platform 0.1.0 uptime 52h44m37s

Overview

data-job is a custom resource managed by the data-platform operator running on Orkestra. Resources of this type are namespace-scoped.

FieldValue
API Versiondata.demo.orkestra.io/v1alpha1
KindDataJob
GVR (plural)data.demo.orkestra.io/v1alpha1, Resource=datajobs
ScopeNamespaced

Reconcile Mode

This CRD runs in dynamic mode. Orkestra works directly with the raw unstructured Kubernetes object — no Go code is required. Reconcile logic is expressed declaratively in the Katalog YAML using template expressions like {{ .spec.field }}.

The Generic Reconciler manages the full CR lifecycle: it ensures managed labels are set on every reconcile, adds and removes finalizers, runs onCreate and onReconcile template blocks, emits events, increments metrics, and reports health status.


Configuration

The operator maintains 3 worker goroutines to process reconcile events concurrently. Each worker dequeues one CR key at a time, reconciles it, and returns. The queue has a maximum depth of 100 events.

Orkestra resyncs all managed resources every 15s by re-enqueueing every CR key. This ensures drift caused by external changes is corrected even without a Kubernetes watch event.


Child Resources

When the operator reconciles a data-job instance it creates and manages the following Kubernetes resources on its behalf. These are owned by the CR via owner references and are deleted automatically when the CR is deleted (unless deletion protection is active).

Resources listed under onCreate are created on the first reconcile. Resources listed under onReconcile are re-applied on every reconcile cycle. A resource appearing in both phases is created once and kept in sync thereafter.

Kind Count Lifecycle phases
ConfigMap 1 onCreate
ReplicaSet 1 onReconcile

To see the actual child resources created for a running instance, navigate to the instance's detail page from the data-platform control panel.


kubectl Reference

Use the commands below to interact with data-job resources from the command line.

List resources

kubectl get datajob -n <namespace>

Describe a resource

kubectl describe datajob <name> -n <namespace>

Get YAML

kubectl get datajob <name> -n <namespace> -o yaml

Watch for changes

kubectl get datajob -n <namespace> -w

Delete a resource

kubectl delete datajob <name> -n <namespace>

Filter by Orkestra managed label

kubectl get datajob -l orkestra.orkspace.io/managed=true -n <namespace>

Access Control

The operator holds the following RBAC permissions to manage data-job resources. 1 configmaps, 1 datajobs, 1 datajobs/status

API GroupsResourcesVerbs
data.demo.orkestra.io datajobs get list watch create update patch delete
data.demo.orkestra.io datajobs/status get update patch
core configmaps get list watch create update patch delete