Skip to content

Commit 69a9538

Browse files
authored
Consistently Capitalize Fair Sharing - Docs (#4649)
1 parent c7decaf commit 69a9538

File tree

11 files changed

+39
-39
lines changed

11 files changed

+39
-39
lines changed

CHANGELOG/CHANGELOG-0.7.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ Changes since `v0.6.0`:
7272
- Add configuration to register Kinds as being managed by an external Kueue-compatible controller (#2059, @dgrove-oss)
7373
- Add fair sharing when borrowing unused resources from other ClusterQueues in a cohort.
7474

75-
Fair sharing is based on DRF for usage above nominal quotas.
75+
Fair Sharing is based on DRF for usage above nominal quotas.
7676
When fair sharing is enabled, Kueue prefers to admit workloads from ClusterQueues with the lowest share first.
7777
Administrators can enable and configure fair sharing preemption using a combination of two policies: `LessThanOrEqualtoFinalShare`, `LessThanInitialShare`.
7878

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Read the [overview](https://kueue.sigs.k8s.io/docs/overview/) and watch the Kueu
1818
## Features overview
1919

2020
- **Job management:** Support job queueing based on [priorities](https://kueue.sigs.k8s.io/docs/concepts/workload/#priority) with different [strategies](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#queueing-strategy): `StrictFIFO` and `BestEffortFIFO`.
21-
- **Advanced Resource management:** Comprising: [resource flavor fungibility](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#flavorfungibility), [fair sharing](https://kueue.sigs.k8s.io/docs/concepts/preemption/#fair-sharing), [cohorts](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#cohort) and [preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
21+
- **Advanced Resource management:** Comprising: [resource flavor fungibility](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#flavorfungibility), [Fair Sharing](https://kueue.sigs.k8s.io/docs/concepts/preemption/#fair-sharing), [cohorts](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#cohort) and [preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
2222
- **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](https://kueue.sigs.k8s.io/docs/tasks/run/jobs/), [Kubeflow training jobs](https://kueue.sigs.k8s.io/docs/tasks/run/kubeflow/), [RayJob](https://kueue.sigs.k8s.io/docs/tasks/run/rayjobs/), [RayCluster](https://kueue.sigs.k8s.io/docs/tasks/run/rayclusters/), [JobSet](https://kueue.sigs.k8s.io/docs/tasks/run/jobsets/), [plain Pod and Pod Groups](https://kueue.sigs.k8s.io/docs/tasks/run/plain_pods/).
2323
- **System insight:** Build-in [prometheus metrics](https://kueue.sigs.k8s.io/docs/reference/metrics/) to help monitor the state of the system, and on-demand visibility endpoint for [monitoring of pending workloads](https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/).
2424
- **AdmissionChecks:** A mechanism for internal or external components to influence whether a workload can be [admitted](https://kueue.sigs.k8s.io/docs/concepts/admission_check/).

keps/1714-fair-sharing/README.md

+21-21
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
<!-- /toc -->
2828

2929
## Summary
30-
This KEP introduces weight-based fair sharing of unused resources across
30+
This KEP introduces weight-based Fair Sharing of unused resources across
3131
needing ClusterQueues, respecting borrowing and lending limits, multi-level
3232
hierarchy and preferences of cohorts to distribute unused resources
3333
internally.
@@ -44,7 +44,7 @@ use the highest one, breaking the whole idea altogether). Thus a more comprehens
4444
solution of distributing unused resources is needed.
4545

4646
### Goals
47-
* Create a mechanism to enforce fair sharing of resources. Two equally important
47+
* Create a mechanism to enforce Fair Sharing of resources. Two equally important
4848
sub-organizations (Cohorts or ClusterQueues), placed in the similar spots in the
4949
whole organization (hierarchy of Cohorts), actively competing for the same
5050
resources (having workloads needing more than nominal quota), should
@@ -59,10 +59,10 @@ under the same parent.
5959
sub-organization and only after they have been fulfilled, proceed with distribution
6060
outside of the suborganization.
6161

62-
* When enforcing fair sharing, ignore workload priorities unless:
62+
* When enforcing Fair Sharing, ignore workload priorities unless:
6363

6464
* The workload's priority is above admin-defined high priority. Super high
65-
priority workloads overrule fair sharing and are treated according to KEP [#1337](https://github.com/kubernetes-sigs/kueue/tree/main/keps/1337-preempt-within-cohort-while-borrowing).
65+
priority workloads overrule Fair Sharing and are treated according to KEP [#1337](https://github.com/kubernetes-sigs/kueue/tree/main/keps/1337-preempt-within-cohort-while-borrowing).
6666

6767
* There is a need to preempt some non top priority workload from a ClusterQueue.
6868
Then the lowest priority workloads from a CQ that is over its fair share should
@@ -76,30 +76,30 @@ in particular:
7676
* Guaranteed/nominal quota
7777
* Hierarchical cohorts
7878

79-
* Fair sharing should not limit Kueue scalability. Kueue, with fair sharing enabled,
79+
* Fair Sharing should not limit Kueue scalability. Kueue, with Fair Sharing enabled,
8080
should be able to handle >1k ClusterQueues, >100 Cohorts and >10k workloads (that
8181
are either running or queued) within a single hierarchical organization.
8282

8383
* The proposed system should be hard to game, for example by creating big workloads
8484
that consume all capacity.
8585

86-
* Fair sharing enforcement should not significantly decrease overall utilization,
86+
* Fair Sharing enforcement should not significantly decrease overall utilization,
8787
however, pathological situations (like a single workload consuming all otherwise
88-
unused capacity) should be resolved in favor of fair sharing than maximizing the
88+
unused capacity) should be resolved in favor of Fair Sharing than maximizing the
8989
utilization (the big greedy workload should be preempted to admit smaller
9090
workloads from other CQ that consume only their fair share).
9191

9292
### Non-Goals
9393
* Use historical data (for example CQ A used a lot of shared capacity for the last
9494
week, so now it should get less because others, who didn't need anything then, have
95-
pending workloads). Fair sharing should be based on point-in-time situation, although,
96-
ideally it should be expandable to support history-based fair sharing(for example with #26)
95+
pending workloads). Fair Sharing should be based on point-in-time situation, although,
96+
ideally it should be expandable to support history-based Fair Sharing(for example with #26)
9797
without major redesign.
9898

99-
* Enable fair sharing only for some part of the resources or Cohort hierarchy.
100-
Fair sharing will be a global switch (at least initially).
99+
* Enable Fair Sharing only for some part of the resources or Cohort hierarchy.
100+
Fair Sharing will be a global switch (at least initially).
101101

102-
* Maximize utilization at the cost of fair sharing.
102+
* Maximize utilization at the cost of Fair Sharing.
103103

104104
## Proposal
105105

@@ -108,7 +108,7 @@ on top of the given nominal quota, that doesn't justify "complains" against
108108
any other similar CQ about excessive extra resources that CQ was given. Basically
109109
the sharing of unused resources is fair, if no CQ can say it is grossly unfair.
110110

111-
Introduce a global fair sharing mechanism that is based on preemptions. As long
111+
Introduce a global Fair Sharing mechanism that is based on preemptions. As long
112112
as there are some free and accessible resources in the cohort hierarchy, Kueue
113113
will admit workloads without any limits. However, once the capacity is gone,
114114
new workloads from Cohorts/CQ that have not received their fair share yet will
@@ -126,7 +126,7 @@ We will add an optional weight field to both Cohorts and CQ. The weight will
126126
indicate how to fair share resources between sub-organizations (CQs or Cohorts)
127127
under the same Cohort.
128128

129-
Fair sharing will be configured for the whole cluster, using the configuration
129+
Fair Sharing will be configured for the whole cluster, using the configuration
130130
file. In Alpha it will be just a feature gate.
131131

132132
### User Stories (Optional)
@@ -176,13 +176,13 @@ to execute. I want to distribute the resources from CS in the following way:
176176

177177
### Risks and Mitigations
178178

179-
* Fair sharing may increase the number of preemptions in the system vs current
179+
* Fair Sharing may increase the number of preemptions in the system vs current
180180
state where the first workload to acquire unused resources keeps them until it
181181
finishes. Mitigations include:
182-
* Introduce minimum execution time before workloads can be preempted for fair sharing.
182+
* Introduce minimum execution time before workloads can be preempted for Fair Sharing.
183183
* Introduce delayed fair share enforcement - new workloads have to wait a bit before preempting others to get their share.
184184

185-
* Fair sharing may decrease utilization of the unused resources while attempting to
185+
* Fair Sharing may decrease utilization of the unused resources while attempting to
186186
distribute them fairly, vs provide the tighties bin-packing. To avoid these scenarios,
187187
users should prefer to run massive workloads under nominal quotas.
188188

@@ -255,7 +255,7 @@ When a CQ x fails to admit a workload w, one of the following scenarios may occu
255255

256256
* [S2] A sub-organization to which CQ x belongs is borrowing, however it seems that it is
257257
borrowing too little compared to other sub-organizations that are also borrowing, so some
258-
action is needed to enforce fair sharing. We should compare how much the sub-orgs are
258+
action is needed to enforce Fair Sharing. We should compare how much the sub-orgs are
259259
borrowing compared to each other, and preempt some workloads up to the point when its fair
260260
share would be smaller than the sub-org for which preemptions are executed.
261261

@@ -291,7 +291,7 @@ For each workload z in y we check whether if:
291291

292292
[S2-a] value of AlmostLCA(y,x) **without z** is still higher (or equal) than value of AlmostLCA(x,y)
293293
with admitted workload w. Y’s sub-orgs will still be better than X’s sub-org after we
294-
preempt z and admit w, thus z is a reasonable candidate to re-balance fair sharing.
294+
preempt z and admit w, thus z is a reasonable candidate to re-balance Fair Sharing.
295295

296296

297297
[S2-b] value of AlmostLCA(y,x) (with z) is strictly higher than AlmostLCA(x,y) with admitted workload w.
@@ -355,8 +355,8 @@ This is a very complex feature so there will be lots of unit, integration and e2
355355
#### Alpha
356356

357357
The alpha will be split among two releases.
358-
Release v0.7 will only implement fair sharing in the existing flat structure (cohorts don’t have parents).
359-
Release v0.8 will incorporate fair sharing with arbitrary hierarchies (KEP #79)
358+
Release v0.7 will only implement Fair Sharing in the existing flat structure (cohorts don’t have parents).
359+
Release v0.8 will incorporate Fair Sharing with arbitrary hierarchies (KEP #79)
360360

361361
The following metrics will be added:
362362
* ClusterQueue fairness value

keps/1714-fair-sharing/kep.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
title: Fair sharing
1+
title: Fair Sharing
22
kep-number: 1714
33
authors:
44
- "@mwielgus"

keps/79-hierarchical-cohorts/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ team and quota/budget structures.
4343

4444
* Change the existing API and mechanics in a not backward-compatible way.
4545
* Introduce an alternative API to ClusterQueue.
46-
* Introduce new ways of fair sharing, like ratio-based sharing (at least
46+
* Introduce new ways of Fair Sharing, like ratio-based sharing (at least
4747
not in this KEP).
4848
* Introduce additional preemption models (this will be in a separate KEP).
4949

site/content/en/docs/concepts/_index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ resources such as availability, pricing, architecture, models, etc.
2222
### [Cluster Queue](/docs/concepts/cluster_queue)
2323

2424
A cluster-scoped resource that governs a pool of resources, defining usage
25-
limits and fair sharing rules.
25+
limits and Fair Sharing rules.
2626

2727
### [Local Queue](/docs/concepts/local_queue)
2828

@@ -75,7 +75,7 @@ A _cohort_ is a group of ClusterQueues that can borrow unused quota from each ot
7575

7676
_Queueing_ is the state of a Workload since the time it is created until Kueue admits it on a ClusterQueue.
7777
Typically, the Workload will compete with other Workloads for available
78-
quota based on the fair sharing rules of the ClusterQueue.
78+
quota based on the Fair Sharing rules of the ClusterQueue.
7979

8080
### [Preemption](/docs/concepts/preemption)
8181

site/content/en/docs/concepts/cluster_queue.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@ title: "Cluster Queue"
33
date: 2023-03-14
44
weight: 3
55
description: >
6-
A cluster-scoped resource that governs a pool of resources, defining usage limits and fair sharing rules.
6+
A cluster-scoped resource that governs a pool of resources, defining usage limits and Fair Sharing rules.
77
---
88

99
A ClusterQueue is a cluster-scoped object that governs a pool of resources
1010
such as pods, CPU, memory, and hardware accelerators. A ClusterQueue defines:
1111

1212
- The quotas for the [resource _flavors_](/docs/concepts/resource_flavor) that the ClusterQueue manages,
1313
with usage limits and order of consumption.
14-
- Fair sharing rules across the multiple ClusterQueues in the cluster.
14+
- Fair Sharing rules across the multiple ClusterQueues in the cluster.
1515

1616
Only [batch administrators](/docs/tasks#batch-administrator) should create `ClusterQueue` objects.
1717

site/content/en/docs/concepts/preemption.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ already above the nominal quota. The algorithms are:
6666

6767
This algorithm is the most lightweight of the two.
6868

69-
- **[Fair sharing](#fair-sharing)**: ClusterQueues with pending Workloads can preempt other Workloads in their cohort
69+
- **[Fair Sharing](#fair-sharing)**: ClusterQueues with pending Workloads can preempt other Workloads in their cohort
7070
until the preempting ClusterQueue obtains an equal or weighted share of the borrowable resources.
7171
The borrowable resources are the unused nominal quota of all the ClusterQueues in the cohort.
7272

@@ -118,10 +118,10 @@ admitted when accounting back the quota usage of the target Workload.
118118

119119
## Fair Sharing
120120

121-
Fair sharing introduces the concepts of ClusterQueue share values and preemption
121+
Fair Sharing introduces the concepts of ClusterQueue share values and preemption
122122
strategies. These work together with the preemption policies set in
123123
`withinClusterQueue` and `reclaimWithinCohort` (but __not__ `borrowWithinCohort`) to determine if a pending
124-
Workload can preempt an admitted Workload in Fair sharing. Fair sharing uses preemptions to
124+
Workload can preempt an admitted Workload in Fair Sharing. Fair Sharing uses preemptions to
125125
achieve an equal or weighted share of the borrowable resources between the
126126
tenants of a cohort.
127127

@@ -132,7 +132,7 @@ parent) as of v0.11. Using these features together in V0.9 and V0.10 is
132132
unsupported, and results in undefined behavior.
133133
{{% /alert %}}
134134

135-
To enable fair sharing, [use a Kueue Configuration](/docs/installation#install-a-custom-configured-release-version) similar to the following:
135+
To enable Fair Sharing, [use a Kueue Configuration](/docs/installation#install-a-custom-configured-release-version) similar to the following:
136136

137137
```yaml
138138
apiVersion: config.kueue.x-k8s.io/v1beta1
@@ -146,7 +146,7 @@ The attributes in this Kueue Configuration are described in the following sectio
146146

147147
### ClusterQueue share value
148148

149-
When you enable fair sharing, Kueue assigns a numeric share value to each ClusterQueue to summarize
149+
When you enable Fair Sharing, Kueue assigns a numeric share value to each ClusterQueue to summarize
150150
the usage of borrowed resources in a ClusterQueue, in comparison to others in the same cohort.
151151
The share value is weighted by the `.spec.fairSharing.weight` defined in a ClusterQueue.
152152

site/content/en/docs/overview/_index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ You can install Kueue on top of a vanilla Kubernetes cluster. Kueue does not rep
1717

1818
Kueue APIs allow you to express:
1919

20-
* Quotas and policies for fair sharing among tenants.
20+
* Quotas and policies for Fair Sharing among tenants.
2121
* Resource fungibility: if a resource flavor is fully utilized, Kueue can admit the job using a different flavor.
2222

2323
A core design principle for Kueue is to avoid duplicating mature functionality in Kubernetes components and well-established third-party controllers. Autoscaling, pod-to-node scheduling and job lifecycle management are the responsibility of cluster-autoscaler, kube-scheduler and kube-controller-manager, respectively. Advanced admission control can be delegated to controllers such as gatekeeper.
2424

2525
## Features overview
2626

2727
- **Job management:** Support job queueing based on [priorities](/docs/concepts/workload/#priority) with different [strategies](/docs/concepts/cluster_queue/#queueing-strategy): `StrictFIFO` and `BestEffortFIFO`.
28-
- **Advanced Resource management:** Comprising: [resource flavor fungibility](/docs/concepts/cluster_queue/#flavorfungibility), [fair sharing](/docs/concepts/preemption/#fair-sharing), [cohorts](/docs/concepts/cluster_queue/#cohort) and [preemption](/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
28+
- **Advanced Resource management:** Comprising: [resource flavor fungibility](/docs/concepts/cluster_queue/#flavorfungibility), [Fair Sharing](/docs/concepts/preemption/#fair-sharing), [cohorts](/docs/concepts/cluster_queue/#cohort) and [preemption](/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
2929
- **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](/docs/tasks/run/jobs/), [Kubeflow training jobs](/docs/tasks/run/kubeflow/), [RayJob](/docs/tasks/run/rayjobs/), [RayCluster](/docs/tasks/run/rayclusters/), [JobSet](/docs/tasks/run/jobsets/), [AppWrappers](/docs/tasks/run/appwrappers/), [plain Pod and Pod Groups](/docs/tasks/run/plain_pods/).
3030
- **System insight:** Build-in [prometheus metrics](/docs/reference/metrics/) to help monitor the state of the system, and on-demand visibility endpoint for [monitoring of pending workloads](/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/).
3131
- **AdmissionChecks:** A mechanism for internal or external components to influence whether a workload can be [admitted](/docs/concepts/admission_check/).

site/content/en/docs/reference/metrics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Use the following metrics to monitor the status of your ClusterQueues:
3737
| `kueue_cluster_queue_status` | Gauge | Reports the status of the ClusterQueue | `cluster_queue`: The name of the ClusterQueue<br> `status`: Possible values are `pending`, `active` or `terminated`. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses. |
3838
| `kueue_reserving_active_workloads` | Gauge | The number of Workloads that are reserving quota, per `cluster_queue`. | `cluster_queue`: the name of the ClusterQueue |
3939
| `kueue_admission_cycle_preemption_skips` | Gauge | The number of Workloads in the ClusterQueue that got preemption candidates but had to be skipped because other ClusterQueues needed the same resources in the same cycle | `cluster_queue`: the name of the ClusterQueue |
40-
| `kueue_preempted_workloads_total` | Counter | The number of preempted workloads per `preempting_cluster_queue` | `preempting_cluster_queue`: the name of the ClusterQueue<br> `reason`: possible values are `InClusterQueue` means that the workload was preempted by a workload in the same ClusterQueue; `InCohortReclamation` means that the workload was preempted by a workload in the same cohort due to reclamation of nominal quota; `InCohortFairSharing` means that the workload was preempted by a workload in the same cohort due to fair sharing; `InCohortReclaimWhileBorrowing` means that the workload was preempted by a workload in the same cohort due to reclamation of nominal quota while borrowing |
40+
| `kueue_preempted_workloads_total` | Counter | The number of preempted workloads per `preempting_cluster_queue` | `preempting_cluster_queue`: the name of the ClusterQueue<br> `reason`: possible values are `InClusterQueue` means that the workload was preempted by a workload in the same ClusterQueue; `InCohortReclamation` means that the workload was preempted by a workload in the same cohort due to reclamation of nominal quota; `InCohortFairSharing` means that the workload was preempted by a workload in the same cohort due to Fair Sharing; `InCohortReclaimWhileBorrowing` means that the workload was preempted by a workload in the same cohort due to reclamation of nominal quota while borrowing |
4141

4242
## LocalQueue Status (alpha)
4343

0 commit comments

Comments
 (0)