Skip to content

Commit

Permalink
docs: answers to the woke police
Browse files Browse the repository at this point in the history
  • Loading branch information
fstagni committed Dec 9, 2024
1 parent 6b374aa commit e9302bb
Show file tree
Hide file tree
Showing 16 changed files with 80 additions and 80 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Normally, services are always exposed on the same port, which is defined in the

As a general rule, services can be duplicated,
meaning you can have the same service running on multiple hosts, thus reducing the load.
There are only 2 cases of DIRAC services that have a "master/slave" concept, and these are the Configuration Service
There are only 2 cases of DIRAC services that have a "controller/worker" concept, and these are the Configuration Service
and the Accounting/DataStore service.
The WorkloadManagement/Matcher service should also not be duplicated.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ operation is the registration of the new host in the already functional Configur

#
# These options define the DIRAC components being installed on "this" DIRAC server.
# The simplest option is to install a slave of the Configuration Server and a
# The simplest option is to install a worker of the Configuration Server and a
# SystemAdministrator for remote management.
#
# The following options defined components to be installed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,11 @@ Services
+--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+
| **System** | **Component** |**Duplicate**| **Remarque** | **HTTPs** +
+--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+
| Accounting | :mod:`DataStore <DIRAC.AccountingSystem.Service.DataStoreHandler>` | PARTIAL | One master and helpers (See :ref:`datastorehelpers`) | +
| Accounting | :mod:`DataStore <DIRAC.AccountingSystem.Service.DataStoreHandler>` | PARTIAL | One controller and helpers (See :ref:`datastorehelpers`) | +
+ +---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+
| | :mod:`ReportGenerator <DIRAC.AccountingSystem.Service.ReportGeneratorHandler>` | | | +
+--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+
| Configuration | :mod:`Configuration <DIRAC.ConfigurationSystem.Service.ConfigurationHandler>` | PARTIAL | One master (rw) and slaves (ro). It's advised to have several CS slaves | YES +
| Configuration | :mod:`Configuration <DIRAC.ConfigurationSystem.Service.ConfigurationHandler>` | PARTIAL | One controller (rw) and workers (ro). Should have several CS workers | YES +
+--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+
| DataManagement | :mod:`DataIntegrity <DIRAC.DataManagementSystem.Service.DataIntegrityHandler>` | YES | | YES +
+ +---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+
Expand Down
20 changes: 10 additions & 10 deletions docs/source/AdministratorGuide/Systems/Configuration/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,22 @@ Configuration System
====================

The configuration system serves the configuration to any other client (be it another server or a standard client).
The infrastructure is master/slave based.
The infrastructure is controller/worker based.

******
Master
******
**********
Controller
**********

The master Server holds the central configuration in a local file. This file is then served to the clients, and synchronized with the slave servers.
The controller Server holds the central configuration in a local file. This file is then served to the clients, and synchronized with the worker servers.

the master server also regularly pings the slave servers to make sure they are still alive. If not, they are removed from the list of CS.
the controller server also regularly pings the worker servers to make sure they are still alive. If not, they are removed from the list of CS.

When changes are committed to the master, a backup of the existing configuration file is made in ``etc/csbackup``.
When changes are committed to the controller, a backup of the existing configuration file is made in ``etc/csbackup``.

******
Slaves
Workers
******

Slave server registers themselves to the master when starting.
worker server registers themselves to the controller when starting.
They synchronize their configuration on a regular bases (every 5 minutes by default).
Note that the slave CS do not hold the configuration in a local file, but only in memory.
Note that the worker CS do not hold the configuration in a local file, but only in memory.
2 changes: 1 addition & 1 deletion docs/source/DeveloperGuide/Overview/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ Configuration Service
The Configuration Service is built in the DISET framework to provide static configuration parameters to
all the distributed DIRAC components. This is the backbone of the whole system and necessitates excellent
reliability. Therefore, it is organized as a single master service where all the parameter
updates are done and multiple read-only slave services which are distributed geographically, on VO-boxes
updates are done and multiple read-only worker services which are distributed geographically, on VO-boxes
at Tier-1 LCG sites in the case of LHCb. All the servers are queried by clients in a load balancing way.
This arrangement ensures configuration data consistency together with very good scalability properties.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ The agent will try to execute request as a whole in one go.
:alt: Treating of Request in the RequestExecutionAgent.
:align: center

The `RequestExecutingAgent` is using the `ProcessPool` utility to create slave workers (subprocesses running `RequestTask`)
The `RequestExecutingAgent` is using the `ProcessPool` utility to create workers (subprocesses running `RequestTask`)
designated to execute requests read from `ReqDB`. Each worker is processing request execution using following steps:

* downloading and setting up request's owner proxy
Expand Down
4 changes: 2 additions & 2 deletions src/DIRAC/AccountingSystem/Service/DataStoreHandler.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
""" DataStore is the service for inserting accounting reports (rows) in the Accounting DB
This service CAN be duplicated iff the first is a "master" and all the others are slaves.
This service CAN be duplicated iff the first is a "controller" and all the others are workers.
See the information about :ref:`datastorehelpers`.
.. literalinclude:: ../ConfigTemplate.cfg
Expand Down Expand Up @@ -171,7 +171,7 @@ def export_compactDB(self):
"""
Compact the db by grouping buckets
"""
# if we are running slaves (not only one service) we can redirect the request to the master
# if we are running workers (not only one service) we can redirect the request to the master
# For more information please read the Administrative guide Accounting part!
# ADVICE: If you want to trigger the bucketing, please make sure the bucketing is not running!!!!
if self.runBucketing:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ class ConfigurationClient(Client):

def __init__(self, **kwargs):
# By default we use Configuration/Server as url, client do the resolution
# In some case url has to be static (when a slave register to the master server for example)
# In some case url has to be static (when a worker register to the master server for example)
# It's why we can use 'url' as keyword arguments
if "url" not in kwargs:
kwargs["url"] = "Configuration/Server"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ def export_getCompressedDataIfNewer(self, sClientVersion):

def export_publishSlaveServer(self, sURL):
"""
Used by slave server to register as a slave server.
Used by worker server to register as a worker server.
:param sURL: The url of the slave server.
:param sURL: The url of the worker server.
"""
self.ServiceInterface.publishSlaveServer(sURL)
return S_OK()
Expand Down
2 changes: 1 addition & 1 deletion src/DIRAC/ConfigurationSystem/private/Refresher.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def autoRefreshAndPublish(self, sURL):
"""
gLogger.debug("Setting configuration refresh as automatic")
if not gConfigurationData.getAutoPublish():
gLogger.debug("Slave server won't auto publish itself")
gLogger.debug("Worker server won't auto publish itself")
if not gConfigurationData.getName():
import DIRAC

Expand Down
2 changes: 1 addition & 1 deletion src/DIRAC/ConfigurationSystem/private/RefresherBase.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def _refreshAndPublish(self):
gLogger.error("Can't publish to master server", dRetVal["Message"])
return True
else:
gLogger.warn("No master server is specified in the configuration, trying to get data from other slaves")
gLogger.warn("No master server is specified in the configuration, trying to get data from other Workers")
return self._refresh()["OK"]

def _refresh(self, fromMaster=False):
Expand Down
16 changes: 8 additions & 8 deletions src/DIRAC/ConfigurationSystem/private/ServiceInterface.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,31 @@

class ServiceInterface(ServiceInterfaceBase, threading.Thread):
"""
Service interface, manage Slave/Master server for CS
Service interface, manage Worker/Controller server for CS
Thread components
"""

def __init__(self, sURL):
threading.Thread.__init__(self)
ServiceInterfaceBase.__init__(self, sURL)

def _launchCheckSlaves(self):
def _launchCheckWorkers(self):
"""
Start loop which check if slaves are alive
Start loop which check if Workers are alive
"""
gLogger.info("Starting purge slaves thread")
gLogger.info("Starting purge Workers thread")
self.daemon = True
self.start()

def run(self):
while True:
iWaitTime = gConfigurationData.getSlavesGraceTime()
time.sleep(iWaitTime)
self._checkSlavesStatus()
self._checkWorkersStatus()

def _updateServiceConfiguration(self, urlSet, fromMaster=False):
"""
Update configuration of a set of slave services in parallel
Update configuration of a set of Worker services in parallel
:param set urlSet: a set of service URLs
:param fromMaster: flag to force updating from the master CS
Expand All @@ -49,6 +49,6 @@ def _updateServiceConfiguration(self, urlSet, fromMaster=False):
url = futureUpdate[future]
result = future.result()
if result["OK"]:
gLogger.info("Successfully updated slave configuration", url)
gLogger.info("Successfully updated Worker configuration", url)
else:
gLogger.error("Failed to update slave configuration", url)
gLogger.error("Failed to update Worker configuration", url)
82 changes: 41 additions & 41 deletions src/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Service interface is the service which provide config for client and synchronize Master/Slave servers"""
"""Service interface is the service which provide config for client and synchronize Master/Worker servers"""

import os
import time
Expand All @@ -17,27 +17,27 @@


class ServiceInterfaceBase:
"""Service interface is the service which provide config for client and synchronize Master/Slave servers"""
"""Service interface is the service which provide config for client and synchronize Master/Worker servers"""

def __init__(self, sURL):
self.sURL = sURL
gLogger.info("Initializing Configuration Service", f"URL is {sURL}")
self.__modificationsIgnoreMask = ["/DIRAC/Configuration/Servers", "/DIRAC/Configuration/Version"]
gConfigurationData.setAsService()
if not gConfigurationData.isMaster():
gLogger.info("Starting configuration service as slave")
gLogger.info("Starting configuration service as Worker")
gRefresher.autoRefreshAndPublish(self.sURL)
else:
gLogger.info("Starting configuration service as master")
gRefresher.disable()
self.__loadConfigurationData()
self.dAliveSlaveServers = {}
self._launchCheckSlaves()
self.dAliveWorkerServers = {}
self._launchCheckWorkers()

def isMaster(self):
return gConfigurationData.isMaster()

def _launchCheckSlaves(self):
def _launchCheckWorkers(self):
raise NotImplementedError("Should be implemented by the children class")

def __loadConfigurationData(self):
Expand Down Expand Up @@ -75,50 +75,50 @@ def __generateNewVersion(self):
gConfigurationData.generateNewVersion()
gConfigurationData.writeRemoteConfigurationToDisk()

def publishSlaveServer(self, sSlaveURL):
def publishSlaveServer(self, sWorkerURL):
"""
Called by the slave server via service, it register a new slave server
Called by the Worker server via service, it register a new Worker server
:param sSlaveURL: url of slave server
:param sWorkerURL: url of Worker server
"""

if not gConfigurationData.isMaster():
return S_ERROR("Configuration modification is not allowed in this server")
gLogger.info(f"Pinging slave {sSlaveURL}")
rpcClient = ConfigurationClient(url=sSlaveURL, timeout=10, useCertificates=True)
gLogger.info(f"Pinging Worker {sWorkerURL}")
rpcClient = ConfigurationClient(url=sWorkerURL, timeout=10, useCertificates=True)
retVal = rpcClient.ping()
if not retVal["OK"]:
gLogger.info(f"Slave {sSlaveURL} didn't reply")
gLogger.info(f"Worker {sWorkerURL} didn't reply")
return
if retVal["Value"]["name"] != "Configuration/Server":
gLogger.info(f"Slave {sSlaveURL} is not a CS serveR")
gLogger.info(f"Worker {sWorkerURL} is not a CS serveR")
return
bNewSlave = False
if sSlaveURL not in self.dAliveSlaveServers:
bNewSlave = True
gLogger.info("New slave registered", sSlaveURL)
self.dAliveSlaveServers[sSlaveURL] = time.time()
if bNewSlave:
gConfigurationData.setServers(", ".join(self.dAliveSlaveServers))
bNewWorker = False
if sWorkerURL not in self.dAliveWorkerServers:
bNewWorker = True
gLogger.info("New Worker registered", sWorkerURL)
self.dAliveWorkerServers[sWorkerURL] = time.time()
if bNewWorker:
gConfigurationData.setServers(", ".join(self.dAliveWorkerServers))
self.__generateNewVersion()

def _checkSlavesStatus(self, forceWriteConfiguration=False):
def _checkWorkersStatus(self, forceWriteConfiguration=False):
"""
Check if Slaves server are still availlable
Check if Workers server are still availlable
:param forceWriteConfiguration: (default False) Force rewriting configuration after checking slaves
:param forceWriteConfiguration: (default False) Force rewriting configuration after checking workers
"""

gLogger.info("Checking status of slave servers")
gLogger.info("Checking status of Worker servers")
iGraceTime = gConfigurationData.getSlavesGraceTime()
bModifiedSlaveServers = False
for sSlaveURL in list(self.dAliveSlaveServers):
if time.time() - self.dAliveSlaveServers[sSlaveURL] > iGraceTime:
gLogger.warn("Found dead slave", sSlaveURL)
del self.dAliveSlaveServers[sSlaveURL]
bModifiedSlaveServers = True
if bModifiedSlaveServers or forceWriteConfiguration:
gConfigurationData.setServers(", ".join(self.dAliveSlaveServers))
bModifiedWorkerServers = False
for sWorkerURL in list(self.dAliveWorkerServers):
if time.time() - self.dAliveWorkerServers[sWorkerURL] > iGraceTime:
gLogger.warn("Found dead Worker", sWorkerURL)
del self.dAliveWorkerServers[sWorkerURL]
bModifiedWorkerServers = True
if bModifiedWorkerServers or forceWriteConfiguration:
gConfigurationData.setServers(", ".join(self.dAliveWorkerServers))
self.__generateNewVersion()

@staticmethod
Expand Down Expand Up @@ -147,18 +147,18 @@ def _updateServiceConfiguration(self, urlSet, fromMaster=False):
"""
raise NotImplementedError("Should be implemented by the children class")

def forceSlavesUpdate(self):
def forceWorkersUpdate(self):
"""
Force updating configuration on all the slave configuration servers
Force updating configuration on all the Worker configuration servers
:return: Nothing
"""
gLogger.info("Updating configuration on slave servers")
gLogger.info("Updating configuration on Worker servers")
iGraceTime = gConfigurationData.getSlavesGraceTime()
urlSet = set()
for slaveURL in self.dAliveSlaveServers:
if time.time() - self.dAliveSlaveServers[slaveURL] <= iGraceTime:
urlSet.add(slaveURL)
for workerURL in self.dAliveWorkerServers:
if time.time() - self.dAliveWorkerServers[workerURL] <= iGraceTime:
urlSet.add(workerURL)
self._updateServiceConfiguration(urlSet, fromMaster=True)

def forceGlobalUpdate(self):
Expand Down Expand Up @@ -233,14 +233,14 @@ def updateConfiguration(self, sBuffer, committer="", updateVersionOption=False):
gConfigurationData.unlock()
gLogger.info("Generating new version")
gConfigurationData.generateNewVersion()
# self.__checkSlavesStatus( forceWriteConfiguration = True )
# self.__checkWorkersStatus( forceWriteConfiguration = True )
gLogger.info("Writing new version to disk")
retVal = gConfigurationData.writeRemoteConfigurationToDisk(f"{committer}@{gConfigurationData.getVersion()}")
gLogger.info("New version", gConfigurationData.getVersion())

# Attempt to update the configuration on currently registered slave services
# Attempt to update the configuration on currently registered Worker services
if gConfigurationData.getAutoSlaveSync():
self.forceSlavesUpdate()
self.forceWorkersUpdate()

return retVal

Expand Down
10 changes: 5 additions & 5 deletions src/DIRAC/ConfigurationSystem/private/ServiceInterfaceTornado.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,20 +17,20 @@ class ServiceInterfaceTornado(ServiceInterfaceBase):
def __init__(self, sURL):
ServiceInterfaceBase.__init__(self, sURL)

def _launchCheckSlaves(self):
def _launchCheckWorkers(self):
"""
Start loop to check if slaves are alive
Start loop to check if workers are alive
"""
IOLoop.current().spawn_callback(self.run)
gLogger.info("Starting purge slaves thread")
gLogger.info("Starting purge workers thread")

def run(self):
"""
Check if slaves are alive
Check if workers are alive
"""
while True:
yield gen.sleep(gConfigurationData.getSlavesGraceTime())
self._checkSlavesStatus()
self._checkWorkersStatus()

def _updateServiceConfiguration(self, urlSet, fromMaster=False):
"""
Expand Down
Loading

0 comments on commit e9302bb

Please sign in to comment.