diff --git a/docs/source/AdministratorGuide/Introduction/configurationbasics.rst b/docs/source/AdministratorGuide/Introduction/configurationbasics.rst index eee88d19b27..2c2b85f5c1e 100644 --- a/docs/source/AdministratorGuide/Introduction/configurationbasics.rst +++ b/docs/source/AdministratorGuide/Introduction/configurationbasics.rst @@ -16,7 +16,7 @@ Normally, services are always exposed on the same port, which is defined in the As a general rule, services can be duplicated, meaning you can have the same service running on multiple hosts, thus reducing the load. -There are only 2 cases of DIRAC services that have a "master/slave" concept, and these are the Configuration Service +There are only 2 cases of DIRAC services that have a "controller/worker" concept, and these are the Configuration Service and the Accounting/DataStore service. The WorkloadManagement/Matcher service should also not be duplicated. diff --git a/docs/source/AdministratorGuide/ServerInstallations/InstallingDiracServer.rst b/docs/source/AdministratorGuide/ServerInstallations/InstallingDiracServer.rst index 551e82b3dc0..9b8316e5b4d 100644 --- a/docs/source/AdministratorGuide/ServerInstallations/InstallingDiracServer.rst +++ b/docs/source/AdministratorGuide/ServerInstallations/InstallingDiracServer.rst @@ -443,7 +443,7 @@ operation is the registration of the new host in the already functional Configur # # These options define the DIRAC components being installed on "this" DIRAC server. - # The simplest option is to install a slave of the Configuration Server and a + # The simplest option is to install a worker of the Configuration Server and a # SystemAdministrator for remote management. # # The following options defined components to be installed diff --git a/docs/source/AdministratorGuide/ServerInstallations/scalingAndLimitations.rst b/docs/source/AdministratorGuide/ServerInstallations/scalingAndLimitations.rst index c6a5dc6b1ff..cddc9fd585f 100644 --- a/docs/source/AdministratorGuide/ServerInstallations/scalingAndLimitations.rst +++ b/docs/source/AdministratorGuide/ServerInstallations/scalingAndLimitations.rst @@ -90,11 +90,11 @@ Services +--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+ | **System** | **Component** |**Duplicate**| **Remarque** | **HTTPs** + +--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+ -| Accounting | :mod:`DataStore ` | PARTIAL | One master and helpers (See :ref:`datastorehelpers`) | + +| Accounting | :mod:`DataStore ` | PARTIAL | One controller and helpers (See :ref:`datastorehelpers`) | + + +---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+ | | :mod:`ReportGenerator ` | | | + +--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+ -| Configuration | :mod:`Configuration ` | PARTIAL | One master (rw) and slaves (ro). It's advised to have several CS slaves | YES + +| Configuration | :mod:`Configuration ` | PARTIAL | One controller (rw) and workers (ro). Should have several CS workers | YES + +--------------------+---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+ | DataManagement | :mod:`DataIntegrity ` | YES | | YES + + +---------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------+-----------+ diff --git a/docs/source/AdministratorGuide/Systems/Configuration/index.rst b/docs/source/AdministratorGuide/Systems/Configuration/index.rst index 22a02e7c5e7..0323b27c119 100644 --- a/docs/source/AdministratorGuide/Systems/Configuration/index.rst +++ b/docs/source/AdministratorGuide/Systems/Configuration/index.rst @@ -5,22 +5,22 @@ Configuration System ==================== The configuration system serves the configuration to any other client (be it another server or a standard client). -The infrastructure is master/slave based. +The infrastructure is controller/worker based. -****** -Master -****** +********** +Controller +********** -The master Server holds the central configuration in a local file. This file is then served to the clients, and synchronized with the slave servers. +The controller Server holds the central configuration in a local file. This file is then served to the clients, and synchronized with the worker servers. -the master server also regularly pings the slave servers to make sure they are still alive. If not, they are removed from the list of CS. +the controller server also regularly pings the worker servers to make sure they are still alive. If not, they are removed from the list of CS. -When changes are committed to the master, a backup of the existing configuration file is made in ``etc/csbackup``. +When changes are committed to the controller, a backup of the existing configuration file is made in ``etc/csbackup``. ****** -Slaves +Workers ****** -Slave server registers themselves to the master when starting. +worker server registers themselves to the controller when starting. They synchronize their configuration on a regular bases (every 5 minutes by default). -Note that the slave CS do not hold the configuration in a local file, but only in memory. +Note that the worker CS do not hold the configuration in a local file, but only in memory. diff --git a/docs/source/DeveloperGuide/Overview/index.rst b/docs/source/DeveloperGuide/Overview/index.rst index 22c4c8353a9..89be55b3fb9 100644 --- a/docs/source/DeveloperGuide/Overview/index.rst +++ b/docs/source/DeveloperGuide/Overview/index.rst @@ -132,7 +132,7 @@ Configuration Service The Configuration Service is built in the DISET framework to provide static configuration parameters to all the distributed DIRAC components. This is the backbone of the whole system and necessitates excellent reliability. Therefore, it is organized as a single master service where all the parameter -updates are done and multiple read-only slave services which are distributed geographically, on VO-boxes +updates are done and multiple read-only worker services which are distributed geographically, on VO-boxes at Tier-1 LCG sites in the case of LHCb. All the servers are queried by clients in a load balancing way. This arrangement ensures configuration data consistency together with very good scalability properties. diff --git a/docs/source/DeveloperGuide/Systems/RequestManagement/index.rst b/docs/source/DeveloperGuide/Systems/RequestManagement/index.rst index 8dfc4a73d10..393cee33079 100644 --- a/docs/source/DeveloperGuide/Systems/RequestManagement/index.rst +++ b/docs/source/DeveloperGuide/Systems/RequestManagement/index.rst @@ -201,7 +201,7 @@ The agent will try to execute request as a whole in one go. :alt: Treating of Request in the RequestExecutionAgent. :align: center -The `RequestExecutingAgent` is using the `ProcessPool` utility to create slave workers (subprocesses running `RequestTask`) +The `RequestExecutingAgent` is using the `ProcessPool` utility to create workers (subprocesses running `RequestTask`) designated to execute requests read from `ReqDB`. Each worker is processing request execution using following steps: * downloading and setting up request's owner proxy diff --git a/src/DIRAC/AccountingSystem/Service/DataStoreHandler.py b/src/DIRAC/AccountingSystem/Service/DataStoreHandler.py index 67864dc3d1b..9ce8a7b2498 100644 --- a/src/DIRAC/AccountingSystem/Service/DataStoreHandler.py +++ b/src/DIRAC/AccountingSystem/Service/DataStoreHandler.py @@ -1,6 +1,6 @@ """ DataStore is the service for inserting accounting reports (rows) in the Accounting DB - This service CAN be duplicated iff the first is a "master" and all the others are slaves. + This service CAN be duplicated iff the first is a "controller" and all the others are workers. See the information about :ref:`datastorehelpers`. .. literalinclude:: ../ConfigTemplate.cfg @@ -171,7 +171,7 @@ def export_compactDB(self): """ Compact the db by grouping buckets """ - # if we are running slaves (not only one service) we can redirect the request to the master + # if we are running workers (not only one service) we can redirect the request to the master # For more information please read the Administrative guide Accounting part! # ADVICE: If you want to trigger the bucketing, please make sure the bucketing is not running!!!! if self.runBucketing: diff --git a/src/DIRAC/ConfigurationSystem/Client/ConfigurationClient.py b/src/DIRAC/ConfigurationSystem/Client/ConfigurationClient.py index f0b60e8f81c..5a7ec4f1ca7 100644 --- a/src/DIRAC/ConfigurationSystem/Client/ConfigurationClient.py +++ b/src/DIRAC/ConfigurationSystem/Client/ConfigurationClient.py @@ -63,7 +63,7 @@ class ConfigurationClient(Client): def __init__(self, **kwargs): # By default we use Configuration/Server as url, client do the resolution - # In some case url has to be static (when a slave register to the master server for example) + # In some case url has to be static (when a worker register to the master server for example) # It's why we can use 'url' as keyword arguments if "url" not in kwargs: kwargs["url"] = "Configuration/Server" diff --git a/src/DIRAC/ConfigurationSystem/Service/TornadoConfigurationHandler.py b/src/DIRAC/ConfigurationSystem/Service/TornadoConfigurationHandler.py index add704586ad..970f2526bda 100644 --- a/src/DIRAC/ConfigurationSystem/Service/TornadoConfigurationHandler.py +++ b/src/DIRAC/ConfigurationSystem/Service/TornadoConfigurationHandler.py @@ -59,9 +59,9 @@ def export_getCompressedDataIfNewer(self, sClientVersion): def export_publishSlaveServer(self, sURL): """ - Used by slave server to register as a slave server. + Used by worker server to register as a worker server. - :param sURL: The url of the slave server. + :param sURL: The url of the worker server. """ self.ServiceInterface.publishSlaveServer(sURL) return S_OK() diff --git a/src/DIRAC/ConfigurationSystem/private/Refresher.py b/src/DIRAC/ConfigurationSystem/private/Refresher.py index 9da1603b50f..395a5b8f7cc 100755 --- a/src/DIRAC/ConfigurationSystem/private/Refresher.py +++ b/src/DIRAC/ConfigurationSystem/private/Refresher.py @@ -70,7 +70,7 @@ def autoRefreshAndPublish(self, sURL): """ gLogger.debug("Setting configuration refresh as automatic") if not gConfigurationData.getAutoPublish(): - gLogger.debug("Slave server won't auto publish itself") + gLogger.debug("Worker server won't auto publish itself") if not gConfigurationData.getName(): import DIRAC diff --git a/src/DIRAC/ConfigurationSystem/private/RefresherBase.py b/src/DIRAC/ConfigurationSystem/private/RefresherBase.py index 5bb2dc0b511..c6c907ecb67 100644 --- a/src/DIRAC/ConfigurationSystem/private/RefresherBase.py +++ b/src/DIRAC/ConfigurationSystem/private/RefresherBase.py @@ -112,7 +112,7 @@ def _refreshAndPublish(self): gLogger.error("Can't publish to master server", dRetVal["Message"]) return True else: - gLogger.warn("No master server is specified in the configuration, trying to get data from other slaves") + gLogger.warn("No master server is specified in the configuration, trying to get data from other Workers") return self._refresh()["OK"] def _refresh(self, fromMaster=False): diff --git a/src/DIRAC/ConfigurationSystem/private/ServiceInterface.py b/src/DIRAC/ConfigurationSystem/private/ServiceInterface.py index c9a2d09d8ad..5f10ac94bb3 100755 --- a/src/DIRAC/ConfigurationSystem/private/ServiceInterface.py +++ b/src/DIRAC/ConfigurationSystem/private/ServiceInterface.py @@ -11,7 +11,7 @@ class ServiceInterface(ServiceInterfaceBase, threading.Thread): """ - Service interface, manage Slave/Master server for CS + Service interface, manage Worker/Controller server for CS Thread components """ @@ -19,11 +19,11 @@ def __init__(self, sURL): threading.Thread.__init__(self) ServiceInterfaceBase.__init__(self, sURL) - def _launchCheckSlaves(self): + def _launchCheckWorkers(self): """ - Start loop which check if slaves are alive + Start loop which check if Workers are alive """ - gLogger.info("Starting purge slaves thread") + gLogger.info("Starting purge Workers thread") self.daemon = True self.start() @@ -31,11 +31,11 @@ def run(self): while True: iWaitTime = gConfigurationData.getSlavesGraceTime() time.sleep(iWaitTime) - self._checkSlavesStatus() + self._checkWorkersStatus() def _updateServiceConfiguration(self, urlSet, fromMaster=False): """ - Update configuration of a set of slave services in parallel + Update configuration of a set of Worker services in parallel :param set urlSet: a set of service URLs :param fromMaster: flag to force updating from the master CS @@ -49,6 +49,6 @@ def _updateServiceConfiguration(self, urlSet, fromMaster=False): url = futureUpdate[future] result = future.result() if result["OK"]: - gLogger.info("Successfully updated slave configuration", url) + gLogger.info("Successfully updated Worker configuration", url) else: - gLogger.error("Failed to update slave configuration", url) + gLogger.error("Failed to update Worker configuration", url) diff --git a/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py b/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py index 0c4bbab811d..15d60586f08 100644 --- a/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py +++ b/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py @@ -1,4 +1,4 @@ -"""Service interface is the service which provide config for client and synchronize Master/Slave servers""" +"""Service interface is the service which provide config for client and synchronize Master/Worker servers""" import os import time @@ -17,7 +17,7 @@ class ServiceInterfaceBase: - """Service interface is the service which provide config for client and synchronize Master/Slave servers""" + """Service interface is the service which provide config for client and synchronize Master/Worker servers""" def __init__(self, sURL): self.sURL = sURL @@ -25,19 +25,19 @@ def __init__(self, sURL): self.__modificationsIgnoreMask = ["/DIRAC/Configuration/Servers", "/DIRAC/Configuration/Version"] gConfigurationData.setAsService() if not gConfigurationData.isMaster(): - gLogger.info("Starting configuration service as slave") + gLogger.info("Starting configuration service as Worker") gRefresher.autoRefreshAndPublish(self.sURL) else: gLogger.info("Starting configuration service as master") gRefresher.disable() self.__loadConfigurationData() - self.dAliveSlaveServers = {} - self._launchCheckSlaves() + self.dAliveWorkerServers = {} + self._launchCheckWorkers() def isMaster(self): return gConfigurationData.isMaster() - def _launchCheckSlaves(self): + def _launchCheckWorkers(self): raise NotImplementedError("Should be implemented by the children class") def __loadConfigurationData(self): @@ -75,50 +75,50 @@ def __generateNewVersion(self): gConfigurationData.generateNewVersion() gConfigurationData.writeRemoteConfigurationToDisk() - def publishSlaveServer(self, sSlaveURL): + def publishSlaveServer(self, sWorkerURL): """ - Called by the slave server via service, it register a new slave server + Called by the Worker server via service, it register a new Worker server - :param sSlaveURL: url of slave server + :param sWorkerURL: url of Worker server """ if not gConfigurationData.isMaster(): return S_ERROR("Configuration modification is not allowed in this server") - gLogger.info(f"Pinging slave {sSlaveURL}") - rpcClient = ConfigurationClient(url=sSlaveURL, timeout=10, useCertificates=True) + gLogger.info(f"Pinging Worker {sWorkerURL}") + rpcClient = ConfigurationClient(url=sWorkerURL, timeout=10, useCertificates=True) retVal = rpcClient.ping() if not retVal["OK"]: - gLogger.info(f"Slave {sSlaveURL} didn't reply") + gLogger.info(f"Worker {sWorkerURL} didn't reply") return if retVal["Value"]["name"] != "Configuration/Server": - gLogger.info(f"Slave {sSlaveURL} is not a CS serveR") + gLogger.info(f"Worker {sWorkerURL} is not a CS serveR") return - bNewSlave = False - if sSlaveURL not in self.dAliveSlaveServers: - bNewSlave = True - gLogger.info("New slave registered", sSlaveURL) - self.dAliveSlaveServers[sSlaveURL] = time.time() - if bNewSlave: - gConfigurationData.setServers(", ".join(self.dAliveSlaveServers)) + bNewWorker = False + if sWorkerURL not in self.dAliveWorkerServers: + bNewWorker = True + gLogger.info("New Worker registered", sWorkerURL) + self.dAliveWorkerServers[sWorkerURL] = time.time() + if bNewWorker: + gConfigurationData.setServers(", ".join(self.dAliveWorkerServers)) self.__generateNewVersion() - def _checkSlavesStatus(self, forceWriteConfiguration=False): + def _checkWorkersStatus(self, forceWriteConfiguration=False): """ - Check if Slaves server are still availlable + Check if Workers server are still availlable - :param forceWriteConfiguration: (default False) Force rewriting configuration after checking slaves + :param forceWriteConfiguration: (default False) Force rewriting configuration after checking workers """ - gLogger.info("Checking status of slave servers") + gLogger.info("Checking status of Worker servers") iGraceTime = gConfigurationData.getSlavesGraceTime() - bModifiedSlaveServers = False - for sSlaveURL in list(self.dAliveSlaveServers): - if time.time() - self.dAliveSlaveServers[sSlaveURL] > iGraceTime: - gLogger.warn("Found dead slave", sSlaveURL) - del self.dAliveSlaveServers[sSlaveURL] - bModifiedSlaveServers = True - if bModifiedSlaveServers or forceWriteConfiguration: - gConfigurationData.setServers(", ".join(self.dAliveSlaveServers)) + bModifiedWorkerServers = False + for sWorkerURL in list(self.dAliveWorkerServers): + if time.time() - self.dAliveWorkerServers[sWorkerURL] > iGraceTime: + gLogger.warn("Found dead Worker", sWorkerURL) + del self.dAliveWorkerServers[sWorkerURL] + bModifiedWorkerServers = True + if bModifiedWorkerServers or forceWriteConfiguration: + gConfigurationData.setServers(", ".join(self.dAliveWorkerServers)) self.__generateNewVersion() @staticmethod @@ -147,18 +147,18 @@ def _updateServiceConfiguration(self, urlSet, fromMaster=False): """ raise NotImplementedError("Should be implemented by the children class") - def forceSlavesUpdate(self): + def forceWorkersUpdate(self): """ - Force updating configuration on all the slave configuration servers + Force updating configuration on all the Worker configuration servers :return: Nothing """ - gLogger.info("Updating configuration on slave servers") + gLogger.info("Updating configuration on Worker servers") iGraceTime = gConfigurationData.getSlavesGraceTime() urlSet = set() - for slaveURL in self.dAliveSlaveServers: - if time.time() - self.dAliveSlaveServers[slaveURL] <= iGraceTime: - urlSet.add(slaveURL) + for workerURL in self.dAliveWorkerServers: + if time.time() - self.dAliveWorkerServers[workerURL] <= iGraceTime: + urlSet.add(workerURL) self._updateServiceConfiguration(urlSet, fromMaster=True) def forceGlobalUpdate(self): @@ -233,14 +233,14 @@ def updateConfiguration(self, sBuffer, committer="", updateVersionOption=False): gConfigurationData.unlock() gLogger.info("Generating new version") gConfigurationData.generateNewVersion() - # self.__checkSlavesStatus( forceWriteConfiguration = True ) + # self.__checkWorkersStatus( forceWriteConfiguration = True ) gLogger.info("Writing new version to disk") retVal = gConfigurationData.writeRemoteConfigurationToDisk(f"{committer}@{gConfigurationData.getVersion()}") gLogger.info("New version", gConfigurationData.getVersion()) - # Attempt to update the configuration on currently registered slave services + # Attempt to update the configuration on currently registered Worker services if gConfigurationData.getAutoSlaveSync(): - self.forceSlavesUpdate() + self.forceWorkersUpdate() return retVal diff --git a/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceTornado.py b/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceTornado.py index 8152be6449e..a0a04c0bcae 100644 --- a/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceTornado.py +++ b/src/DIRAC/ConfigurationSystem/private/ServiceInterfaceTornado.py @@ -17,20 +17,20 @@ class ServiceInterfaceTornado(ServiceInterfaceBase): def __init__(self, sURL): ServiceInterfaceBase.__init__(self, sURL) - def _launchCheckSlaves(self): + def _launchCheckWorkers(self): """ - Start loop to check if slaves are alive + Start loop to check if workers are alive """ IOLoop.current().spawn_callback(self.run) - gLogger.info("Starting purge slaves thread") + gLogger.info("Starting purge workers thread") def run(self): """ - Check if slaves are alive + Check if workers are alive """ while True: yield gen.sleep(gConfigurationData.getSlavesGraceTime()) - self._checkSlavesStatus() + self._checkWorkersStatus() def _updateServiceConfiguration(self, urlSet, fromMaster=False): """ diff --git a/src/DIRAC/ConfigurationSystem/private/TornadoRefresher.py b/src/DIRAC/ConfigurationSystem/private/TornadoRefresher.py index 2ce15174696..1b49ac6b39a 100644 --- a/src/DIRAC/ConfigurationSystem/private/TornadoRefresher.py +++ b/src/DIRAC/ConfigurationSystem/private/TornadoRefresher.py @@ -42,7 +42,7 @@ def autoRefreshAndPublish(self, sURL): """ gLogger.debug("Setting configuration refresh as automatic") if not gConfigurationData.getAutoPublish(): - gLogger.debug("Slave server won't auto publish itself") + gLogger.debug("Worker server won't auto publish itself") if not gConfigurationData.getName(): import DIRAC diff --git a/src/DIRAC/Core/Utilities/ProcessPool.py b/src/DIRAC/Core/Utilities/ProcessPool.py index 9e4de45a911..ad86fa65788 100644 --- a/src/DIRAC/Core/Utilities/ProcessPool.py +++ b/src/DIRAC/Core/Utilities/ProcessPool.py @@ -547,14 +547,14 @@ class ProcessPool: Pool depth - The :ProcessPool: is keeping required number of active workers all the time: slave workers are only created + The :ProcessPool: is keeping required number of active workers all the time: worker workers are only created when pendingQueue is being filled with tasks, not exceeding defined min and max limits. When pendingQueue is empty, active workers will be cleaned up by themselves, as each worker has got built in self-destroy mechanism after 10 idle loops. Processing and communication - The communication between :ProcessPool: instance and slaves is performed using two :multiprocessing.Queues: + The communication between :ProcessPool: instance and workers is performed using two :multiprocessing.Queues: * pendingQueue, used to push tasks to the workers, * resultsQueue for revert direction;