Skip to content

A tool to quantify and report the carbon footprint of machine learning computations and communication

License

Notifications You must be signed in to change notification settings

vincenzopecorella/cumulator

 
 

Repository files navigation

CUMULATOR

A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare

Aim

Raise awareness about the carbon footprint of machine learning methods and to encourage further optimization and the rationale use of AI-powered tools. This work advocates for sustainable AI and the rational use of IT systems.

Key Carbon Indicators

  • One hour of GPU load is equivalent to 112 gCO2eq
  • 1 GB of data traffic through a data center is equivalent to 31 gCO2eq

Install and use

Free software: MIT license

pip install cumulator <- installs CUMULATOR

from cumulator import base <- imports the script

cumulator = base.Cumulator() <- creates an Cumulator instance

Measure cost of computations.

  • Activate or deactivate chronometer by using cumulator.on(), cumulator.off() whenever you perform ML computations (typically within each interation). It will automatically record each time duration in cumulator.time_list and sum it in cumulator.cumulated_time(). Then return carbon footprint due to all computations using cumulator.computation_costs().

Measure cost of communications.

  • Each time your models sends a data file to another node of the network, record the size of the file which is communicated (in kilo bytes) using cumulator.data_transferred(file_size). The amount of data transferred is automatically recorded in cumulator.file_size_list and accumulated in cumulator.cumulated_data_traffic. Then return carbon footprint due to all communications using cumulator.communication_costs().

Display your total carbon footprint

  • Display the carbon footprint of your recorded actions with cumulator.display_carbon_footprint():
>>>cumulator.display_carbon_footprint()
########
Overall carbon footprint: 3.14e+02 gCO2eq
########
Carbon footprint due to computations: 2.78e+02 gCO2eq
Carbon footprint due to communications: 3.60e+01 gCO2eq
  • You can also return the total carbon footprint as a number using cumulator.total_carbon_footprint().

Default assumptions (can be manually modified for better estimation):

self.hardware_load = 250 / 3.6e6 <- computation costs: power consumption of a typical GPU in Watts converted to kWh/s

self.one_byte_model = 6.894E-8 <- communication costs: average energy impact of traffic in a typical data centers, kWh/kB

self.carbon_intensity = 447 <- conversion to carbon footprint: average carbon intensity value in gCO2eq/kWh in the EU in 2014

self.n_gpu = 1 <- number of GPU used in parallel

Project Structure

src/
├── cumulator
    ├── base.py           <- implementation of the Cumulator class
    └── bonus.py          <- Impact Statement Protocol

Cite

@article{cumulator,
  title={A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare},
  author={Tristan Trebaol, Mary-Anne Hartley, Martin Jaggi and Hossein Shokri Ghadikolaei},
  journal={Infoscience EPFL: record 278189},
  year={2020}
}

ChangeLog

  • 18.06.2020: 0.0.6 update README.rst
  • 11.06.2020: 0.0.5 add number of processors (0.0.4 failed)
  • 08.06.2020: 0.0.3 added bonus.py carbon impact statement
  • 07.06.2020: 0.0.2 added communication costs and cleaned src/
  • 21.05.2020: 0.0.1 deployment on PypI and integration with Alg-E

Links

About

A tool to quantify and report the carbon footprint of machine learning computations and communication

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 55.9%
  • Python 44.1%