A comparative study of AI and human programming on environmental sustainability
Abstract:
Despite rising concerns over AI’s environmental impact, a recent article claimed human writers emit 130 to 1,500 times more greenhouse gases than AI. While the study utilized life cycle assessment methodology, it overlooked a critical factor: output quality. Unlike AI, which instantly generates content, human writers work for months, producing superior results. To provide a more objective comparison, we analyzed the environmental impacts of human and AI programmers generating functionally equivalent code. Using the USA Computing Olympiad database, we developed infrastructure to evaluate multiple GPT-based models. To our knowledge, this is the first study to correctness-control and quantitatively assess AI’s environmental impact in code generation. To address AI’s inaccuracies, we built a multi-round correction process to iteratively fix responses. We calculated AI emissions from usage and embodied impacts, while human emissions were estimated using average computing power consumption. Our case study results show that smaller models can match the environmental impact of human programmers when they succeed, though they often fail. However, standard, widely-used models are far more environmentally strenuous. For example, GPT-4 emitted between 5 and 19 times more $$text {CO}_2$$eq than humans, underscoring a much greater trade-off between efficiency and environmental cost than previously understood.
Despite rising concerns over AI’s environmental impact, a recent article claimed human writers emit 130 to 1,500 times more greenhouse gases than AI. While the study utilized life cycle assessment methodology, it overlooked a critical factor: output quality. Unlike AI, which instantly generates content, human writers work for months, producing superior results. To provide a more objective comparison, we analyzed the environmental impacts of human and AI programmers generating functionally equivalent code. Using the USA Computing Olympiad database, we developed infrastructure to evaluate multiple GPT-based models. To our knowledge, this is the first study to correctness-control and quantitatively assess AI’s environmental impact in code generation. To address AI’s inaccuracies, we built a multi-round correction process to iteratively fix responses. We calculated AI emissions from usage and embodied impacts, while human emissions were estimated using average computing power consumption. Our case study results show that smaller models can match the environmental impact of human programmers when they succeed, though they often fail. However, standard, widely-used models are far more environmentally strenuous. For example, GPT-4 emitted between 5 and 19 times more $$text {CO}_2$$eq than humans, underscoring a much greater trade-off between efficiency and environmental cost than previously understood.