Skip to content

This repository "Welcome" is an introduction to the open science environment for benchmarking engineering design methods and tools.

License

Notifications You must be signed in to change notification settings

GIS-S-mart/Welcome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarks in Design Research

Design research has come very far after several decades of development. Nevertheless, there is still room for improvement. Such improvements must be well documented and fairly evaluated. Time is over for making claims like “results seem to be better than existing ones from the literature”. We are responsible for making quantitative judgments and fair comparisons to the state of the art.

We may think that the validation of design research is much more complicated than the validation of computer science research. Still, arguments such as " we would need several years to get evidence", "our design situation is really special", "we cannot replicate the simulation results as we do not have the computing resources" are no different from other disciplines. For instance, climate research, a discipline that requires more computing resources to solve research questions on highly complex situations and over a very long period, created the Coupled Model Intercomparison Project 6 (CMIP6), which has been internationally adopted as a shared infrastructure that serves as the provision of benchmarks against which to compare improvements in models and prediction quality. The same when researchers argue that "the influence of designers' knowledge is too important to get comparable results"; disciplines like Human-Computer Interaction are facing the same challenges but try to adopt the scientific method.

This version-controlled and community-based open science platform is a collaborative space gathering benchmarks of engineering design theories, processes, methods and tools. Each benchmark is a sustainable ecosystem where a community of researchers can engage in an asynchronous collaboration for 1/the co-definition of fundamental and practical research problems, goals and solutions, 2/ the fair and systematic evaluation of claimed contributions on open benchmark exercises following standardised measurement protocols, and 3/ the comparison of competing solutions based on agreed-upon qualitative or quantitative measures of performance aligned with research goals.

What is a scientific benchmark?

The "What" and "Why" of Scientific Benchmarks are mostly extracts from the paper ''Using benchmarking to advance research: a challenge to software engineering'' (Sim, 2003).

Researchers commonly communicate their research results in papers. Often, a wide variety of research papers address the same research topic. It is becoming increasingly hard to compare research results by reading papers. Why? Because it is hard to compare results if the metrics, protocols, datasets and ground truth differ. Benchmarking in the form of organising challenges (the terminology for this differs per field, from competitions, benchmarking, shared or common tasks, etc.) is one way to address this. In this case, a benchmark is a standardised validation framework that allows for the direct comparison of different solutions that address the same research problem. Participants are invited to submit their solutions to a benchmark, after which their submissions are assessed using a predefined set of evaluation criteria.

A scientific benchmark is more than a dataset or set of datasets composed of tests and metrics used to compare the performance of alternative tools or techniques. A benchmark operationalises a paradigm; it takes an abstract concept and makes it concrete so it can serve as a guide for action. Indeed, within research communities, benchmarks are a statement of the discipline's research goals, and they emerge through a synergetic process of technical knowledge and social consensus proceeding in tandem. This community-based open science platform is concerned primarily with benchmarks created and used by a technical research community, especially the design research community.

Benchmarking has a strong positive effect on a discipline's scientific maturity. It helps whenever a research area needs to become more scientific, codify technical knowledge, or become more cohesive. Appropriately deployed benchmarking is not about winning a contest but more about surveying a landscape — the more we can reframe, contextualise, and appropriately scope the datasets, the more useful they will become as an informative dimension.

Scientific benchmarks emerge through a process of scientific discovery and consensus. Both must progress together for a standard benchmark to emerge because neither alone is sufficient. The community of interest may include academia, industry, and government participants, but they are all primarily interested in scientific research. The benchmark should be specified at a high enough level of abstraction to ensure that it is portable to different tools or techniques and does not bias one technology in favour of others. Continued evolution of the benchmark is necessary to prevent researchers from making changes to optimise the performance of their contributions on a particular set of tests.

Motivations for Engaging in Design Research Benchmarking

During the last decades, our community of researchers in engineering design has reached a general consensus:

"As foundation of good scientific practise, manuscripts submitted for publication must advance the state of the art and provide novel theoretical, numerical or mechanical insight and knowledge. Sometimes, good ideas turn out to be not as good as initially expected, i.e., the initial research hypotheses or promises turn out not to hold. This is no shame and in fact learning from other researches’ mistakes or failed attempts may be just as fruitful as learning from successes and may save a lot of time. Unfortunately, however, many manuscripts set out with high goals and claims but fail to critically evaluate the outcome at the end." [Sigmund, 2022]
"Real progress on evaluating design methods can only be expected if preconditions such as standardized theoretical constructs, measures, data bases of empirical data, and a sufficient number of studies on specific design methods are developed." [Hein and Lamé, 2020]
"Without action to increase scientific, theoretical, and methodological rigour there is a real possibility of the field being superseded and becoming obsolete through lack of impact." [Cash, 2018]
"There is this concern that design research does not live up to the standards of science: it is creating in a sense too many theories and models, which jeopardises the coherence of the discipline and which indicates that design research does not yet have the means to test and refute design theories and models." [Vermaas, 2014]
"There is in design research a general concern about the quality of the testing of design theories and models. In work reflecting on the results that design research has produced, it is complained that generally accepted and effective research methods for testing design theories and models are lacking in design research, and that the discipline is fragmented in separate research strands.” [Vermaas, 2014]
“37% of the articles reviewed did not have any validation. There needs to be more validation in the field of research in engineering design.” [Barth, A. et al. 2011]
“A lack of common terminology, benchmarked research methods, and above all, a common research methodology are the most outstanding problems in the field.” [Blessing and Chakrabarti, 2009]

Two conditions must already exist within a discipline before the construction of a benchmark can be fruitfully attempted. Design research is at a stage where scientific benchmarks can be fruitfully attempted. Design research is being established, and diverse approaches and solutions proliferate. This proliferation is desirable, so there are various tools and techniques to be compared by the benchmarks. Evidence that our community has reached the required level of maturity and is ready to move to a more rigorous scientific basis comes in many forms. Typical symptoms include an increasing concern with the validation of research results and with a comparison between solutions developed at different laboratories, attempted replication of results, use of proto-benchmarks (or at least attempts to apply solutions to a common set of sample problems), and finally, increasing resistance to accepting speculative papers for publication. This precondition is important because there is a significant cost to developing and maintaining scientific benchmarks and a danger in committing to a benchmark too early.

Scientific benchmarks advance a discipline by improving the science and increasing the cohesiveness of the community. The research design community is sufficiently well-established and has a culture of collaboration. Evidence of the former includes an existing collection of diverse research results and an increasing concern in validating these results. Evidence of the latter includes multi-site research projects, multi-author publications, standards for reporting, file formats, and the like. From this base, a consensus-based process led by researchers can be used to construct benchmarks endorsed by the design research community. Using benchmarks results in a more rigorous examination of research contributions and an overall improvement in the tools and techniques being developed. The presence of benchmarks will state that the design research community believes that contributions ought to be evaluated against clearly defined standards. We want the design research community to become more scientific and cohesive by working as a community to define benchmarks to advance the state of research.

The benchmark itself promotes collaborative, open, and public research. Creating a benchmark requires our community to examine our understanding of the field, agree on the key problems, and encapsulate this knowledge in an evaluation. Throughout the benchmarking process, greater communication and collaboration among different researchers lead to a stronger consensus on the community's research goals and methods.

Although some research (e.g., computer science research) is more obviously amenable to benchmarking because the performance measures are straightforward, no dataset will be able to capture the full complexity of the details of existence,

Benchmarking is far superior to merely asserting that a design theory, process, method or technology is valuable. No research method or empirical evaluation is perfect. Benchmarks are one of the few ways that the dirty details of research, such as debugging techniques, design decisions, and mistakes, are forced out into the open and shared between laboratories. Like experiments, control of the task sample reduces variability in the results—all tools and techniques are evaluated using the same tasks and experimental materials. Another advantage of benchmarking is that replication is built into the method. Since the materials are designed to be used in different laboratories, people can perform the evaluation on various tools and techniques repeatedly, if desired.

The second precondition is that there must be an ethos of collaboration within the community. In other words, there must be a willingness to work together to solve common problems.

Why should we collaborate and open our research data?

Design research is interdisciplinary, and using multiple research methods is difficult. Literature reviews drew up extensive lists of research methods (Barth et al., 2011; Escudero-Mancebo et al., 2023) and design research objectives (Eckert et al., 2004; Cantamessa, 2003). When mixing research methods from multiple research areas, many challenges can arise due to the individual research cultures of each discipline involved.

Collaboration in benchmarking occurs in two ways. During development, researchers work together to build consensus on what should be in the benchmark. During deployment, the results from different design philosophies, processes, methods, [...] and tools are compared, which requires researchers to look at each other’s contributions. Consequently, researchers become more aware of one another's work and ties between researchers with similar interests are strengthened. Evaluations carried out using benchmarks are, by their nature, open and public. The materials are available for general use, and often so are the results being tested. It is difficult to hide the flaws of a tool or technique or to aggrandise its strengths when there is transparency in the test procedures. Moreover, anyone could use the benchmark with the same tools or techniques and attempt to replicate the results. Together with collaboration, openness, and publicness, these factors result in frank, detailed, and technical communication among researchers. This kind of public evaluation contrasts sharply with the descriptions of tools and techniques currently found in design research conferences or journal publications.

Publishing or making our data available to others is not considered standard practice. As measured by the French Open Science Barometer, researchers in design science keep their results more confidential than other disciplines, with only 10 % of French publications in engineering which mention the sharing of their data and 15% that include a "Data Availability Statement" between 2013 and 2021, whereas 86 % of French publications in engineering mention the use of data. This lack of openness is all the more regrettable, given that the opening up of data forces researchers to guarantee data quality. We may assume that it is mainly because we are primarily focused on getting grant money, and the influence of outside sponsors, such as industrialists, limits the openness of research data. Still, it is necessary to open our research data to a scientific community that examines the same research question from multiple angles over time because more than one data collection effort is needed to lead to a definitive answer. Research methods and results should be well documented, with enough detail so that other teams can attempt to reproduce or replicate the findings and expand upon them. If they come up with the same general results over time, all of these efforts give evidence for the scientific truth of the findings. Benchmarks are an opportunity to share open data that serves as ground truth.

Goals

What does this project provide?

This open science project has been born and developed to make a move for the engineering design research community's progress.

News

What's new?

  • 20/05/2024 - 18th International Design Conference "Operationalizing Community-Based Open Scientific Design Research Benchmarks: Application to Model-Based Architecture Design Synthesis". Slides Paper
  • 04/04/2023 - Animation of the workshop "Co-design of a community-based ecosystem to improve validation practices in engineering research”, S.mart Special Interest Group in Industry 4.0. Slides
  • 08/12/2022 - Benchmarking in design research workshop at the Academia-Industry forum of the INCOSE French chapter (AFIS). Slides
  • 21/06/2022 - Meeting of French academics whose research concentrates on systems engineering. Part of the researchers decided to start two working groups so as to develop two benchmarks: 1) Model-based system architecture synthesis; 2) Early validation and verification of systems. Lettre
  • 28/03/2022 - 32nd CIRP Design Conference "An Open Science Platform for Benchmarking Engineering Design Researches." Slides Paper
  • 31/03/2021 - Atelier S.mart "Validation de nos recherches en Génie Industriel : Co-Construction d'une Feuille de Route." Dashboard Notes
  • 28/01/2021 - Webinar S.mart "Méthodologies de recherche sur l'industrie du futur: Pourquoi et Comment ? Replay Slides Notes

Contribution Process

Willing to be active with us? Follow the contributing guide!

Code of Conduct

This code of conduct outlines expectations for participation in this Open Source Benchmarking Environment for Engineering Research. By joining our community, you pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community by:

  • Being radically inclusive to existing members and newcomers looking to learn or participate.
  • Being totally respectful of each others abilities, interests, viewpoints, experiences and personal differences.
  • Gracefully accepting constructive criticism and being exceedingly kind even in moments of disagreement while working towards consensus.
  • Educating and illuminating others with something you know more about.
  • Contacting the original contributors before any external communication.
  • Preventing any public or private business opportunities of the open source content without agreeing with the original authors and contributors of the sources.

People violating this code of conduct may be banned from the community.

Benchmarks

The open science benchmarking environment contains a set of benchmarks that aim at making technical progress objective and reproducible.

Benchmark methods and tools for the early validation and verification of engineered systems.

Keywords: MBSE, Validation, Verification

Discussions - Open Issues

Benchmark methods and tools for 3D modelling in virtual reality.

Keywords: Virtual Reality, Geometric Modelling, CAD

Discussions - Open Issues

Identify the most suitable Life Cycle Assessment method for a teaching population.

Keywords: Life cycle analysis, Sustainability, Competencies Evaluation

Discussions - Open Issues

Benchmark to compare different approaches for measuring the value perceived by the stakeholders: ecosystems, territorial approaches, value analysis approaches, etc.

Keywords: Value, Sustainability, Stakeholders

Discussions - Open Issues

Open benchmark exercises for comparing digital materials supporting model-based design reviews.

Keywords: Model-Based Design, Design review

Discussions - Open Issues

Open benchmark exercises for comparing concept finding in a model-based design synthesis process for system sizing.

Keywords: Model-Based Design Synthesis, Concept Finding, System Sizing

Discussions - Open Issues

Open benchmark exercises to compare and study the performance of approaches for automatically inferring a transformation model.

Keywords: Model-Based Systems Engineering, Interoperability, Data Transformation, Inference

Discussions - Open Issues

Project Lead

Project Team

These were the original creators of this project. Want to contact the Core Team? Send an e-mail to all of them!

Other Participants

Volunteers run this open-science repository. Below is a list of volunteers who have expressed an interest in this project.

If you want to be an active member of our community, open a new issue.

Related Projects

What are the existing sources that have inspired this project?

Related Papers

Want to learn more about the validation in engineering design research? Here is a list of sources to start with!

Disclaimer

Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by our community or S.mart.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Sponsors and Partners

About

This repository "Welcome" is an introduction to the open science environment for benchmarking engineering design methods and tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •