FAIR4ML Metadata Schema
+Release date: 2024.11.04
+Version: 0.1.0
+Status: Draft (under review)
+This version URI: https://w3id.org/fair4ml/0.0.1
+Latest version URI: https://w3id.org/fair4ml#
+Authors (in alphabetical order). The list of authors is not final! Please contribute to the discussion in our GitHub repository or discussion spreadsheet
+-
+
+
+
- Leyla-Jael Castro, ZB MED + + + +
- Daniel Garijo, Universidad Politécnica de Madrid + + + +
- Dietrich Rebholz-Schuhmann, ZB MED + + + +
- Dhwani Solanki, ZB MED + + + +
- Jenifer Tabita Ciuciu-Kiss, Universidad Politécnica de Madrid + + + +
- Research Data Alliance FAIR4ML Task Force + + +
License:
+Download:
++ +
Introduction
++ An increasing amount of machine learning models are produced and shared in the Web by research scientists, ML enthusiast and ML developers. In this document we introduce a Schema.org extension for creating machine-readable representations of trained Machine Learning models. The proposed vocabulary also reuses properties from codemeta, in order to point to the code repository associated with a model. The figure below shows a high-level overview of the main metadata fields used to describe an ML model. +
++ +
+ +Namespaces used in this document
+-
+
+
- rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# + +
- rdfs: http://www.w3.org/2000/01/rdf-schema# + +
- owl: http://www.w3.org/2002/07/owl# + +
- schema: http://schema.org/ + +
- codemeta: https://w3id.org/codemeta/ + +
- fair4ml: https://w3id.org/fair4ml/ + +
- cr: http://mlcommons.org/croissant/1.0 + +
Extending Schema.org hierarchy
+This Profile extends the Schema.org hierarchy as follows:
++ + schema:Thing + > + + schema:CreativeWork + > + + fair4ml:MLModel + + +
++
fair4ml: MLModel new properties
+Property | +Expected Type | +Description | +
---|---|---|
fair4ml:legal | +
+
+ schema:Text + + |
+ Considerations with respect to legal aspects. | +
fair4ml:ethicalSocial | +
+
+ schema:Text + + |
+ Considerations with respect to ethical and social aspects. | +
fair4ml:evaluatedOn | +
+
+ schema:Dataset + + cr:Dataset + + |
+ Dataset used for evaluating the model. The dataset used for evaluation may not have been part of the train/test/validation (e.g., a benchmark, extrinsic validation). | +
fair4ml:fineTunedFrom | +
+
+ fair4ml:MLModel + + |
+ Relationship to point to the source model used for fine tuning (if this model was fine-tuned from another one). | +
fair4ml:hasCO2eEmissions | +
+
+ schema:Text + + |
+ Amount of CO2 equivalent emissions produced by training the model. The unit should be included in the field (e.g., 10 tonnes). | +
fair4ml:intendedUse | +
+
+ schema:Text + + schema:DefinedTerm + + schema:URL + + |
+ Purpose and intended use stated to enable users to make a decision as to the suitability of this creative work (e.g., lab protocol, machine learning model, software) to their experimental problem or own use case. | +
fair4ml:mlTask | +
+
+ schema:Text + + schema:DefinedTerm + + |
+ ML task addressed by this ML software or model (e.g., binary classification). | +
fair4ml:modelCategory | +
+
+ schema:Text + + schema:DefinedTerm + + |
+ Category of this ML model (e.g., Supervised, Unsupervised, Semi-supervised, Reinforcement), learning architecture (e.g., CNN, transformer), underlying algorithm (e.g., logistic regression, random forest, LLM). | +
fair4ml:modelRisksBiasLimitations | +
+
+ schema:Text + + |
+ Description of the risks and biases of the model, in a human-readable manner. | +
fair4ml:sharedBy | +
+
+ schema:Person + + schema:Organization + + |
+ Person or Organization who shared the model online (e.g., uploading it to HuggingFace). | +
fair4ml:testedOn | +
+
+ schema:Dataset + + cr:Dataset + + |
+ Link to the dataset used to test the model (following train/test/validation splits). | +
fair4ml:trainedOn | +
+
+ schema:Dataset + + cr:Dataset + + |
+ AI-ready dataset (after pre-processing) used for the training and optimization of this ML model. | +
fair4ml:usageInstructions | +
+
+ schema:Text + + |
+ Description of the instructions needed to run the model (e.g., to do inference on a task). Code snippets may be used for illustration. | +
fair4ml:codeSampleSnippet | +
+
+ schema:Text + + |
+ Code snippet with a usage example of the model. | +
fair4ml:validatedOn | +
+
+ schema:Dataset + + cr:Dataset + + |
+ Link to the dataset used to validate the model. Typically the training dataset is a separated set from the train/testing set. | +
fair4ml: MLModelEvaluation new properties
+Property | +Expected Type | +Description | +
---|---|---|
fair4ml:hasEvaluation | +
+
+ fair4ml:MLModelEvaluation + + |
+ Relationship to point to the source model used for fine tuning (if this model was fine-tuned from another one). | +
fair4ml:evaluatedMLModel | +
+
+ fair4ml:MLModel + + |
+ MLModel evaluated with this evaluation. Reverse property: fair4ml:evaluatedWith. | +
fair4ml:evaluationDataset | +
+
+ cr:Dataset + + |
+ Dataset used for the evaluation. | +
fair4ml:evaluationMetrics | +
+
+ schema:Text + + schema:PropertyValue + + |
+ Description of the metrics used for evaluating the ML model. Example with Text: ["Precision: 0.8", "Mean: 0.9"]. Example with PropertyValue: [{minValue: 0.0, maxValue: 1.0, value: 0.8, measurementTechnique: "Precision"} ...] | +
fair4ml:evaluationResults | +
+
+ schema:Text + + |
+ Summary of the results from the evaluation. | +
fair4ml:evaluationSoftware | +
+
+ schema:SoftwareSourceCode + + schema:SoftwareApplication + + |
+ Code used to performed the evaluation. | +
fair4ml:extrinsicEvaluation | +
+
+ schema:Boolean + + |
+ Indicates whether this evaluation is extrinsic, i.e., done with an existing model, outside the model training scope, and with a totally unseen dataset. It could be done by third-parties or by the authors of the model. | +
Schema.org inherited Properties
+Property | +Expected Type | +Description | +
---|---|---|
schema:archivedAt | +
+
+ schema:URL + + schema:WebPage + + |
+ Indicates a page or other link involved in archival of a CreativeWork. In the case of MediaReview, the items in a MediaReviewItem may often become inaccessible, but be archived by archival, journalistic, activist, or law enforcement organizations. In such cases, the referenced page may not directly publish the content. | +
schema:author | +
+
+ schema:Person + + schema:Organization + + |
+ The author of this content or rating. Please note that author is special in that HTML 5 provides a special mechanism for indicating authorship via the rel tag. That is equivalent to this and may be used interchangeably. | +
schema:citation | +
+
+ schema:Text + + schema:CreativeWork + + |
+ A citation or reference to another creative work, such as another publication, web page, scholarly article, etc. | +
schema:codeRepository | +
+
+ schema:URL + + |
+ Link to the repository where the un-compiled, human readable code and related code is located (SVN, GitHub, CodePlex). | +
schema:conditionsOfAccess | +
+
+ schema:Text + + |
+ Conditions that affect the availability of, or method(s) of access to, an item. Typically used for real world items such as an ArchiveComponent held by an ArchiveOrganization. This property is not suitable for use as a general Web access control mechanism. It is expressed only in natural language.\n\nFor example "Available by appointment from the Reading Room" or "Accessible only from logged-in accounts ". | +
schema:contributor | +
+
+ schema:Organization + + schema:Person + + |
+ A secondary contributor to the CreativeWork or Event. | +
schema:copyrightHolder | +
+
+ schema:Organization + + schema:Person + + |
+ The party holding the legal copyright to the CreativeWork. | +
schema:dateCreated | +
+
+ schema:Date + + schema:DateTime + + |
+ The date on which the CreativeWork was created or the item was added to a DataFeed. | +
schema:dateModified | +
+
+ schema:Date + + schema:DateTime + + |
+ The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed. | +
schema:datePublished | +
+
+ schema:Date + + schema:DateTime + + |
+ Date of first publication or broadcast. For example the date a CreativeWork was broadcast or a Certification was issued. | +
schema:description | +
+
+ schema:TextObject + + schema:Text + + |
+ A description of the item. | +
schema:discussionUrl | +
+
+ schema:URL + + |
+ A link to the page containing the comments of the CreativeWork. | +
schema:distribution | +
+
+ schema:DataDownload + + |
+ A downloadable form of this dataset, at a specific location, in a specific format. This property can be repeated if different variations are available. There is no expectation that different downloadable distributions must contain exactly equivalent information (see also [DCAT](https://www.w3.org/TR/vocab-dcat-3/#Class:Distribution) on this point). Different distributions might include or exclude different subsets of the entire dataset, for example. | +
schema:funding | +
+
+ schema:Grant + + |
+ A Grant that directly or indirectly provide funding or sponsorship for this item. See also ownershipFundingInfo. | +
schema:identifier | +
+
+ schema:Text + + schema:URL + + schema:PropertyValue + + |
+ The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See [background notes](/docs/datamodel.html#identifierBg) for more details. + | +
schema:inLanguage | +
+
+ schema:Text + + schema:Language + + |
+ The language of the content or performance or used in an action. Please use one of the language codes from the [IETF BCP 47 standard](http://tools.ietf.org/html/bcp47). See also availableLanguage. | +
schema:isAccessibleForFree | +
+
+ schema:Boolean + + |
+ A flag to signal that the item, event, or place is accessible for free. | +
schema:keywords | +
+
+ schema:Text + + schema:URL + + schema:DefinedTerm + + |
+ Keywords or tags used to describe some item. Multiple textual entries in a keywords list are typically delimited by commas, or by repeating the property. | +
schema:license | +
+
+ schema:URL + + schema:CreativeWork + + |
+ A license document that applies to this content, typically indicated by URL. | +
schema:maintainer | +
+
+ schema:Organization + + schema:Person + + |
+ A maintainer of a Dataset, software package (SoftwareApplication), or other Project. A maintainer is a Person or Organization that manages contributions to, and/or publication of, some (typically complex) artifact. It is common for distributions of software and data to be based on "upstream" sources. When maintainer is applied to a specific version of something e.g. a particular version or packaging of a Dataset, it is always possible that the upstream source has a different maintainer. The isBasedOn property can be used to indicate such relationships between datasets to make the different maintenance roles clear. Similarly in the case of software, a package may have dedicated maintainers working on integration into software distributions such as Ubuntu, as well as upstream maintainers of the underlying work. + | +
schema:memoryRequirements | +
+
+ schema:Text + + schema:URL + + |
+ Minimum memory requirements. | +
schema:name | +
+
+ schema:Text + + |
+ The name of the item. | +
schema:operatingSystem | +
+
+ schema:Text + + |
+ Operating systems supported (Windows 7, OS X 10.6, Android 1.6). | +
schema:processorRequirements | +
+
+ schema:Text + + |
+ Processor architecture required to run the application (e.g. IA64). | +
schema:releaseNotes | +
+
+ schema:Text + + schema:URL + + |
+ Description of what changed in this version. | +
schema:softwareHelp | +
+
+ schema:CreativeWork + + |
+ Software application help. | +
schema:softwareRequirements | +
+
+ schema:Text + + schema:URL + + |
+ Component dependency requirements for application. This includes runtime environments and shared libraries that are not included in the application distribution package, but required to run the application (examples: DirectX, Java or .NET runtime). | +
schema:storageRequirements | +
+
+ schema:Text + + schema:URL + + |
+ Storage requirements (free space required). | +
schema:url | +
+
+ schema:URL + + |
+ URL of the item. | +
schema:version | +
+
+ schema:Number + + schema:Text + + |
+ The version of the CreativeWork embodied by a specified resource. | +
Codemeta Properties
+Property | +Expected Type | +Description | +
---|---|---|
buildInstructions | +
+
+ schema:URL + + |
+ Link to installation instructions/documentation. | +
developmentStatus | +
+
+ schema:Text + + |
+ Description of development status, e.g. Active, inactive, suspended. See repostatus.org . | +
issueTracker | +
+
+ schema:URL + + |
+ Link to software bug reporting or issue tracking system. | +
readme | +
+
+ schema:URL + + |
+ Link to software Readme file. | +
referencePublication | +
+
+ schema:ScholarlyArticle + + |
+ An academic publication related to the software. | +
If you spot any errors or omissions, please file an issue in our GitHub.
+ + +