# Thesauri and Ontologies

The use of controlled vocabularies with standardized, generally accepted and clearly assigned identifiers, definitions and semantics are necessary for smooth data integration and to allow human- and machine readable exchange, query and reproduction of information. Semantic data annotation by controlled vocabularies is precondition to provide interoperability of data repositories. Standardizes ontologies enable integration of data and information to the Semantic Web.

To link data and ensure its reusability it is important that thesauri are published online, using an open license and endowed with an URI (W3C, Best Practices for Publish Linked Data, 2014). General information for development and operate thesauri are described in ISO 25964 - Information and documentation – Thesauri & interoperability with other vocabularies, that is published in two parts:

  • 1:2011 Thesauri for information retrieval This is part one of an international standard for thesauri, that is published in two parts. It provides recommendations for the development and maintenance of thesauri intended for information retrieval applications. It is applicable to vocabularies about all types of information resources including knowledge bases and portals, bibliographic databases, text, etc. It provides a data model and recommended format for the import and export of thesaurus data and can be applied for monolingual and multilingual thesauri. Based on the data model it includes also an XML schema for data exchange.

  • 2:2013 Interoperability with other vocabularies Provides guidelines for high quality information retrieval across networked resources that have been indexed with different vocabularies or Knowledge Organization Systems (KOS). It helps to set up mappings between different concepts (classification schema, taxonomies, subject heading schemas, ontologies, name authority lists, terminologies and synonym rings).


# Overview of controlled vocabularies in agricultural and environmental science

# AGROVOC

Is a multilingual vocabulary developed by the FAO. It defines and relates more than 36,000 concepts in 33 languages. The thesaurus is updated on a monthly basis. It is published as an RDF/SKOS-XL concept schema and accessible as Linked Data via SPARQL-endpoint. The thesaurus is aligned with 16 vocabularies as related to agriculture. Editing is possible for registered users through the web-based editing tool VocBench 3. AGROVOC is released under the license CC-BY IGO 3.0 and published by the web-based SKOS browser “Skosmos”. The JAVA Command Line application “AgroTagger” assigns and index semantic terms to textual content and as a keyword extractor from a set of web URLs. It allows to index web documents identifying main topics and creating RDF triples that link a Web URL to AGROVOC URIs.


# GEMET

# (General Multilingual Environmental Thesaurus)

This thesaurus was developed by the European Environmental Information and Observation Network (EIONET). It summarizes different structured vocabularies and aims to define a common terminology for environmental terms in the European context. It is available in more than 27 languages and consists of more than 6000 records. At least one GEMET keyword is required to be conform to the INSPIRE metadata schema for geospatial data (ISO 19115, ISO 19119, ISO 19139).


# AgrO

The Agronomy Ontology (AgrO) is develoed by CGIAR and describes agronomic practices, techniques and variables used in agronomic experiments. AgrO use traits, ICASA variables, and other existing ontologies. In March 2019 it was in the alpha phase and nit released officially.


# BCO

The application ontology Biological Collections Ontology (BCO) includes semantics relations on biodiversity, museum collections, environmental samples, and ecological surveys.


# CAB Thesaurus

This open access, multi-lingual thesaurus is operated by CAB International (CABI) science-based organization and includes almost 3 million terms in world’s science and technical fields. It includes some 250,000 plant, animal and microorganism names.


# Crop Ontology

This ontology includes a large database with ontologies of crops and crop-related terms, structured in the categories phenotype, breeding, germplasm and trait. Terms are defined by a unique combination of trait, method used and scale. It is open access and open to improve by the crop community.


# EUROVOC

This multilingual thesaurus maintained by the Publications Office of the European Union for indexing of documents of European institutions. It is available in 24 languages.


# GCMD

# Global Change Master Directory

This directory was developed by NASA and can be implemented as thesaurus into a database. Keywords are provided in different scientific disciplines such as agriculture, atmosphere and hydrology.


# ISO 11074:2015

# Soil quality, Vocabulary

This standard summarizes all relevant terms of soil science in a glossary and is available in a trilingual edition. It defines a list of terms used in the preparation of other standards in the field of soil quality. The terms are classified under the following main headings: general terms, description of soil, sampling and assessment of soils, remediation, and soil ecotoxicology.


# NAL

# National Agricultural Library

The United States Department of Agriculture (USDA) produced this agricultural vocabulary. It contains more than 135,000 terms, is updated annually, bilingual (English, Spanish), and available as Linked Open Data. Provided download formats are XML, RDF-OLS. It is mainly used for indexing and for improving retrieval of agricultural information.


# NCBI Taxon

The National Center for Biotechnology Information (NCBI) provides an extensive list of field crops including codes


# Ontology of Units of Measure and Related Concepts (OM), Version 2.0

This ontology models concepts and relations for units, quantities, measurements and dimensions including conversion factors. It was developed by the Wageningen University and modelled in OWL 2.


# QUDT

# Quantities, Units, Dimensions and Data Types Ontologies

This ontology is under development by the NASA and provides first unified model of quantities, dimensions, units, and conversion factors. Each unit has its own URI and can thus be used as unique unit-identifier for data-sets.



# Units Ontology (UO)


# AnaEE Thesaurus


# EngMath

This ontology was developed for mathematical modeling and is mostly used by engineers.


# EPPO Plant Protection Thesaurus

This European thesaurus includes pest-specific information, names (multi-lingual) and codes for plants, animals and microorganisms.


# GACS

The GACS project developed this semantic concept scheme which integrates three important agricultural thesauri: AGROVOC, CAB and NAL thesauri. GACS was planned to be a hub for all concepts and shared value lists related to agriculture. In November 2019 the participants agreed that GACS URIs should not be promoted for use since GACS is not actively being maintained. Data is from 2016 and will not be actively maintained in the future.


# LandVoc

This vocabulary was created by Land Portal organization and includes a set of 270 concept about land governance. LandVoc is mainly derived from AGROVOC but links together other vocabularies.


# PROV-O

This W3C ontology provides a set of classes, properties, and restrictions that can be used to represent and interchange provenance information generated in different systems and under different contexts. It can also be specialized to create new classes and properties to model provenance information for different applications and domains.


# Semantic Sensor Network Ontology

This ontology was developed by the W3C Semantic Sensor Networks Incubator Group and describes sensors, observations, and related concepts [1]. It provides numerous suggestions on the management of sensor data including metadata of sensor description (e.g. accuracy, detection limit).



# Overview of vocabulary portals, look-up services, registries, visualization, and gazetteers

# AgroPortal

The web portal provides access to agricultural ontologies and thesauri. Via a search field terms can be entered and concepts of more than 100 agricultural vocabularies. Registered user can provide new ontologies to this web service.


# GODAN

# Agrisemantics Map of Data Standards

This portal provides a global overview of existing vocabularies for the exchange of agricultural data. It is grouped in 14 main categories, such as “Natural Resources, Earth and Environment”.


# GeoNames

This geographical database contains more than 11 million names of places.


# Ontobee

This ontology register includes some 200 ontologies from natural science, including agriculture. It presents ontology term URIs to HTML web pages (user-friendly web browsing) and to RDF source codes for Semantic Web applications.


# Ontology Lookup Service (OLS)

OLS is a web service interface which allows queries from a register with more than 200 ontologies. The lookup service includes more than 5 million terms. Results link to single ontologies, concepts and output formats.


# Open Tree of Life

Was funded by the NSF, describes and visualizes the biological taxonomic classification system and can be used to allocate taxonomic species names and classes.


# Planteome

The project provides a platform to search and browse plant species, plant traits, phenotypes and gene expressions from different information systems.



# Overview of tools, specifications, initiatives, data models and ontology languages

# SKOS

# Simple Knowledge Organization System

SKOS is a W3C recommendation for the representation of thesauri, classification etc. or any other controlled vocabulary. It gives guidelines to facilitate publication and use of vocabularies as Linked Data. SKOS is part of the Semantic Web standards built upon RDF and RDFS. SKOS was formal released in 2009 by W3C as a new standard that connects different KOS and the linked data community. It defines classes and properties to present common features of a standard thesaurus.


# RDF

# Resource Description Framework

RDF is a family of W3C specifications that is applied as a general method for conceptual description or modeling of information that is implemented in web resources. Via an Application Programming Interface (e.g. RDF API) a standardized interface can be implemented e.g. in within web-based data portals.


# ISO/IEC 13250-2:2006

# Information technology – Topic Maps

This standard regulates the representation and interchange of knowledge, especially for information retrieval. Topic Maps enable the linkage of multiple indexes from different sources. The standard defines the abstract structure and interpretation of Topic Maps, rules for merging them and a set of fundamental subject identifiers. The purpose of the data model is to define the interpretation of the Topic Maps interchange syntax, and to serve as a foundation for the definition of supporting standards for canonicalization, querying, constraints, etc.


# OWL (Web Ontology Language)

Ontology language developed (and updated to OWL 2) by the W3C. It meets the requirements of the Semantic Web. Ontologies which were written in OWL 2 (e.g. OM) can be used and exchanged as RDF documents. Relations between agricultural terms, a set of 179 custom relations is provided by e.g. Agrontology (as used in AGROVOC).


# VocBench

Is an open source, web-based multilingual vocabulary editing and workflow tool. It was originally developed and released by the FAO and the Artificial Intelligence Research Group of the University of Rome Tor Vergata to manage AGROVOC, but now hosts a still expanding set of vocabularies.


# Data Catalog Vocabulary (DCAT)

This RDF specification was designed by W3C to facilitate the interoperability between and search across different data catalogues. DCAT does not make any assumptions about the format of the datasets described in a catalogue. It incorporates terms from other vocabularies with stable term with appropriate meanings (e.g., foaf:homepage or dct:title).


# RDFS

# Resource Description Framework Schema

This RDF schema can be used for sematic data models and was published by the W3C (1998). It includes several classes with certain properties using the RDF extensible representation data model, providing basic elements for the description of ontologies.


# schema.org

Is a collaborative activity founded by technology companies (Google, Yahoo, Microsoft), to develop a standardized ontology to structure web data, based on existing markup languages. It provides a standardized schema for structured data on the internet and defines entities and relationships to be used to describe data sets or web pages. It is used to structure data in the research data search tool “Google Dataset Search”.


# Use cases for applied thesauri in databases

# AGRIS

# (International System for Agricultural Science and Technology)

Is a global public domain database published by the FAO with more than 8 million records on agricultural science and technology. The AGRIS Search system allows scientists, researchers and students to perform sophisticated searches using keywords from the AGROVOC thesaurus, specific journal titles or names of countries, institutions, and authors. The AGRIS is a RDF-aware system and AGRIS database is exposed as RDF.


# Conflicts and solutions

The AGROVOC thesaurus is widely accepted and appreciated within the agricultural science community. However, soil terms and concepts are often missing or inadequately described or assigned. Terms and concepts are permanently edited and improved by the experts (AGROVOC Editors) using the editing tool VocBench3.

# References

[1] Group, W3C Semantic Sensor Network Incubator (2009). Semantic Sensor Network Ontology (http://purl.oclc.org/NET/ssnx/ssn).