Publications Office of the EU
Cellar data - Cellar

Cellar data

Cellar's EU publications can be retrieved directly from Cellar using the HTTP RESTful web services, dedicated to publications.

Cellar also stores and disseminates metadata about publications.

Cellar's metadata about EU publications can be retrieved directly from Cellar, using the machine-readable SPARQL interface of the Cellar knowledge graph or the HTTP RESTful web services, dedicated to retrieve metadata notices.

Cellar provides also an RSS and Atom feeds API, in order to allow (re)users to receive information about new publications that have been stored into the repository or updates of existing ones.

In order to represent and organize Cellar Data, a model is used and is presented below.

Cellar Data Model

Cellar's Data Model

Cellar's data model is expressed through an ontology and it also exploits vocabularies analyzed in the following sections.

Asset Publisher

The Cellar’s data model is based on the Common Data Model (CDM). CDM follows the logic of the Functional Requirements for Bibliographic Records (FRBR), published by the International Federation of Library Associations. The foundation entities of FRBR are:

  • Work: a distinct intellectual or artistic creation.
  • Expression: the realization of a Work in the form of alphanumeric, musical or choreographic notation, sound, image, object, movement, etc.
  • Manifestation: the physical embodiment of an Expression of a Work.
  • Item: a single exemplar of a Manifestation.

Hierarchy Work-Expression-Manifestation-Item

Publications and their associated metadata are represented in Cellar following the Work-Expression-Manifestation-Item (WEMI) entities and their respective hierarchy: 

  • A Work, which covers the W role of the WEMI hierarchy. For example, a Work can be an official journal, an preparatory document of the European Commission, a legal act or a general publication. Metadata like authors, subjects, identifiers, types, references to other publication can be defined at the Work level. A complete list of these metadata are defined at the Common Data Model page. A Work included one or more Expressions:
  • An Expression covers the E role of the WEMI hierarchy and is defined as the realization of a Work in a specific language. Many of the publications stored in Cellar, like an official journal or a general publication, may have been published in more than one language. For example, legal documents are published in all official EU languages. In this case, a specific Work will have one Expression for each language. Expressions are linked to the language authority table using expression_uses_language relation. The relations in the Common Data Model between Work and Expression entities are defined by the work_has_expression property or its inverse property expression_belongs_to_work. An Expression may include one or more Manifestations:
  • A Manifestation covers the M role of the WEMI hierarchy and is defined as the instantiation of a Work in the language defined by the embedding Expression, and in a specific format. For example, if the English Expression of a legal act is available in PDF and XHTML, there will be two Manifestations associated with that English Expression. There are even ‘print’ Manifestations, which have no associated digital content, just metadata, aiming to describe that the OP keeps a printed version of that Expression (language) and Work. Manifestation type is defined in the Common Data Model by the property manifestation_type. The relations in the Common Data Model between Expressions and Manifestations entities are defined by the expression_manifested_by_manifestation property or inverse property manifestation_manifests_expression. Finally, a Manifestation may include one or more Items:
  • An Item is several content streams or digital files. A content stream covers the I role of the WEMI hierarchy and is defined as the entity that physically carries the information of the Manifestation. The content stream is typically a document written in the language and format defined by the embedding Manifestation. For example, an official journal with many pages, such as the EU Budget, is split into several files. Also, metadata in RDF syntax are also stored as Items in Cellar. The relations in CDM between Manifestations and Items entities are defined by the manifestation_has_item  property or inverse property item_belongs_to_manifestation.


 

 

Cellar also supports other semantic models and ontologies.

Hierarchy Dossier-Event

The Dossier-Event hierarchy is composed by:

  • a Dossier, which covers the W role of the WEMI hierarchy. A Dossier may embed:
  • a several Events, which cover the E role of the WEMI hierarchy.

Event entity

The Event also called top level event hierarchy is solely composed by an event. It’s a new hierarchy: now an event can be a top-level entity and not only a child of a “Dossier”.

Agent entity

The Agent hierarchy is solely composed by an agent, which covers the W role of the WEMI pattern.

The full list of CDM classes and properties can be consulted at the Metadata Registry (MDR) page.

In order to harmonize and standardize the codes used in the metadata of Cellar, the Named Authority Lists (NALs) have been defined. These NALs are also known as controlled vocabularies or value lists.

The NALs are a preloaded, not modifiable, decoded-by-language set of data, meant to be used by the Cellar ontology’s concepts. The NAL itself is a concept defined with the resource URI: http://publications.europa.eu/resource/authority/* where * is the NAL specific class.

EuroVoc is a multilingual thesaurus, maintained by the Publications Office of the European Union. It exists in all the official languages of the European Union. Eurovoc is used by:

  • the European Parliament,
  • the Publications Office of the European Union,
  • the national and regional parliaments in Europe, and
  • some national government departments and European organizations.

This thesaurus serves as the basis for the domain names used in the European Union's terminology database: Inter-Active Terminology for Europe. As stated in previous paragraph, the EuroVoc is one specific type of NAL.

The EuroVoc url is http://eurovoc.europa.eu/100141.

Each resource in the CELLAR is globally identified by a URI composed as follows:  http://publications.europa.eu/resource/{ps-name}/{ps-id}, named the resource URI.

{PS-NAME}

It identifies the name of the production system. The CELLAR currently uses the following production system names:

cellar, celex, oj, com, genpub, ep, jurisprudence, dd, mtf, consolidation, eurostat, eesc, cor, nim, pegase, agent, uriserv, join, swd, comnat, mdr, legissum, ecli, procedure, procedure-event, eli, immc, planjo, numpub, case-event, case, person, organization, whoiswho, membership, consil, dataset, documentation, directory, distribution, schema, expression, jure, parliament, eca, wp, intproc, intproc-event, intcom, inteesc, intcor, intconsil, intep, inteca, ecb, ted, session, session-sitting, legispack, dossier and dossier-event.

This list is not exhaustive and is revised regularly.

{PS-ID}

It is the resource’s unique identifier, and it has a structure that depends on the value of {ps-name}.

If {ps-name} is ‘cellar’

cellar is the only production system’s name reserved to the Cellar application, and its identifiers follow the following conventions:
type {PS-ID} example
work - dossier - event top level - agent {work-id} b84f49cd-750f-11e3-8e20-01aa75ed71a1
expression - event {work-id}.{expr-id} b84f49cd-750f-11e3-8e20-01aa75ed71a1.0003
manifestation {work-id}.{expr-id}.{man-id} b84f49cd-750f-11e3-8e20-01aa75ed71a1.0003.01
item {work-id}.{expr-id}.{man-id}/{cs-id}{ b84f49cd-750f-11e3-8e20-01aa75ed71a1.0003.01/DOC_1

where:

  • {work-id} is a valid Universally Unique Identifier (UUID)
  • {expr-id} is a 4-chars numeric value
  • {man-id} is a 2-chars numeric value
  • {cs-id} is an alphanumeric value with following pattern: DOC_x, where x is an incremental numeric value that identifies the content stream.

If {ps-name} is other than ‘cellar’

For all other production system’s names, the following conventions are used:
type {PS-ID} example
work - dossier - agent {work-id} 32006D0241
expression - event {work-id}.{expr-id} 32006D0241.FRA
manifestation {work-id}.{expr-id}.{man-id} 32006D0241.fmx4
item {work-id}.{expr-id}.{man-id}/{cs-id} 32006D0241.FRA.fmx4.L_2006088FR.01006402.xml
event {work-id}.{eventid} 32006D0241.FRA.fmx4.L_2006088FR.01006402.xml

where:

  • {work-id} is an alphanumeric value
  • {expr-id} is a 3-chars ISO_639-3 language code. For the exhaustive list of supported ISO_639-3 codes
  • {man-id} is an alphanumeric value identifying a file format (Formex, PDF, HTML, XML, etc.)
  • {cs-id} is an alphanumeric value identifying the name of the content stream
  • {event-id} is a numeric value.

Examples of valid resource URIs

Here follows a non-exhaustive list of examples of resource URIs that match the patterns described above:

  1. A work with ps-name of type cellar and the given ps-id: http://publications.europa.eu/resource/cellar/b84f49cd750f-11e3-8e20-01aa75ed71a1
  2. An expression – belonging to the work at point 1) – with ps-name of type cellar and the given ps-id: http://publications.europa.eu/resource/cellar/b84f49cd750f-11e3-8e20-01aa75ed71a1.0006
  3. A manifestation – belonging to the expression at point 2) - with ps-name of type cellar and the given ps-id: http://publications.europa.eu/resource/cellar/b84f49cd750f-11e3-8e20-01aa75ed71a1.0006.03
  4. A content stream – belonging to the manifestation at point 3) – with ps-name of type cellar and the given ps-id: http://publications.europa.eu/resource/cellar/b84f49cd750f-11e3-8e20-01aa75ed71a1.0006.03/DOC_1
  5. A work with ps-name of type oj and the given ps-id: http://publications.europa.eu/resource/oj/JOL_2014_001_R_0001_01
  6. A work with ps-name of type celex and the given ps-id: http://publications.europa.eu/resource/celex/32014R0001
  7. The following resource URI identifies an expression – belonging to the work at point 6) - with ps-name of type celex and the given ps-id: http://publications.europa.eu/resource/celex/32014R0001.FRA
  8. A manifestation – belonging to the expression at point 7) - with ps-name of type oj and the given ps-id: http://publications.europa.eu/resource/oj/JOL_2014_001_R_0001_01.FRA.fmx4
  9. A content stream – belonging to the manifestation at point 8) - with ps-name of type oj and the given ps-id: http://publications.europa.eu/resource/oj/JOL_2014_001_R_0001_01.FRA.fmx4.L_2014001FR.01000302.xml
  10. A work with ps-name of type pegase and the given ps-id: http://publications.europa.eu/resource/pegase/11260
  11. An event with ps-name of type pegase and the given ps-id: http://publications.europa.eu/resource/pegase/11260.12796
Which are the benefits of Cellar data model?

Which are the benefits of Cellar data model?

  • Multiple views (formats & linguistic versions) of the same publication are grouped together
  • Use of open & standardized models & vocabularies to describe the publications
  • Unique identification of publications
  • Publications can be easily re-used