SemPub

From ESWC 2014 Challenges WIKI
Jump to: navigation, search
NEWS: Papers are now online

ESWC-14 Challenge: Semantic Publishing – Assessing the Quality of Scientific Output

MOTIVATION AND OBJECTIVES

Scholarly publishing is increasingly enabling a new wave of applications that better support researchers in disseminating, exploiting and evaluating their results. The potential of publishing scientific papers enriched with semantic information is huge and raises interesting and challenging issues. Semantic Web technologies play a central role in this context, as they can help publishers to make scientific results available in an open format the whole research community can benefit from. The Semantic Publishing Challenge 2014 is intended to be the first in a series of events at ESWC for producing and exploiting semantic publishing data. The main focus this year is on extracting information and using this information to assess the quality of scientific productions. Linked open datasets about scientific production exist – e.g. DBLP – but they usually cover basic bibliographic information, which is not sufficient to assess quality. Quality-related information are often hidden and not yet available as LOD. There is also a growing interest in alternative forms of publishing scientific data as (semantic) datasets that can be more easily shared, linked to each other, and reasoned on. Alternative metrics for scientific impact are also gaining relevance. We are seeking the most innovative and impacting applications in this emerging contexts.

TARGET AUDIENCE

The Challenge is open to everyone from industry and academia.

TASKS

The Challenge includes three tasks. Participants can participate in as many tasks as they like.

Further details about the organization of the Challenge are provided in SemPub/ChallengeStructure

Extraction Tasks

We ask challengers to automatically annotate a set of multi-format and multi-source input documents and to produce a Linked Open Dataset that fully describes these documents, their context, and relevant parts of their content. The evaluation will consist of evaluating a set of queries against the produced dataset to assess its correctness and completeness. The input dataset will be split in two parts: a training/testing part and an evaluation part, which will disclosed a few days before the submission deadline. Participants will be asked to run their tool on the evaluation dataset and to produce the final Linked Open Dataset.

Task 1: Extraction and assessment of workshop proceedings information

Participants are required to extract information from a set of HTML tables of contents, partly including microformat and RDFa annotations but not necessarily being valid HTML, of selected computer science workshop proceedings published with the CEUR-WS.org open access service. The extracted information is expected to answer queries about the quality of these workshops, for instance by measuring their growth, longevity, connection with other events, distribution of papers and authors. (Details)

Task 2: Extraction and characterization of citations

Participants are required to extract information about the citations in scientific articles and their relevance. Input documents are in XML JATS and TaxPub, an official extension of JATS customized for taxonomic treatments, and selected from the PubMedCentral Open Access Subset and the Pensoft Biodiversity Data Journal and ZooKeys archive. The extracted information is expected to be used for assessing the value of articles by analyzing citations, for instance by considering their position in the paper, their co-location with other citations or their purpose. (Details)

In-use Task 3: Semantic technologies in improving scientific production

Participants are asked to submit demos that showcase the potential of Semantic Web technology for enhancing and assessing the quality of scientific production. The task has a completely open structure and is, in particular, independent from tasks 1 and 2: participants are free to decide which tool to show and which dataset to use. The evaluation will be different from other tasks and will consist of two phases: after a first round of review, a number of submissions will be invited to demo their work at ESWC. The final decision will be taken at the Conference by a jury formed of PC members present at the event and other invited experts. (Details)

EVALUATION

Extraction Tasks 1 and 2

Participants will be requested to submit the LOD that their tool produces from the evaluation dataset, as well as a paper that describes their approach. They will also be given a set of queries in natural language form and will be asked to translate those queries into a SPARQL form that works on their LOD. The results of the queries on the produced LOD will be compared with the expected output, and precision and recall will be measured to identify the best performing approach. Separately, the most original approach will be assigned by the Program Committee.

In-use Task 3

Participants are required to submit a paper description as for tasks 1 and 2 and a demo version of the tool (open source appreciated but not mandatory). The evaluation will consist of two phases: after a first round of review, a number of submissions will be invited to demo their work at ESWC. The final decision will be taken at the Conference by a jury formed of PC members present at the event and other invited experts. The winner will be selected according to its potential impact, originality, breakthrough, the quality of the demo, and the appropriateness for ESWC.

Further details about the evaluation are provided in SemPub/ChallengeStructure.

FEEDBACK AND DISCUSSION

A discussion group is open for participants to ask questions and to receive updates about the challenge.

We invite the potential participants to subscribe to our mailing list and to communicate their intention to participate. They are also invited to use this channel to discuss problems in the input dataset and to suggest changes.

JUDGING AND PRIZES

After a first round of review, the Program Committee and the chairs will select a number of submissions conforming to the challenge requirements that will be invited to present their work. Submissions accepted for presentation will receive constructive reviews from the Program Committee, they will be included in the Springer LNCS post-proceedings of ESWC, and they will have a presentation slot in a poster session dedicated to the challenge.

In addition, the winners will present their work in a special slot of the main program of ESWC and will be invited to submit a revised and extended paper to a dedicated Semantic Web Journal special issue.

Five winners will be selected. For each of Tasks 1 and 2 we will select:

  • best performing tool, given to the paper which will get the highest score in the evaluation
  • most original approach, selected by the Challenge Committee with the reviewing process

The winner of Task 3 will be selected by the jury according to its potential impact, originality, breakthrough, the quality of the demo, and the appropriateness for ESWC.

Winners will be selected only for tasks with at least 3 participants. In any case all submissions will be reviewed and, if accepted, published in ESWC post-proceedings.

An amount of 700 Euro has already been secured for the final prizes. We are currently working on securing further funding.

HOW TO PARTICIPATE

Participants in Tasks 1 and 2 are required to submit an abstract, a 5-page description, the Linked Open Dataset produced by their tool on the evaluation dataset, a set of SPARQL queries that work on that LOD and correspond to the natural language queries provided as input, and their tool (for verification purposes).

Participants in Task 3 are required to submit an abstract, a 5-page description, and a demo version of their tool.

Further submission instructions are in SemPub/Submission

All submissions should be provided via EasyChair

IMPORTANT DATES

  • December 3, 2013: Publication of the full description of the extraction tasks 1 and 2, rules and queries; publication of the training/testing dataset
  • January 31, 2014, 23:59 CET: Deadline for making remarks to the task 1 and 2 training/testing datasets
  • February 5, 2014: Publication of the final task 1 and 2 training/testing datasets
  • March 18, 2014, 23:59 CET: Abstract submission (task 3 only) *** Extended ***
  • March 18, 2014: Publication of the task 1 and 2 evaluation dataset *** Extended ***
  • March 21, 2014, 23:59 CET: Submission due *** Extended ***
  • April 9, 2014: Notification of acceptance
  • May 27-29, 2014: Demo at ESWC-14, and winner selection

CHALLENGE CHAIRS

  • Angelo Di Iorio, Department of Computer Science and Engineering, University of Bologna, Italy
  • Christoph Lange, Enterprise Information Systems, University of Bonn / Fraunhofer IAIS, Germany

PROGRAM COMMITTEE

  • Sören Auer, University of Bonn / Fraunhofer IAIS, Germany (supervisor)
  • Chris Bizer, University of Mannheim, Germany
  • Sarven Capadisli, University of Leipzig, Germany
  • Alexander Constantin, University of Manchester, UK
  • Jeremy Debattista, University of Bonn / Fraunhofer IAIS, Germany
  • Alexander García Castro, Florida State University, USA
  • Leyla Jael García Castro, Bundeswehr University of Munich, Germany
  • Manfred Jeusfeld, University of Skövde / CEUR-WS.org, Sweden
  • Paul Groth, VU University of Amsterdam, Netherlands
  • Rinke Hoekstra, VU University of Amsterdam, Netherlands
  • Aidan Hogan, DCC, Universidad de Chile
  • Evangelos Milios, Dalhousie University, Canada
  • Lyubomir Penev, Pensoft Publishers, Bulgaria
  • Robert Stevens, University of Manchester, UK
  • Jun Zhao, Lancaster University, UK

We are inviting further members.

ESWC CHALLENGE COORDINATOR

  • Milan Stankovic, Sépage & Université Paris-Sorbonne (France)