SemSA

From ESWC 2014 Challenges WIKI
Jump to: navigation, search

ESWC'14 Challenge: Concept-Level Sentiment Analysis

For the full report on the challenge (and for citing it in a scientific publication), please refer to the following manuscript: http://sentic.net/eswc14-challenge.pdf

MOTIVATION AND OBJECTIVES

Mining opinions and sentiments from natural language is an extremely difficult task as it involves a deep understanding of most of the explicit and implicit, regular and irregular, syntactical and semantic rules proper of a language. Existing approaches mainly rely on parts of text in which opinions and sentiments are explicitly expressed such as polarity terms, affect words and their co-occurrence frequencies. However, opinions and sentiments are often conveyed implicitly through latent semantics, which make purely syntactical approaches ineffective. To this end, concept-level sentiment analysis aims to go beyond a mere word-level analysis of text and provide novel approaches to opinion mining and sentiment analysis that allow a more efficient passage from (unstructured) textual information to (structured) machine-processable data, in potentially any domain.

Concept-level sentiment analysis focuses on a semantic analysis of text through the use of web ontologies or semantic networks, which allow the aggregation of conceptual and affective information associated with natural language opinions. By relying on large semantic knowledge bases, concept-level sentiment analysis steps away from blind use of keywords and word co-occurrence count, but rather relies on the implicit features associated with natural language concepts.

This Challenge focuses on the introduction, presentation, and discussion of novel approaches to concept-level sentiment analysis. Participants will have to design a concept-level opinion-mining engine that exploits common-sense knowledge bases, e.g., SenticNet, and/or Linked Data and Semantic Web ontologies, e.g., DBPedia, to perform multi-domain sentiment analysis. The main motivation for the Challenge, in particular, is to go beyond a mere word-level analysis of natural language text and provide novel concept-level tools and techniques that allow a more efficient passage from (unstructured) natural language to (structured) machine-processable data, in potentially any domain.

Systems must have a semantics flavor (e.g., by making use of Linked Data or known semantic networks within their core functionalities) and authors need to show how the introduction of semantics can be used to obtain valuable information, functionality or performance. Existing natural language processing methods or statistical approaches can be used too as long as the semantics plays a main role within the core approach (engines based merely on syntax/word-count will be excluded from the competition).

TARGET AUDIENCE

The Challenge is open to everyone from industry and academia.

TASKS

The Concept-Level Sentiment Analysis Challenge is defined in terms of different tasks. The first task is elementary whereas the others are more advanced. The input units of each task are sentences. Sentences are assumed to be in grammatically correct American English and have to be processed according to the input format specified at http://sentic.net/challenge/sentence.

Elementary Task: Polarity Detection The main goal of the task is polarity detection. The proposed systems will be assessed according to precision, recall and F-measure of detected binary polarity values (1=positive; 0=negative) for each input sentence of the evaluation dataset, following the same format as in http://sentic.net/challenge/task0. The problem of subjectivity detection is not addressed within this Challenge, hence participants can assume that there will be no neutral sentences. Participants are encouraged to use the Sentic API or further develop and apply sentic computing tools.

Advanced Task #1: Aspect Extraction The output of this task will be a set of aspects of the reviewed product and a binary polarity value associated to each of such aspects, in the format specified at http://sentic.net/challenge/task1. So, for example, while for the Elementary task an overall polarity (positive or negative) is expected for a review about a mobile phone, this task requires a set of aspects (such as ‘speaker’, ‘touchscreen’, ‘camera’, etc.) and a polarity value (positive OR negative) associated with each of such aspects. Systems will be assessed according to both aspect extraction and aspect polarity detection.

Advanced Task #2: Semantic Parsing As suggested by the title, the Challenge focuses on sentiment analysis at concept-level. This means that the proposed systems are not supposed to work at word/syntax level but rather work with concepts/semantics. Hence, this task will evaluate the capability of the proposed systems to deconstruct natural language text into concepts, following the same format as in http://sentic.net/challenge/task2. SenticNet will be taken as a reference to test the efficiency of the proposed parsers, but extracted concepts won't necessary have to match SenticNet concepts. The proposed systems, for example, are supposed to be able to extract a multi-word expression like ‘buy christmas present’ from sentences such as “Today I bought a lot of very nice Christmas presents’. The number of extracted concepts per sentence will be assessed through precision, recall and F-measure against the evaluation dataset.

Advanced Task #3: Topic Spotting Input sentences will be about four different domains, namely: books, DVDs, electronics, and kitchen appliances. This task focuses on the automatic classification of sentences into one of such domains, in the format specified at http://sentic.net/challenge/task3. All sentences are assumed to belong to only one of the above-mentioned domains. The proposed systems are supposed to exploit the extracted concepts to infer which domain each sentence belongs to. Classification accuracy will be evaluated in terms of precision, recall and F-measure against the evaluation dataset.

EVALUATION DATASET

We arbitrarily chose 50 electronics, book, housewares and dvd reviews from the Blitzer dataset. Reviews were then split into sentences and each of these was labeled by a pool of four annotators (two native English speakers, 1 Chinese and 1 Indian). The dataset can be freely downloaded (http://sentic.net/eswc14.zip); the compressed file contains the annotated dataset for each of the four tasks.

Elementary Task: Polarity Detection Annotators were asked to label sentences according to their polarity, i.e., positive or negative (neutral sentences were removed). This yielded 2,322 sentences bearing either positive or negative sentiment. Specifically, annotators were asked to empathize with the speaker. So, in a sense, the polarity associated with each sentence does not reflect the conveyed emotions but rather is an inference about the speaker’s sentiments. This is key to disambiguate sentences that refer to more than one actor, e.g., “I love the movie that you hate”. For each sentence, the polarity with the highest inter-annotator agreement was selected. We obtained 1,420 negative sentences and 902 positive.

Advanced Task #1: Aspect Extraction For the aspect extraction task, annotators were asked to infer aspects and label the sentiment associated with each of them. For this task, we liaised on majority voting for the selection of extracted aspects and their sentiment labels. It was notable that for most sentences the inter annotator agreement was greater than 2, i.e., most of the times, at least 3 annotators extracted same aspects and labeled them with the same sentiment. Sentences that did not have any aspect were removed from the final corpus. 1,725 sentences have been generated for such a task.

Advanced Task #2: Semantic Parsing For semantic parsing task, we manually selected 2,398 sentences and asked annotators to extract the most useful concepts from them. Majority voting technique was applied on the extracted concepts to come up with a final list of concept for each sentence. The guideline was to choose multiword expressions richer in semantics so that in a sentence like “I went to the mall to buy food” the parsed concepts would be go_mall and buy_food rather than simply go, mall, buy, and food.

Advanced Task #3: Topic Spotting The topic spotting dataset was also built at sentence level. For each sentence, annotators labeled the topic and a majority voting technique determined the final topic label for that sentence. It is notable that for almost every sentence annotator agreement was 4 (but this is mainly due to the fact that topics were predefined). The final dataset contains 1,122 sentences about electronics, 442 sentences about books, 1,104 sentences about dvds and 1,088 sentences about housewares.

EVALUATION

The evaluation will be performed by the members of the Program Committee. For systems that can be tuned with different parameters, please indicate a range of up to 4 sets of settings. Settings with the best F-measures will be considered for judgment. For each system, reviewers will give a numerical score within the range [1-10] and details motivating their choice. The scores will be given to the following aspects:

1. Use of common-sense knowledge and semantics;

2. Precision, recall, and F-measure wrt the selected task;

3. Computational time;

4. Innovative nature of the approach.

JUDGING AND PRIZES

After a first round of review, the Program Committee and the chairs will select a number of submissions confirming to the challenge requirements that will be invited to present their work. Submissions accepted for presentation will be included in post-proceedings and will receive constructive reviews from the Program Committee. All accepted submissions will have a slot in a poster session dedicated to the challenge. In addition, the winners will present their work in a special slot of the main program of ESWC and will be invited to submit a paper to a dedicated Semantic Web Journal special issue.

For the Concept-Level Sentiment Analysis Challenge there will be two awards for each task:

Quantitative: the system with the highest average score in items 1-3 above;

Innovative: the system with the highest score in item 4 above.

There will be a board of judges at the conference who will evaluate again the systems in more detail. The judges will then meet in private to discuss the entries and to determine the winners. It may happen that the same system runs for both the awards. Winners will be selected only for tasks with at least 3 participants. In any case all submissions will be reviewed and, if accepted, published in ESWC post-proceedings.

HOW TO PARTICIPATE

The following information has to be provided:

Abstract: no more than 200 words.

Description: It should contain the details of the system, including why the system is innovative, how it uses Semantic Web, which features or functions the system provides, what design choices were made and what lessons were learned. The description should also summarize how participants have addressed the evaluation tasks. Papers must be submitted in PDF format, following the style of the Springer’s Lecture Notes in Computer Science (LNCS) series (http://www.springer.com/computer/lncs/lncs+authors), and not exceeding 5 pages in length.

Web Access: The application can either be accessible via the web or downloadable. If the application is not publicly accessible, password must be provided. A short set of instructions on how to use the application should be provided as well.

All submissions should be provided via EasyChair https://www.easychair.org/conferences/?conf=eswc2014-challenges

We invite the potential participants to subscribe to our mailing list in order to be kept up to date with the latest news related to the challenge.

https://lists.sti2.org/mailman/listinfo/eswc2014-semsa-challenge

MAILING LIST

We invite the potential participants to subscribe to our mailing list in order to be kept up to date with the latest news related to the challenge.

https://lists.sti2.org/mailman/listinfo/eswc2014-semsa-challenge

IMPORTANT DATES

EXTENDED March 14 March 21, 2014, 23:59 CET: Submission due

April 9, 2014, 23:59 CET: Notification of acceptance

May 27-29, 2014: The Challenge takes place at ESWC-14

CHALLENGE CHAIRS

Erik Cambria, Nanyang Technological University (Singapore)

Diego Reforgiato, CNR STLAB Laboratory (Italy)

PROGRAM COMMITTEE

• Newton Howard, MIT Media Laboratory (USA)

• ChengXiang Zhai, University of Illinois at Urbana-Champaign (USA)

• Rada Mihalcea, University of North Texas (USA)

• Ping Chen, University of Houston-Downtown (USA)

• Yongzheng Zhang, LinkedIn Inc. (USA)

• Giuseppe Di Fabbrizio, Amazon Inc. (USA)

• Rui Xia, Nanjing University of Science and Technology (China)

• Rafal Rzepka, Hokkaido University (Japan)

• Amir Hussain, University of Stirling (UK)

• Alexander Gelbukh, National Polytechnic Institute (Mexico)

• Bjoern Schuller, Technical University of Munich (Germany)

• Amitava Das, Samsung Research India (India)

• Dipankar Das, National Institute of Technology (India)

• Carlo Strapparava, Fondazione Bruno Kessler (Italy)

• Stefano Squartini, Marche Polytechnic University (Italy)

• Cristina Bosco, University of Torino (Italy)

• Paolo Rosso, Technical University of Valencia (Spain)

ESWC CHALLENGE COORDINATOR

Milan Stankovic, Sépage & Université Paris-Sorbonne (France)

RESULTS OF SEMSA AT ESWC2014

TASK0

For this task the dataset was composed of 2429 sentences.

Precision Recall F1 Measure Final Position
NCU 0.78 0.57 0.66 1
IBM 0.66 0.59 0.62 2
FBK 0.42 0.47 0.44 3
NTU disqualified as it processed 323 sentences only out of 2429 4

TASK1

For this task the dataset was composed of 1725 sentences.

Precision Recall F1 Measure Final Position
FBK 0.25 0.26 0.25 1
IBM 0.24 0.14 0.18 2
UNI-NEGEV 0.12 0.05 0.07 3

TASK2

For this task the dataset was composed of 2421 sentences.

Precision Final Position
NTU 0.37 1
SINICA 0.04 2
FBK disqualified as it processed 1176 sentences out of 2421

TASK3

For this task the dataset was composed of 3756 sentences.

Precision Final Position
IBM 0.33 1
FBK 0.17 2

THE MOST INNOVATIVE APPROACH

Final Position
FBK 1
IBM 2
NTU 3
SINICA 4
NCU 5
UNI-NEGEV 6