The workshop will take place on Sunday, June 26, 2016, in the Marina hall of the Hyatt Regency Hotel, as part of the SIGMOD 2016 program for Sunday.
The program includes 4 sessions. In the first session we have a keynote speech by Luna Dong, following an introduction. The second session is a paper session, where four papers will be presented. The third session includes an award ceremony, followed by the presentation of the best paper, and then a second keynote speech by Gerhard Weikum. Finally, the fourth session is again a paper session with 4 paper presentations.
Each paper is allotted 22 minutes, consisting of (at most) 18 minutes presentation followed by (at least) 4 minutes of questions and discussion.
Detailed Program (Tentative)
- 8:45-10:00: keynote
session
- Introduction
- Keynote: Xin Luna Dong. How Far Are We from Collecting the Knowledge in the World?
- 10:00-10:30: coffee break
- 10:30-12:00: paper
session
- Workload-Driven Learning of Mallow Mixtures with Pairwise Preference Data. By Julia Stoyanovich, Lovro Ilijasic and Haoyue Ping.
- Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector. By Matteo Cannaviccio, Denilson Barbosa and Paolo Merialdo.
- Fast and Accurate Identification of Implicit Enterprise Users in Social Media. By Zhenni You, Diqian Wu, Yi Wang and Tieyun Qian.
- Fusing Time-Dependent Web Table Data. By Yaser Oulabi, Robert Meusel and Christian Bizer.
- 12:00-13:30: Lunch (on your own)
- 13:30-15:00: award +
keynote session
- Award ceremony.
- Incorporating Information Extraction in the Relational Database Model. By Yoav Nahshon, Liat Peterfreund and Stijn Vansummeren. (Best paper)
- Keynote: Gerhard Weikum. What Computers Know, Should Know, and Shouldn't Know.
- 15:30-17:00: paper
session
- An Index Scheme for Fast Data Stream to Distributed Append-Only Store. By Parijat Mazumdar, Li Wang, Marianne Winslett, Zhenjie Zhang, and Deokwoo Jung.
- The Ranking Game. By Ran Ben Basat and Elad Kravi.
- Web Table Column Categorisation and Profiling. By Oliver Lehmberg and Christian Bizer.
- On the Statistical Analysis of Practical SPARQL Queries. By Xingwang Han, Zhiyong Feng, Xiaowang Zhang, Xin Wang, Guozheng Rao and Shuo Jiang.
Invited Talks
Xin Luna Dong: How Far Are We from Collecting the Knowledge in the World?
Abstract. In this talk we ask the question: How far are we from collecting the knowledge in the world? We analyze the knowledge that has been collected in three categories: head knowledge in head verticals (e.g., music), long-tail knowledge in head verticals, and head knowledge in long-tail verticals (e.g., yoga pose), showing the limitations and challenges in current knowledge-collection techniques. We then present two key efforts at Google on collecting tail knowledge. The first, called Knowledge Vault, targeted on tail knowledge in head verticals. It used 15 extractors to periodically extract knowledge from 1B+ Webpages, obtaining 3B+ distinct (subject, predicate, object) knowledge triples. The second, called Lightweight Verticals, targets on head knowledge in tail verticals. It uses a crowd-sourcing approach to collect knowledge by annotating websites, and currently is attracting millions of active Google Search users every day. We present key technologies under both projects, namely, knowledge fusion for guaranteeing knowledge correctness, and knowledge-based trust for finding authoritative sources for knowledge curation.
Bio. Xin Luna Dong is a Senior Research Scientist at Google Inc. She is one of the major contributors to the Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the "Google Truth Machine" by Washington's Post. She has co-authored book "Big Data Integration", published 65+ papers in top conferences and journals, given 20+ keynotes/invited-talks/tutorials, and got the Best Demo award in Sigmod 2005. She is the PC co-chair for WAIM 2015 and serves as an area chair for Sigmod 2017, Sigmod 2015, ICDE 2013, and CIKM 2011.
Gerhard Weikum: What Computers Know, Should Know, and Shouldn't Know
Abstract. Machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing vision and challenge of AI. In the last decade, huge knowledge bases (aka. knowledge graphs) have been automatically constructed from web data and text sources, and have become a key asset for search, analytics, recommendations and data integration. This digital knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, contributing to natural language processing and data analytics. This talk reviews these advances, discusses recent directions such as acquiring commonsense, and identifies new opportunities and future challenges.
Bio. Gerhard Weikum is a Scientific Director at the Max Planck Institute for Informatics in Saarbruecken, Germany. His research spans transactional and distributed systems, self-tuning database systems, DB&IR integration, and the automatic construction of knowledge bases. He co-authored a comprehensive textbook on transactional systems, received the VLDB 10-Year Award for his work on automatic DB tuning, and is one of the creators of the YAGO knowledge base. Weikum is an ACM Fellow and a member of several academies in Europe. He has served on various editorial boards and as PC chair of conferences like ACM SIGMOD, ICDE and CIDR. He received the ACM SIGMOD Contributions Award in 2011 and an ERC Synergy Grant in 2013.