SemTechBiz 2012


I attended SemTechBiz 2012 in San Francisco last week. This annual conference on semantic technology, which is in its eight year, does a nice job in balancing the interests of research vs. commercial communities. This year the conference was tilted towards commercial vendor interests after all the vendors do sponsor the event although the product pitches were confined to a clearly identified solutions track. Here are my semantic annotations about this semantic technology conference.

Given our focus on open source platforms, I enjoyed the session on wikis and semantics. In this session, Joel Natividad of Ontodia gave an overview of NYFacets - a crowd knowing solution built with Semantic Mediawiki. Ontodia's site won the NYC BigApps - a contest started by Bloomberg as part of his grand plan to make NYC the capital of the digital world. NYFacets has a semantic data dictionary with 3.5M facts. Ontodia's vision is to socialize data conversations about data, and eventually build NYCpedia. I wondered why public libraries don't take this idea and run with it: Bostonpedia by Boston Public Library, Concordpedia by Concord Public Library and so on.

Stephen Larson gave an overview of NeuroLex - a periodic table of elements for neuroscience built with SMW under the NIF program. They built a master table of neurons and exposed as a SPARQL end point with rows consisting of 270 neuron classes, and columns consisting of 30 properties. NeuroLex demonstrates the value of a shared ontology for neuroscience by representing knowledge in a machine understandable form.

In the session - Wikipedia’s Next Big Thing, Denny Vrandecic, Wikimedia Deutschland gave an overview of Wiki Data project, which addresses the manual maintenance deficiencies of Wikipedia by bringing a number of the Semantic Mediawiki features to its fold. For instance, all info boxes in Wikipedia will become a semantic form stored in a central repository eliminating the need for maintaining the same content duplicated on many pages of Wikipedia. Semantic search capability will also come to Wikipedia to the applause of folks who maintain Wikipedia list of lists, list of lists of lists by replacing these manually maintained huge lists with a single semantic query. One of the novelties of Wikidata that it will be a secondary database of referenced sources for every fact. For instance, if one source says the population is 4.5M while another says 4,449,000, each source will be listed in the database, thus enabling a belief based inference.

It was nice to see several evangelists of linked data from the government sector at the conference. Dennis Wisnosky, and Jonathan Underly of the U.S. Department of Defense gave a nice overview of EIW Enterprise Information Web. It was refreshing to hear that DoD is looking at linked data as a cost reduction driver. Given the Cloud First mandate of the Defense Authorization Act 2012, the importance of semantic technology in the government will accelerate. In another session, Steve Harris of Garlik, now part of Experian gave an overview of Garlik DataPatrol - a semantic store of fraudulent activities for finance. I could not help wonder if someone from the Department of Homeland Defense was in attendance to hear the details of this application. Steve found no need for complex ontologies, reasoning, and NLP in this large scale application, which records about 400M instances of personal information (e.g. Social Security Number mentioned an IRC channel) every day.

Matthew Perry, and Xavier Lopez of Oracle gave an overview of OGC GeoSPARQL Standard, which aims to support representing and querying geospatial data on the Semantic Web. GeoSPARQL defines a vocabulary such as union, intersection, buffer, polygon, line, point for representing geospatial data in RDF, and it defines an extension to the SPARQL query language for processing geospatial data using distance, buffer, convex hull, intersection, union,envelope, and boundary functions.

Linked data being essentially about the plumbing of semantic infrastructure, it is hard to give engaging presentations on this topic. Two presentations bucked this trend. The presentation by Mateja Verlic from the Slovenian startup Zemanta rocked. Zemanta developed a DBpedia extension - LODGrefine for Google Refine under the LOD2 program. Google Refine supports large transformations of open source data sources, and LODGrefine exposes Refine results as a SPARQL endpoint. Mateja managed to give two impressive live demoes in ten minutes. The other rock star presentation was by Bart van Leeuwen - a professional firefighter, on Real-time Emergency Response Using Semantic Web Technology. Everyone in attendance got the gist of how FIREbrary - a linked data library for fire response, can help firefighters in the real world with a presentation sprinkled with live videos of fire emergency responses. It was instructive to see how semantic technology can make a difference in managing extreme events such as a chemical fire as there are no plans by definition for these types of events.

Bringing good user interface design practices to linked data enabled applications was another theme of the conference. Christian Doegl of Uma gave a demo of Semantic Skin, which is a whole wall interactive visualization driven by semantics. Siemens used it to build an identity map of their company. It uses Intel Audience Impression Metrics Suite to detect the age gender, etc. of the person walking in front of the wall for personalization of content driven by semantics. Pretty cool stuff.