The Semantic Web, a concept tossed around for years as a Web extension to make it easier to find and group information, is getting a critical boost Tuesday from the World Wide Web Consortium (W3C).
W3C will announce publication of SPARQL (pronounced "sparkle") query technology, a Semantic Web component enabling people to focus on what they want to know rather than on the database technology or data format used to store data, W3C said.
The potential of the Semantic Web cannot be underestimated. By scanning the Web on behalf of users, even Google's ad-based business model could be impacted, an analyst said.
SPARQL queries express high-levels goals and are easier to extend to unanticipated data sources. The technology overcomes limitations of local searches and single formats, according to W3C.
"[SPARQL is] the query language and protocol for the Semantic Web," said Lee Feigenbaum, chair of the RDF (Resource Description Framework) Data Access Working Group at W3C, which is responsible for SPARQL.
Already available in 14 known implementations, SPARQL is designed to be used at the scale of the Web to allow queries over distributed data sources independent of format. It also can be used for mashing up Web 2.0 data.
The Semantic Web, the W3C said, is intended to enable sharing, merging, and reusing of data globally. "The basic idea of the Semantic Web is take the idea of the Web, which is effectively a linked set of documents around the world, and apply it to data," Feigenbaum said.
"One way to think about the Semantic Web is the Web as one big database," said Ian Jacobs, W3C representative. A database, he said, enables querying and manipulation of data. More database-like Web sites are emerging, he said.
Comparing the Semantic Web to search sites such as Google, Jacobs said Google allows for searching through document text, essentially. The Semantic Web, meanwhile, allows for automation and combining of data, he said.
While the Semantic Web concept has been talked about for several years, Feigenbaum believes momentum is building. He cited DBpedia, which extracts structured information form Wikipedia, as an example of a Web site based on the Semantic Web.
With the Semantic Web's ability to hone in on just the information a user needs, companies based on a Web search advertising model such as Google may have to reconsider their plans, said analyst Jonas Lamis, executive director of SciVestor.
"They may need to rethink their business model because if I have an agent that acts on my behalf and finds things that are interesting for me, it's not necessarily going to be reading Google ads to do that," Lamis said.
The goal of the Semantic Web is to serve as a giant set of databases that can be integrated, Jacobs said. The Semantic Web has seen a lot of uptake in the health care and life sciences, he said. The drug discovery and pharmaceutical fields can use it to take clinical results and learn from data, according to Jacobs.
At pharmaceuticals company Eli Lilly, Semantic Web technology is being used for research.
"We're using it for our targeted assessment tools, which helps us to find out as much information or find out lots of information about drug targets of interest," said Susie Stephens, principal research scientist at Eli Lilly and chair of the W3C Semantic Web Education and Outreach Working Group. A drug target is a protein in the body that is to be modified with a particular drug.
"We use Semantic Web technologies to help us link to lots of information about the drug targets," she said.
The SPARQL specification works with other W3C Semantic Web technologies. These include: RDF, for representing data; RDF Schema; Web Ontology Language (OWL) for building vocabularies; and Gleaning Resource Descriptions from Dialects of Languages (GRDDL) for automatic extraction of Semantic Web data from documents.
SPARQL also can use other W3C standards such as WSDL.
The W3C RDF Data Access Working Group has produced three SPARQL recommendations being issued Tuesday: the SPARQL Query Language for RDF; SPARQL Protocol for RDF; and SPARQL Query Results for XML Format.
Participants in the working group include persons from companies such as Agfa-Gevaert, HP, IBM, Matsushita, and Oracle. W3C released statements of support from numerous parties, including HP and Oracle.
"SPARQL is a key element for integrated information access across information silos and across business boundaries. HP customers can benefit from better information utilization by employing semantic Web technologies," said Jean-Luc Chatelain, CTO of HP Software Information management, in the company's statement.
"HP's Jena Semantic Web framework has a complete implementation of query language, protocol, and result set processing," Chatelain said.
"As an active participant in this working group, Oracle believes the standardization of SPARQL will play an instrumental role in achieving the vision of the Semantic Web," said Don Deutsch, Oracle vice president of Standards Strategy and Oracle, in Oracle's statement.
Paul Krill is editor at large at InfoWorld.