Last year, Reuters acquired text analytics company ClearForest.
They recently launched a new free semantic web service, based on ClearForest technology, named OpenCalais. This service helps to extract from a submitted text (web content for example), entities. And, last but not least, the service returns all these extracted concepts as an RDF graph. So using this service and browsing this graph, you can automatically tag any unstructured content (with RDFa for example), provide enhanced search functionalities based on the semantic (if you have a good knowledge of the used ontology), etc…
See below a short example: I submitted a text found on the web to this service through this web page, then I queried the returned RDF graph using this RDF graph visualization tool and a pretty simple SPARQL like query, to retrieve all what was identified as a “Company”. Well, it could be best if all the found companies were linked by something else than their common type, for example an “acquired” relationship, but it’s already a good start.

Original Plain Text

March 16, 2004 (Computerworld) — Enterprise content management vendor Documentum Inc. has acquired a one-step content integration product line from Xerox Corp. and today unveiled a new “virtual repository” for improved organization of stored data.
In an announcement, the Pleasanton, Calif.-based company said its new Documentum Virtual Repository will allow companies to organize and store a wide range of internal and external information that will be easy to retrieve for use. The repository will allow aggregation for automated and scheduled content collection from multiple sources and will make the information available to others in compatible formats.
The new feature will be available early in the second quarter.
In a related move, Documentum acquired the AskOnce business unit of Xerox for an undisclosed price. AskOnce is a secure enterprise content integration product that searches multiple repositories and data types using a single query. AskOnce relies on a uniform query interface to connect it to existing database, document repository, Internet, corporate intranet or e-mail applications.
Financial details of the transaction weren’t disclosed.
“With the Documentum Virtual Repository solution, companies will be able to control all of their content — internal and external, structured and unstructured — regardless of where it resides,” Dave DeWalt, president of the Documentum division of EMC Corp., said in a statement.
“Most enterprises have limited knowledge of the content scattered throughout their organizations — on employee desktops, internal and external networks, Web sites and portals, or in data archives. There’s a great need in the market for technology that helps companies manage all of this content — especially with the intense public scrutiny of both government agencies and public companies.”

All identified entities

entities2.jpg

Tagged HTML sample

March 16, 2004 (Computerworld) — Enterprise content management vendor Documentum Inc. has acquired a one-step content integration product line from Xerox Corp. and today unveiled a new “virtual repository” for improved organization of stored data. In an announcement, the Pleasanton, Calif. -based company

Global RDF graph

globalgraph.jpg

RDQL (SPARQL like) query : What is identified as a “Company”?

SELECT ?subject ?predicate ?object WHERE
(?subject rdf:type <http://s.opencalais.com/1/type/em/e/Company>)
(?subject ?predicate ?object)

RDF graph / Query result

subgraph.jpg



3 Responses to “Semantic food for free”  

  1. Bruno:

    Thanks for noticing Calais. Great to see people trying it out.

    Try working with an article that actually discusses an acquisition or other type of equity transaction – and submitting it directly to the Calais service – perhaps through the command line tool.

    The web submission tool you used is helpful – but it doesn’t expose all the facts and events – like acquisitions, etc – that the service supports.

    Also – we’ll be deploying a web-based tool in the next week or so that fully supports entities, facts and events. This will help give a much better picture of the full set of Calais capabilities.

  2. Bruno:

    A quick follow up note. We’ve released a web-based submission tool that allows you to explore not just the entities extracted – but the events and facts as well. It’s hosted at http://sws.clearforest.com/calaisviewer. This tool will give you a much better view of the full Calais capabilities.

    Regards,

  3. Thank you.
    I will give it a try soon!

    Bruno


Leave a Reply