The technology behind WolframAlpha is truly incredible. It's likely one of the m...

jnazario · on Oct 17, 2013

yes and no. it's a tremendous amount of work to organize information and build associations between what you ask and what you get, although it certainly has a lot of gaps. it's also a neat way to look at how to combine information.

that said it's within grasp of many of us: natural language interfaces (NLI) and SPARQL databases and endpoints. have a look at this semanticweb q&a:

http://answers.semanticweb.com/questions/12747/natural-langu...

some good links in there. basically find your SPARQL endpoints, have a list of synonyms mapped between your inputs (which you parse with NLP tools like weka or the stanford parser, or even python's nltk) and map your query to your ontologie(s) from your endpoints. then try successive answers.

a good, simple interface to play around with that is quepy:

http://quepy.machinalis.com/

a few others exist.

hope that helps. despite challenges in the adoption rates of the semantic web, i think it's the future of information retrieval because it makes sense for us as users and truly organizes information.

taliesinb · on Oct 17, 2013

Some of our stuff is simple database lookup (ala Google knowledge graph), other stuff is more algorithmic and computational in nature.

The problem we've had with SPARQL and co is that we feel it isn't optimized for computational queries. Ontologies don't matter as much in that case, and inference in tuple stores costs you significantly in performance, although the technology is improving.

As often as not, however, the computationally irreducible work lies in making a domain suitable for computational consumption, not in the technology used for representation.

To analogize, UTF-8 is great, but without the notion of Unicode code points it wouldn't exist.

luikore · on Oct 17, 2013

Do you have any reference of wolframalpha using SPARQL? I don't think they are using similar things.

Helianthus · on Oct 17, 2013

... It's a parlor trick.

The computational complexity going on is minimal compared to the effort it takes to create and maintain the dataset. Accessing it with a formal query language is then straightforward.

The parlor trick is making the formal query language seem as informal and colloquial as possible. But fiddle with breaking it and you'll see you're just running searches on very specific and very specifically tagged data.

Edit: Having thought about it more, the reason it's 'underutilied' is that it's actually not that useful. Your knowledge of its dataset is more important than its ability to provide it for you--witness the person below who knew that it could provide nutrition information.

Your index on its information is better than its index, in other words.