tammet's comments

tammet · on Jan 26, 2021

The whole field has been dominated by research, i.e. the wish to make simple things complicated (in order to publish papers) as opposed to engineering, i.e. making complicated things simple (in order to produce usable software efficiently). As a result the standards are horrendously - and needlessly - complicated. The few major practical outcomes like the schema.org, json-ld and the google annotation system, are results of engineering, not research. Alas, json-ld has also taken a turn towards hypercomplexities.

huskyr · on Jan 26, 2021

Yeah, this is an unfortunate consequence of having the whole ecosystem mostly within academia, including the lack of tutorials and proper documentation (e.g. not a 500 page standard).

IMO the most interesting place right now for semantic web development is Wikidata. It's still pretty difficult for newcomers to contribute (as is the case for all Wikimedia projects) but at least it has many eyeballs and a very active community / ecosystem.

ivansavz · on Jan 28, 2021

+1 for WIKIDATA

There are lots of useful WIKIDATA links and demos on this page: https://www.wikidata.org/wiki/User:Daniel_Mietchen/FSCI_2017...

krallistic · on Jan 26, 2021

Maybe a good indicator that there is only minor (industry) need/benefit. The "biggest" Knowledge Graph is Google, but it is unclear, how much there is actually Semantic Web and how much search, ML, NLP etc..

They are all nice ideas, but the practical usecases are rare. I am skeptical of the often touted usecase in Medicine/Drug Interactions. The only time i saw it in the industry, it was not really used by the lab technicians. Because all questions the system could answer, were trivial. The promise of "the system can inference new combinations/interactions" was never fulfilled.

cheph · on Jan 26, 2021

> The "biggest" Knowledge Graph is Google, but it is unclear, how much there is actually Semantic Web and how much search, ML, NLP etc..

The second biggest is possibly WikiData, and it is not that small.

As to the practical use cases, there are many, but it is the premier way of encoding metadata for search engines: https://schema.org/docs/about.html

And the amount of datasets and ontologies that exist is quite vast:

- https://lod-cloud.net/dataset

- http://obofoundry.org/

- https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_M...

I would like to understand what other options you would consider better for these datasets, for the metadata and for the ontologies?

I mean if not RDF for web metadata then what? If not semantic web for UK govt data (https://ukparliament.github.io/ontologies/, https://opendatacommunities.org/data_home, https://ckan.publishing.service.gov.uk/dataset?res_format=RD..., https://ckan.publishing.service.gov.uk/dataset?res_format=SP...) then what?

It would be nice to have something even better, but I much prefer RDF to a bunch of CSV files.

namedgraph · on Jan 26, 2021

Explain this Knowledge Graph usage by Fortune 500 companies then: http://sparql.club/

breck · on Jan 26, 2021

I agree, the research is overly complicated.

So it's a lot of extra work to sift through, but I've found a lot of gold in there.

If you're looking for a simple, noise-free way to do the semantic web, I'm very confident that Tree Notation will enable it (https://treenotation.org/).

I've played around a bit with turning Schema.org into a Tree Language, and think that would be a fruitful exercise, but plenty more on the plate first.

FWIW I've pitched this concept to W3C for 4 or 5 years to no avail yet. I think though if someone can put together a decent prototype the idea might start clicking.

Imagine a noise free way to encode the semantic web with natural 3-d positional semantics. Could be cool!

ta988 · on Jan 27, 2021

It is unclear to me what it would achieve compared to a spog (subject, predicate, object, graph) based representation like it exists in RDF based triplestores.

breck · on Jan 27, 2021

Yes you are right. Semantic triplets are great. I think the semantics are largely the same. Here's my work in progress argument for why this is relevant.

My take with ontologies is building consensus is hard.

Tree Notation offers a solution to the problem of: what should we agree on for the encoding? I assume that simpler is better, all else being equal. Then Tree Notation is the simplest, in terms of the thing with the fewest pieces(tokens).

To get to Tree Notation, nothing was added, only stripped. I started with an existing notation and stripped away each visible syntax token that wasn't needed. Surprisingly, not one is needed. Not one quote, parens, bracket, colon, etc.

So now if we can get consensus around going with the simplest thing, we have got a way to agree on an whether we should use XML, JSON-LD, turtle, etc. The simplest thing works (which would be Tree Notation, or a close relative—someone can rebrand the notation but the idea is largely the same). This does not suffer from the 927 problem, as there are a few classes of things where we do have 1 new language that is mathematically superior and of a different kind than others (binary notation, for example).

So after you have agreement on that encoding, versioning and forking and merging schemas is dead simple (just use Git—in Tree Notation all changes are semantic and noise free).

So now we've solved what encoding to use for our ontologies, and we have a very fast and efficient way to collaborate on them (it's just plain text and git).

That brings us to a third advantage which is more theoretical. Tree Notation maps words/nodes to a 3-D representation. This means that there would be an X-Y-Z isomorphism with an ontology and the real world. I don't really know where we go from there, but at least by this point we've moved the semantic web idea a lot further and can start looking at the next realm of possibilities.

tammet · on Nov 6, 2019

A simple demo of the algorithm described in the article: http://dijkstra.cs.ttu.ee/~tammet/oligarch.html

tammet · on Sept 22, 2018

Similar to https://guardtime.com/technology

tammet · on Aug 20, 2017

A similar Scheme-to-C project I wrote last century for the SCM scheme, should be still usable: http://people.csail.mit.edu/jaffer/hobbit.pdf

One goal of the project was to produce human-readable C.

tammet · on March 27, 2015

Yep, a bug has creeped into a preprocessor. Debugging right now :) thx!

drostie · on March 27, 2015

Glad to help. Sorry if I was overly negative; things like this are actually pretty darn cool.

tammet · on March 27, 2015

Fixed it: a classic js error of writing != instead of !== . Thx for noticing! Shift-reload to empty cache of the old js file :)

tammet · on Jan 16, 2014

Eiffel tower photo locations are spread over a fairly wide area, while Moulin Rouge photos are taken in a relatively small spot, hence more intense.

brc · on Jan 16, 2014

A very good point. The Eiffel tower photos will only show up when people are ironically taking photos from the tower, instead of the tower. It's much too big to get a good photo of with normal point and shoot lenses. There will be related hotspots such as from across the river, but lots of random photos on the streets will be of the tower from different angles.

Whereas the Moulin Rouge is tucked into a street, and that street is the only place you can snap it.

A great innovation in camera tech would be to tag the 'main' object being photographed if it is more than xm away from the camera itself. Using GPS for direction, and using some type of algorithm for detecting the main image, I'm sure you could get a decent approximation of the target as well as the shooters location. That would make for some interesting data and you'd be able to sort on landscape vs portrait photos just by examining the GPS data.

tammet · on Jan 16, 2014

The marked locations in high-res areas like cities are based on both most popular wikipedia articles and most popular foursquare locations in the intense area. Quite likely Alex's place is one of these highly popular foursquare locations.

tammet · on Jan 16, 2014

The "all sizes" selection has a similar effect: take "small", "tiny" or "remote" to put markers on sparsely populated locations only.

tammet · on Oct 25, 2013

One of the authors here. A few answers quickly. It does write to disk: you can either dump memory or write all changes to log (turn it on/off yourself). Sure it has a global read/write lock, with several locking strategies to select from (task-fair atomic spinlock queue or a reader-preference or a writer-preference spinlock). It is definitely meant to be a simple library. We strived to document it carefully to make usage as easy as possible. Yes, you can very easily form lists, trees or any other pointer structures. Happy to see it on the Hacker News, we never really expected that :)

bch · on Oct 25, 2013

Congratulations on your project and the attention it's getting!

Can you explain-more/rationalize GPLv3 licensing w/ the conditional alternate commercial license?

Why not just BSD, MIT, or (at least) LGPL ?

GPL is more understandable on "higher level software" (ie: complete applications), but I don't understand your intent licensing a library this way.

belorn · on Oct 25, 2013

The license page is quite clear why. The authors want that applications that are distributed and marketed as database systems to be used by other developers to be under GPLv3.

You might ask why they want that, and that could be an interesting read. My best guess: They are themselves developers.

tammet · on Oct 25, 2013

Making a clean cut between free-as-in-speech on one hand and free-as-in-beer on the other.

mbreese · on Oct 25, 2013

I appreciate the sentiment, but I was looking forward to using this until I saw that part. I (like I assume many others) cannot use a GPL3 library (and I'm in academia). If you want any sort of traction for a library, GPL3 is not the way to go.

This is why the LGPL was created, so that you can have modifications done on your library be free-as-in-speech, but still make the library as a whole useable for a wide variety of other projects, including closed-source versions.

Having a separate requirement to email you for a free-as-in-beer license is just overly complicated for this. The more hurdles you put up for people, the fewer that will adapt the library. I think that licensing is one of those cases where is doesn't pay to be clever. Plus, what happens when you decide to stop maintaining the code? Do you want to keep getting emails for licenses years from now?

Edit: in last paragraph, I said free-as-in-speech, but meant beer (see comment below).

tammet · on Oct 25, 2013

The default GPL is free-as-in-speech. You do not have to email for GPL. You have to email for free-as-in-beer. I assume that in case free-as-in-speech is not OK, it is also not a major hurdle to email for the free-as-in-beer version. In case emailing is a major hurdle, maybe you do not really need the free beer part.

Should we stop maintaining the code or get bored mailing free beer licences, we'll very likely change the licence to LGPL or MIT. Until then beer comes via email.

bch · on Oct 25, 2013

In the case of changing licenses, make sure over the course of your project maintainorship that you have the right to relicense all the code, including patches/contributions from others.

I wish it was simply licensed MIT or BSD, but congratulations on your software and sticking to your convictions.

:)

belorn · on Oct 25, 2013

> I (like I assume many others) cannot use a GPL3 library (and I'm in academia).

Is it copyleft in general, or the patent grant that hinders your work in in Academia? I not sure why you should be using other peoples work for free, but then go around and sue anyone who copies or improves on your work.

The project wrote down exactly what they wanted to do with their work on their license page. I say good for them. More people should do so and think what they themselves want.

mbreese · on Oct 26, 2013

I've had academic licensing offices balk at the GPL. I've had my fights with people over this, and lost. There are some specific clauses that they didn't like (this was GPL2). However, they rarely have problems with MIT/BSD licenses, so in general, that's what I try to use.

My stance is that since they did the work, the authors of the library can license it however they'd like. But, if they wanted to get more people using their library, I think that they should rethink their approach. LGPL is more appropriate for a library, where you can still have your copyleft approach for the code you wrote, while still promoting wider use.

Here's an extreme edge case... as they said, if they get tired of supporting the email to get a free-as-in-beer license, they will just open it up with an MIT/BSD style license and be done with it. That's great. But what if someone gets hit by a bus? Or someone leaves the project and moves to Antarctica? There would be no practical way to release an unencumbered version.

Really though, they can do what they want - it's their code. But licensing is one of those areas that you really shouldn't try to be clever.

teddyh · on Oct 27, 2013

> But, if they wanted to get more people using their library, I think that they should rethink their approach.

They actually don’t want as many people as possible using their library. That is not the goal when choosing the GPL. The goal is to maximize the number of free users in the world – that is, users who have the freedoms which define Free Software. Mere users is inconsequential. If users is what you desire, then by all means, choose a permissive license (MIT/BSD/etc).

tammet · on Oct 25, 2013

I'd just make a remark that even in the GPL world everything is not as simple as it looks. There are GPL versions with _exceptions_ endorsed by RMS, for example. A long time ago I used to work on a Hobbit scheme compiler for the scm interpreter, which was promoted by RMS and became Guile later. scm had such a GPL-with-exceptions clause by RMS, which was stated clearly incorrectly. I take every chance to boast that I convinced RMS to fix the error in his own GPL version for scm :)

bitdiddle · on Oct 26, 2013

Yes, there's a lot of subtlety to licensing. Personally I think it should be taught in computer science schools. Open source software has really changed the dynamics of corporations. Of course we wouldn't have the open source movement without free software and imho free software is more important than ever. You seemed to have struck a nice balance with this exception that protects your interests in the database space.

And yes, getting RMS to change something is quite the accomplishment :) His ability to walk the talk is impressive.

otterley · on Oct 25, 2013

Why is shared memory better than mmap(2) here? With the latter, you get persistence for free.

danjayh · on Oct 25, 2013

Is there information available anywhere on the required resources? Specifically:

1) How much space does the compiled code require? Can conditional compilation be used to omit unused features?

2) What is the overhead for the various data structures?

I'm thinking that it might be interesting to use this on very limited environments (PIC microcontrollers for example) where every byte matters.

tammet · on Oct 25, 2013

Conditional compilation can be quite certainly used. The best way to find out the space requirements is to try out some of the examples provided. I cannot give the overhead exactly, but it can be read from the source with not too much effort. Send an email to tanel.tammet at gmail.com if you need help with that. In broad terms, we have been very careful with using memory, both for the reasons you state and the reason of getting more bang from the cache.

tammet · on Sept 7, 2013

check http://www.sightsmap.com : this shows the wikipedia locations and more, geared to finding places of interest