> So, besides the plug of my work (sorry) I would say that the earlier paradigm ...

YeGoblynQueenne · on Nov 18, 2019

aⁿbⁿ is a context-free language, not regular. The fact that it's simple is exactly the point: it is so simple, yet neural nets cannot learn it. They can only learn limited subsets of it and even those, from tens of thousands of examples. I showed how the same language can be learned in full from three examples. It's a task in which neural nets, despite all their successes that you cite, fail dismally. You dismiss it easily, but it's an ebmarrassing failure and it makes clear the weaknesses of the connectionist paradigm.

Despite myself, I am still surprised by the unwillingness of connectionists to admit this simple fact: neural nets generalise very poorly and have absolutely awful sample efficiency.

As to more complex, real-world tasks, this is a recent example on a language task using the same algorithm (but a different implementation):

Bias reformulation for one-shot function induction

https://dspace.mit.edu/handle/1721.1/102524

There's more of that in the literature.

YeGoblynQueenne · on Nov 19, 2019

So I think I really screwed this one up, by assuming a common background. Your comment about aⁿbⁿ being a "simple regular grammar" stuck with me and I realise that we do not have any common background and I should have done a better job of putting my example in context.

It was my mistake to agree that aⁿbⁿ is a "simple" grammar. It looks simple and it's simple for a human.

But, aⁿbⁿ is a context-free language and as such it can only be learned "in the limit", i.e. by a number of positive and negative examples approaching infinity. It cannot be learned by positive examples only, not even in the limit. This is part of Gold's result [1], a famous result from Inductive Inference, the field that essentially predated Computational Learning Theory. Gold's result was that anything more complex than a finite language can only be learned in the limit and anything above regular languages requires both positive and negative examples.

Gold's result had a profound effect on machine leargning [2] and was a direct cause of the paradigm shift that came with Leslie Valiant's paper that introduced PAC Learning [3]. Valiant's paper placed machine learning on a new theoretical foundation where it's acceptable to learn approximately, under some assumptions (particularly, distributional consistency between examples and true theory). This is how machine learning works today and this is why machine learning results are accepted in scholarly articles, not because they are useful in the industry or make nice articles in the popular press.

Where aⁿbⁿ comes into all this is that it's one of a group of formal languages that are routinely used to assess machine learning algorithms. Keeping in mind that grammars for CFLs are impossible to learn precisely from finite examples, the point of the exercise is to show that some learning algorithm can learn a good approximation that generalises well from a small number of examples.

This is how formal languages are used in neural networks research also. For a recent example from the literature see [4].

The example in my comment shows instead that our algorithm can learn aⁿbⁿ precisely, not approximately, from only three positive examples. This should be surprising, to say the least. Gold's result says that this should not be possible, at all, ever, until hell freezes over. Why it is possible in the first place is another long discussion that I was hoping to have the chance to make here, but instead I allowed myself to be drawn into an adversarial exchange I should have avoided.

To put an end to it: neural nets have produced some spectacular results, and I don't want to discount them, but it is important to understand what those results mean, in the grand scheme of things. The fact that there is a machine learning paradigm that outperforms neural nets on a very hard problem (simple as it may look) means that the same paradigm may outperform neural nets on those other things that they do so well, like image recognition and speech recognition. Or it may just mean that different learning paradigms have different strengths that must somehow be combined (a more popular view for sure).

Anyway, sorry to spread noise and confusion in this thread when my goal was to do the opposite. I have a rubbish way of communicating.

_____________________

[1] Language identification in the limit:

https://www.rand.org/pubs/research_memoranda/RM4136.html

[2] And a few other fields besides. Noam Chomsky used Gold's result to argue for a Universal Grammar enabling humans to learn natural languages. This all is really groundbreaking stuff and a real shame that it's not more widely known in machine learning circles.

[3] A theory of the learnable:

https://web.mit.edu/6.435/www/Valiant84.pdf

[4] LSTM networks can learn dynamic counting:

https://arxiv.org/abs/1906.03648