I'm surprised that so much of the discussion around Copilot has centered around licensing rather than this.
You're basically asking a robot that stayed up all night reading a billion lines of questionable source code to go on a massive LSD trip and then use the resulting fever dream to fill in your for loops.
Coming from a hardware background where you often spend 2-8x of your time and money on verification vs. on the actual design, it seems obvious to me that Copilot as implemented today will either not provide any value (best case), will be a net negative (middling case), or will be a net negative, but you won't realize that you've surrounded yourself with a minefield for a few years (worst case).
Having an "autocomplete" that can suggest more lines of code isn't better, it's worse. You still have to read the result, figure out what it's doing, and figure out why it will or will not work. Figuring out that it won't work could be relatively straightforward, as it is today with normal "here's a list of methods" autocomplete. Or it could be spectacularly difficult, as it would be when Copilot decides to regurgitate "fast inverse square root" but with different constants. Do you really think you're going to be able to decipher and debug code like that repeatedly when you're tired? When it's a subtly broken block of code rather than a famous example?
That Easter example looks horrific, but I can absolutely see a tired developer saying "fuck it" and committing it at the end of the day, fully intending to check it later, and then either forgetting or hoping that it won't be a problem rather than ruining the next morning by attempting to look at it again.
I can't imagine ever using it, but I worry about new grads and junior developers thinking that they need to use crap like this because some thought leader praises it as the newest best practice. We already have too much modern development methodology bullshit that takes endless effort to stomp out, but this has the potential to be exceptionally disastrous.
I can't help but think that the product itself must be a PSYOP-like attempt to gaslight the entire industry. It seems so obvious to me that people are going to commit more broken code via Copilot than ever before.
IMHO they built the opposite of what's actually useful for real-world use. Copilot should have been trained to describe what a selected block of code does, not write a block of code from a description. It could be extremely useful when looking at new or under-documented codebases to have an AI that gives you a rough hint as to what some code might be doing. For example if you select some heinous spaghetti code function, press a button, and get a prompt back that says "This code looks like it's parsing HTML using regex (74.2% confidence)" it could be much easier for folks to be productive on big codebases.
No presumably copilot skirted that need by just analyzing the AST of code they host and using the nearby comments to identify what a section of code is meant to do. This would use the same dataset but solve the opposite problem, generate a description from a block of code AST as input.
> copilot skirted that need by just analyzing the AST of code they host and using the nearby comments to identify what a section of code is meant to do.
I'm curious what it spills out for things like "Todo", or "this is probably broken", etc.
I'm not sure I understand how you envision this working, given the underlying technology. You'd have to have a pretty large cache of such analyses to train on, right?
Github has a huge amount of source code and likely for copilot they already had to transform it into an AST to look at comments and nearby code. This would use the same dataset but build the opposite model--input a block of code AST and get a guess as to what the description (i.e. comment) should be for it.
This is the thing that made no sense to me about it as a premise. Doing correct program synthesis is really hard even when you have really opinionated and well-defined models of the domain (e.g. the Termite project for generating Linux device drivers). The domain model for Copilot is somewhere between non-existent to so open-ended (i.e. all the diverse code on Github, et al.) as to be functionally non-existent.
A bare minimum baseline validation check for Copilot would be to see if it provides you code which won't compile in-context. If it will, then that means it's not even taking into account well-specified domain model of your chosen programming language's semantics. Which, upon satisfaction, is still miles away from taking into account the domain of your actual problem that you're using software to solve.
The only place where the approach taken, as-is, makes sense to me is for truly rote boilerplate code. However, that then begs the question... how is this machine learning approach more effective than a targeted heuristic approach already taken by existing IDE tooling, etc.?
FWIW, I don't think any of this is lost on GitHub. I think Copilot is more likely a tremendously marketable half-step and small piece of a larger longer-term strategy unfolding at Microsoft/GitHub to leverage an incredible asset they're holding, i.e... practically everybody's source code. The combination of detailed changelogs, CI results (e.g. GitHub actions), Copilot, and a couple other key pieces makes for a pretty incredible basis for reinforcement learning to multiple ends.
You're basically asking a robot that stayed up all night reading a billion lines of questionable source code to go on a massive LSD trip and then use the resulting fever dream to fill in your for loops.
Coming from a hardware background where you often spend 2-8x of your time and money on verification vs. on the actual design, it seems obvious to me that Copilot as implemented today will either not provide any value (best case), will be a net negative (middling case), or will be a net negative, but you won't realize that you've surrounded yourself with a minefield for a few years (worst case).
Having an "autocomplete" that can suggest more lines of code isn't better, it's worse. You still have to read the result, figure out what it's doing, and figure out why it will or will not work. Figuring out that it won't work could be relatively straightforward, as it is today with normal "here's a list of methods" autocomplete. Or it could be spectacularly difficult, as it would be when Copilot decides to regurgitate "fast inverse square root" but with different constants. Do you really think you're going to be able to decipher and debug code like that repeatedly when you're tired? When it's a subtly broken block of code rather than a famous example?
That Easter example looks horrific, but I can absolutely see a tired developer saying "fuck it" and committing it at the end of the day, fully intending to check it later, and then either forgetting or hoping that it won't be a problem rather than ruining the next morning by attempting to look at it again.
I can't imagine ever using it, but I worry about new grads and junior developers thinking that they need to use crap like this because some thought leader praises it as the newest best practice. We already have too much modern development methodology bullshit that takes endless effort to stomp out, but this has the potential to be exceptionally disastrous.
I can't help but think that the product itself must be a PSYOP-like attempt to gaslight the entire industry. It seems so obvious to me that people are going to commit more broken code via Copilot than ever before.