Parity Bits

Has the Copilot SEO Spam War Begun?

A few days ago I wanted to pull something from an external API for testing purposes. As a modern programmer, I typed

function getSomeImage()

and then paused to let Copilot fill it out. It did great - producing a short, readable function with pixel widths and heights I could adjust if I wanted to, etc etc. What a world.

But wait! I've never heard of the API it's pulling images from. I've encountered a few such services before. I'm sure their names are floating around in my long term memory, and I'm sure this isn't one of them.

An aside - Copilot?

In case you're under a rock, here's my take on what copilot does, practically speaking.

Copilot saves you the trip to the search engine in many cases. "How do I do X with data type Y in language Z" has been searched a billion times in the years since the answer to every possible (and impossible) combination has been so neatly indexed by stack overflow. With copilot, this search is no longer necessary. Type doX(y: Y) in anyfile.zlang and the aggregate of those neatly indexed answers will autocomplete for you. Maybe you'll need to fiddle with it for your specific case, but that was the case with the stack overflow responses as well. It's fast. It makes writing code in unfamiliar languages easy (...at the risk of being more error prone).

Years ago I read a hackernews comment that went something like

what I want in an natural language->code generator is more like a search engine. I should give it prompts, and it returns a list of candidate implementations.

Well, here we are. This is exactly what Copilot does, with two notable exceptions. First, most of your inputs are code rather than natural language. Second, while Copilot can generate a list of candidate implementations, the dominant automatic use case is to display only the top candidate. The proverbial I'm feeling lucky button.

I'm funneled

If I'd started my day with the task of choosing this sort of service, it's hard to guess how much time I might have spent searching, comparing price points, comparing API docs, skimming provided sample code, etc, before finally hitting some individual API with code from my own machine. Definitely more time than none.

But now, almost entirely accidentally, I have working code in my local editor hitting a new-to-me API with the desired result immediately visible to me (it fetched images). This is a huge get for this API.

Intent?

Probably not, but there will be soon. Any company with a billable API has a huge interest in being the first Copilot result, and in being the first result on the widest net of Copilot prompts.

Sound familiar?

Copilot does what it does in order to weigh the relative importance of various inputs. I don't know anything about it, but my naive assumptions are that it likes code that

  1. it sees a lot
  2. is well liked
  3. is written by celebrated accounts
  4. is imported by lots of things

Gaming each in turn:

  1. Spam github with lots of code, from lots of accounts, that hits your API.
  2. Luckily there's a preexisting black market for github stars.
  3. Just how much would it cost to have gh/torvalds publish code that hits my API? Businesses - are you ready for pitches from self-styled codefluencers who would be willing to expose your APIs to their fans (if you'll just omit those pesky prices)?
  4. Spin up lots of projects that take your API using code as dependencies. Wash trading for package management.

Thankfully, and in the nick of time, tools like Copilot & GPT4 will make most of these operations easy to automate at scale.

Worse still is the thought of spam code that incorrectly incorporates your competitor's APIs, so that Copilot's suggestions relying on it are flaky.

What follows?

Microsoft probably doesn't want github to become the battleground for a new SEO spam war. Horrifyingly, one alternative is to offer a more attractive path to any companies looking to spend their marketing budgets juicing their suggestion results: sponsored results.

Uugghhhhhh.

(Come to think of it, sponsored results could plausibly have been part of Microsoft's play with Copilot from day one.)

Credit

The excellent Who Owns the Future has a bit about how technologists, over and over, invent compute platforms which analyze lots of human behaviour and make decisions about which humans get rich. The humans under this analysis, of course, start to game the decision processes of the compute platform. The technologists are shocked by this every time.

The unicode exclamation point in paragraph one was my moment of shock in this retelling of the story.