Euravox is not an AI-first company.

In a rush toward AI-in-everything, I have taken the decision to go a different route with Euravox. This is why.

Euravox Logo

But but but... Not jumping in with both feet to generative AI is heresy!

In the current business climate, this is tantamount to sacrilege. How could any company, let alone a hopeful startup, take such a decision?

I have my reasons, and they are difficult to refute. I'll take them in turn, then sum up at the end.

Cost

This might be surprising, because the hype says that AI is supposed to be cutting everyone's costs, isn't it? Actually... not so much. Lets make the assumption that every word posted by users to Euravox is going to be passed through (at least) one LLM. Let's assume that we scale to about a million active users, and that each user on average posts about 100 words in total each day. This equates to 100 million words per day. Let's say 200 million tokens per day to take into account encoding inefficiencies and the need to add prompts, take output, etc. This equates to needing about 2250 tokens per second (without headroom to cope with bursts, and this will be bursty traffic). On OpenAI's pricing, that's roughly $350/day, or $128k/year. Not impossible, but not cheap. Gemini would be about $800/day, or a hair under $300k/year. Bearing in mind that OpenAI is almost certainly running at a loss here, this pricing isn't sustainable, and could increase by an order of magnitude.

Running in-house hardware would, to get close to like for like, need to be running something like a Llama 3 405B model, which gets about 142 tokens/sec on an NVIDIA H200. It would require 16 to meet the average token rate, so probably 32 to give some headroom. If you can even get one of those things as a non-hyperscaler, this would be well over $1M factoring in the cost of the machines and other infra needed to run them. Economically, this would break even after about 3 years, so isn't a terrible investment if you have that money to burn up front.

None of these things are happy-making.

Electricity usage and environmental impact

Based on a public estimate for OpenAI of 0.0001 kWh per token, processing 200 million tokens per day would use 20,000 kWh per day, equivalent to about 1,738 average households. I'm not going to try to estimate other environmental impacts, because there aren't good numbers available, but the idea of being responsible for more power usage by a factor of about 8 than the entire small town I live in doesn't sit well with me.

Customers just don't want it

Time and time again, talking to potential future users of the Euravox platform, the one question I've been consistently asked is, "will you fill the user interface with AI bullshit, or allow AI slop on the platform?"

Though I'm also working toward many other differentiating features, this is by far the most commonly mentioned. The people I've talked to hate generative AI, and don't want it. They don't want to use it, and they don't want to have to wade through slop generated by it.

Many people have told me that they don't want their content used to train gen AI, and are concerned about our platform's ethics as regards gen AI. They are currently stuck with legacy platforms whose ethics frankly disgust them.

It's never a bad idea to listen to what your customers are telling you.

Why would I use it, anyway?

Certainly not for generating content or creating our own internal bots. It's hard to imagine anything that would annoy our users more, or that would be more likely to have them never sign up in the first place.

The one – actually good – reason to use AI in social media is trust and safety. This encompasses moderation, but is a little more general than that in scope. Most social media platforms restrict content that would be damaging to users. In the EU, this is actually a legal requirement. Platforms define what they mean by harmful content differently – the US platforms are now mostly perfectly OK with fascistic content and even content that advocates for causing harm to certain minorities. We are not intending to allow such content. Almost all platforms do block CSAM (child sexual abuse material) – this isn't optional in most jurisdictions around the world, and rightly so.

The difficult part is providing trust and safety at scale. Going back to the theoretical 1M users posting once a day, it would require 10,000 human moderators handling 100 messages a day each in order to keep up, and this is just practially and financially unviable. Moreover, it's a not talked about but actually widely known inside the industry problem that exposing human moderators to CSAM and other really bad material damages them psychologically. In practice, this means that moderators should only be allowed to do that work for a few months at most in a lifetime. The only practical alternative here is machine learning.

Note that I used the term machine learning, not generative AI. I did that for a reason. Generative AI gets the news column inches, but it's only a latecomer in the machine learning world, and by no means the be all and end all. In fact, it's not actually particularly effective at trust and safety tasks – it tends to have a high failure rate, with too many false positives and false negatives, both for text and image classification, to be genuinely useful. There are older techniques that actually work better for this specific use case, not based on transformers, that are capable of getting a far lower error rate at a tiny fraction of the computational cost. These are, fortunately, small models that don't need huge, expensive NVIDIA datacenter-grade cards, and often get acceptable performance just runing on a CPU or (much smaller) integrated GPU.

It's fair to point out that text classification is also extremely useful for helping to understand a user's interests. This tends to be necessary for a few reasons – showing them ads that are related to their interests is a big one, as is helping them to discover people, groups and posts (and, at some point, other media) that is likely to be interesting to them. This would be the biggest use case for gen AI (and was the basis of the numbers that I showed in the first section above), but again, practical experiments have shown it to not be particularly effective, with far lighter weight, lower CPU/GPU, lower power approaches with negligible environmental impact being far preferable.

Query-by-Example – old-skool algorithmic magic to the rescue?

I have a personal history of having worked on text retrieval, back in the day. Pre-Google Search, back at a time when the internet existed but the web had just been invented. Since this was right in the middle of the original AI winter, this was the way to do interesting things with natural language at scale.

The system I developed, back in the mid-90s, was orginally aimed at being the backend for an email client that would give very fast search capabilities. Web search engines weren't a thing yet. The email client never got implemented, but the code did end up back-ending a real-time financial news and price distribution system. Frankly, it was way ahead of its time. Way too far ahead to actually be successful as a product. I still own the IP, I still have the code sitting in an archive somewhere. I'd imagine that it would still work fine if I hacked it to work on modern hardware.

One of the things that system could do was query by example. This isn't a very well known technique these days, but in effect what it means is doing text retrieval by giving the search engine a piece of text, and having it return a number of similar texts, ranked for similarity.

I've implemented basically the same algorithm in the Euravox back-end. Not exactly the same – it's decades later and I've learned a few things in the time since then – but I've built something very similar. It's much faster than the old system, which would take about 200ms to index a new message (admittedly on single processor Pentium 1 servers, but it was disk-, not CPU-bound even then). With careful attention to detail, I can currently index the entire text of Mary Shelley's Frankenstein (thank you Project Gutenberg for my test data!) in about 16ms on a single node. 200 words takes somewhere in the microseconds, so even a single node could potentially be close to handling 1M users. Of course, I'd never attempt that with a single node, but it's comforting to see those numbers all the same.

And yes, the new implementation can also do QBE. This turns out to be surprisingly useful – rather than taking a modern approach of, for example, running all of our text through (computationally expensive) sentence transformers and using a vector database to do the matching (which is also computationally expensive, particularly at scale), I can just throw some text at the search engine and have it directly tell me where to look for similar text. This solves (most of) the text classification problem without needing machine learning at all. My (currently incomplete, lacking some caching, so slower than the production code will be) can do a QBE query typically in about 20ms. On one node. I suspect that this may go down to about 5ms or less with optimization. I'd probably run a query like this per user maybe once an hour or so. For 1M users, this equates to 136 QPS, or about 7.3ms per query. So we're in the ballpark of 1 to 2 nodes to handle that traffic. It remains to be seen whether QBE will be effective for harmful material detection, but I suspect that it might be.

To compare like for like, let's assume we actually need 10 nodes (because reasons). Let's assume that these are fairly chunky Intel servers running close to flat out, using about 1kW each on average, so 240kWh per day. That's 83 times less power than a gen AI approach (1.2%), with no need to give money to Google or OpenAI.

I am currently evaluating the possibility of using ARM-based hardware, which is far more power efficient than Intel, which can be about 5 - 10 times more power efficient again. I'll take it.

Vibe Coding

I wish I didn't have to talk about this, but this is an Oh Hell No for me. I have so many reasons, but:

  • I don't want code in my codebase written by someone I don't have access to, particularly early in the project
  • I want to know for damned sure that I actually own all the necessary copyright, and that I haven't accidentally plagiarised something that could bite me later
  • I've been coding for nearly 50 years at this point. I am better at it than the vast majority of people whose code was used to train gen AI. At best, it will generate code that's not as good as mine.
  • Bugs cost exponentially more to handle, further down the pipe. If you can catch a bug in the design, it's 10 times cheaper than catching it in development, which is 10 times cheaper than catching it in production. Gen AI doesn't change this, and in fact goes directly to dumping all of your bugs into production.
  • Gen AI generates bad code (boilerplate, not factored into libraries, often just plain weird).
  • I don't find it actually saves time. It might get you to 80% (of a far less computationally difficult project than Euravox) quickly, but that last 20% will bite your shiny metal ass. Hard.

The Euravox code is old-fashioned, corn-fed, hand-crafted artisanal goodness. And probably a tenth the size it would have been had it been vibe coded. I'm managing to go quite fast enough, thanks, and I'm not piling on tech debt that I'll have to repay later. I'm building for massive scale from the outset.

Conclusions

My feelings toward gen AI are mostly that of disappointment. When I first played with a transformer-based chatbot, it was an unreleased, internal-only thing when I was working at Google. It was creepily good (and the same one I think that led to an engineer declaring it sentient, causing a bit of a ruckus at the time). It was also obviously not ready for prime time, because it would devolve into bizarre weirdness and (to be honest, like current publicly accessible models) could make some dangerous suggestions to a gullible user. To their credit, Google didn't release it. Then OpenAI threw caution to the wind, and the current madness started, dragging the rest of the world along with it.

When people talk publicly and openly about a bubble in an industry, listen to them. I was around for the dot com crash. I was actually consulting for what I think was the first UK company to go bang ignominiously (another story for another time). It was bad then. An AI bubble popping will inevitably be much worse. I suspect that the entire tech industry will be hit with the backlash – economists are saying that the only reason the US economy isn't officially in recession is due to AI investment spending. This means an immediate US recession, and a hard one, because the AI bubble is said to be about 17 times bigger than the dot com bubble, and three times bigger than subprime.

My prediction is that when it happens – not if – any business that has upended its internal processes to depend on gen AI will be in immediate and very deep shit. OpenAI is running at a huge loss, so they will be gone, Enron-style. Other AI companies who exist essentially as wrappers around OpenAI will evaporate overnight, as will any business plan valued on 'it'll be incredibly valuable eventually when blah blah AI blah blah' principles. Google will survive, though their share price will tank. They may end up with datacenter overcapacity, but that'll probably just mean that they will lay off a bunch of people to save money and then wait for organic growth to catch up. Companies who jumped on the AI bandwagon without a real plan for revenue will rip it out, and most likely go back to offshoring jobs and other similar tactics. NVIDIA won't go bankrupt, but its share price will tank. Oh and Elon, with the AI datacenters-in-space grift. I've written about this elsewhere, but at that time it wasn't clear why this thing was being talked about at all. Technically it's basically a completely stupid idea, practically it'll likely never fly at all. What it actually appears to be is a grift to pump SpaceX's share price at its forthcoming IPO by a factor of two. I was a little surprised to hear Google go along with this, because they should certainly know better, but when I found out later that they own 9% of SpaceX, and stand to make about an extra 30 billion dollars by overvaluing the company on the basis of that incredibly stupid idea, everything started to make sense. It's vapourware, and intended specifically to fleece investors. Needless to say, if the AI bubble pops before the IPO, there will be no IPO. If the AI bubble pops after the IPO, the project will be shelved with a 'too bad, so sad, market has gone away' excuse to the markets.

Me? I will use some machine learning where appropriate. Likely non of it being generative. Image classification is one area where neural networks are the only game in town, and it's critically needed in order to detect CSAM. I probably won't use neural networks on text at all if QBE works as well as it's doing in experiments currently. But I am expecting the AI bubble to crash, hard, and am setting up Euravox specifically in such a way that it will survive and thrive through such a storm. This doesn't require rocket science – avoiding tech that will become difficult or unacceptably expensive, or unethical to access, whilst being very careful to maximize efficiency so we can run on minimal hardware, minimal electricity and with as small an environmental impact as possible just makes sense.

Subscribe to Taranis

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe