That in my experience is never the bottleneck, at least not for professionals. Letting Claude code is the easy part of the job. Gathering the requirements, Drafting a good story for claude, guiding claude through the endless mistakes it usually commits, reviewing the output of claude, steering the ship, this is the bottleneck. Unfortunately, no AI can't steer the ship currently, not o3 pro, not gemini, not opus 4, not any of their fancy cli agent tools, no matter how clever the md instruction files and other gimmicks. And boy, i'd be the first one to cheer if AI could do this. But currently, it's useless without fulltime attention of a senior experienced human.
If you’re on a Pro account, it’s common to hit the usage limit in the middle of a long-running task. Claude Code will tell you you’re out of quota, and the reset time might be something like 3 AM.
If you’re asleep by then, you miss the chance to resume right when it resets. The script is just a workaround to automatically pick up where you left off as soon as the quota is restored.
Exactly. I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!), and the biggest lesson I learnt is that I need to pay attention to what Claude Code does. I need to be specific in my instructions, and I need to review the output, because it will sometimes not know how to do things and it will never say "I don't know how to do that", it'll just endlessly write more and more code to try to appease you.
I think I'm faster with Claude Code overall, but that's because it's a tradeoff between "it makes me much faster at the stuff I'm not good at" and "it slows me down on the stuff I am good at". Maybe with better prompting skills, I'll be faster overall, but I'm definitely glad I don't have to write all the boilerplate yet again.
> I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!), and the biggest lesson I learnt is that I need to pay attention to what Claude Code does.
It is part of the learning curve discovering that making a non-deterministic system act deterministically based on some vague instructions is actually pretty difficult.
> Exactly. I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!)
I'm not sure what this means; what exactly is an AI assistant SaaS? There are plenty of wrappers around LLMs that you can use, but I'm guessing that a wrapper around (for example) the ChatGPT or Claude API isn't what you had in mind, right?
The entire market is focused on productivity ones that all have the same feature set, no real originality in the offerings IMHO.
There are a lot of low hanging fruit that can be tackled. From helping people stay focused to managing household schedules. I honestly don't even think the product has to be SaSS, a 16GB VRAM GPU, or a well equipped MacBook, can do everything with locally.
I don’t see it that way. Even if Claude could give me code without hallucinating (in my experience it is a 30-35% success rate on giving me code that actually works and doesn’t use APIs it makes up as it goes), it cannot come up with real world problems to solve. For example, it isn’t going to notice that I need help managing my calendars and want an AI assistant that can read my calendar and email me my agenda for the day and the week, find scheduling conflicts, and suggest dates and times that align with social norms and my habits for get togethers with friends. It cannot notice that my car’s Bluetooth prioritized the last phone it was connected to and not my phone. It cannot notice that my 3D printer has a frame skew that needs to be corrected. It cannot notice that a set of solar panels could be optimized with a bunch of liners actuators and a cloud tracking camera. Those are meatspace problems that Claude cannot see. It might get more capable but it can’t design a product or a service.
Coincidentally, I want this AI assistant as well. I built a proof of concept and it worked really well, so I'm building a more multi-user version so my friends can use it as well.
The really nice thing about it is that I gave it memory, so a lot of these behaviours are just things you teach it. For example, I only programmed it to be able to read my calendar and add events, and then told it "before you add an event, check for conflicts" and it learned to do that. I really like that sort of "programming by saying stuff" that LLMs enable.
I'm looking forward to seeing where this experiment goes, email me if you want access/want to discuss features. I don't know if I'll open it up to everyone, as LLMs are costly, but maybe I could do a "bring your own key" thing.
I only started working on it on Saturday, so nothing very useful yet. It's at https://www.askhuxley.com/ and I want to focus on making something really useful instead of on monetization, so you'll have to bring your own API key, but I'd love it if you wanted to bounce some ideas and use cases off each other. I know what things are useful to me, but I don't know what's useful to other people, and I'd love to get new ideas.
Feel free to email me (email in profile) if you'd like to try it out. Right now it only does weather and Google Calendar, but adding new integrations is easy and the interesting thing is the fact that it can learn from you, and will behave like a PA would, while also being proactive and messaging you without you having to message it first.
I did make a prototype a while ago, which I integrated with a hardware device, and that was extremely useful, being able to do things by me teaching it. For example, it only had access to my calendar and its memories, but I told it (in chat) to check for and notify me of conflicts before adding an event, and told it the usual times of some events, so then I'd say "I'm playing D&D on Thursday" and it would reply with "you can't, you have an appointment at 8PM". This sounds simple for a human, but the LLM had to already know what time D&D was, that it's something I can't do during appointments, and that I wanted to be informed of conflicts, which are all things that I didn't have to program, but just instructed the LLM to do.
For some things, yes. For a lot of things, not really. Think of it like this: if product designers could just treat human software developers as idea -> code translators then we wouldn’t have developers making quite so many crucial decisions. Just translate the spec to code. But in reality engineers end up making the bulk of the decisions and often drive the product’s direction because of what is possible. Engineers also think up base tech that enables new products. An AI cannot stand in front of a radar dish with a sandwich in its pocket to discover microwave ovens.
It helps when the agent automatically uses diagnostics, logs, and tests. If it resolves two or three problems by itself, that saves you the effort of doing so.
This over reliance on llms is crazy. People are going to forget how to code. Sometimes the llm makes up shit or uses the wrong version of the API. Sometimes it's easier to look up the documentation and write some code.
All you need is a magnetized needle and a steady hand.
Years ago I interviewed at Rackspace. They did a data structures and algorithms type interview. One of the main questions they had was about designing a data structure for a distributed hash table, using C or equivalent, to be used as a cache and specifically addressing cache invalidation. After outlining the basic approach I stopped and said that I have used a system like that in several projects at my current and former jobs and I would use something like Redis, memcache, or even Postgres in one instance, and do a push to cache on write system rather than a cache server pulling values from the source of truth if it suspected it had stale data. They did not like that answer. I asked why and they said it’s because I’m not designing a data structure from scratch. I asked them if the job I am applying for involved creating cache servers from scratch and they said “of course not. We use Redis.” (It might have been memcache, I honestly don’t remember which datadore they liked). Needless to say, this wasn’t a fit for either of us. While I am perfectly capable of creating toy versions of these kinds of services, I still stand by using existing battle tested software over rolling your own.
If you worry about forgetting how to code, then code. You already don’t know how to code 99% of the system you are using to post this comment (Verilog, CPU microcode, GPU equivalents, probably MMU programming, CPU-specific assembly, and so on). You can get ahead of the competition by learning some of that tech. Or not. But technically all you need is a magnetized needle and a steady hand.
Heh, I remember an interview once and they wanted me to figure out if a word contained double letters (i.e. there's 2 L's in letters).
I was like well I'd probably just make a Set in Java and insert letters until it has a duplicate.
They didn't like that. So I was like well I guess I can make a double for-loop and check that way and they liked that ... It is weird how people like you to re-invent the wheel as opposed to just import one.
Not that I'm a fan of this kind of interview, but these answers illustrate different kinds of skill/intelligence.
One is domain knowledge which is less important in the age Google search and StackOverflow (and even less so in the age of LLMs but I guess interview techniques haven't caught up yet).
The second is the ability to understand a nested for loop, and if a coder can't do that by the point they reach an interview, it can probably never be taught.
It could be argued that being able to think up using a set in this instance is also an important skill, and I agree. But nested for loops are foundational skills, if the interviewee has problems there it's a good thing to know about early.
It could also be argued that they should just say directly "solve this using loops" if that's what they want, and well, yeah.
A Bloom filter would be a way more fun solution. But I think the quiet part of this is that people conducting interviews just like to feel clever in knowing some puzzle and its answer. Make them feel good about their puzzle and they like you that much more as a candidate.
My favorite way to interview people is to ask them about their work and personal projects and about what parts of those were tricky, why, and how they solved those challenges. This gets candidates to talk much more openly about what experience they have and we can discuss real world practical problems down to having them show pseudo code (or showing off their GitHub repos of specific projects) that can very efficiently tell you how they think. It’s the equivalent of “tell me about bridges you have designed” vs “here are popsicle sticks, design a toy bridge” approach.
Jokes aside, while I'm almost sure that the ability to code can be lost and regained just like training a muscle what I'm more worried is the rug pull and squeeze that is bound to happen sometime in the next 5 to 10 years unless LLMs go the way of Free Software GNU style. If the latter happens then LLMs for coding will be like calculators and such more or less and personally I don't know how more harmful that would be compared to the boost in productivity.
That said if the former becomes reality (and I hope not!) then we're in for some huge existential crises when people realize they can barely materialize the labour part of their jobs after doing the thinky part and the meetings part.
I don't think the rug pull and squeeze are possible. Because I've had the same worry. But using an existing LLM to train or fine tune a new one seems to be standard practice, and to work quite well. So any LLM with an API will end up training all the others - even open source LLMs - and all will benefit. And every day that passes, Moore makes it less and less costly for amateurs to commit the compute necessary for fine tuning, and eventually training from scratch.
In time, even video and embodied training may be possible for amateurs, though that's difficult to contemplate today.
This rhymes with the discussion we had when higher level languages became popularized. And many of us did forget how to write assembly! What might the world have looked like otherwise?
I think it's more like every engineer will either become like a lead or a principal or have problems. I'm a principal. I have for years had multiple teams building things that I prototyped designed or worked with them on the specs for. There's a level of touch and letting go that you have to employ to not over burden them or you getting bogged down in details that don't matter and missing those that do.
One of the skills I've developed is spinning (back) up on problems quickly while holding the overall in my head. I find with AI I'm just doing that even more often and I now have 5 direct reports (AI) added to the 6 teams of 8 I work with through managers and roadmaps.
I think there is one big difference that will differentiate between principal/lead devs and euqally experienced senior devs working with AI.. AIs are not people. Lead/principal developers are good at delegating work to, and managing, people. People and AIs have very little in common and I don't think the skills will really translate that well. I think the people who will really shine with AI are those at the principal level of skill but who are better with computers than people. They will be able to learn the AI system interaction skills without first having to unlearn all the people interaction skills and I'm not sure if the "leadership skills" that are prized in principal devs can even be unlearned they seem to be more a natural affinity than a skill.
Pretty much me with some IDEs and their code inspections and refactoring capabilities and run profile configurations (especially in your average enterprise Java codebase). Oh well.
The future is going to be great for us that have been resisting going all in. Unfortunately I feel a lot of work will be detangling the mess these llms make in larger repos.
The devils is in the details, as they say. And software engineering used to be exorcism, now they want it to be summoning. Now I'm just hopping for the majority to realize that hell is not a great environment for business.
> software engineering used to be exorcism, now they want it to be summoning. Now I'm just hopping for the majority to realize that hell is not a great environment for business
I mean, it’s just like having an army of interns that works for (near) free. It’s a huge positive for productivity, and I don’t think we will forget how. I’m more concerned with how we make new senior/staff engineers from now on, since the old “do grunt work for a couple years, then do simple well defined work for a few years” is 100% not a career path that exists even now.
This is my question as well. I am already wondering how prepared college grads will be. Getting help with programming assignments meant going to the dungeon and collaborating with fellow students while the TA made their rounds and overall just figuring it out. Today, an LLM knocks out the programming assignments in once shot, probably. And industry seems hellbent on hiring seniors mostly so where are the juniors to become seniors going to come from?
I think the talent pipeline has contracted and/or will and overcorrect. But maybe the industry’s carrying capacity of devs has shrunk.
This is a problem for new generations that should be mindful of but then again how many people can whip some assembly? Certainly not the majority of developers and it is certainly not required for most programming tasks. We might end up in the same situation - most of the plumbing will be done by high-level coding with the help of AI agents.
Your average professional python programmer knows a lot less about how computers work than the assembly machine level programmers of yesteryear. Software today is both worse and better. Slack uses 2gb of RAM, but is there anyone who wants to go back?
Things will probably continue in that general direction. And just like today, a small number of people who really know what they're doing keep everything humming so people can build on top of it. By importing 13 python libraries they don't completely understand, or having an AI build 75% of their CRUD app.
Since downgrading from max to pro.... i have been using sonnet 4 a TON and i havent even been limited yet. The usage allowance is awesome since gemini cli released.
Everything is so dependent on the type of work you do. For me sonnet works pretty well, it’s hard for me to say. Even when I used opus nonstop last month it’s not like I was doing the exact same tasks on sonnet. However it did drop down to sonnet a lot and I didn’t have a problem
Opus and Sonnet are pretty similar for mainstream stuff heavily represented in the corpus.
But when you get into dark corners, Opus remains useful or at the minimum not harmful, Sonnet (especially Claude Code) are really useful doing something commonly done in $MAINSTREAM_STACK, but will wreck your tree in some io_uring sorcery.
I love these work-arounds and generous tiers. A bit of a tangent, but with very cheap essentially unlimited code generation, are there any active projects that just run this for days straight with an ambitious goal like "Develop an operating system" with instructions to just make all the necessary decisions to continue work?
I would love to see what a system like Claude Code could cook up running continuously for weeks. But I imagine it would get stuck in some infinite recursive loop.
Yes I tried pushing it as far as possible over the course of a couple days to invent, build and prioritize the direction of a new programming language (trying to give it as much freedom as possible and make its own decicions while steering it only to not get stuck). After around $50 in tokens it kinda got lost in the complexity it had created and just kept adding more and more useless trivialities while overlooking fundamental unsolved problems.
E.g. it wanted to build a data query language with temporal operations but completely forgot to keep historical data.
It currently lacks the ability to focus on the overall goal and prioritize sub-tasks accordingly and instead spirals into random side quests real quick.
I think you might be aiming too low. Tasked with writing a "perfect and most useful program" this would surely yield something more than merely writing 42 to stdout.
Current llms get lost fairly quickly in larger projects. They still benefit from reduced scope when promoting. Context is the biggest bottleneck right now by far. You can only summarize so much before the information is too vague to make meaningful changes.
Claude Code with an API token, even when told: "Iterate until github is green without stopping" will sometimes just stop and wait to be told to continue.
For a POC, I've wrapped it with another LLM that makes some judgements and sends the appropriate (Keep going, do xyz) messages. Worked well enough for basic tasks
Feels like we are incredibly early into pricing LLM usage. I suspect we are going to look back in the days these scripts were useful with some rose tinted glasses.
I assume they have very peaky demand, especially when Europe + N American office hours overlap (though I'm assuming b2b demand is higher than b2c). I'm also assuming Asian demand is significantly less than "the west", which I guess would be true given the reluctance to serve Chinese users (and Chinese users using locally hosted models?).
I know OpenAI and Anthropic have 'batch' pricing but that's slightly different as it's asynchronous and not well suited for a lot of code tasks. Think a more dynamic model for users would make a lot more sense - for example, a cheaper tier giving "Max" usage but you can only use it 8pm-6am Eastern time, otherwise you are on Pro limits.
Seems convenient. I have a couple of different Claude accounts, so I switched between them when one gets exhausted. Sometimes they both get exhausted. If other people have a couple of accounts then that would be a nice feature to add to this: switching between accounts and then resuming when either of them becomes available again.
Why is the API usage-based billing so much more expensive than the $20/month tier? Like you can literally burn through $20 of usage in a day with the same amount of usage that you get included with the $20/month plan
How much people use these APIs? It seems Clause API costs $3 or $15 per 1M tokens (I suppose for different models, it seems difficult to find first-party pricing information), so people who would use these daily would then use a lot more than that?
That in my experience is never the bottleneck, at least not for professionals. Letting Claude code is the easy part of the job. Gathering the requirements, Drafting a good story for claude, guiding claude through the endless mistakes it usually commits, reviewing the output of claude, steering the ship, this is the bottleneck. Unfortunately, no AI can't steer the ship currently, not o3 pro, not gemini, not opus 4, not any of their fancy cli agent tools, no matter how clever the md instruction files and other gimmicks. And boy, i'd be the first one to cheer if AI could do this. But currently, it's useless without fulltime attention of a senior experienced human.
If you’re on a Pro account, it’s common to hit the usage limit in the middle of a long-running task. Claude Code will tell you you’re out of quota, and the reset time might be something like 3 AM.
If you’re asleep by then, you miss the chance to resume right when it resets. The script is just a workaround to automatically pick up where you left off as soon as the quota is restored.
Exactly. I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!), and the biggest lesson I learnt is that I need to pay attention to what Claude Code does. I need to be specific in my instructions, and I need to review the output, because it will sometimes not know how to do things and it will never say "I don't know how to do that", it'll just endlessly write more and more code to try to appease you.
I think I'm faster with Claude Code overall, but that's because it's a tradeoff between "it makes me much faster at the stuff I'm not good at" and "it slows me down on the stuff I am good at". Maybe with better prompting skills, I'll be faster overall, but I'm definitely glad I don't have to write all the boilerplate yet again.
> I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!), and the biggest lesson I learnt is that I need to pay attention to what Claude Code does.
It is part of the learning curve discovering that making a non-deterministic system act deterministically based on some vague instructions is actually pretty difficult.
I don't need it to act deterministically, I just need it to be correct.
I mean, you need to deterministically produce correctness.
Most of us would be happy with the “probably approximately correct” standard for Claude code. ;)
https://en.m.wikipedia.org/wiki/Probably_approximately_corre...
> Exactly. I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!)
I'm not sure what this means; what exactly is an AI assistant SaaS? There are plenty of wrappers around LLMs that you can use, but I'm guessing that a wrapper around (for example) the ChatGPT or Claude API isn't what you had in mind, right?
Can you explain?
I mean an AI PA. Someone to manage my calendar, todos, personal documentation, emails, etc. Just something to help with general life admin.
> I've spent the weekend trying my hand at making an AI assistant SaaS (I can't believe this doesn't exist yet!)
you mean another wrapper ?
Do you know of any good ones?
there's plainly, just run a search on perplexity or else. Most YC startup are AI wrapper in some sort...
I haven't been able to find any AI personal assistants, please point me to any you may know.
The entire market is focused on productivity ones that all have the same feature set, no real originality in the offerings IMHO.
There are a lot of low hanging fruit that can be tackled. From helping people stay focused to managing household schedules. I honestly don't even think the product has to be SaSS, a 16GB VRAM GPU, or a well equipped MacBook, can do everything with locally.
> And boy, i'd be the first one to cheer if AI could do this.
Yeah, well it would be the next major step towards human irrelevance.
Or at least, for developers.
I don’t see it that way. Even if Claude could give me code without hallucinating (in my experience it is a 30-35% success rate on giving me code that actually works and doesn’t use APIs it makes up as it goes), it cannot come up with real world problems to solve. For example, it isn’t going to notice that I need help managing my calendars and want an AI assistant that can read my calendar and email me my agenda for the day and the week, find scheduling conflicts, and suggest dates and times that align with social norms and my habits for get togethers with friends. It cannot notice that my car’s Bluetooth prioritized the last phone it was connected to and not my phone. It cannot notice that my 3D printer has a frame skew that needs to be corrected. It cannot notice that a set of solar panels could be optimized with a bunch of liners actuators and a cloud tracking camera. Those are meatspace problems that Claude cannot see. It might get more capable but it can’t design a product or a service.
Coincidentally, I want this AI assistant as well. I built a proof of concept and it worked really well, so I'm building a more multi-user version so my friends can use it as well.
The really nice thing about it is that I gave it memory, so a lot of these behaviours are just things you teach it. For example, I only programmed it to be able to read my calendar and add events, and then told it "before you add an event, check for conflicts" and it learned to do that. I really like that sort of "programming by saying stuff" that LLMs enable.
I'm looking forward to seeing where this experiment goes, email me if you want access/want to discuss features. I don't know if I'll open it up to everyone, as LLMs are costly, but maybe I could do a "bring your own key" thing.
Do you have a demo of it one can check?
I only started working on it on Saturday, so nothing very useful yet. It's at https://www.askhuxley.com/ and I want to focus on making something really useful instead of on monetization, so you'll have to bring your own API key, but I'd love it if you wanted to bounce some ideas and use cases off each other. I know what things are useful to me, but I don't know what's useful to other people, and I'd love to get new ideas.
Feel free to email me (email in profile) if you'd like to try it out. Right now it only does weather and Google Calendar, but adding new integrations is easy and the interesting thing is the fact that it can learn from you, and will behave like a PA would, while also being proactive and messaging you without you having to message it first.
I did make a prototype a while ago, which I integrated with a hardware device, and that was extremely useful, being able to do things by me teaching it. For example, it only had access to my calendar and its memories, but I told it (in chat) to check for and notify me of conflicts before adding an event, and told it the usual times of some events, so then I'd say "I'm playing D&D on Thursday" and it would reply with "you can't, you have an appointment at 8PM". This sounds simple for a human, but the LLM had to already know what time D&D was, that it's something I can't do during appointments, and that I wanted to be informed of conflicts, which are all things that I didn't have to program, but just instructed the LLM to do.
Yes, but product designers can talk to the AI and they wouldn't need developers to implement their ideas.
For some things, yes. For a lot of things, not really. Think of it like this: if product designers could just treat human software developers as idea -> code translators then we wouldn’t have developers making quite so many crucial decisions. Just translate the spec to code. But in reality engineers end up making the bulk of the decisions and often drive the product’s direction because of what is possible. Engineers also think up base tech that enables new products. An AI cannot stand in front of a radar dish with a sandwich in its pocket to discover microwave ovens.
It helps when the agent automatically uses diagnostics, logs, and tests. If it resolves two or three problems by itself, that saves you the effort of doing so.
This project is about getting around the extremely aggressive throttling Claude applies.
This over reliance on llms is crazy. People are going to forget how to code. Sometimes the llm makes up shit or uses the wrong version of the API. Sometimes it's easier to look up the documentation and write some code.
All you need is a magnetized needle and a steady hand.
Years ago I interviewed at Rackspace. They did a data structures and algorithms type interview. One of the main questions they had was about designing a data structure for a distributed hash table, using C or equivalent, to be used as a cache and specifically addressing cache invalidation. After outlining the basic approach I stopped and said that I have used a system like that in several projects at my current and former jobs and I would use something like Redis, memcache, or even Postgres in one instance, and do a push to cache on write system rather than a cache server pulling values from the source of truth if it suspected it had stale data. They did not like that answer. I asked why and they said it’s because I’m not designing a data structure from scratch. I asked them if the job I am applying for involved creating cache servers from scratch and they said “of course not. We use Redis.” (It might have been memcache, I honestly don’t remember which datadore they liked). Needless to say, this wasn’t a fit for either of us. While I am perfectly capable of creating toy versions of these kinds of services, I still stand by using existing battle tested software over rolling your own.
If you worry about forgetting how to code, then code. You already don’t know how to code 99% of the system you are using to post this comment (Verilog, CPU microcode, GPU equivalents, probably MMU programming, CPU-specific assembly, and so on). You can get ahead of the competition by learning some of that tech. Or not. But technically all you need is a magnetized needle and a steady hand.
Heh, I remember an interview once and they wanted me to figure out if a word contained double letters (i.e. there's 2 L's in letters).
I was like well I'd probably just make a Set in Java and insert letters until it has a duplicate.
They didn't like that. So I was like well I guess I can make a double for-loop and check that way and they liked that ... It is weird how people like you to re-invent the wheel as opposed to just import one.
Not that I'm a fan of this kind of interview, but these answers illustrate different kinds of skill/intelligence.
One is domain knowledge which is less important in the age Google search and StackOverflow (and even less so in the age of LLMs but I guess interview techniques haven't caught up yet).
The second is the ability to understand a nested for loop, and if a coder can't do that by the point they reach an interview, it can probably never be taught.
It could be argued that being able to think up using a set in this instance is also an important skill, and I agree. But nested for loops are foundational skills, if the interviewee has problems there it's a good thing to know about early.
It could also be argued that they should just say directly "solve this using loops" if that's what they want, and well, yeah.
A Bloom filter would be a way more fun solution. But I think the quiet part of this is that people conducting interviews just like to feel clever in knowing some puzzle and its answer. Make them feel good about their puzzle and they like you that much more as a candidate.
My favorite way to interview people is to ask them about their work and personal projects and about what parts of those were tricky, why, and how they solved those challenges. This gets candidates to talk much more openly about what experience they have and we can discuss real world practical problems down to having them show pseudo code (or showing off their GitHub repos of specific projects) that can very efficiently tell you how they think. It’s the equivalent of “tell me about bridges you have designed” vs “here are popsicle sticks, design a toy bridge” approach.
It's the calculator all over again!
Jokes aside, while I'm almost sure that the ability to code can be lost and regained just like training a muscle what I'm more worried is the rug pull and squeeze that is bound to happen sometime in the next 5 to 10 years unless LLMs go the way of Free Software GNU style. If the latter happens then LLMs for coding will be like calculators and such more or less and personally I don't know how more harmful that would be compared to the boost in productivity.
That said if the former becomes reality (and I hope not!) then we're in for some huge existential crises when people realize they can barely materialize the labour part of their jobs after doing the thinky part and the meetings part.
I don't think the rug pull and squeeze are possible. Because I've had the same worry. But using an existing LLM to train or fine tune a new one seems to be standard practice, and to work quite well. So any LLM with an API will end up training all the others - even open source LLMs - and all will benefit. And every day that passes, Moore makes it less and less costly for amateurs to commit the compute necessary for fine tuning, and eventually training from scratch.
In time, even video and embodied training may be possible for amateurs, though that's difficult to contemplate today.
There are already lots of open source/open weight models than can run locally on a laptop.
People into homelabs have been running AI tools on home servers for years.
> There are already lots of open source/open weight models than can run locally on a laptop.
And they're all too small and dumb to be useful for anything but the most basic tasks.
This rhymes with the discussion we had when higher level languages became popularized. And many of us did forget how to write assembly! What might the world have looked like otherwise?
Maybe computers wouldn't have gotten slower as time goes on.
We definitely would not have Electron and that's a world I want to live in.
Just like how people forgot how to patch phone lines and punch cards.
Just a reminder you can tell the LLM the version of an API to use.
Your code should have tests the AI can use to test the code it wrote.
And thanks to MCP, you can literally point your LLM to the documentation of your preferred tool [1].
[1]: https://context7.com/about
I think it's more like every engineer will either become like a lead or a principal or have problems. I'm a principal. I have for years had multiple teams building things that I prototyped designed or worked with them on the specs for. There's a level of touch and letting go that you have to employ to not over burden them or you getting bogged down in details that don't matter and missing those that do.
One of the skills I've developed is spinning (back) up on problems quickly while holding the overall in my head. I find with AI I'm just doing that even more often and I now have 5 direct reports (AI) added to the 6 teams of 8 I work with through managers and roadmaps.
I think there is one big difference that will differentiate between principal/lead devs and euqally experienced senior devs working with AI.. AIs are not people. Lead/principal developers are good at delegating work to, and managing, people. People and AIs have very little in common and I don't think the skills will really translate that well. I think the people who will really shine with AI are those at the principal level of skill but who are better with computers than people. They will be able to learn the AI system interaction skills without first having to unlearn all the people interaction skills and I'm not sure if the "leadership skills" that are prized in principal devs can even be unlearned they seem to be more a natural affinity than a skill.
> People are going to forget how to code.
Pretty much me with some IDEs and their code inspections and refactoring capabilities and run profile configurations (especially in your average enterprise Java codebase). Oh well.
The future is going to be great for us that have been resisting going all in. Unfortunately I feel a lot of work will be detangling the mess these llms make in larger repos.
The devils is in the details, as they say. And software engineering used to be exorcism, now they want it to be summoning. Now I'm just hopping for the majority to realize that hell is not a great environment for business.
> software engineering used to be exorcism, now they want it to be summoning. Now I'm just hopping for the majority to realize that hell is not a great environment for business
With your leave, this is going up on my wall :)
I mean, it’s just like having an army of interns that works for (near) free. It’s a huge positive for productivity, and I don’t think we will forget how. I’m more concerned with how we make new senior/staff engineers from now on, since the old “do grunt work for a couple years, then do simple well defined work for a few years” is 100% not a career path that exists even now.
This is my question as well. I am already wondering how prepared college grads will be. Getting help with programming assignments meant going to the dungeon and collaborating with fellow students while the TA made their rounds and overall just figuring it out. Today, an LLM knocks out the programming assignments in once shot, probably. And industry seems hellbent on hiring seniors mostly so where are the juniors to become seniors going to come from?
I think the talent pipeline has contracted and/or will and overcorrect. But maybe the industry’s carrying capacity of devs has shrunk.
This is a problem for new generations that should be mindful of but then again how many people can whip some assembly? Certainly not the majority of developers and it is certainly not required for most programming tasks. We might end up in the same situation - most of the plumbing will be done by high-level coding with the help of AI agents.
> People are going to forget how to code.
Which is a problem when exactly? When civilization collapses?
It will probably be like coding in assembly after the advent of the compilers. There are some people who still code in assembly, but it's rare.
Wait until only LLMs know how to build compilers. That's going to be a riot ...
so software will get even worse because nobody understands anything about how computers work?
Your average professional python programmer knows a lot less about how computers work than the assembly machine level programmers of yesteryear. Software today is both worse and better. Slack uses 2gb of RAM, but is there anyone who wants to go back?
Things will probably continue in that general direction. And just like today, a small number of people who really know what they're doing keep everything humming so people can build on top of it. By importing 13 python libraries they don't completely understand, or having an AI build 75% of their CRUD app.
I think it's a problem in that each 1% of slop here and there massive compounds overall.
Me and my LLM buddy together understand exactly how computers work!
Since downgrading from max to pro.... i have been using sonnet 4 a TON and i havent even been limited yet. The usage allowance is awesome since gemini cli released.
As someone who cpuld have never considered Claude Code due to the cost of Max/Opus - do you notice any differences in practice with just Sonnet?
Everything is so dependent on the type of work you do. For me sonnet works pretty well, it’s hard for me to say. Even when I used opus nonstop last month it’s not like I was doing the exact same tasks on sonnet. However it did drop down to sonnet a lot and I didn’t have a problem
Opus and Sonnet are pretty similar for mainstream stuff heavily represented in the corpus.
But when you get into dark corners, Opus remains useful or at the minimum not harmful, Sonnet (especially Claude Code) are really useful doing something commonly done in $MAINSTREAM_STACK, but will wreck your tree in some io_uring sorcery.
Sonnet is noticeably worse in my opinion. It’s worth it to spring for Max and only use Opus
I love these work-arounds and generous tiers. A bit of a tangent, but with very cheap essentially unlimited code generation, are there any active projects that just run this for days straight with an ambitious goal like "Develop an operating system" with instructions to just make all the necessary decisions to continue work?
I would love to see what a system like Claude Code could cook up running continuously for weeks. But I imagine it would get stuck in some infinite recursive loop.
Yes I tried pushing it as far as possible over the course of a couple days to invent, build and prioritize the direction of a new programming language (trying to give it as much freedom as possible and make its own decicions while steering it only to not get stuck). After around $50 in tokens it kinda got lost in the complexity it had created and just kept adding more and more useless trivialities while overlooking fundamental unsolved problems.
E.g. it wanted to build a data query language with temporal operations but completely forgot to keep historical data.
It currently lacks the ability to focus on the overall goal and prioritize sub-tasks accordingly and instead spirals into random side quests real quick.
I think you might be aiming too low. Tasked with writing a "perfect and most useful program" this would surely yield something more than merely writing 42 to stdout.
Current llms get lost fairly quickly in larger projects. They still benefit from reduced scope when promoting. Context is the biggest bottleneck right now by far. You can only summarize so much before the information is too vague to make meaningful changes.
It would probably look suspiciously like Linux.
Better that than violating someone’s IP!
There is ClaudePlaysPokemon which has been failing to beat the game for weeks (months?) now.
edit:
https://www.twitch.tv/claudeplayspokemon
pairing this with Task Master could allow you to draft all of your tasks and effectively have Claude pick something from an endless backlog 24/7...
Doesn’t Taskmaster require an API key? It doesn’t work with a subscription.
You can technically hack the API key from the subscription, but that’s probably brittle.
Or is there some other meta I’m missing?
https://github.com/eyaltoledano/claude-task-master/pull/805
Nice, thanks!
You need some time to chain tasks. Endless tasks, can't be in random order and there is usually a link between tasks/context.
when I say Task Master[0] I'm referring to specific bit of software that manages task dependencies and ordering.
but I agree, at least the way I use AI tools, it'd be unfeasible to review the code using this method.
[0] https://github.com/eyaltoledano/claude-task-master
Reminds me of this HN discussion (Writing Code Was Never the Bottleneck): https://news.ycombinator.com/item?id=44429789
Claude Code with an API token, even when told: "Iterate until github is green without stopping" will sometimes just stop and wait to be told to continue.
For a POC, I've wrapped it with another LLM that makes some judgements and sends the appropriate (Keep going, do xyz) messages. Worked well enough for basic tasks
Feels like we are incredibly early into pricing LLM usage. I suspect we are going to look back in the days these scripts were useful with some rose tinted glasses.
I assume they have very peaky demand, especially when Europe + N American office hours overlap (though I'm assuming b2b demand is higher than b2c). I'm also assuming Asian demand is significantly less than "the west", which I guess would be true given the reluctance to serve Chinese users (and Chinese users using locally hosted models?).
I know OpenAI and Anthropic have 'batch' pricing but that's slightly different as it's asynchronous and not well suited for a lot of code tasks. Think a more dynamic model for users would make a lot more sense - for example, a cheaper tier giving "Max" usage but you can only use it 8pm-6am Eastern time, otherwise you are on Pro limits.
My extension in vscode have auto resume already, you use GUI to write workflows and run them right them. https://marketplace.visualstudio.com/items?itemName=Codingwo...
You can even pause. I will public a CLI that is doing same base on same syntax. And it use github claude action yaml syntax: https://github.com/codingworkflow/claude-runner/blob/main/.g...
Seems convenient. I have a couple of different Claude accounts, so I switched between them when one gets exhausted. Sometimes they both get exhausted. If other people have a couple of accounts then that would be a nice feature to add to this: switching between accounts and then resuming when either of them becomes available again.
Why is the API usage-based billing so much more expensive than the $20/month tier? Like you can literally burn through $20 of usage in a day with the same amount of usage that you get included with the $20/month plan
Because subscription plans are subsidized by people who don't use it that much. Like gyms.
That being said it will not surprise me if subscribers actually are losing Claude money and only API is profitable.
How much people use these APIs? It seems Clause API costs $3 or $15 per 1M tokens (I suppose for different models, it seems difficult to find first-party pricing information), so people who would use these daily would then use a lot more than that?
I need this plus a little drinky-drinky bird to keep pressing 'y' for me, just as the Simpsons predicted
Don't worry, this script dangerously skips permissions (by default?) You don't need to press 'y'.