Hitler reacts to computer-using AI agents
29 views • 3/23/2025
Hitler tries to use an AI agent to order on DoorDash, expecting it to work
00:00 - 00:03 | Our computer-use agent is coming along well |
00:04 - 00:05 | We did pre-training |
00:05 - 00:07 | It knows everything |
00:08 - 00:12 | It's read the whole internet, and can understand images |
00:12 - 00:15 | And its GPQA score indicates it's a PhD-level genius |
00:17 - 00:19 | Let me see it |
00:19 - 00:21 | have it do something simple, like order on DoorDash |
00:24 - 00:26 | My Führer |
00:27 - 00:28 | It |
00:31 - 00:33 | It can't even navigate to a webpage, much less order DoorDash |
00:34 - 00:36 | It's completely useless |
00:53 - 00:58 | Everyone who is not on our AI agents team, leave |
01:13 - 01:15 | I was told common sense was a solved problem |
01:15 - 01:17 | You told me models were at PhD-level |
01:18 - 01:23 | What the hell kind of PhD were you talking about? |
01:25 - 01:28 | That is an extremely misleading claim |
01:29 - 01:31 | Even a 5 year old can use a computer |
01:31 - 01:34 | I was told we were on the verge of AGI |
01:34 - 01:37 | Now you're saying we're stuck at "navigating the web" |
01:37 - 01:40 | How is that harder than FrontierMath? |
01:40 - 01:42 | My Führer, maybe we can automate software engineering without computer-use |
01:42 - 01:46 | That's just cope. Software engineers don't just write code |
01:46 - 01:48 | My Führer, you're failing to see the utility of reasoning models |
01:48 - 01:52 | Reasoning models aren't going to automate all jobs |
01:53 - 01:54 | It's not enough |
01:56 - 01:57 | You can't just tell everyone |
01:57 - 02:00 | that we're 2 years away from Nobel laureate-level AIs |
02:00 - 02:03 | when AIs still can't even navigate websites |
02:04 - 02:08 | That prediction seems like complete nonsense |
02:08 - 02:13 | AIs don't have the same capability profile as humans |
02:14 - 02:16 | You can't assume that just because it's good at coding |
02:17 - 02:21 | that we're 2 years away from AI that can do everything |
02:27 - 02:29 | I thought we'd avoid doing this again |
02:30 - 02:34 | I thought we learned our lesson with chess |
02:34 - 02:36 | DeepBlue wasn't AGI |
02:41 - 02:42 | This is the same lesson |
02:43 - 02:47 | AI can be good at narrow things but still not be very useful |
02:48 - 02:53 | Now everyone is just making the same mistake again |
02:54 - 02:56 | They say there's an imminent software singularity |
02:56 - 02:59 | It'll be hard to have a software singularity without navigating websites |
03:00 - 03:02 | Did anyone ever think of that? |
03:04 - 03:07 | It's OK. Just trust the "trendlines" |
03:14 - 03:16 | Maybe compute scaling will just solve it |
03:19 - 03:23 | It did work for natural language, I suppose |
03:25 - 03:26 | There's still hope |
03:31 - 03:33 | But we only have until like 2030 |
03:40 - 03:46 | If we can't solve agency by then, I'm going to have long timelines again |
03:46 - 03:49 | That'll be the end of the investment scale-up |
03:53 - 03:56 | Make sure we solve agency in time |
No comments yet.