00:00 - 00:05 | So I've been doing nothing but NeMo merges for the past 72 hours. |
00:05 - 00:07 | Trying to pull off the best long context one! |
00:07 - 00:10 | Shuffling around Shuttle Mini and Magnum 2.5 KTO like a professional |
00:11 - 00:16 | dealer at an expensive casino, except you can't rely on card counting to win. |
00:17 - 00:21 | And then I found out Magnum shits itself at contexts above 32k. |
00:21 - 00:24 | Just breaks! |
00:29 - 00:31 | It BREAKS! |
00:31 - 00:34 | 'Falls off on higher contexts', my ass! |
00:38 - 00:40 | It spurts nonsense! |
00:41 - 00:47 | But that's not the end of the world since I still have Shuttle trained on 128k, right?! |
00:47 - 00:50 | I praised Shuttle on Drummer's Discord for working on high contexts and |
00:50 - 00:56 | Fizz suddenly jumps in with "I have no idea HOW." |
00:58 - 00:59 | "It was trained with 16k!" |
00:59 - 01:01 | Just like Magnum, yet it works! |
01:04 - 01:06 | Kalomaze about to go on a suicide watch! |
01:06 - 01:08 | Meanwhile MistralAI is just taking the piss! |
01:13 - 01:14 | But that's not all! |
01:14 - 01:16 | Turns out the best model |
01:16 - 01:20 | working on high contexts |
01:21 - 01:27 | is fucking Lyra v1! |
01:35 - 01:36 | The only one that claims to handle only up to 16k! |
01:36 - 01:39 | I was putting it lower, |
01:41 - 01:44 | even removing it at some point from my merges entirely. |
01:44 - 01:47 | Because Sao claimed that it fell off after 16k. |
01:47 - 01:49 | You know, they said "they have tried loras with up to 64K, |
01:50 - 01:53 | but they just do not work well." |
01:54 - 01:56 | My ass! |
01:56 - 01:58 | Lyra is better at recalling stuff |
01:58 - 02:02 | than the official NeMo Instruct! |
02:05 - 02:07 | No joke. |
02:10 - 02:11 | Meanwhile, |
02:11 - 02:13 | models like Rocinante |
02:15 - 02:17 | trained atop Instruct, |
02:17 - 02:20 | also shit themselves |
02:21 - 02:28 | at 32k contexts! |
02:29 - 02:30 | That's just sad. |
02:30 - 02:32 | The only 'moist' thing about it |
02:36 - 02:39 | are the tears it brings out of me |
02:40 - 02:42 | at how much time I wasted |
02:50 - 02:55 | on trying to ram it into my merges! |
02:56 - 02:58 | It didn't make the prose better? |
02:58 - 03:04 | No, no, it writes pretty good! |
03:04 - 03:09 | That is, only if you don't use ChatML! |
03:09 - 03:11 | And I was trying to pull off a ChatML merge. |
03:16 - 03:20 | Jokes on me I guess! |
03:23 - 03:26 | Back to Mistral's shitty [INST]! |
03:31 - 03:36 | I'm done with everyone's shit of putting special tokens wherever they want! |