00:00 - 00:05	So I've been doing nothing but NeMo merges for the past 72 hours.
00:05 - 00:07	Trying to pull off the best long context one!
00:07 - 00:10	Shuffling around Shuttle Mini and Magnum 2.5 KTO like a professional
00:11 - 00:16	dealer at an expensive casino, except you can't rely on card counting to win.
00:17 - 00:21	And then I found out Magnum shits itself at contexts above 32k.
00:21 - 00:24	Just breaks!
00:29 - 00:31	It BREAKS!
00:31 - 00:34	'Falls off on higher contexts', my ass!
00:38 - 00:40	It spurts nonsense!
00:41 - 00:47	But that's not the end of the world since I still have Shuttle trained on 128k, right?!
00:47 - 00:50	I praised Shuttle on Drummer's Discord for working on high contexts and
00:50 - 00:56	Fizz suddenly jumps in with "I have no idea HOW."
00:58 - 00:59	"It was trained with 16k!"
00:59 - 01:01	Just like Magnum, yet it works!
01:04 - 01:06	Kalomaze about to go on a suicide watch!
01:06 - 01:08	Meanwhile MistralAI is just taking the piss!
01:13 - 01:14	But that's not all!
01:14 - 01:16	Turns out the best model
01:16 - 01:20	working on high contexts
01:21 - 01:27	is fucking Lyra v1!
01:35 - 01:36	The only one that claims to handle only up to 16k!
01:36 - 01:39	I was putting it lower,
01:41 - 01:44	even removing it at some point from my merges entirely.
01:44 - 01:47	Because Sao claimed that it fell off after 16k.
01:47 - 01:49	You know, they said "they have tried loras with up to 64K,
01:50 - 01:53	but they just do not work well."
01:54 - 01:56	My ass!
01:56 - 01:58	Lyra is better at recalling stuff
01:58 - 02:02	than the official NeMo Instruct!
02:05 - 02:07	No joke.
02:10 - 02:11	Meanwhile,
02:11 - 02:13	models like Rocinante
02:15 - 02:17	trained atop Instruct,
02:17 - 02:20	also shit themselves
02:21 - 02:28	at 32k contexts!
02:29 - 02:30	That's just sad.
02:30 - 02:32	The only 'moist' thing about it
02:36 - 02:39	are the tears it brings out of me
02:40 - 02:42	at how much time I wasted
02:50 - 02:55	on trying to ram it into my merges!
02:56 - 02:58	It didn't make the prose better?
02:58 - 03:04	No, no, it writes pretty good!
03:04 - 03:09	That is, only if you don't use ChatML!
03:09 - 03:11	And I was trying to pull off a ChatML merge.
03:16 - 03:20	Jokes on me I guess!
03:23 - 03:26	Back to Mistral's shitty [INST]!
03:31 - 03:36	I'm done with everyone's shit of putting special tokens wherever they want!

MarinaraSpaghetti's merging experience

Captions

1 Comment

Taktyka na mecz

Dimitria Orel

Harald Hårdråde

Meler reaguje na inc z CCP

Mara

Komisariat II Policji- upadek obyczaju...

Astex moderuje GoWorka

Karo dowiaduje się o tym że eventowych mebli nie można sprzedać.

Doradca Doradcy

Harmonia Cloudy

In another universe

scena