XJTop post: @_xjdr “What all is involved with onboarding a new model to noumena and ncode? I added GLM support over the weekend so i thought this might be interesting for some of you. first, you have to understand the architecture and how to properly serve it . luckily GLM5.2 is close enough to DeepSeek (which i have spent nearly a year working closely with) that this part fit very well into the existing serving platforms. it took a bit of DSA tuning but other than that, more or less was able to just be deployed in my existing dsv3 harness including the FA4 work, etc i have done over the past few months so now that it is serving you have to write model specific stream parsers for the chat format, the reasoning logic and the tool call format. writing the parsers are pretty straight forward as the hugging face project usually comes with the .jinja to specify it but understanding how to parse it in a stream and what the typical generation errors look like is a bit more challenging (you cant just look for opening and close brackets as parallel tool calls stream out a few tokens at a time) . when there is an error, typically you would log this as training data and make sure the mode was more robust next time, but as this is an OSS model, and i do my best not to save any customer data on purpose EVER, you need to be more clever. this typically means exposing the poorly formatted data back to the model and saying 'this is bad, dont do this please'. now this is just the serving end to get the responses into an openai compatible format, but to add support into ncode, it means exercising every tool call available to the model and common tool call chains to make sure the prompts, tool schema contracts and the ncode side parsing all the model to understand how to use all of the tools at its disposal (and ideally use them well) . luckily GLM was very well trained on ncode shaped tool calls so it didnt take as much work as i had feared. Similarly to the serving side, as i do not store session data for training, in order to make the model behave better, the idea is to give the model context when it screws up tool calls such that it can properly format the call on the next turn. there is a ton more required on the model routing, and preview metadata , and supporting multiple models in a single session and kv caching that is less interesting, but that is less than 1/2 the hours spent getting GLM onboarded for everyone! Hopefully you found that interesting and you continue to use and enjoy GLM 5.2 on noumena with ncode”