GPT OSS 120B Fast and GLM 4.7 Fast Are Now Available on just4o.chat

March 29, 2026, 4:00 PM•The just4o.chat Team

We’ve added two new Cerebras-backed fast paths to just4o.chat: GPT OSS 120B Fast and GLM 4.7 Fast.

This is one of those launches where the headline is not just “we support two more models.” The real story is speed.

Cerebras says its hosted gpt-oss-120B runs at 3,000 tokens per second at full 128k context. For GLM-4.7, Cerebras says code generation runs at approximately 1,000 tokens per second, and in some use cases can reach 1,700 TPS. Those numbers matter because they change what an open model feels like in practice. You stop waiting around for the model to gather itself. You can iterate faster, keep multi-step workflows moving, and actually treat open-weight models like live tools instead of slow background jobs.

That is exactly why we wanted these models inside just4o.chat.

Why this fits just4o.chat

just4o.chat is built for people who care about continuity, context, and honest model choice.

That means:

the model you select is the model you get
your files, projects, and memory stay in the same workspace
you are not dealing with silent router swaps in the middle of a long thread
you can move between providers without rebuilding your whole workflow each time

Fast open models make that setup more interesting. If you already like the idea of direct model choice, the next thing you notice is latency. A model can be smart and still feel hard to live with if every response drags. Cerebras-backed routes change that tradeoff.

What is now available

GPT OSS 120B Fast

On just4o.chat, GPT OSS 120B Fast is the Cerebras-backed fast path for OpenAI’s open-weight GPT OSS model.

In our current setup, it offers:

131k context
function calling
1 premium request per send

It does not support web search or image input on just4o.chat.

The point of this route is simple: keep the GPT OSS family available in a form that feels genuinely quick enough for real coding, tool use, and back-and-forth agentic work.

GLM 4.7 Fast

We’ve also added GLM 4.7 Fast, routed through Cerebras chat completions.

In our current setup, it offers:

131k context
function calling
2 premium requests per send

It also does not support web search or image input on just4o.chat.

In the repo, this route is treated as a preview path, which lines up with Cerebras’ own positioning of GLM-4.7 as a newer high-speed option for coding, tool-driven agents, and multi-turn reasoning.

Why speed matters more than people admit

There is a weird habit in AI discourse where people talk about intelligence as if latency barely matters. But if you actually use these systems every day, latency shapes the experience almost as much as model quality does.

If a model takes too long, you ask fewer follow-up questions. You try fewer branches. You avoid exploratory workflows. You stop using it like a thought partner and start using it like a vending machine.

Very fast inference changes that. It makes:

coding loops tighter
tool-using workflows less fragile
long-context chats more tolerable
agent-style iteration feel more natural

That is especially true on just4o.chat, where the whole point is not just one isolated completion, but an ongoing workspace with memory, files, projects, personas, and explicit model choice.

Where these models fit

If you want a rough mental map:

GPT OSS 120B Fast looks like the stronger “fast GPT OSS” option when you want OpenAI’s open-weight line with function calling and very low latency.
GLM 4.7 Fast looks especially interesting for coding, tool-driven workflows, and people who want another strong open-model option with a different style and behavior profile.

Neither of these routes is trying to be “the only model you’ll ever need.” That is not really the just4o.chat worldview anyway. The point is to give you a legible, usable lineup where different models can shine for different reasons, without forcing you out of the same environment every time you want to compare them.

Why we added them now

One of the most fun things happening in AI right now is that open models are getting good enough to matter, and inference infrastructure is finally getting fast enough to make that quality feel immediate.

Cerebras is a big part of that story.

So bringing GPT OSS 120B Fast and GLM 4.7 Fast into just4o.chat felt like an obvious move. They give users two more serious options for low-latency reasoning, coding, and tool use, while keeping the things that matter most here intact: direct model choice, memory continuity, project context, and a workspace that stays coherent over time.

If you care about open models but hate the feeling that “open” has to mean “slow,” these are worth trying.