gemma-4-26b-a4b-it
Text Generation • GoogleGemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.
| Model Info | |
|---|---|
| Context Window ↗ | 256,000 tokens |
| Terms and License | link ↗ |
| Function calling ↗ | Yes |
| Reasoning | Yes |
| Vision | Yes |
| Unit Pricing | $0.10 per M input tokens, $0.30 per M output tokens |
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ];
const stream = await env.AI.run("@cf/google/gemma-4-26b-a4b-it", { messages, stream: true, });
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); },} satisfies ExportedHandler<Env>;export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ]; const response = await env.AI.run("@cf/google/gemma-4-26b-a4b-it", { messages });
return Response.json(response); },} satisfies ExportedHandler<Env>;import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post( f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/google/gemma-4-26b-a4b-it", headers={"Authorization": f"Bearer {AUTH_TOKEN}"}, json={ "messages": [ {"role": "system", "content": "You are a friendly assistant"}, {"role": "user", "content": prompt} ] })result = response.json()print(result)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/google/gemma-4-26b-a4b-it \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \ -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'Parameters
Synchronous — Send a request and receive a complete response
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.skip_special_tokens
booleandefault: falsemodel
stringID of the model to use (e.g. '@cf/zai-org/glm-4.7-flash, etc').▶audio{}
objectParameters for audio output. Required when modalities includes 'audio'.frequency_penalty
number | nullPenalizes new tokens based on their existing frequency in the text so far.logit_bias
object | nullModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.logprobs
boolean | nullWhether to return log probabilities of the output tokens.top_logprobs
integer | nullHow many top log probabilities to return at each token position (0-20). Requires logprobs=true.max_tokens
integer | nullDeprecated in favor of max_completion_tokens. The maximum number of tokens to generate.max_completion_tokens
integer | nullAn upper bound for the number of tokens that can be generated for a completion.metadata
object | nullSet of 16 key-value pairs that can be attached to the object.modalities
array | nullOutput types requested from the model (e.g. ['text'] or ['text', 'audio']).n
integer | nullHow many chat completion choices to generate for each input message.parallel_tool_calls
booleandefault: trueWhether to enable parallel function calling during tool use.▶prediction{}
objectpresence_penalty
number | nullPenalizes new tokens based on whether they appear in the text so far.reasoning_effort
string | nullConstrains effort on reasoning for reasoning models (o1, o3-mini, etc.).▶chat_template_kwargs{}
object▶response_format
one ofSpecifies the format the model must output.seed
integer | nullIf specified, the system will make a best effort to sample deterministically.service_tier
string | nullSpecifies the processing type used for serving the request.▶stop
one ofstore
boolean | nullWhether to store the output for model distillation / evals.stream
boolean | nullIf true, partial message deltas will be sent as server-sent events.▶stream_options{}
objecttemperature
number | nullSampling temperature between 0 and 2.▶tool_choice
one ofControls which (if any) tool is called by the model. 'none' = no tools, 'auto' = model decides, 'required' = must call a tool.▶tools[]
arrayA list of tools the model may call.top_p
number | nullNucleus sampling: considers the results of the tokens with top_p probability mass.user
stringA unique identifier representing your end-user, for abuse monitoring.▶web_search_options{}
objectOptions for the web search tool (when using built-in web search).▶function_call
one of▶functions[]
arrayminItems: 1maxItems: 128id
stringA unique identifier for the chat completion.object
stringcreated
integerUnix timestamp (seconds) of when the completion was created.model
stringThe model used for the chat completion.▶choices[]
arrayminItems: 1▶usage{}
objectsystem_fingerprint
string | nullservice_tier
string | nullStreaming — Send a request with `stream: true` and receive server-sent events
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.skip_special_tokens
booleandefault: falsemodel
stringID of the model to use (e.g. '@cf/zai-org/glm-4.7-flash, etc').▶audio{}
objectParameters for audio output. Required when modalities includes 'audio'.frequency_penalty
number | nullPenalizes new tokens based on their existing frequency in the text so far.logit_bias
object | nullModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.logprobs
boolean | nullWhether to return log probabilities of the output tokens.top_logprobs
integer | nullHow many top log probabilities to return at each token position (0-20). Requires logprobs=true.max_tokens
integer | nullDeprecated in favor of max_completion_tokens. The maximum number of tokens to generate.max_completion_tokens
integer | nullAn upper bound for the number of tokens that can be generated for a completion.metadata
object | nullSet of 16 key-value pairs that can be attached to the object.modalities
array | nullOutput types requested from the model (e.g. ['text'] or ['text', 'audio']).n
integer | nullHow many chat completion choices to generate for each input message.parallel_tool_calls
booleandefault: trueWhether to enable parallel function calling during tool use.▶prediction{}
objectpresence_penalty
number | nullPenalizes new tokens based on whether they appear in the text so far.reasoning_effort
string | nullConstrains effort on reasoning for reasoning models (o1, o3-mini, etc.).▶chat_template_kwargs{}
object▶response_format
one ofSpecifies the format the model must output.seed
integer | nullIf specified, the system will make a best effort to sample deterministically.service_tier
string | nullSpecifies the processing type used for serving the request.▶stop
one ofstore
boolean | nullWhether to store the output for model distillation / evals.stream
boolean | nullIf true, partial message deltas will be sent as server-sent events.▶stream_options{}
objecttemperature
number | nullSampling temperature between 0 and 2.▶tool_choice
one ofControls which (if any) tool is called by the model. 'none' = no tools, 'auto' = model decides, 'required' = must call a tool.▶tools[]
arrayA list of tools the model may call.top_p
number | nullNucleus sampling: considers the results of the tokens with top_p probability mass.user
stringA unique identifier representing your end-user, for abuse monitoring.▶web_search_options{}
objectOptions for the web search tool (when using built-in web search).▶function_call
one of▶functions[]
arrayminItems: 1maxItems: 128type
stringcontentType
text/event-streamformat
binaryBatch — Send multiple requests in a single API call
▶requests[]
arrayid
stringA unique identifier for the chat completion.object
stringcreated
integerUnix timestamp (seconds) of when the completion was created.model
stringThe model used for the chat completion.▶choices[]
arrayminItems: 1▶usage{}
objectsystem_fingerprint
string | nullservice_tier
string | null