API Documentation


This endpoint generates completions based on the provided prompt and parameters.

Request: POST /v1/completion


  • Content-Type: application/json
  • Authorization: Bearer <your_api_key>


The request accepts a JSON body with the following properties:

promptstringThe prompt to generate from.Yes
modelstring or listThe model name(s) to use for completion. Can be a single model name or a list of model names.Yes
max_tokensintegerMaximum number of tokens in the response.No
temperaturefloatControls randomness in the response. Ranges from 0 to 1.No
top_pfloatControls the nucleus sampling method. Ranges from 0 to 1.No
stoplist of stringsList of strings that indicates stopping tokens.No
presence_penaltyfloatAffects the presence of tokens in the response. Ranges from -2 to 2.No
frequency_penaltyfloatAffects the frequency of tokens in the response. Ranges from -2 to 2.No

Supported Models

  • llama-2-70b-chat
  • llama-2-70b
  • llama-2-13b-chat
  • llama-2-7b
  • llama-2-7b-chat
  • gpt-4
  • gpt-4-0613
  • gpt-4-0314
  • gpt-4-32k
  • gpt-4-32k-0613
  • gpt-4-32k-0314
  • gpt-3.5-turbo
  • gpt-3.5-turbo-0613
  • gpt-3.5-turbo-0301
  • gpt-3.5-turbo-16k
  • gpt-3.5-turbo-16k-0613
  • cohere-command
  • cohere-command-light
  • cohere-command-nightly
  • text-davinci-001


  "prompt": "Once upon a time",
  "model": ["gpt-4", "llama-2-70b"],
  "max_tokens": 50,
  "temperature": 0.7,
  "top_p": 0.8,
  "stop": ["."],
  "presence_penalty": 0.5,
  "frequency_penalty": -0.5


When a request to the /v1/completion endpoint is successful, the response will contain the following fields:

idUUIDA unique identifier for the completion request.
objectstringThe type of object returned, in this case, it would typically represent a completion type object.
createdtime (ISO 8601 format)The timestamp of when the completion was generated.
choicesarray of CompletionChoiceAn array of completion choices. Each choice contains generated text and other related information.
usageUsage objectProvides information about the number of tokens used in the prompt, completion, and the prediction time in milliseconds.


Each CompletionChoice in the choices array has the following fields:

textstringThe generated completion text.
createdintThe timestamp of when the choice was created.
modelstringThe name of the model that was used to generate this choice.
finish_reasonstringThe reason the generation for this choice was finished (e.g., "stop token reached").
usageUsage objectProvides information about the number of tokens used for this specific choice.


The Usage object contains information about token usage and prediction time:

prompt_tokensintThe number of tokens in the provided prompt.
prediction_time_msint64The time taken, in milliseconds, to generate the completion.
completion_tokensintThe number of tokens in the generated completion.
total_tokensintThe total number of tokens, including both prompt and completion.