Fish Audio Voice Cloning

2025.9.14 This site now supports the fish-tts model
Many built-in voices, such as: Elon Musk, Zhao Benshan, Zhu Bajie, Sun Wukong voice cloning
Supports Emotion Control Emotion Control
Supports uploading your own audio files for cloning
Fully compatible with the OpenAI TTS /v1/audio/speech interface

Pricing

Price at exchange rate 1:3
Official site: $15.00 USD per million UTF-8 bytes
Ours: ¥45.00 CNY per million UTF-8 bytes

API

Convention: include Authorization: Bearer hk-your-key in the request header

1. Clone Voice

post https://api.openai-hk.com/v1/audio/speech

shell

curl --request POST \
  --url https://api.openai-hk.com/v1/audio/speech \
  --header 'Authorization: Bearer hk-your-key' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "fish-tts",
    "input": "(高兴) 师傅，我想娶媳妇",
    "voice": "d7900c21663f485ab63ebdb7e5905036"
}'  \
  --output speech.mp3

curl --request POST \
  --url https://api.openai-hk.com/v1/audio/speech \
  --header 'Authorization: Bearer hk-your-key' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "fish-tts",
    "input": "(高兴) 师傅，我想娶媳妇",
    "voice": "d7900c21663f485ab63ebdb7e5905036"
}'  \
  --output speech.mp3

Output is mp3, header Content-Type: application/octet-stream
The response header contains task-id: b7f228140bd949d7b466e1c33566a2fd
With this task-id, the mp3 link is https://platform.r2.fish.audio/task/b7f228140bd949d7b466e1c33566a2fd.mp3
voice is the voice clone source; d7900c21663f485ab63ebdb7e5905036 is the Zhu Bajie voice, which can be a model id from Fish Audio
More voice sources can be retrieved via the API or by uploading your own audio to clone
input is the content; emotion control tags are supported in the content — Official Docs
Return JSON format

Request body

json

{
  "model": "fish-tts",
  "input": "(高兴) 师傅，我想娶媳妇",
  "voice": "d7900c21663f485ab63ebdb7e5905036",
  "response_format": "url"
}

{
  "model": "fish-tts",
  "input": "(高兴) 师傅，我想娶媳妇",
  "voice": "d7900c21663f485ab63ebdb7e5905036",
  "response_format": "url"
}

Parameter Description

Parameter	Type	Description
model	string	Voice clone model name
input	string	Text content
voice	string	Voice clone source, model id from Fish Audio
response_format	string	Response format, default is stream, can be url

Response

json

{
  "audio_url": "https://platform.r2.fish.audio/task/ad68621d26334758922403bd2d0a4bd4.mp3"
}

{
  "audio_url": "https://platform.r2.fish.audio/task/ad68621d26334758922403bd2d0a4bd4.mp3"
}

2. Create Your Own Voice Clone Source

post https://api.openai-hk.com/fish/model

shell

curl --request POST \
  --url https://api.openai-hk.com/fish/model \
  --header 'Authorization: Bearer hk-your-key' \
  --header 'Content-Type: application/json' \
  --data '{
    "title":"Teacher"
    ,"description":"Description"
    ,"voices":"https://platform.r2.fish.audio/task/604133d7b3c7430385382470f67770e8.mp3"
    ,"cover_image":"https://www.open-hk.com/res/img/open.png"
    ,"train_mode":"fast"

}'

curl --request POST \
  --url https://api.openai-hk.com/fish/model \
  --header 'Authorization: Bearer hk-your-key' \
  --header 'Content-Type: application/json' \
  --data '{
    "title":"Teacher"
    ,"description":"Description"
    ,"voices":"https://platform.r2.fish.audio/task/604133d7b3c7430385382470f67770e8.mp3"
    ,"cover_image":"https://www.open-hk.com/res/img/open.png"
    ,"train_mode":"fast"

}'

To prevent abuse, each call deducts 50 credits

Parameter description

Parameter	Type	Description
title	string	Voice clone name
description	string	Voice clone description
voices	string	Audio file URL
cover_image	string	Cover image
train_mode	string	Voice clone training mode
is_save	bool	Default is false; whether to keep the model — if not saved, it will be deleted after 10 minutes

Response body

The _id is used as the voice parameter

json

{
  "_id": "d3645891f9e742108c313e66271394ac",
  "type": "tts",
  "title": "老师",
  "description": "描述",
  "cover_image": "coverimage/d3645891f9e742108c313e66271394ac",
  "train_mode": "fast",
  "state": "trained",
  "tags": [],
  "samples": [
    {
      "title": "Default Sample",
      "text": "通过光合作用，植物能够将阳光转化为生命所需的能量。这个精密的生物化学过程不仅维持了植物的生长，更为地球上的氧气循环做出了重要贡献，展现了自然界的奇妙平衡。",
      "task_id": "1ae43d4f956d4744b7ecb6c6d54e4437",
      "audio": "task/1ae43d4f956d4744b7ecb6c6d54e4437.mp3"
    }
  ],
  "created_at": "2025-09-14T15:10:09.033883Z",
  "updated_at": "2025-09-14T15:10:09.033485Z",
  "languages": ["zh"],
  "visibility": "public",
  "lock_visibility": false,
  "default_text": "通过光合作用，植物能够将阳光转化为生命所需的能量。这个精密的生物化学过程不仅维持了植物的生长，更为地球上的氧气循环做出了重要贡献，展现了自然界的奇妙平衡。",
  "like_count": 0,
  "mark_count": 0,
  "shared_count": 0,
  "task_count": 0,
  "unliked": false,
  "liked": false,
  "marked": false
}

{
  "_id": "d3645891f9e742108c313e66271394ac",
  "type": "tts",
  "title": "老师",
  "description": "描述",
  "cover_image": "coverimage/d3645891f9e742108c313e66271394ac",
  "train_mode": "fast",
  "state": "trained",
  "tags": [],
  "samples": [
    {
      "title": "Default Sample",
      "text": "通过光合作用，植物能够将阳光转化为生命所需的能量。这个精密的生物化学过程不仅维持了植物的生长，更为地球上的氧气循环做出了重要贡献，展现了自然界的奇妙平衡。",
      "task_id": "1ae43d4f956d4744b7ecb6c6d54e4437",
      "audio": "task/1ae43d4f956d4744b7ecb6c6d54e4437.mp3"
    }
  ],
  "created_at": "2025-09-14T15:10:09.033883Z",
  "updated_at": "2025-09-14T15:10:09.033485Z",
  "languages": ["zh"],
  "visibility": "public",
  "lock_visibility": false,
  "default_text": "通过光合作用，植物能够将阳光转化为生命所需的能量。这个精密的生物化学过程不仅维持了植物的生长，更为地球上的氧气循环做出了重要贡献，展现了自然界的奇妙平衡。",
  "like_count": 0,
  "mark_count": 0,
  "shared_count": 0,
  "task_count": 0,
  "unliked": false,
  "liked": false,
  "marked": false
}

3. Official Voice Sources

get https://api.openai-hk.com/fish/q/model?page_size=10&page_number=1&title=&language=zh&sort_by=task_count

Response body — the _id is used as the voice parameter

json

{
  "total": 272820,
  "items": [
    {
      "_id": "54a5170264694bfc8e9ad98df7bd89c3",
      "type": "tts",
      "title": "丁真",
      "description": "",
      "cover_image": "coverimage/54a5170264694bfc8e9ad98df7bd89c3",
      "train_mode": "fast"
    },
    ......,
    {}
  ]
}

{
  "total": 272820,
  "items": [
    {
      "_id": "54a5170264694bfc8e9ad98df7bd89c3",
      "type": "tts",
      "title": "丁真",
      "description": "",
      "cover_image": "coverimage/54a5170264694bfc8e9ad98df7bd89c3",
      "train_mode": "fast"
    },
    ......,
    {}
  ]
}

4. Emotion and Control Tags

Official Reference (Chinese)

Examples

At the start of a sentence: (Angry) Is this how you repay me?
Mid-sentence: I trusted you so much, (Angry) is this how you repay me?
Speed up: Run! Someone is chasing us, (speed up) if we don't run now it'll be too late!
Slow down: He spoke word by word, (slow down) as if each word weighed a thousand pounds.
Lower volume: He leaned close to my ear, (lower volume) and quietly told me a secret.
Raise volume: (Loud) What did you say? I can't hear you!
Excited tone: This is unbelievable! (Excited) We actually succeeded!
Laugh: Hearing the joke, he couldn't hold it anymore, (laugh) hahaha!
Cry: She covered her face, (cry) sobbing, unable to say another word.
Sigh: How did things end up like this... (sigh) Ah.

5. Official /v1/tts Interface

post https://api.openai-hk.com/fish/v1/tts

shell

curl --request POST \
  --url https://api.openai-hk.com/fish/v1/tts \
  --header 'Authorization: Bearer hk-your-key' \
  --header 'format: url' \
  --header 'model: s2-pro' \
  --data '{
    "text": "[嘿嘿]你说什么?",
    "reference_id": "5c353fdb312f4888836a9a5680099ef0",
    "temperature": 0.7,
    "top_p": 0.7,
    "prosody": {
      "speed": 1,
      "volume": 0,
      "normalize_loudness": true
    },
    "chunk_length": 300,
    "normalize": true,
    "format": "mp3",
    "sample_rate": 44100,
    "mp3_bitrate": 128,
    "latency": "normal",
    "max_new_tokens": 1024,
    "repetition_penalty": 1.2,
    "min_chunk_length": 50,
    "condition_on_previous_chunks": true,
    "early_stop_threshold": 1,
    "references":[{
        "audio_url":"http://cos.aitutu.cc/mp4/ru-user-voice.mp3",
        "text":"Hello! Welcome to Fish Audio."
    }]
  }'

curl --request POST \
  --url https://api.openai-hk.com/fish/v1/tts \
  --header 'Authorization: Bearer hk-your-key' \
  --header 'format: url' \
  --header 'model: s2-pro' \
  --data '{
    "text": "[嘿嘿]你说什么?",
    "reference_id": "5c353fdb312f4888836a9a5680099ef0",
    "temperature": 0.7,
    "top_p": 0.7,
    "prosody": {
      "speed": 1,
      "volume": 0,
      "normalize_loudness": true
    },
    "chunk_length": 300,
    "normalize": true,
    "format": "mp3",
    "sample_rate": 44100,
    "mp3_bitrate": 128,
    "latency": "normal",
    "max_new_tokens": 1024,
    "repetition_penalty": 1.2,
    "min_chunk_length": 50,
    "condition_on_previous_chunks": true,
    "early_stop_threshold": 1,
    "references":[{
        "audio_url":"http://cos.aitutu.cc/mp4/ru-user-voice.mp3",
        "text":"Hello! Welcome to Fish Audio."
    }]
  }'

model is placed in the header; supports s2-pro and s1
To return a URL, add format: url to the header; default returns a stream
Parameter description

Parameter	Type	Description
text	string	Text content
reference_id	string	Voice timbre clone id
references	object	Voice timbre clone references; ignored if reference_id has a value
references[0].audio_url	string	Voice timbre clone audio URL; duration 10s~270s; supports mp3 and wav
references[0].text	string	Reference sample text for the voice timbre clone

Fish Audio Voice Cloning ​

Pricing ​

API ​

1. Clone Voice ​

Parameter Description ​

2. Create Your Own Voice Clone Source ​

3. Official Voice Sources ​

4. Emotion and Control Tags ​

5. Official /v1/tts Interface ​

Fish Audio Voice Cloning

Pricing

API

1. Clone Voice

Parameter Description

2. Create Your Own Voice Clone Source

3. Official Voice Sources

4. Emotion and Control Tags

5. Official /v1/tts Interface