電腦桌寵製作_聲音篇

2026年03月27日 3k 字大概 16 分鐘

又臭又長，還寫得不完整，總歸是我的心血所以紀錄一下

這個東西主要是24年年初的時候寫了一點後就放置沒動，最近又想起來所以先寫一下紀錄。

開始是做一個桌面機器人，配好看的live2D，就像是我的blog一樣，讓我在用電腦的時候有一個桌寵待在右下角。

我的環境：

i5-12400F
32GB RAM
RTX 4060

首先設想的開頭架構：

使用者輸入 → 
   機器人接收 → 
      （呼叫記憶檢索）→ 
         組合「個性 + 記憶 + 主人偏好」→ 
            發給本地 LLM → 
               解析 → 
                  Live2D 語音 + 動作 → 回覆使用者

本地模型備選：
llama3.1-8B、phi-3 mini (3.8B)、qwen2.5-7B
總之以輕量化為主。

語音模型的話則是經歷無數次的失敗後使用了Kokoro

以下是聲音模型訓練失敗紀錄

不想看失敗紀錄的話可以跳過不看直接從最後面開始

安裝

訓練需要準備音檔：WAV轉IPA
由於我沒有viurtal studio因此我需要安裝openjtalk
如果沒有的話就用：openjtalk 0.3.0.dev3
下載對應環境的whl，抓好後正常安裝即可。

1	`python -m pip install .\openjtalk-0.3.0.dev2-cp310-cp310-win_amd64.whl`

因為anaconda缺少cmake所以需要安裝編譯好的whl
https://cmake.org/
參考網站

聲音模型訓練

這邊我嘗試使用：so-vits-svc

環境要求：python 3.9
新建一個環境：
1
conda create -n 39 python=3.9

套件安裝

1
2
3

pip install -r requirements_win.txt
(使用4060)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

碰到錯誤(1)：

1
2
3

(38) C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable>pip install matplotlib
ERROR: Could not find a version that satisfies the requirement matplotlib (from versions: none)
ERROR: No matching distribution found for matplotlib

解決方法

1	`python -m pip install -U matplotlib`

解決方法參考

碰到錯誤(2)：

1
2
3

(38) C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable>pip install faiss
ERROR: Could not find a version that satisfies the requirement faiss (from versions: none)
ERROR: No matching distribution found for faiss

解決方法

1	`pip install faiss-cpu langchain-community`

解決方法參考

碰到錯誤(3)：

1
2
3

(38) C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable>pip install yaml
ERROR: Could not find a version that satisfies the requirement yaml (from versions: none)
ERROR: No matching distribution found for yaml

解決方法

1
2
3

pip install pyyaml
pip install rich
pip install tensorboard

在這邊修正腳本為單GPU。

安裝 fairseq：

1 2	`git clone https://github.com/facebookresearch/fairseq.git cd fairseq`

錯誤訊息：

(38) C:\Users\miss2\Desktop\fairseq>pip install --editable ./
Obtaining file:///C:/Users/miss2/Desktop/fairseq
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... error
error: subprocess-exited-with-error
× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
Traceback (most recent call last):
File "D:\anaconda\envs\38\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 389, in <module>
main()
File "D:\anaconda\envs\38\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
File "D:\anaconda\envs\38\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 157, in get_requires_for_build_editable
return hook(config_settings)
File "C:\Users\miss2\AppData\Local\Temp\pip-build-env-hmxy_ij0\overlay\Lib\site-packages\setuptools\build_meta.py", line 473, in get_requires_for_build_editable
return self.get_requires_for_build_wheel(config_settings)
File "C:\Users\miss2\AppData\Local\Temp\pip-build-env-hmxy_ij0\overlay\Lib\site-packages\setuptools\build_meta.py", line 331, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
File "C:\Users\miss2\AppData\Local\Temp\pip-build-env-hmxy_ij0\overlay\Lib\site-packages\setuptools\build_meta.py", line 301, in _get_build_requires
self.run_setup()
File "C:\Users\miss2\AppData\Local\Temp\pip-build-env-hmxy_ij0\overlay\Lib\site-packages\setuptools\build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 246, in <module>
OSError: [WinError 1314] 用戶端沒有這項特殊權限。: '..\\examples' -> 'fairseq\\examples'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed to build 'file:///C:/Users/miss2/Desktop/fairseq' when getting requirements to build editable

使用系統管理員開啟Anaconda後錯誤訊息變成：

(38) C:\Users\miss2\Desktop\fairseq>pip install --editable ./ --no-deps
Obtaining file:///C:/Users/miss2/Desktop/fairseq
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: fairseq
Building editable for fairseq (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building editable for fairseq (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [37 lines of output]
C:\Users\miss2\AppData\Local\Temp\pip-build-env-91abdbrk\overlay\Lib\site-packages\setuptools\_distutils\[dist.py:289](http://dist.py:289/): UserWarning: Unknown distribution option: 'test_suite'
warnings.warn(msg)
C:\Users\miss2\AppData\Local\Temp\pip-build-env-91abdbrk\overlay\Lib\site-packages\setuptools\[dist.py:759](http://dist.py:759/): SetuptoolsDeprecationWarning: License classifiers are deprecated.
!!
          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: MIT License

          See <https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license> for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running editable_wheel
  creating C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info
  writing C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\PKG-INFO
  writing dependency_links to C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\dependency_links.txt
  writing entry points to C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\entry_points.txt
  writing requirements to C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\requires.txt
  writing top-level names to C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\top_level.txt
  writing manifest file 'C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\SOURCES.txt'
  W1115 14:44:57.268483 14608 torch\\utils\\cpp_extension.py:615] Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  reading manifest file 'C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  adding license file 'LICENSE'
  writing manifest file 'C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq.egg-info\\SOURCES.txt'
  creating 'C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq-0.12.2.dist-info'
  C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-build-env-91abdbrk\\overlay\\Lib\\site-packages\\setuptools\\command\\bdist_wheel.py:103: RuntimeWarning: Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect
    if get_flag("Py_DEBUG", hasattr(sys, "gettotalrefcount"), warn=(impl == "cp")):
  creating C:\\Users\\miss2\\AppData\\Local\\Temp\\pip-wheel-rgr7_co9\\.tmp-jg5ensi6\\fairseq-0.12.2.dist-info\\WHEEL
  running build_py
  running build_ext
  W1115 14:44:57.376789 14608 torch\\utils\\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] 系統找不到指定的檔案。
  building 'fairseq.libbleu' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": <https://visualstudio.microsoft.com/visual-cpp-build-tools/>
  [end of output]
 note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building editable for fairseq
Failed to build fairseq
ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects

解決方法：
安裝Microsoft C++ Build Tools
Visual Studio Build Tools

是的，可以看出來前面許多錯誤都是安裝Visual Studio就能直接解決了，雖然我知道但那時候我就是一直不想安裝。

音高調整模型

Port	來源	作用
6006	TensorBoard	顯示訓練狀態（loss、訓練曲線）
7860	Gradio WebUI	開模型推理介面（voice conversion）

RMVPE 官方模型
碰到錯誤：

(38) C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable>pip install parselmouth
Collecting parselmouth
  Downloading parselmouth-1.1.1.tar.gz (33 kB)
  Preparing metadata (setup.py) ... done
Collecting googleads==3.8.0 (from parselmouth)
  Downloading googleads-3.8.0.tar.gz (23 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in googleads setup command: use_2to3 is invalid.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

[notice] A new release of pip is available: 24.0 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip

解決方法：

1	`pip install praat-parselmouth`

生成fileset

1	`python [resample.py](http://resample.py/)`

生成config

1	`python preprocess_flist_config.py --speech_encoder vec768l12`

進行預處理(使用pm) ←官方解釋音質最差

python preprocess_hubert_f0.py –f0_predictor pm –num_processes 4

進行預處理(使用fcpe) ←最後選擇方案

python preprocess_hubert_f0.py –f0_predictor fcpe –num_processes 4

進行預處理(使用RMVPE) ←官方最推薦但無法成功運行

~~python preprocess_hubert_f0.py –f0_predictor rmvpe –num_processes 4~~

開始訓練

由於電腦只有單核所以將train整個進行重寫使用單核。

使用webUI檢查

tensorboard –logdir logs/44k –port 6006
http://localhost:6006/#timeseries

可以看到訓練情況

訓練過程中terminal會實時輸出訓練情況例如：
INFO:44k:====> Epoch: 29
表示已跑完第 29 個 epoch。
或是可以在log中查看：
logs/44k/train.log

在訓練過程中會跳出一堆告警以及看不出來跑到哪，所以這邊我有修了程式讓他變得簡潔。

進行模型測試

碰到錯誤(1)：

(38) C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable>python webUI.py
Traceback (most recent call last):
  File "C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable\webUI.py", line 14, in <module>
    import gradio as gr
ModuleNotFoundError: No module named 'gradio'

解決方式：

1	`pip install gradio`

碰到錯誤(2)：

(38) C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable>python webUI.py
Traceback (most recent call last):
  File "C:\Users\miss2\Desktop\so-vits-svc-4.1-Stable\so-vits-svc-4.1-Stable\webUI.py", line 14, in <module>
    import gradio as gr
  File "D:\anaconda\envs\38\lib\site-packages\gradio\__init__.py", line 3, in <module>
    import gradio._simple_templates
  File "D:\anaconda\envs\38\lib\site-packages\gradio\_simple_templates\__init__.py", line 1, in <module>
    from .simpledropdown import SimpleDropdown
  File "D:\anaconda\envs\38\lib\site-packages\gradio\_simple_templates\simpledropdown.py", line 6, in <module>
    from gradio.components.base import Component, FormComponent
  File "D:\anaconda\envs\38\lib\site-packages\gradio\components\__init__.py", line 1, in <module>
    from gradio.components.annotated_image import AnnotatedImage
  File "D:\anaconda\envs\38\lib\site-packages\gradio\components\annotated_image.py", line 14, in <module>
    from gradio.components.base import Component
  File "D:\anaconda\envs\38\lib\site-packages\gradio\components\base.py", line 20, in <module>
    from gradio.blocks import Block, BlockContext
  File "D:\anaconda\envs\38\lib\site-packages\gradio\blocks.py", line 39, in <module>
    from gradio import (
  File "D:\anaconda\envs\38\lib\site-packages\gradio\networking.py", line 15, in <module>
    from gradio.routes import App  # HACK: to avoid circular import # noqa: F401
  File "D:\anaconda\envs\38\lib\site-packages\gradio\routes.py", line 69, in <module>
    from gradio import ranged_response, route_utils, utils, wasm_utils
  File "D:\anaconda\envs\38\lib\site-packages\gradio\route_utils.py", line 53, in <module>
    from gradio.helpers import EventData
  File "D:\anaconda\envs\38\lib\site-packages\gradio\helpers.py", line 26, in <module>
    from gradio import components, oauth, processing_utils, routes, utils, wasm_utils
  File "D:\anaconda\envs\38\lib\site-packages\gradio\oauth.py", line 13, in <module>
    from huggingface_hub import HfFolder, whoami
ImportError: cannot import name 'HfFolder' from 'huggingface_hub' (D:\anaconda\envs\38\lib\site-packages\huggingface_hub\__init__.py)

解決方式：

pip show gradio huggingface-hub

pip install einops

pip install fastapi==0.95.2 uvicorn==0.22.0 Collecting fastapi==0.95.2
//因為我的Gradio版本太新了需要降版
//so-vits-svc-4.1 的 WebUI 只能跑 Gradio 3.x 版本。
pip install gradio==3.41.2 gradio_client==0.2.5

python [webUI.py](http://webui.py/)

打開瀏覽器前往：
http://127.0.0.1:7860

將訓練好的模型等數值設定好

F0 變換方式 - rmvpe
变调（pitch）- 0
保护（protect）- 0.5
切片阈值 - -40

在這邊失敗了，明明看起來模型已經訓練完成但實際產出來的聲音不如預期且雜音過多。
嘗試了好幾次真希望有AI能教教我呵呵。

這應該是已經要過擬合的程度了為什麼還是雜音這麼多呢……..?

在這之後又嘗試了GPT‑SoVITS
一樣是失敗告終

最後只能先用別人寫好的了

使用聲音伺服器

kokorotts-webui

運行：
python -m venv kokoro-env

這邊我有加上了api，為了讓後端能直接呼叫使用。
這邊我們使用FastAPI + uvicorn

1	`pip install fastapi uvicorn`

from fastapi import FastAPI
from fastapi.responses import FileResponse
import uvicorn

api_app = FastAPI()

@api_app.post("/api/tts")
async def tts_api(request: dict):
    text = request.get("text", "")
    voice = request.get("voice", "en_daniel")
    speed = float(request.get("speed", 1.0))
    language = request.get("language", "en-us")

    audio_path, status = tts_generate(text, voice, speed, language)
    if audio_path is None:
        return {"error": status}

    return FileResponse(audio_path, media_type="audio/wav")

以及修改我們的main

if __name__ == "__main__":
    # Run BOTH:  Gradio UI (port 7860)  + API (port 8001)
    import threading

    def run_gradio():
        demo.launch(server_name="0.0.0.0", server_port=7860)

    def run_api():
        uvicorn.run(api_app, host="0.0.0.0", port=8001)

    # Start both servers
    threading.Thread(target=run_gradio).start()
    threading.Thread(target=run_api).start()

使用頁面：

可被後端呼叫的API
http://localhost:8001/docs

原本的WebUI
http://localhost:7860/

API傳送測試

curl -X POST "http://localhost:8001/api/tts" ^
-H "Content-Type: application/json" ^
-d "{\"text\":\"helloworld\",\"voice\":\"pm_alex\",\"language\":\"en-us\",\"speed\":1.0}" ^
--output test.wav

測試完後打開TEST.WAV聽看看聲音就成功了。

運行：
.\kokoro-env\Scripts\activate
python -X utf8 app.py

額外補充：當初沒抓到可用的聲音，因此建立了以下腳本去讀去WebUI底下的kokoro物件。

1	`notepad list_voices_from_running.py`

from app import kokoro, voice_options

print("Voices from app.py loaded kokoro instance:")
for v in voice_options:
    print("-", v)

執行後可以得到以下清單：

python list_voices_from_running.py WARNING: CUDA/MPS are not available, running on CPU. This will be slow! Error loading Kokoro: type object 'EspeakWrapper' has no attribute 'set_data_path' Voices from app.py loaded kokoro instance: 
- af_sarah 
- en_erin 
- en_daniel 
- en_vicki 
- en_brandon 
- ja_akira 
- ja_naomi 
- de_anna 
- fr_elise 
- es_carlos 
- it_marco 
- zh_mei 
- ru_ivan 
- ko_mina

電腦桌寵製作_聲音篇

以下是聲音模型訓練 失敗紀錄

安裝

聲音模型訓練

安裝 fairseq：

音高調整模型

進行預處理(使用pm) ←官方解釋音質最差

進行預處理(使用fcpe) ←最後選擇方案

進行預處理(使用RMVPE) ←官方最推薦但無法成功運行

開始訓練

使用webUI檢查

進行模型測試

使用聲音伺服器

以下是聲音模型訓練失敗紀錄