Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated voice is not a valid speech #722

Open
cod3r0k opened this issue Feb 3, 2025 · 0 comments
Open

Generated voice is not a valid speech #722

cod3r0k opened this issue Feb 3, 2025 · 0 comments

Comments

@cod3r0k
Copy link

cod3r0k commented Feb 3, 2025

Hi, I trained my model, and during the inference phase, I followed these steps but did not hear a vocal voice. Why?

Step1: Training

python fish_speech/train.py --config-name text2semantic_finetune     project="run3"     [email protected]_config=r_8_alpha_16

Image

Step2: Inference:

prepare model for inference:

python tools/llama/merge_lora.py     --lora-config r_8_alpha_16     --base-weight checkpoints/fish-speech-1.5     --lora-weight results/run3/checkpoints/step_000045900.ckpt     --output checkpoints/fish-speech-1.5-yth-lora-2A100

Then as mentioned in documentation (https://speech.fish.audio/inference/#1-generate-prompt-from-voice):

python fish_speech/models/vqgan/inference.py     -i "paimon.wav"     --checkpoint-path "checkpoints/fish-speech-1.5-yth-lora-2A100/model.pth"

I can not hear any valid voice (https://drive.google.com/file/d/1w3MPQ6jL0Mc5qneBF2fgR9G7-aoTiBtP/view?usp=sharing)

Also, the next evaluation step, as mentioned in the documentation (https://speech.fish.audio/inference/#2-generate-semantic-tokens-from-text) is not working well for me to generate voice:

fish_speech/models/text2semantic/inference.py     --text "The text you want to convert"     --prompt-text "Your reference text"     --prompt-tokens "fake.npy"     --checkpoint-path "checkpoints/fish-speech-1.5-yth-lora-2A100/"     --num-samples 2     --compile

and

python fish_speech/models/vqgan/inference.py     -i "codes_0.npy"     --checkpoint-path "checkpoints/fish-speech-1.5-yth-lora-2A100/model.pth"

which return

2025-02-03 10:52:33.867 | INFO     | __main__:main:99 - Processing precomputed indices from codes_0.npy
2025-02-03 10:52:34.328 | INFO     | __main__:main:113 - Generated audio of shape torch.Size([1, 1, 112640]), equivalent to 2.55 seconds from 55 features, features/second: 21.53
2025-02-03 10:52:34.332 | INFO     | __main__:main:120 - Saved audio to fake.wav

And the fake.wav audio is attached at [Google Drive link]. Could you guide me on why it does not generate a valid response?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant