Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I also use mp3 files for training instead of wav files? #53

Open
Jochen-sys opened this issue Apr 9, 2021 · 32 comments
Open

Can I also use mp3 files for training instead of wav files? #53

Jochen-sys opened this issue Apr 9, 2021 · 32 comments

Comments

@Jochen-sys
Copy link

Hello first of all nice work!
I wanted to ask if I could use mp3 files instead of wav files and which lines I have to change for that, if this is working?

@CracKCatZ
Copy link

Hello @Jochen-sys I think that's not possible.

@CracKCatZ
Copy link

@Jochen-sys because the spectograms are made out of the wav files

@Jochen-sys
Copy link
Author

@CracKCatZ Thanks for the quick answer.
But I also can create a spectogram with a mp3 file. Admitting I only tested it with tensorflow and I know it works there.
If it didn't work with pytorch, would it be possible to do the one task with tensorflow and the rest with pytorch?

@CracKCatZ
Copy link

@Jochen-sys hmmm good question I actually don't know if this is possible but I think yes you can do one part with tensorflow and the other with pytorch you just need to fed the spectograms some how into pytorch.

@CracKCatZ
Copy link

@Jochen-sys but why would you want to use mp3 files instead of wav files it's much easier to handel and format them:)

@Jochen-sys
Copy link
Author

@CracKCatZ First of all I don't have so much storage on my computer.
But the bigger problem is my GPU, so I have to train with google colab. Wav Files are too big, mp3 files are not so big. Another idea was only to upload the spectogram to google colab, so the files wouldn't be too big.
Is there a quality different between wav and mp3? I know that wav files have a better quality, but I don't really think this is so important in training. Or it's better, because the model has to transcript worse quality.

@CracKCatZ
Copy link

@Jochen-sys mp3 files are like any normal files, wav files(wave files ) are constructed different they look quite different too because every sound is displayed as a wave, mp3's on the other hand not. I don't know how it would change the performance of the model or the training. U can add me on discord: SheeeshForce#8083

@Jochen-sys
Copy link
Author

@CracKCatZ Thanks for explaning. I wanted to test the engine.py, but I got an error "ImportError: cannot import name 'imsave'". imsave is from scipy.misc and I found out that stackoverflow means it should be imageio. Now I'm confused, because I think it should work with imsave?! Could you help me out there please?

@CracKCatZ
Copy link

@Jochen-sys I am actually not familiar with imsave and imageio

@Jochen-sys
Copy link
Author

@CracKCatZ Ok but can you run engine.py without problems?

@CracKCatZ
Copy link

@Jochen-sys at the moment not because for installing the ctcdecoder I have to switch to Linux.

@Jochen-sys
Copy link
Author

@CracKCatZ Ok I'm sorry I'm an idiot, I fixed it. My problem was that I thought neuralnet would be a regular pypi package and not a special self programmed one. Why did you name scripts or folders like other existing packages on pypi :-) (there is sadly no smiley which is laughing)?

@Jochen-sys
Copy link
Author

@CracKCatZ Where exactly will the spectrograms be produced? There is so much code with spectrograms, I don't find the exact one.

@Jochen-sys
Copy link
Author

@CracKCatZ Which version of ctcdecode do you use? (Mine worked a few days ago, but than it failed)
What does the ken_lm file mean? Is this the file which did such a good transcription in the video?

@CracKCatZ
Copy link

@Jochen-sys I don't was able to test ctcdecode yet

@Jochen-sys
Copy link
Author

Jochen-sys commented Apr 17, 2021

@CracKCatZ Ok got it.
Do I have to use the ckpt file for training from a checkpoint (argument for --load_model_from)? And how can I get zip file in the end of training or a ckpt file? I think I need a zip file for transcription with the microphone, but I also would like to get a ckpt file for further training in the future.

@NoCodeAvaible
Copy link

Hey @Jochen-sys yes you have:) The model will be saved automatically as a ckpt file:) Yes I think that you need one too(btw I need also one ) because I think without the zip we get no outputs. Could you please add me on discord please so we could talk there and speed up communication? Name:SheeeshForce1#8083

@Jochen-sys
Copy link
Author

Jochen-sys commented Apr 20, 2021

Sorry I don't have discord.
Ok thanks. I'm getting the folowing error when I use the argument --load_model_from speechrecognition.ckpt:
RuntimeError: Error(s) in loading state_dict for SpeechModule:
Unexpected key(s) in state_dict: "model.cnn.0.weight", "model.cnn.0.bias", "model.cnn.1.norm.weight", "model.cnn.1.norm.bias", "model.dense.0.weight", "model.dense.0.bias", "model.dense.1.weight", "model.dense.1.bias", "model.dense.4.weight", "model.dense.4.bias", "model.dense.5.weight", "model.dense.5.bias", "model.lstm.weight_ih_l0", "model.lstm.weight_hh_l0", "model.lstm.bias_ih_l0", "model.lstm.bias_hh_l0", "model.layer_norm2.weight", "model.layer_norm2.bias", "model.final_fc.weight", "model.final_fc.bias".

Does anyone now what this means?

Then I tried to use the argument --resume_from_checkpoint (I don't know what this argument is doing, sorry) instead of --load_model_from. But this doesn't work, too. Following error:
checkpoint_callbacks[-1].best_model_path = checkpoint['checkpoint_callback_best_model_path']
KeyError: 'checkpoint_callback_best_model_path'

@Jochen-sys
Copy link
Author

Ok I fixed the first error. My version of pytorch_lightning was to old.
But what does the --resume_from_checkpoint argument mean?

@CracKCatZ
Copy link

CracKCatZ commented Apr 21, 2021

@Jochen-sys It means that you insert an checkpoint file as default or truh the terminal(set required false if you set it as default) and the training is being resumed from this checkpoint. U basically use it to resume training if you stopped the training, if you want to test the checkpoint(model that you create in optimize_graph.py) or if your pc shuts down for an unknown reason while training.

@Jochen-sys
Copy link
Author

@CracKCatZ Do you know why loss could be "nan"? At the beginning it worked with a real float, but now I only see this string there. I researched this, but didn't find a good cause.

@CracKCatZ
Copy link

@Jochen-sys yes Cuda and cudnn are not installed the right way. U can search on YouTube for videos for a correct cuda and cudnn installation:)

@Jochen-sys
Copy link
Author

@CracKCatZ Ouh ok that's interesting thanks. I'm using my CPU.

@CracKCatZ
Copy link

@Jochen-sys are you working with mp3 files now?

@Jochen-sys
Copy link
Author

With Windows, no, because it doesn't work there with mp3 files, but it is working with Linux. I'm training with my Windows system, I only have Linux as vm.

@Jochen-sys
Copy link
Author

@CracKCatZ Do you know what this ken_lm is for and where I could get it? Is this the file which improved the transcription in the video so much? When not, what was the file which improved the transcription so much?

@CracKCatZ
Copy link

@Jochen-sys did you already tested the speechrecognition?

@Jochen-sys
Copy link
Author

@CracKCatZ Yes with the zip model. But it's not so good. But I remember that in the video he used something else, too, to get good results.

@CracKCatZ
Copy link

@Jochen-sys hold up did u used the portaudio library because I think that this library is required and can give better results. Could you please tell me if you have portaudio already installed at the beginning of working with this project or if you have to install it?

@Jochen-sys
Copy link
Author

@CracKCatZ Sorry for late response. Yes, I think so.
To come back to the loss=nan problem: Why isn't loss=nan when I train the same wav file for a few epochs? Could I try to train only one wav file per training or would the result be worse?

@Jochen-sys
Copy link
Author

Ok I tried some things and it seems that there is a problem with the big letters but only in the first line. The second line doesn't care about big letters.
Have you teained a kenlm model? When yes, how? I don't understand what I have to do, I'm sorry!

@Botirjon2009
Copy link

@CracKCatZ Hi.. I am going to set my mp3 files. .. Actually I going to know where my mp3 files should be set? I mean which section should be linked to mp3 files.. scripts\common_voice_json.. ? right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants