Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows not train #18

Open
Vubni opened this issue Jan 25, 2024 · 4 comments
Open

Windows not train #18

Vubni opened this issue Jan 25, 2024 · 4 comments

Comments

@Vubni
Copy link

Vubni commented Jan 25, 2024

I do everything according to the instructions in train.sh , downloaded an archive with audio and marks.txt , the folder that was required and nemo, I run - and here is such a series of errors

PS E:\g++\синтез новый> sh train.sh
train.sh: line 1: #!/bin/bash: No such file or directory
train.sh: line 2: conda: command not found
fatal: destination path 'ru_g2p_ipa_bert_large' already exists and is not an empty directory.
Traceback (most recent call last):
  File "NeMo/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py", line 42, in <module>
    from helpers import ITN_MODEL, instantiate_model_and_trainer
  File "E:\g++\синтез новый\NeMo\examples\nlp\text_normalization_as_tagging\helpers.py", line 22, in <module>
    from nemo.collections.nlp.models import ThutmoseTaggerModel
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\__init__.py", line 15, in <module>
    from nemo.collections.nlp import data, losses, models, modules
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\data\__init__.py", line 42, in <module>
    from nemo.collections.nlp.data.zero_shot_intent_recognition.zero_shot_intent_dataset import (
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\data\zero_shot_intent_recognition\__init__.py", line 16, in <module>
    from nemo.collections.nlp.data.zero_shot_intent_recognition.zero_shot_intent_dataset import (
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\data\zero_shot_intent_recognition\zero_shot_intent_dataset.py", line 30, in <module>
    from nemo.collections.nlp.parts.utils_funcs import tensor2list
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\parts\__init__.py", line 17, in <module>
    from nemo.collections.nlp.parts.utils_funcs import list2str, tensor2list
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\parts\utils_funcs.py", line 28, in <module>
    from nemo.collections.nlp.modules.common.megatron.utils import erf_gelu
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\modules\__init__.py", line 16, in <module>
    from nemo.collections.nlp.modules.common import (
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\modules\common\__init__.py", line 36, in <module>
    from nemo.collections.nlp.modules.common.tokenizer_utils import get_tokenizer, get_tokenizer_list
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\modules\common\tokenizer_utils.py", line 29, in <module>
    from nemo.collections.nlp.parts.nlp_overrides import HAVE_MEGATRON_CORE
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\parts\nlp_overrides.py", line 31, in <module>
    from pytorch_lightning.overrides.base import _LightningModuleWrapperBase
ModuleNotFoundError: No module named 'pytorch_lightning.overrides.base'
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/ru_g2p_ipa/preprocess_text_before_tts.py", line 30, in <module>
    with open(args.g2p_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'all_words.g2p.txt'
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/utils/create_manifest_for_tts.py", line 17, in <module>
    with open(args.preprocessed_text_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'marks.g2p.txt'
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\scripts\dataset_processing\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\ds_conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
head: cannot open 'manifest.json' for reading: No such file or directory
TAIL: can't open 460
TAIL: can't open manifest.json
[NeMo W 2024-01-25 22:24:44 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[NeMo W 2024-01-25 22:24:56 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
usage: generate_mels.py [-h] --fastpitch-model-ckpt FASTPITCH_MODEL_CKPT --input-json-manifests INPUT_JSON_MANIFESTS
                        [INPUT_JSON_MANIFESTS ...] --output-json-manifest-root OUTPUT_JSON_MANIFEST_ROOT
                        [--num-workers NUM_WORKERS] [--cpu]
generate_mels.py: error: argument --fastpitch-model-ckpt: expected one argument
[NeMo W 2024-01-25 22:25:06 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\NeMo\examples\tts\conf\hifigan' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
@bene-ges
Copy link
Owner

hi @Vubni , I never tried it on Windows
but concerning the reported error, maybe it's version mismatch between nemo and pytorch_lightning.
See requirements in nemo, but check with your particular nemo version

@Vubni
Copy link
Author

Vubni commented Jan 26, 2024

Thanks @bene-ges ! It really helped me get rid of that error, but after I ran into others and I don't understand how to solve them, I checked all the libraries, looked for a solution, but nothing

PS E:\g++\синтез новый> sh train.sh
train.sh: line 1: #!/bin/bash: No such file or directory
train.sh: line 2: conda: command not found
fatal: destination path 'ru_g2p_ipa_bert_large' already exists and is not an empty directory.
[NeMo W 2024-01-26 16:44:36 nemo_logging:349] C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\hydra\_internal\hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
    See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
[NeMo I 2024-01-26 16:44:36 helpers:60] Restoring pretrained itn model from ru_g2p_ipa_bert_large/ru_g2p.nemo
[NeMo I 2024-01-26 16:44:37 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: DeepPavlov/rubert-base-cased, vocab_file: C:\Users\egora\AppData\Local\Temp\tmp0u87qzzr\c09b2638681e4862bdffa78433689e48_vocab.txt, merges_files: None, special_tokens_dict: {}, and use_fast: False
[NeMo W 2024-01-26 16:44:38 modelPT:251] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2024-01-26 16:44:38 nlp_overrides:454] Apex was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/apex
    Megatron-based models require Apex to function correctly.
[NeMo W 2024-01-26 16:44:38 nlp_overrides:462] megatron-core was not found. Please see the NeMo README for installation instructions: https://github.com/NVIDIA/NeMo#megatron-gpt.
[NeMo W 2024-01-26 16:44:38 lm_utils:91] DeepPavlov/rubert-base-cased is not in get_pretrained_lm_models_list(include_external=False), will be using AutoModel from HuggingFace.
Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not used when initializing BertModel: ['cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[NeMo W 2024-01-26 16:44:42 modelPT:251] You tried to register an artifact under config key=language_model.config_file but an artifact for it has already been registered.
[NeMo I 2024-01-26 16:44:42 save_restore_connector:249] Model ThutmoseTaggerModel was successfully restored from E:\g++\синтез новый\ru_g2p_ipa_bert_large\ru_g2p.nemo.
[NeMo I 2024-01-26 16:44:42 helpers:81] Model itn -- Device cuda:0
[NeMo I 2024-01-26 16:44:42 normalization_as_tagging_infer:59] Running inference on all_words.txt...
Error executing job with overrides: ['pretrained_model=ru_g2p_ipa_bert_large/ru_g2p.nemo', 'inference.from_file=all_words.txt', 'inference.out_file=all_words.g2p.txt', 'model.max_sequence_len=64', 'inference.batch_size=512', 'lang=ru']
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
  File "NeMo/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py", line 73, in main
    outputs = model._infer(batch)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\nemo\collections\nlp\models\text_normalization_as_tagging\thutmose_tagger.py", line 310, in _infer
    batch = next(iter(infer_datalayer))
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 386, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1039, in __init__
    w.start()
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\egora\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
NotImplementedError: object proxy must define __reduce_ex__()

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/ru_g2p_ipa/preprocess_text_before_tts.py", line 30, in <module>
    with open(args.g2p_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'all_words.g2p.txt'
Traceback (most recent call last):
  File "nemo_compatible/scripts/tts/utils/create_manifest_for_tts.py", line 17, in <module>
    with open(args.preprocessed_text_name, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'marks.g2p.txt'
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\scripts\dataset_processing\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\ds_conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
head: cannot open 'manifest.json' for reading: No such file or directory
TAIL: can't open 460
TAIL: can't open manifest.json
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\nemo_compatible\scripts\tts\ru_ipa_fastpitch_hifigan\conf' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
usage: generate_mels.py [-h] --fastpitch-model-ckpt FASTPITCH_MODEL_CKPT --input-json-manifests INPUT_JSON_MANIFESTS
                        [INPUT_JSON_MANIFESTS ...] --output-json-manifest-root OUTPUT_JSON_MANIFEST_ROOT
                        [--num-workers NUM_WORKERS] [--cpu]
generate_mels.py: error: argument --fastpitch-model-ckpt: expected one argument
Primary config directory not found.
Check that the config directory 'E:\g++\синтез новый\NeMo\examples\tts\NeMo\examples\tts\conf\hifigan' exists and readable

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace

As I understand it, the initialization threshold has been passed, but training causes errors

@bene-ges
Copy link
Owner

@Vubni this is some error with multiprocessing - I don't know how to solve it.
Look at this discussion in Nemo - maybe try WSL on Windows?.

@bene-ges
Copy link
Owner

also see this (suggests a patch for similar error)
NVIDIA/NeMo#5492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants