GitHub - fokhruli/CM-seti-anlysis: Implementation for the paper titled, " Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding", IEEE Access, 2023

Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross-Linguistic Contextual Understanding

This code is the official implementation of the following paper:

Mohammad Tareq, Md Fokhrul Islam, Swakshar Deb, Sejuti Rahman, Abdullah Al Mahmud, "Data-augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding," in IEEE Access, 2023.

Figure 1: Distinct languages are represented by different colors ( blue : English, brown : Bangla, green : transliterated Bangla) in a shared semantic space for CM sentiment classification. (Left) Previous studies have used existing monolingual word embeddings for CM sentiment analysis, and therefore, words from different languages cannot be related. (Right) When the proposed data augmentation is paired with existing word embeddings, cross-lingual understanding is developed, which improves CM sentiment classification performance.

Figure 2: (a) Proposed data augmentation process with multiple sampling rates. For simplicity, we only showed sampling rate 1 and 2 in the above diagram. (b) Illustration of word embedding training process. We augment input data with several sampling rate. (c) Training the classifier using learned word embedding.

Dataset descriptions

Dictionary_BN_EN_61208.xlsx: Dictionary of collected word from different sources. Here we made huge dictionary which contain code-mixed bangla word and their english counter word.
final_code_mixed_BN_app_review_dataset_csv.xlsx: The large collectd dataset on code mixed sentiment analysis.

Running instructions

To run the baseling model with our proposed data augmentation strategy, first, train the word embedding modeling such as Fastext or W2V. For example, to train the fastext word embedding with our proposed data augmentation, run this command

python align/fastext_align.py

This will save the word embedding model named as "saved_model_fastex_mask_banglish_supervised". Second, load this model to Conv1DLSTM.py and ml_method.py file to obtain the results for Con1DLSTM and other ML algorithms.

python align/Conv1DLSTM.py
python align/ml_method.py

Similarly, to run the baseling model without the data augmentation strategy follow this shell command

python non-align/fastext_mono.py
python non-align/Conv1DLSTM.py
python non-align/ml_method.py

Citation

If you use this code and the dataset for your research, please consider to cite our paper:

@article{tareq2023data,
  author={Tareq, Mohammad and Islam, Md. Fokhrul and Deb, Swakshar and Rahman, Sejuti and Mahmud, Abdullah Al},
  journal={IEEE Access}, 
  title={Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross Linguistic Contextual Understanding}, 
  year={2023},
  volume={11},
  number={},
  pages={51657-51671},
  doi={10.1109/ACCESS.2023.3277787}
}

Contact

For any question, feel free to contact @

Swakshar Deb     : [email protected]
Md Fokhrul Islam : [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
align		align
img		img
non-align		non-align
Dictionary_BN_EN_61208.xlsx		Dictionary_BN_EN_61208.xlsx
LICENSE		LICENSE
README.md		README.md
final_code_mixed_BN_app_review_dataset_csv.xlsx		final_code_mixed_BN_app_review_dataset_csv.xlsx
guideline.txt		guideline.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross-Linguistic Contextual Understanding

Dataset descriptions

Running instructions

Citation

Contact

About

Releases

Packages

Languages

License

fokhruli/CM-seti-anlysis

Folders and files

Latest commit

History

Repository files navigation

Data-Augmentation for Bangla-English Code-Mixed Sentiment Analysis: Enhancing Cross-Linguistic Contextual Understanding

Dataset descriptions

Running instructions

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages