Skip to content

AridHasan/Data-Collection-System-for-Machine-Translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AmaderCAT: A Data Collection System for Machine Translation System

Objective & Description

The application AmaderCAT is the abbreviation of Amader Computer Assisted Translation. This application is developed for the purpose of building parallel corpus for Machine Translation system. The application contains a Translation Memory and a Glossary suggestions implementation which helps translators by providing TM and glossary suggestions. The application is collaborative and highly configurable for the translation task. It has the mechanism for crowd translation. You can use it as a single user or a group/team. In future, we will add Machine Translation System in our application using Neural Network technologies.

This developed system supports any language, however, we only evaluated for developing Bangla-English parallel corpus.

The information about architecture and user guidelines is described in our paper and thesis site:

For better experiences, please visit the demo site:- AmaderCAT: A Machine Translation Tool for Bangla

Setup

To configure this application, following necessary steps must need to be performed:

Prerequisites

We used CodeIgniter (v3.1.7) framework to develop this application. To extends this corpus building application, please also see the CodeIgniter Documentation.

Download

Please download or run the following command to clone this repository:

https://github.com/AridHasan/Data-Collection-System-for-Machine-Translation.git

Database Configuration

Create a database with the name of amader or change the database configuration in application/config/database.php and application/models/Auth.php in User class. The table structures are in database.sql file in the root directory. Run every table structure on your MySQL command prompt.

Please run this following command to create database:

CREATE DATABASE IF NOT EXISTS `amader`

Configuring E-mail sending option

Before registration, e-mail configuration is mandatory. Please modify the send_mail function in the application/models/Auth.php file to configure the e-mail option. The configuration should be like:

$config = Array(
            'protocol' => 'smtp',
            'smtp_host' => 'ssl://smtp.googlemail.com',
            'smtp_port' => 465,
            'smtp_user' => '[email protected]',
            'smtp_pass' => 'your_password',
            'mailtype'  => 'html',
            'charset'   => 'iso-8859-1'
        );

Run this Application

Please start Apache or NGINX server and MySQL on 3306 port on your machine. Then copy this following url in your browser:

http://localhost/your_application_directory/
or
http://localhost:configured_port_for_server/your_application_directory/

For example:
http://localhost:8080/Data-Collection-System-for-Machine-Translation/

Administration and user guidelines already provided in our paper or you can see the video tutorial.

Our Developed Corpus

You will find our parallel Bangla to English corpus which developed by using this AmaderCAT application in the data folder of root directory. For Bengali and English parallel corpus the files name are bengali.txt and english.txt, respectively.

Citation

If you find this application useful and use this system for developing your parallel corpus, please cite this paper, please cite this paper A Collaborative Platform to Collect Data for Developing Machine Translation System.

Hasan M.A., Alam F., Noori S.R.H. (2020) A Collaborative Platform to Collect Data for Developing Machine Translation Systems. In: Uddin M., Bansal J. (eds) Proceedings of International Joint Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer

@inproceedings{hasan2020collaborative,
  title={A Collaborative Platform to Collect Data for Developing Machine Translation Systems},
  author={Hasan, Md Arid and Alam, Firoj and Noori, Sheak Rashed Haider},
  booktitle={Proceedings of International Joint Conference on Computational Intelligence},
  pages={407--416},
  year={2020},
  organization={Springer}
}

About

Please see the description in the README.md/documentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published