Skip to content

Latest commit

 

History

History
70 lines (44 loc) · 3.45 KB

README_setup_local_mediawiki_and_install_dump.md

File metadata and controls

70 lines (44 loc) · 3.45 KB

Restoring a Wikipedia dump to a local MediaWiki instance involves several steps, including downloading the dump, setting up a MediaWiki environment, and importing the data. Here's a general guide to help you through the process:

1. Download Wikipedia Dump

Wikipedia provides free dumps of its content, which can be found at the Wikimedia Downloads page (https://dumps.wikimedia.org). Choose an appropriate dump file for your needs - a dump of the Wikipedia in Simple English (simplewiki) is a good starting point because it's not too large and contains relatively simple markup. For a full database dump, look for files ending with .xml.bz2.

2. Set Up MediaWiki

If you haven't already, you need to set up a local instance of MediaWiki:

  1. Install MediaWiki: Follow the official installation guide at https://www.mediawiki.org/wiki/Installation.
  2. Configure MediaWiki: Ensure that your local MediaWiki installation is properly configured, especially the database settings.

3. Prepare Your Database

The database should be ready to handle the large volume of data from the Wikipedia dump:

  • Increase the size limit for imports if necessary (in PHP configuration).
  • Optimize MySQL/MariaDB settings: Adjust settings like max_allowed_packet and innodb_buffer_pool_size for better performance during import.

4. Unzip the Dump File

Since the dump files are usually compressed, you need to decompress them:

bzip2 -dk filename.xml.bz2

This will decompress filename.xml.bz2 to filename.xml.

5. Import the Dump into MediaWiki

Use the importDump.php maintenance script provided by MediaWiki to import the XML dump:

php maintenance/importDump.php --dbpass wikidb_userpassword --quiet --wiki wikidb path-to-dumpfile/dumpfile.xml
php maintenance/rebuildrecentchanges.php

This process can be very time-consuming, especially for large dumps.

6. Import the images (Optional)

Afterwards use ImportImages.php to import the images:

php wikifolder/maintenance/importImages.php wikifolder_backup/images

7. Update Search Index

If your MediaWiki installation uses a search feature, update the search index after import:

php maintenance/rebuildtextindex.php

Tips and Considerations

  • Hardware Requirements: Importing a full Wikipedia dump requires a powerful machine with plenty of RAM and storage.
  • Partial Import: Consider importing a smaller subset of Wikipedia if you don't need the entire database.
  • Regular Updates: Wikipedia dumps are snapshots. If you want to keep your local copy up to date, you will need to regularly download and import new dumps.
  • Performance Tuning: Depending on your server's specifications, you might need to tweak your MediaWiki and database configurations for optimal performance.

Troubleshooting

  • Memory Limits: If you encounter PHP memory limit errors, increase the memory_limit in your php.ini file.
  • Execution Time: Adjust the max_execution_time in your php.ini if the script times out.
  • Database Issues: Ensure your database server is properly configured and has enough resources.

This is a complex process, and the exact steps can vary based on your server environment and the specific dump you are importing. Always refer to the official MediaWiki documentation for the most detailed and up-to-date information.