Restoring a Wikipedia dump to a local MediaWiki instance involves several steps, including downloading the dump, setting up a MediaWiki environment, and importing the data. Here's a general guide to help you through the process:
Wikipedia provides free dumps of its content, which can be found at the Wikimedia Downloads page (https://dumps.wikimedia.org
). Choose an appropriate dump file for your needs - a dump of the Wikipedia in Simple English (simplewiki) is a good starting point because it's not too large and contains relatively simple markup. For a full database dump, look for files ending with .xml.bz2
.
If you haven't already, you need to set up a local instance of MediaWiki:
- Install MediaWiki: Follow the official installation guide at
https://www.mediawiki.org/wiki/Installation
. - Configure MediaWiki: Ensure that your local MediaWiki installation is properly configured, especially the database settings.
The database should be ready to handle the large volume of data from the Wikipedia dump:
- Increase the size limit for imports if necessary (in PHP configuration).
- Optimize MySQL/MariaDB settings: Adjust settings like
max_allowed_packet
andinnodb_buffer_pool_size
for better performance during import.
Since the dump files are usually compressed, you need to decompress them:
bzip2 -dk filename.xml.bz2
This will decompress filename.xml.bz2
to filename.xml
.
Use the importDump.php
maintenance script provided by MediaWiki to import the XML dump:
php maintenance/importDump.php --dbpass wikidb_userpassword --quiet --wiki wikidb path-to-dumpfile/dumpfile.xml
php maintenance/rebuildrecentchanges.php
This process can be very time-consuming, especially for large dumps.
Afterwards use ImportImages.php to import the images:
php wikifolder/maintenance/importImages.php wikifolder_backup/images
If your MediaWiki installation uses a search feature, update the search index after import:
php maintenance/rebuildtextindex.php
- Hardware Requirements: Importing a full Wikipedia dump requires a powerful machine with plenty of RAM and storage.
- Partial Import: Consider importing a smaller subset of Wikipedia if you don't need the entire database.
- Regular Updates: Wikipedia dumps are snapshots. If you want to keep your local copy up to date, you will need to regularly download and import new dumps.
- Performance Tuning: Depending on your server's specifications, you might need to tweak your MediaWiki and database configurations for optimal performance.
- Memory Limits: If you encounter PHP memory limit errors, increase the
memory_limit
in yourphp.ini
file. - Execution Time: Adjust the
max_execution_time
in yourphp.ini
if the script times out. - Database Issues: Ensure your database server is properly configured and has enough resources.
This is a complex process, and the exact steps can vary based on your server environment and the specific dump you are importing. Always refer to the official MediaWiki documentation for the most detailed and up-to-date information.