In this post, i will present the step by step procedure to import locally present XML dumps into MYSQL data base. The final outcome of this post is local copy of Wikipedia is available. Please note that all tools used during the import already developed, and no contribution from my side is performed unless specified otherwise. This post is based on the original post found here.
Tools to download:
- MYSQL: Can be downloaded freely from here. However, i usually find it convenient to download WAMPs server, that is a suite including Apache, MYSQL and Php. Either of the option will work for this post.
- Wikipedia Articles: Articles from Wikipedia can be downloaded online. I will use articles in English downloadable from here.
- Perl: Can be downloaded from here.
- mwImport script: Script that will Import the articles to MYSQL can be downloaded from here.
- Extract the articles (enwiki-version……) such that the zipped file is extracted and XML dump is created. Extractor like 7Zip can be used for this purpose.
- Create the SQL schema for the database by executing the SQL found here.
- Execute the following command to start the import:
type enwiki-<date>.xml | perl mwimport.pl | mysql -f -u<admin name> -p<admin password> –default-character-set=utf8 <database name>
The command can take several hours to complete. If “Server Gone” error is encountered, then you need to increase the max_packet_size of your MYSQL settings. Edit the my.ini file and set max_allowed_packet = 1000M. Duplicate entry errors don’t stop the import. Duplicate entries are simply not added to the database!
Hopefully it works!