Cleanup Wiki from Mass Spam: Difference between revisions
Line 14: | Line 14: | ||
[[http://upload.wikimedia.org/wikipedia/commons/thumb/3/33/MediaWiki_database_schema_1-19_%28r102798%29.svg/2193px-MediaWiki_database_schema_1-19_%28r102798%29.svg.png database schema]] for this purpose. | [[http://upload.wikimedia.org/wikipedia/commons/thumb/3/33/MediaWiki_database_schema_1-19_%28r102798%29.svg/2193px-MediaWiki_database_schema_1-19_%28r102798%29.svg.png database schema]] for this purpose. | ||
My mediawiki database is mysql database wikidb. User is | My mediawiki database is mysql database wikidb. User is root. For ease of use of the mysql command I created this .my.cnf file | ||
[mysql] | [mysql] | ||
password=my_password | password=my_password | ||
user= | user=root | ||
Before any modification, do a backup (and check if the created file contains the expecteds sql commands) | Before any modification, do a backup (and check if the created file contains the expecteds sql commands) |
Revision as of 16:43, 12 January 2023
Cleanup Wiki from Mass Spam
recently I recognized, the wiki was filled with many (>2000) new pages, all containing spam. Also I noticed there were about 20000 users registered, all spammers or without contributions.
I decided to allow edit only for registered users and deactivated automatic registering. So I put this in LocalSettings.php:
$wgGroupPermissions['*']['edit'] = false; $wgGroupPermissions['*']['createaccount'] = false;
Since I found no useful plugin to remove that many users and pages I did some research on the database table level. Mediawiki provided a useful [database schema] for this purpose.
My mediawiki database is mysql database wikidb. User is root. For ease of use of the mysql command I created this .my.cnf file
[mysql] password=my_password user=root
Before any modification, do a backup (and check if the created file contains the expecteds sql commands)
mkdir -p /backup/mediawiki cd /backup/mediawiki mysqldump -u root -p wikidb >mediawiki-2023-01-11-before-cleanup.sql
Since I am the only user with valuable content and my user id is 1, I came up with these sql commands to clean up. Worked well as far as I can tell:
mysql -D wikidb -e "delete from watchlist where wl_user <> 1" mysql -D wikidb -e "delete from ipblocks" mysql -D wikidb -e "delete from logging where log_actor <> 1" mysql -D wikidb -e "delete from recentchanges where rc_actor <> 1" mysql -D wikidb -e "delete from user_properties where up_user <> 1" mysql -D wikidb -e "delete from user where user_id <> 1" mysql -D wikidb -e "delete from user_groups where ug_user <> 1" mysql -D wikidb -e "delete from user_former_groups where ufg_user <> 1" mysql -D wikidb -e "delete from actor where actor_user > 1" mysql -D wikidb -e "delete from comment where comment_id in (select log_comment_id from logging where log_actor > 1)" mysql -D wikidb -e "delete from watchlist where wl_user <> 1" # mysql -D wikidb -e "delete from text where old_id = (select ar_text_id from archive where ar_actor <> 1)" mysql -D wikidb -e "delete from archive where ar_actor <> 1" mysql -D wikidb -e "delete from uploadstash where us_user <> 1" mysql -D wikidb -e "delete from filearchive where fa_actor <> 1" mysql -D wikidb -e "delete from image where img_actor <> 1" mysql -D wikidb -e "delete from oldimage where oi_actor <> 1" # mysql -D wikidb -e "delete from text where old_id = (select rev_text_id from revision where rev_actor <> 1)" mysql -D wikidb -e "delete from revision where rev_actor <> 1" mysql -D wikidb -e "delete from page where not exists (select 1 from revision r where page.page_id = r.rev_page)"
Only trace left of spammers are log entries from my earlier manual cleanup attempts. Can live with that :)
Sometimes single pages get lost after this procedure. Check all pages after cleanup. If pages are lost: restore, save edit content of the relevant pages, repeat cleanup, restore content.
mysql -e "drop database wikidb" mysql -e "create database wikidb" cat /backup/mediawiki/mediawiki-2023-01-11-before-cleanup.sql | mysql wikidb