Cleanup Wiki from Mass Spam

From JoBaPedia
Jump to navigation Jump to search

Cleanup Wiki from Mass Spam

recently I recognized, the wiki was filled with many (>2000) new pages, all containing spam. Also I noticed there were about 20000 users registered, all spammers or without contributions.

I decided to allow edit only for registered users and deactivated automatic registering. So I put this in LocalSettings.php:

$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['*']['createaccount'] = false;

Since I found no useful plugin to remove that many users and pages I did some research on the database table level. Mediawiki provided a useful [database schema] for this purpose.

My mediawiki database is mysql database wikidb. User is root. For ease of use of the mysql command I created this .my.cnf file

[mysql]
password=my_password
user=root

Before any modification, do a backup (and check if the created file contains the expecteds sql commands)

mkdir -p /backup/mediawiki
cd /backup/mediawiki
mysqldump -u root -p wikidb >mediawiki-2023-01-11-before-cleanup.sql

Since I am the only user with valuable content and my user id is 1, I came up with these sql commands to clean up. Worked well as far as I can tell:

mysql -D wikidb -e "delete from watchlist where wl_user <> 1"
mysql -D wikidb -e "delete from ipblocks"
mysql -D wikidb -e "delete from logging where log_actor <> 1"
mysql -D wikidb -e "delete from recentchanges where rc_actor <> 1"
mysql -D wikidb -e "delete from user_properties where up_user <> 1"
mysql -D wikidb -e "delete from user where user_id <> 1"
mysql -D wikidb -e "delete from user_groups where ug_user <> 1"
mysql -D wikidb -e "delete from user_former_groups where ufg_user <> 1"
mysql -D wikidb -e "delete from actor where actor_user > 1"
mysql -D wikidb -e "delete from comment where comment_id in (select log_comment_id from logging where log_actor > 1)"
mysql -D wikidb -e "delete from watchlist where wl_user <> 1"
# mysql -D wikidb -e "delete from text where old_id = (select ar_text_id from archive where ar_actor <> 1)"
mysql -D wikidb -e "delete from archive where ar_actor <> 1"
mysql -D wikidb -e "delete from uploadstash where us_user <> 1"
mysql -D wikidb -e "delete from filearchive where fa_actor <> 1"
mysql -D wikidb -e "delete from image where img_actor <> 1"
mysql -D wikidb -e "delete from oldimage where oi_actor <> 1"
# mysql -D wikidb -e "delete from text where old_id = (select rev_text_id from revision where rev_actor <> 1)"
mysql -D wikidb -e "delete from revision where rev_actor <> 1"
mysql -D wikidb -e "delete from page where not exists (select 1 from revision r where page.page_id = r.rev_page)"

Only trace left of spammers are log entries from my earlier manual cleanup attempts. Can live with that :)

Sometimes single pages get lost after this procedure. Check all pages after cleanup. If pages are lost: restore, save edit content of the relevant pages, repeat cleanup, restore content.

mysql -e "drop database wikidb"
mysql -e "create database wikidb"
cat /backup/mediawiki/mediawiki-2023-01-11-before-cleanup.sql | mysql wikidb