Wikipedia articles carry with them a revision history that logs every change made to an article, as well as information on the user or IP address that made the change. The revision history and user information can provide someone gathering intelligence with as much or more information about a topic than its article. Many Wikipedia editors are personally connected to the articles they edit, or otherwise have a stake in what is being said, and by processing the revision data, it’s possible to gain some insight into those connections.

This is somewhat awkward and time consuming to do through the web interface, so it helps to automate. I had a need to determine what revisions and users introduced certain phrases in an article, and wrote the following script to help:

To use, first export the article(s) you want to process to XML using Wikipedia’s Special:Export page (be sure to uncheck ”Include only the current revision”). Once you have the XML saved locally, usage is as follows:

./wikiadded.py <xml file> <word or phrase>

The output is comma-delimited and contains the Wikipedia timestamp, user information (username/id or IP address), and a link to the revision that introduced (or re-introduced) the phrase.

Hope this is of use to someone besides myself!

 

Today, while updating this VPS, I took the opportunity to change the style/design of mcgrewsecurity.com. I’m especially proud of the new logo. It’s a combination of several out-of-copyright book scans, and my co-worker Kendall’s keen observation that the bit of the key looked like an RJ-45 Ethernet port. A bit of work later and now it very much looks like one.

Over the past several months, No Starch Press has been kind enough to send along review copies of several of their recent security-related book releases. Soon you’ll start seeing my reviews being posted. Overall, I can say I’m very impressed.

© 2012 McGrew Security Suffusion theme by Sayontan Sinha