I was struggled when starting my research in the Adversarial Stylometry. This area basically an intersection between Natural language processing and Machine learning. Adversarial stylometry also can be seen as another form of authorship attribution. However don't be suprised, only very limited works have been done. If you are interested learning about adversarial stylometry or authorship attribution, you can start checking these websites. They are providing quite extensive information and events on this area.

  • PAN lab is an annual evaluation lab on uncovering plagiarism, authorship, and social software misuse. Very recomended if you are starting building your authorship attribution software and in needs of feedbacks and evaluation.
  • PSAL University of Drexel: this research group initiated research on adversarial stylometry, privacy and security.
  • This video contains a talk by Michael Brennan about deceiving authorship detection

In addition, currently I'm trying to study about deep learning and its implementation for NLP, some useful resources can be found here:

  • Would be better start with Machine Learning course by Prof Andrew Ng (it covers some topics on Neural Network)
  • This website lists resources about Deep Learning from the theory to the implementation (very very recommended!). You also can find an unpublish Deep Learning book written by Yoshua Bengio, Ian Goodfellow and Aaron Courville.
  • Deep learning for natural language processing course from Stanford University by Richard Socher
  • Convolutional Neural Network for Visual Recognition course from Stanford University by Fei-Fei Li and Andrej Karpathy (the course notes somehow are more understandable (for me :D) than the Socher's notes)
  • Word2vec tutorial by Angela Chapman
  • Word2vec and Glove tutorial
  • Practical text analysis using deep learning (unfortunatelly this tutorial used GraphLab which is not free)
  • When having experiment with deep learning, it's so frustating when the code ran like forever. So I decided to give a try to one of the deep learning library called Lasagne. This library is based on Theano, so its GPU enabled. In addition, I do love the name! It makes writing code just like cooking Lasagne (lol)
  • When the GPU server in the university can't be accessed, usually I run my code in Amazon EC2. This tutorial by Markus Beissinger helped a lot on setting Theano on Amazon EC2 gpu instances.
  • Been a while, learning about RNN/LSTM, I created a short presentation for the group meeting (I just collected from all sources)

Most of the time, I use Python in my experiments. Favorite tools include NLTK, Scikit Learn, python-weka-wrapper.

Feel free to contact me, if you find something interesting and should be in the list.


Yunita Sari
Dept of Computer Science
University of Sheffield

Room G28
Regent Court, 211 Portobello
Sheffield, S1 4DP. United Kingdom.
e-mail: y.sari[at]sheffield.ac.uk

View Yunita Sari's profile on LinkedIn