When disaggregating the data by profession we could observe that left leaning politicians – e.g. Democrat in the USA or Labour in the UK - are on the receiving end of 23% more problematic and abusive mentions that their counterparts in right-leaning parties - e,g. Republican (USA) or Conservative (UK). However, journalists working for right leaning media groups like Daily Mail, the Sun or Breitbart were mentioned in 64% more problematic and abusive tweets than journalists working at left leaning organisations like New York Times or the Guardian.

The role for machine learning

in content moderation

The role for machine learning in content moderation

Large social media platforms have millions of users, host vast quantities of content, and must contend with massive volumes of abuse. As a result, they are increasingly turning to automated systems to help manage abuse on their platforms. To better understand the potential and risks of using machine learning in content moderation systems, we worked with Element AI to develop a machine learning model that would attempt to automate the process of detecting violence and abuse against women online.

While it is far from perfect, the model has advanced the state of the art compared to existing models and on some metrics, achieves results comparable to our digital volunteers at predicting abuse. Even so, it still achieves about a 50% accuracy level when compared to the judgement of our experts, meaning it identifies 2 in every 14 tweets as abusive or problematic, whereas our experts identified 1 in every 14 tweets as abusive or problematic.

Element AI is made its model available few three weeks to demonstrate the potential and current limitations of AI technology in this field. See below a recording of the model.

Machine learning systems are already being widely used to flag potentially problematic content to a human workforce: For example, Facebook and YouTube are using machine learning-powered software to scan and flag content to human moderators. Meanwhile, Perspective API, developed by Google Jigsaw, has been used to flag potentially inappropriate content for review on both Wikipedia and the New York Times comments section.

In a letter to Amnesty International, Twitter has called machine learning “one of the areas of greatest potential for tackling abusive users”. Twitter CEO Jack Dorsey has similarly said that “We think that we can reduce the amount of abuse and create technology to recognize it before a report has to be made.” Twitter has also said that it is focused on machine learning in an effort to combat spam and automated accounts, and that it has begun acting against abusive accounts that have not yet been reported.

However, the trend towards using machine learning to automate content moderation online also poses risks to human rights. For example, David Kaye, the UN Special Rapporteur on Freedom of Expression, has noted (paras 32-25) that “automation may provide value for companies assessing huge volumes of user-generated content.” He cautions, however, that in subject areas dealing with issues which require an analysis of context, such tools can be less useful, or even problematic.

We have already seen that there can be serious human rights consequences when algorithms mistakenly censor content. In June 2017, Google announced "four steps intended to fight terrorism online", among them more rigorous detection and faster removal of content related to 'violent extremism' and 'terrorism'. The automated flagging and removal of content resulted in the accidental removal of hundreds of thousands of YouTube videos uploaded by journalists, investigators, and human rights organizations.

The simple reality is that the use of machine learning necessarily accepts working within margins of error.  For example, the decision to weight an algorithm towards greater precision will result in increased detection of genuinely abusive tweets, at the risk of missing abusive content which is more subtle (equivalent to casting the net too narrow). On the other hand, weighting an algorithm towards greater recall would capture a wider range of abusive content, at the risk of also capturing false positives - that is to say, content that should be protected as legitimate speech (equivalent to casting the net too wide). These trade-offs are value-based judgements with serious implications for freedom of expression and other human rights online.

Amnesty International and Element AI’s experience using machine learning to detect online abuse against women highlights the risks of leaving it to algorithms to determine what constitutes abuse. As it stands, automation may have a useful role to play in assessing trends or flagging content for human review, but it should, at best, be used to assist trained moderators, and certainly should not replace them. Human judgement by trained moderators remains crucial for contextual interpretation, such as examination of the intent, content and form of a piece of content, as well as assessing compliance with policies. It is vital that companies are transparent about how exactly they are using automated systems within their content moderation systems and that they publish information about the algorithms they develop.

Moving forward

Amnesty International has repeatedly asked Twitter to make available meaningful and comprehensive data regarding the scale and nature of abuse on their platform, as well as how they are addressing it. Such data will be invaluable for anyone seeking to understand and combat this barrier to women’s human rights online. In light of Twitter’s refusal to do so, it is our hope that this project with Element AI will help shed some insight into the scale and nature of abuse on the platform, and also provide tools to others who wish to conduct statistical research on this topic.

This research also provides valuable insights into the potential role of automation in content moderation processes. As companies including Twitter embrace the use of machine learning to flag content for moderation, it is more important than ever that they are transparent about the algorithms they use. They should publish information about training data, methodologies, moderation policies and technical trade-offs (such as between greater precision or recall) for public scrutiny. At best, automation should be part of a larger content moderation system characterized by human judgement, greater transparency, rights of appeal and other safeguards.

Amnesty International’s full set of recommendations to Twitter are available here.

Take Action