Greater efficiency
"We have to admit that human employees find it hard to handle the rising volume of pornographic content online," said Zhu, the law professor, who despite his reservations is a keen proponent of the use of hi-tech applications.
He witnessed how employees identified pornographic content when he visited internet and technology companies several years ago. "They stared at computers all day, but they were unable to view much of the large amount of illegal material online," he said. "They also had to endure watching lots of disgusting sexual acts, naked images and videos."
Some of the problems have been alleviated since the introduction of AI systems.
Wei Shi, a senior algorithm engineer at Alibaba's security department, described the accuracy and speed of human searchers compared with an AI system.
"A person can identify about 10,000 pictures a day. To handle 400 million online images, we would have to hire 40,000 employees," he said. "But if we do the job via our AI system, we'll only need 20 people. The system can select 200,000 pictures it is unable to classify for us to check after it has reviewed all 400 million."
When the company decided to use AI to identify online pornography three years ago, it first collected more than 13 million pornographic images from about 2,000 websites as a database to teach the AI system, according to Wei.
After updating the database and algorithm, the system was not only able to identify pornographic photos and text, but also sexually explicit audiovisual content. For example, the system can identify a variety of languages, including Japanese, Russian and English, as well as dialects used in the provinces of Hunan, Guangdong and Sichuan. "It can even identify (sexual) groaning," Wei said.
"The AI system has accelerated online identification and reduced our workload," he added, though he conceded that the identification of audio content remains a work in progress and some mistakes are inevitable.
Challenges remain
Despite recent advances, the use of AI does not guarantee satisfactory results, which means both e-commerce companies such as Alibaba and systems providers such as Tuputech face problems as they seek to improve the accuracy of identification.
Sometimes, for example, the AI system is unable to distinguish between photos of classic works of art, such as Michelangelo's statue of David, and pornographic images, because both feature naked human bodies, Wei said.
Jiang Zerong, director of Tuputech's operations department, said the AI system is flexible and has exceptional learning ability, but the first step is to provide it with a clear definition of pornography.
"We can update and adjust our algorithm to meet clients' demands, but we should also be able to distinguish if a picture is artistic or not," he said, adding that blurred boundaries will affect the accuracy of the AI system.
Zhu, from the China University of Political Science and Law, expressed concern about the system's excessive "blind blocking" of online content it is unable to classify accurately - for example, photos of women in skimpy swimwear at a beach.
"Not all nude content must be removed from the internet, because of the definition laid down by the Criminal Law related to the blatant adverting of sex and nudity," he said. "Sometimes, what constitutes pornography depends on common sense and the times, so it is, and always will be, difficult to give a very strict definition in laws or regulations."
Tan Yicheng, a judge at Beijing Haidian District People's Court who hears cases related to the dissemination of pornography online, said miniskirts might have been regarded as pornographic in the 1970s, but not nowadays.
He noted that the search for hidden content presents a major challenge for AI systems.
"Links to pornography are often hidden in text messages or spread via instant messaging platforms such as WeChat or QQ, rather than on websites," he said, adding that it is exceptionally difficult to track such content.
Optimum defense
Given the difficulty of formulating a clear definition of pornography, Zhu suggested the anti-porn office release more information about concluded cases to aid identification by internet companies.
He said human intervention is essential to conduct secondary checks that increase the accuracy of identification, and a combination of humans and AI would provide an optimum defense against the spread of pornography online.
Tan, the judge, said he broadly agreed with Zhu's suggestions, but he added that to keep cyberspace safe internet companies must be prepared to shoulder the responsibility of improving identification to ensure that their sites do not host illegal content.