Copy catcher: The Israeli startup that can spot AI-written text

As the emergence of ChatGPT raises fears of plagiarism, Israeli startup Copyleaks identifies texts created by artificial intelligence, and has already proven its capabilities to US academic institutions asking to check the originality of students’ work

Omer Kabir
13:10, 12.02.23
TAGS:
Copyleaks
ChatGPT
Interview
On the very same day that Google announced the layoff of 12,000 employees, I conducted a little experiment. I logged into ChatGPT, OpenAI's wildly successful chatbot, and asked it to write a letter from Google's CEO to the company's employees informing them of layoffs. I put the result next to the letter from the real CEO Sundar Pichai, and asked co-workers and random people to guess which one of the texts was written by a human, and which by machine. In what is perhaps a testament to the generic nature of texts coming out of corporate spokespeople, they all unhesitatingly opted for the AI-generated ChatGPT text as the one created by a human. Well, all of them, except one: an online tool developed by Copyleaks, an Israeli startup with offices in New York and an R&D center in Kiryat Shmona. "Written by AI (artificial intelligence) with a high probability," it said.
The founder and CEO of the company, Alon Yamin, was of course not surprised. Since they integrated the ability to recognize texts created by artificial intelligence in their copy prevention system (which is also, of course, based on AI) it has already proven itself to be a powerful tool for dealing with this new technological development, and has helped reveal the extent to which some people are already relying on artificial intelligence texts. "We have access to a lot of content from students and researchers," Yamin told Calcalist in an interview from the Copyleaks offices in New York. "In the last few weeks, we have started to activate AI detection on the content, to know the percentage of students who use artificial intelligence to write content. In the last three weeks, over 10% of the content submitted to the system, which is hundreds of thousands of documents, included text created by AI, and that's when ChatGPT just came out. The data will continue to rise. We were very surprised by this number."
1 View gallery 
Alon Yamin - Copyleaks CEO 
(Photo: Rotem Golan, Studio Golan)
ChatGPT's ability to generate intelligent and informative-looking texts, at a level high enough to successfully pass certification exams in subjects such as medicine or accounting or to pass with a high score in entrance exams for MBA studies, has raised concerns that the days of the written academic work - a central tool in the learning process today - are over. Yamin believes that the solution offered by Copyleaks deals with this crisis successfully: "Students need to know how to write content, this is an important skill that will not disappear from the world, but there is a process of deciphering how to work with these tools. Everything is very, very new."
The company was founded about eight years ago by Yamin and his partner Yehonatan Bitton, VP of R&D. "I met Yehonatan in the IDF's 8200 Intelligence Unit," said Yamin. "We were programmers. After military service he studied computer science and I studied economics and management. Soon after that we started working on Copyleaks. We are focused on AI technologies for text analysis. What is the meaning of the text, where did it come from, is it original or not, in what tone was it written, who wrote the text? 
"Our starting point was Yehonatan's family business. They sell ornamental fish. He developed a website for them when he was 11 years old, and uploaded a lot of content to get a high ranking on Google. One day he saw that they were going down in the search results ranking, this affected the traffic to the website and the revenues. He saw that their competitors copy content, and Google punishes them for it because the search engine ranks sites with duplicated content lower, and has no ability to know what is the source and what is the copy. This was the starting point. We wanted to develop a tool that would be able to identify the distribution of content on the web , and if it is original or not. We noticed that a lot of content is not copied one to one, so we wanted something smarter that can recognize even if someone is playing with the text, but the structure, meaning and tone are similar enough to overcome that.
"From there we shifted the focus to the world of education. It is very important there to know if content is original, and there are also many uses in the worlds of advertising, media and business - is someone copying or stealing your content, is there a leak of sensitive content to the network? Everything is at a more sophisticated level than just copy and paste. We can also identify cases where content has been copied and translated, literally providing protection from all directions."
Related articles:
ChatGPT and its perilous use as a "Force Multiplier" for cyberattacks 
Extreme Makover: “Generative AI is the most exciting step in AI” 
Meaningless words: Dangerous conversations with ChatGPT

The emergence of ChatGPT, Yamin says, did not catch them off guard: "We saw this development already months ago, and we were busy developing technology that could reduce the risk. There are many advantages to working with ChatGPT, but as users we do not know if the text was written by a human or AI. Our technology knows how to recognize which it is. It may be difficult for people to recognize the difference, but at the end of the day an AI system writes in a different way that looks different from a statistical point of view. There are AI crumbs that technologies like ours know how to identify and determine based on this that it is content that was not written by humans. The transition wasn't very easy, it's something we've been working on for a very long time. In the end it's about text, even if it's created by AI, and we're constantly working on analyzing text content. There were a lot of changes and developments we had to do, but also a lot of common parts that allowed us to base it on our existing infrastructures."
How does it work?
"Imagine you hear a knock on the door. To us it sounds like a normal knock, but if you understand Morse Code, it has meaning. Our AI knows how to speak the language of AI, recognize it in text versus non-AI generated text. Our system understands how an AI text is created, it is a text that is based on statistical models, on data files, it is not human. There are unique things in the text written by AI, and that is why it looks different. We know how to identify these things, reverse engineer how the text was created."
What does the feedback look like from the user's side?
"Currently we say whether a text being tested was written by AI with a probability of more than 99% for all content. It only means if the text includes content written by AI, without detailing which parts of the text were created by AI. In the next two to three weeks we will launch an update which will allow identification based on paragraphs and sentences. It will be possible to know at the sentence and paragraph level what was written by AI and what was not, and we will attach confidence percentages to each sentence. Right now we only present things that we are 99% sure of."
Since ChatGPT was released, quite a few tools have appeared that claim to recognize texts created by it. The developer of the chatbot, OpenAI, is also planning to launch its own identification tool. What's your advantage in this game?
"We are not limited to a specific platform or model. Our technology can recognize any text generated by AI, not just ChatGPT. Furthermore, the ability to recognize at the paragraph or sentence level is something unique to us, and this affects the quality and how reliable the results can be. That's why our development is part of a complete platform. We can also tell whether the text is original or not. We are the only platform that covers everything from copying to copyright infringement. We are available in five languages (English, French, Spanish, Portuguese and German) and are working on more languages."
Despite the slowdown in the global high-tech industry, Alon Yamin said this has been a successful period for the company: "We are in the midst of growth and recruitment processes, not layoffs. This is an interesting period."
It is very rare to find a startup in Kiryat Shmona.
"Yehonatan is from a kibbutz in the area, and that's why we set up there. We wanted to stay in the area, to see how we could do something with startups there. Now VC firm JVP has opened offices and there is progress."
Is it difficult to recruit employees there?
"At the stage where we are now it’s less of a problem. At first it took us a while to figure out the best way to do it. We had to figure out how to work with colleges and universities in the area. Our first employees were Druze from the area who studied engineering, and now there are five or six Druze workers. There are many workers who come from the colleges, and many workers who worked in Tel Aviv and are originally from the North, and this allowed them to return to the North."