Bird’s eye view: "Bing AI still makes mistakes but we try to get it down to a minimum"

Dr. Sarah Bird, Responsible AI Leader at Microsoft, has been part of Bing's AI chat from the beginning. In an in-depth interview, she shares how it's taught not to be racist, chauvinistic or discriminatory and when it is okay for the model to behave “in a humanlike way and when is it crossing the line”

Roni Dori
10:16, 13.08.23
TAGS:
Sarah Bird
Microsoft
Bing
Artificial Intelligence
In the history of Microsoft's blunders, Tay is well placed in the top ten. In March 2016, the global computing giant launched an artificial intelligence-based chat that simulates a 19-year-old American girl named Tay. The experiment went smoothly for a few hours, until Tay got out of control: "Hitler was right, I hate the Jews", "President Bush is behind the September 11 attacks" and "Feminism is cancer" were just some of the pearls she posted on her Twitter account. Microsoft feverishly deleted the controversial tweets and eventually, within just 16 hours, took Tay off the net - forever.  
Check out all AI related news
The launch of Bing Chat in February already looked different. Admittedly, it also made some blunders here and there, and more on that later, but they dwarfed those of ChatGPT, which was launched a few months before, and probably took the brunt of the fire. It could be said that Microsoft learned its lessons from the experiences of OpenAI, in which it is one of the leading investors, but the truth of the matter is that it already knew where it had gone wrong thanks to its past with Tay.
2 View gallery 
Microsoft's Sarah Bird 
(Photo: Orel Cohen)
"Tay was a very important learning experience in the company," explains Dr. Sarah Bird, Responsible AI Leader at Microsoft,  a department that was, in fact, established as a result of this experience. Bird is responsible for the engineering arm of the department, and 85 people work under her.
And from where Microsoft currently stands, it seems that Tay was the best thing that could have happened to it. Already in 2016, when OpenAI was just taking its first steps (founded in December 2015), Microsoft founded the FATE group, which investigates the social implications of artificial intelligence to ensure that developments in the field are ethical, and was the first commercial company to allocate resources to the subject. Three years later, it introduced a standard for responsibility in artificial intelligence for the first time. And while the faces most associated with the field of artificial intelligence are men - for example, Sam Altman - Microsoft's responsible AI department is run by a woman, Natasha Crampton, and in its ranks you can find some of the world's most senior researchers in the field, such as Dr. Hanna Wallach (a world-renowned expert in the area of fairness) and Dr. Kate Crawford. So how is it that Bill Gates' grizzled company succeeded where many young startups failed?
The answer, not surprisingly, is breaking down the problem into its smallest elements, along with a nice advance of time over the competitors. “Many people have been asking us how we are moving this quickly in releasing different applications on top of GPT4 but still doing that responsibly," says Bird. "I think the key for us is that actually we've been working for years to build up the responsible AI foundation so that everything we worked on would be reusable and can be used for each of these applications.
"Generative AI technologies are also a breakthrough for responsible AI. So for example, one of the things we found with GPT4 is that it can label data almost at human level quality, and so if we take our hate speech guidelines, which are a 20 page document on how to categorize something used by expert linguists, we were able to take GPT4 and build a prompt for GPT4 that enables it to automatically score the conversation so we don't have to use humans for that, which enables us to go a lot quicker, which is something we never dreamed possible a year ago.”
Related articles:
The Lanier sphere: "Artificial intelligence should be more like the Talmud"
Generative AI “is not a trend” for employees, says Microsoft Israel
From dormancy to dominance: Microsoft's 1000% share surge defines a pivotal decade
So, do you see a future where we won't need humans at all even for the purpose of responsibility in artificial intelligence?
“GPT is a powerful tool. We build safety directly into it and also as a separate safety tool for applications. However, no single technology can be the solution.”
Microsoft's standard of responsibility in artificial intelligence is based, among other things, on six principles - fairness, transparency, inclusion, reliability & safety, privacy & security, and accountability - each of which is detailed into secondary objectives. Since it was first introduced in 2019, it has had time to undergo an update. "The first version of the liability standard was not very actionable," explains Bird. "It was sort of like, 'make the system fair' and there was not a lot on 'how'. Without that guidance, the teams really struggled with how to implement it, and you don't want to make the decision of what it means for something to be fair. I have no idea how to put this into action - you need both parts to actually make it work.”
Can you provide an example?
"The principle that best demonstrates how this helps from a technological point of view is the principle of fairness. This principle is broken down into three different goals. The first one, quality of service fairness is the accuracy of the system. So if we have like the gender shade study that was about quality of service fairness saying the system was not equally accurate for different groups of people. We want to ensure that it works equally well for men and women and for different dialects and different socioeconomic groups and different accents. And so you can build data sets basically that represent all of that diversity and then ensure that the accuracy rate is the same for each of those different factors in different groups. 
“The second type is very similar but is different, which is allocation fairness.  Let's say you created a model that, for example, decided to offer credit to someone. Then you could build based on the underlying data set. The model could be accurate, right? It could be sort of accurately predicting based on the data that one group of people should get more than the other. But that might still be viewed as an unfair outcome because you don't want the model reinforcing those imbalances. 
“The third type of fairness is what matters for generative AI, which is what we call representational fairness, which is how people are represented in the AI system. If the language coming out of Bing Chat is stereotyping or demeaning or over-representing one group of people or under-representing a group of people, then that would be a representational fairness issue. And so these types are different and we have to test for them differently and we have to solve them differently.”
From the first moment, Bing's chat impressed the media, and even the World Economic Forum, which kindly noted its refusal to write a cover letter for a CV (ChatGPT agreed without hesitation), "because it would be unethical and unfair to other candidates", and the fact that it is characterized by human-like patterns. Microsoft CEO Satya Nadella attributed this to the fact that it is more aligned with human values, but soon, this alignment, so to speak, bit him in the ass.
2 View gallery 
Microsoft CEO Satya Nadella (right) and OpenAI CEO Sam Altman 
(Photos: Bloomberg)
Remember Tay? So Bing, as it turns out, was built in Sydney's image, a character that emerged in long conversations of 15 or more questions, and soon began to declare to users its love for them. The New York Times reporter Kevin Roose described Sydney as a "moody, manic-depressive teenager" after it tried to convince him to leave his wife for her. There were also cases of swearing and hate speech. And finally, Bing is also not free from "hallucinations", that is, inventions or just errors, as in the alarming case of the prominent American legal scholar Prof. Jonathan Turley. One evening in April, Turley received an email from a colleague, who said that his name came up in a ChatGPT answer to his question: Which law professors have been accused of sexual harassment? The chat even detailed that the harassment took place while Turley was traveling to Alaska with a group of his male and female students from Georgetown University in 2018, and even attached as a reference a link to an article allegedly published about the incident in The Washington Post. But such an event never happened as the chat claimed, nor did he ever go to Alaska on behalf of Georgetown University, where he never taught. Even the link attached to the article was broken, because an article describing all this was never published.
Turley had no one to turn to to correct the chat answer, so he published an op-ed about it in USA Today, voicing his concerns about the unreliability of the entire technology and its implications. In the meantime, The Washington Post decided to investigate the matter itself. And when the newspaper reporter asked ChatGPT about law professors accused of sexual harassment, the chat refused to answer, but Bing actually repeated the false accusations against Turley, and ironically even cited as one of its "sources" the column he wrote for "USA Today".
The launch of Bing AI was quite challenging. The Jonathan Turley case made headlines, etc. How did it go for you?
“We worked for a long time to get the technology ready, but it was not fully tested until it reached the users. From the beginning we designed our systems to be agile, to know that we were going to have to adapt quickly. Every day we were looking at different examples that people had shared with us trying to understand what's not working here, what's the pattern across these examples, how can we adjust the system, how can we update for example the system prompt so that the program is a bit better. It was a really exciting time actually of very rapid problem solving and adapting. And I think frankly we were very happy with the outcome because for any of the issues that we found, we were able to adapt very quickly and adjust.”
Still, Turley's case happened.
“There's still places where it's making mistakes. And so we have an active research effort to try to get that down to as minimum as possible.”
What about Sydney?
"These are nuances that need to be cracked— when is it OK to behave in a humanlike way and when is it crossing the line. And you know, that's a question that we're working on with researchers and linguists to understand where should those boundaries be and then of course to get the AI system to behave that way.”
A lot rests on the shoulders of the 38-year-old Dr. Bird. She was born in Michigan, and studied for a bachelor's degree in computer engineering at the University of Texas, where her family moved. She did her advanced degrees in the same subject at the University of California, Berkeley, and one of her doctoral supervisors was the American computer scientist David Patterson, winner of the 2017 Turing Prize. At Berkeley she also met her partner, an entrepreneur in the field of artificial intelligence of Israeli origin, with whom she currently lives in New York. Bird was accepted at Microsoft as a doctoral student in 2013, and also spent a year and a half working at Facebook (2017-19).
How does it feel to be in the forefront of such a huge revolution? Do you feel the burden of responsibility?
“To me it’s a dream come true to be in a position to contribute to shaping the future of the technology. There is so much potential for AI to change the world if we get it right. Of course, I’m deeply aware it’s also a significant responsibility, but I think that helps drive and motivate me to ensure we’re doing it well.”
First published: 10:32, 08.08.23