Clifford Chance is one of the biggest law firms in the world. In 2017-2018, it reported revenues of £1.62 billion. It has an impressive array of clients and an equally impressive record of legal success. But, like many organisations, keeping on top of the market is not easy.
Keeping the fee earners on the winning side requires data, lots of data, lots of complex data across multiple countries. Getting the most out of that data is a non-trivial task. To that end, Clifford Chance has its own data science department headed up by Mirko Bernardoni.
At the Spark+AI Summit in Amsterdam, Bernardoni talked to Enterprise Times about his team and some of the challenges they face.
Running a Data Science start-up
Clifford Chance has its own innovation lab. It delivers new solutions and technologies for the company. According to Bernardoni: “Our data science lab is organised internally similar to a start-up inside most large enterprises. We are working with, and part of the innovation lab in Clifford Chance. There are different initiatives like helping the lawyer to do their daily work. We started years ago to use software as a service to do business intelligence.”
Bernardoni called out how any idea has to be sold to the business in order to get support. This is not just about creating a nicer UX. The business will want to know: “How the idea is going to augment the human process.” Lawyers deal with large amounts of documents and text. If Bernardoni’s team get it right then they will be changing how the lawyers work on a daily basis. This is why the business needs to be sure of any new ideas.
Creating solutions is not just about changing how Clifford Chance works. Bernardoni pointed out that it is important to: “Understand what is the right level of investment.” This is particularly relevant if the solution will eventually be sold outside of Clifford Chance. He said: “One of our goals as a data science lab is we are looking to be profitable.” Those solutions are sold through Applied Solutions, a subsidiary of Clifford Chance that sells legal technology (lawtech) products.
Bernardoni summed up the process saying: “We define the investment, the KPI and when we have all this information we are able to take this to an internal group who give the go ahead. There are different stakeholders, different partners that need to check in. For example, if this idea has been already realised before, there is an existing product, then there is no need to re-invent the wheel. If it is something we are doing in another area, to work as one big company across the globe.”
Getting the data is not a simple task
One of the major challenges Bernardoni and his face is getting access to the data they need. Much of the data Clifford Chance holds belongs to clients. This means that the company needs client permission to use it. Data is also linked to client contracts. Each contract refers to a separate engagement so getting all the data from a single customer could mean dealing with tens, hundreds or even thousands of contracts. There are also other issues such as regulatory concerns over where the data is stored, used and anonymised.
Another problem is the diversity of sources for the data. ET asked Bernardoni about systems to help lawyers find precedents to fight cases. He replied: “The space is so vast. If you are doing a LDR (Legal Dispute Resolution), our guys are using more than one discovery tool to find information.” Even when the case relates to a single country, there may be multiple data sets to work through.
When permission is granted the next challenge is extracting the data from its sources. Bernardoni commented: “When you are extracting specific information from a document, for example contracts, they have a lot of information about dates, borrowers, and so on that are not in a structured format. This is because it is changing in the negotiation.”
One of the advantages of financial data is that it tends to be better structured than other data. From Bernardoni’s perspective, this means that the first step is automating the classification of documents.
Context is king but not easy to create
As with many other solutions such as voice to text and translation, context is a significant challenge for Bernardoni. ET asked how is the system taught context? Is Bernardoni using a pre-built understanding for context for language? How much time is spent on contextual analysis? One major challenge ET was interested in is dealing with industry specific language and terminology.
According to Bernardoni: “Contextual analysis is one of the first challenges when we implement a model. It is not only contextual in UK English, in American English law you have a big difference between the two.”
ET asked if this meant Clifford Chance had built its own context engine?
Bernardoni replied: “We don’t have a context engine but what you do usually is use a concept called embeddings. When working with text you convert the text to numbers. Machine learning models only understand maths. You start to count how many times a word appears in the text. But it is not giving you the contextuality that you are looking for.”
Creating the contextual links starts ingesting a very large body of data. Then, Bernardoni says: “You use a number of algorithms. One of these creates a numerical vector for each word. For each word there can be 300, 700, 1000 points. Each one of these vectors represents the word with each of the variations the machine encountered inside that specific text. In that way you have the context quality.
“You are able to understand if the word BANK is a Financial institution or a river bank. Vectors allow you to use linear algebra to calculate the distance between the words and say something like ‘If I’m a man and I’m a king and she’s a woman, who is she? And you get the answer, she is a Queen.'”
When it comes to unique words and acronyms Bernardoni says it is not easy. It requires the model to identify unique instances. These are then substituted and transformed to create new vectors.
Translation brings its own challenges
Clifford Chance doesn’t build every model itself. Bernardoni explained that when translating from Chinese legal documents to English legal documents, you have another set of problems. To do this there are a number of pretrained models from different services that can be used. Many come with as many as 10,000 sentences that are domain specific. The advantage of this, said Bernardoni, is it: “Allows you to translates a Chinese loan to an English one with accuracy.”
Bernardoni also stressed that this is not left to just the machine. Everything is then checked by someone who speaks both languages. He said: “The work is not about substituting a lawyer or their expertise. It is about augmenting their capability and removing all the mundane and repetitive tasks that they have to do. At the end of the journey you always have the legal services come in and check that everything is correct.”
This is the same approach that the medical profession is taking. It, rightly, is concerned with the suggestion that medical professionals are being replaced by technology. Patients are not ready for that step. The same is true in legal services.
Enterprise Times: What does this mean?
That legal firms such as Clifford Chance are investing heavily into data science is no surprise. It is an industry that is dealing with lots of complex data and arcane processes. Rationalising it to remove the mundane not only removes cost but frees up fee earners to add value. If the systems built can be sold on, this makes the creation of technology solutions doubly successful.
There is significant demand out there for new lawtech solutions. The UK Government has just awarded a £2 million contract to Tech Nation to help create new solutions. How far that will go remains to be seen. For Clifford Chance and its competitors, this is a market they are already taking advantage of. New entrants will need to be wary of the heavy hitters.