Last year, I took a look at a new depth-sensing sensor system called Clarity from a company called Light. Originally developed for smartphone applications, Light turned a few years ago to develop its technology for automotive applications such as Advanced Driver Assistance Systems (ADAS) and Autonomous Driving.
There followed a long comment thread with lots of questions about how Lights technology works. The folks at Light read the whole thread and then talked to me to answer your questions.
The question of the Ars commentary fell into four themes: whether Clarity can function in low light; the similarities with human vision and parallax; Clarity’s accuracy and reliability compared to other sensor modalities like lidar, and whether it resembles Tesla’s vision-only approach.
Headlights are required to drive at night
As for how Clarity performs at night and in low light, the answer is pretty simple: We are required to drive with headlights on at night. “Most of the infrastructure for the entire automotive industry has the assumption that there is some exterior lighting, usually light on the vehicle,” said Prashant Velagaleti, Lights’ product manager.
Similarly, there were some questions about how the sensor system handles dirt or occlusion. “One of the advantages of our approach is that we do not specify in advance the cameras and their locations. Customers decide per vehicle where they want to place them, and so you know, many in passenger cars will place them. Them behind the windshield,” Velagaleti said. me. And of course, if your cameras are behind the windshield, it’s trivial to keep their vision free, thanks to technology that has been around since 1903 that allows drivers of non-autonomous vehicles to drive in rain or snow and see where they are on their way.
“But when we talk about commercial applications, like a class 8 truck or even an autonomous shuttle, they have sensor capsules, and those sensor capsules have whole cleaning mechanisms, some of which are quite sophisticated. And that’s exactly the purpose – to keep that thing operational so as much as possible, right? It’s not just about safety, it’s about uptime. And then if you can add some cleaning system that keeps the vehicle running on the road all the time, and you’ve saved net-net, it’s an advantage. ‘has saved money,’ said Velagaleti.
“Everyone just assumes the end indicates the first state right? And we think those of us who really tackle this from a pragmatic point of view, it’s crawling, walking, running right here. Why should people not take advantage of safety systems that are L2 + with what brightness can offer by adding one more camera module, suddenly your car is much more secure.We do not have to wait until we reach all four for people to take advantage of some of these technologies today, “Velagaleti told me.
How does it compare to Tesla?
“When it comes to Tesla and Mobileye, for example, you know that both of these are machine learning-based systems. So as we like to say, you need to know what something is in the world before you know where it is. And if you can not figure out what it is, you are failing, “said Dave Grannan, Lights co-founder and CEO.
Unlike an ML-based approach, Clarity does not care if a pixel belongs to a car or the road or a tree – that kind of perception happens further up the stack. “We’re just looking at pixels, and if two cameras can see the same object, we can measure it. It’s basically a tagline. Without knowing what the object is. Later, down in the stack and the perception layer, you will then use both image data and depth data to better determine what the object is and it is necessary for me to change my decision, “explained Boris Adjoin, senior director of technical product management at Light.
And no, that’s not to say that Light is a waste of time. “Machine learning is a great breakthrough. If you can feed machine learning with this kind of sensor data, per frame, without any assumptions, it’s when real breakthroughs start to happen because you have the scale of every structure in the world. It’s not. something that really every machine learning model in the field today benefits from. Maybe it’s trained in 3D data, but it typically does not get very much 3D data, because as you’ve seen with lidars, they are accurate, but sparse, and they do not look very far away, “Velagaleti remarked.
Meanwhile, Tesla’s system uses a single camera. “Tesla claims a billion miles of driving, and they still have these flaws that we see very often with the latest release of FSD. Well, that’s important because you require way too much of ML to have to deduce things like depth and structures of the world and it’s just, it’s a bit opposite.It’s backwards.And again, I think it made a lot of sense for people to get something on the market that does something.
“But if we really want the next change to happen, you can either think that there might be a lidar on the market that will provide the kind of density you see here at a price that everyone can afford. It’s robust in car environments. It can be manufactured just like in volume, or we can add another camera and add some signal processing and do it quickly. We can not just keep asking for a single camera with inferencing or structure from motion or another “technology like this about dealing with a very complex world. And in a complex application area – I think driving a car is not easy, we do not let a 4-year-old drive a car,” said Velagaleti.
“I think Tesla has done a good job of highlighting how sophisticated a training system they have, you know, and that’s very impressive. I do not think we’re here to criticize Tesla. They’ve made it their own. chip, which is in and of itself, after doing it before, it is non-trivial.So there is a lot that is very impressive in Tesla’s approach.I think people so sadly assume that a Tesla does certain things that Tesla does not say, so Tesla does not do stereo, “Velagaleti explained.
What about Subaru’s EyeSight stereo vision?
Grannan pointed out that the principles of stereo vision have been well understood for quite some time. He admitted that Light has not done as good a job as it could have done in explaining how its system differs from Subaru’s EyeSight camera-only ADAS, which uses a pair of cameras mounted in a device that lives behind the rearview mirror at the top of the windshield.
“Really, what we’ve solved comes down to two things. The ability to handle these wide baselines of cameras far apart, because when your cameras are far apart, you can see farther – it’s just physics. In the Subaru EyeSight they have to keep the cameras close together because they have not figured out how to keep them calibrated.It becomes a very difficult problem when they are far apart and not on the same piece of metal.It is one.The other thing , we have done, mostly stereo systems are very good for edge detection, seeing the silhouette of the car of the person on the bike, and then just assuming that the depth is the same throughout, right? So it’s called regularization or filling in. We developed signal processing algorithms that allow us to get depth for each pixel and frame. These are now much richer details, “Grannan explained.
“I think we really are the first robust implementation in stereo,” Velagaleti said. “What you will find across the board, Continental, Hitachi – I will not be too specific about any provider’s technology – you will see that they separate their cameras by only about 40 centimeters. And the reason they do that is “It’s about the size of a matrix they can support. They have to build it very rigid in order for it to work,” Velagaleti explained.
“And if you think about it, the problem gets exponentially more difficult when you go further apart, as Dave said, because what’s the size of a pixel and a camera module today? It’s about three microns. It’s very small, now? Now we’re going to see objects very far away.So if you place cameras far apart, the intention is that you try to see something exactly far away, which means something in most applications.But if you are now away with a few pixels, meaning you are away with just a few micrometers, you do not get exact depth, “Velagaleti said.
“So what Light has solved, which is what comes to the robustness of things, is that we’ve been able to solve for each image, we find out where the cameras really are, how the images relate to each other, and then we derive the depth very precisely.So basically we are robust, right? And that’s how you can literally put two independent cameras without anything rigid between them.And we’re still working at a sub-pixel level, which means “We are sub-micron in terms of how we find out where things are in the world. And that’s just never been done before,” Velagaleti continued.
This calibration process is apparently simple to perform at the factory, but the exact details of how Light does it are a trade secret. “But by virtue of being able to solve our calibration, it gives us resilience and it gives us flexibility. So that’s how I can tell you for any customer who comes to us, OEM or Tier One [supplier], they can decide where they want to place their cameras, or how many cameras they want to place, and what kind of cameras they want to use. That’s because we solve for calibration, “Velagaleti said.
“The other important thing that I would like to highlight is very different from others – we do not make assumptions. So what Dave said about edge detection and padding, right, basically most stereo systems today, they measure a certain part of “what they see And then they basically guess everything in between. Because they can actually not do what we are capable of doing, which is basically measuring every pixel we get and deriving the depth for it,” Velagaleti told me.