Believe – Part 3, Having Vision
Any intelligence is effectively useless without an ability to perceive that which is external to the intelligence. Humans often think that taste, touch, sight, hearing, and smell cover the gamut of what’s important to sense. That sort of limit on imagination also extends to furnishing AI and and AAI with mechanisms with which to experience their world. With the Believe project, we’ll try to do better. While much of what is discussed here concerns vision, many of the ideas can also be applied to other senses as well.
Nature
I often look with skepticism when designers quickly jump to natural analogs. This isn’t to suggest that millions of years of evolution can’t achieve marvelous things, but often what is achieved isn’t readily applicable to non-biologic organisms. In fact, given the constraints imposed by natural selection, what is a great adaptation in one environment for a species will spell its doom in another environment.
That being said, I’m going to rely on nature as a model for AAI vision (and other senses). Why? Because it’s been demonstrated that vision has independently evolved several times on several evolutionary branches. The end result has been remarkably consistent. This tends to imply that this is the most reasonable course to take for our project.
Structure
We often become overly concerned with the light sensors themselves (e.g. cameras). I’m not at all certain that is the correct formula. Basically, the gathering surface (plane, film, retina, etc.) is pretty simple: a bundle of receptors sensitive to particular frequencies of electromagnetic radiation. These receptors transmit the presence or absence of EM and optionally the relative degree received. This data can be either analog or digital.
Another option that is quite flexible is the mechanics/configuration of the light-gathering system as well. Many predators prefer to use two single-lensed eyes situated for an overlap in coverage sufficient to obtain mechanical and/or visual parallax data into order to accurately place a target in the external world. Prey animals often sacrifice this stereoscopic triangulation in favor of increased visual coverage. The invertebrate world finds several variation on this theme as well as multi-lenses-per-eye and multi-eye systems. Depending on the needs of the organism, all are effective.
Processing
The key to vision isn’t the how of image gathering, but the processing and interpretation of the images gathered. This is where nature gives us a lot of insight and where, sadly, so many of our implementations have come up wanting.
Our typical approach is rather simplistic: we capture an image, or small sequence of images, and then throw a lot of processing cycles in a brute-force attempt to interpret the data we have gathered. This method is often very inaccurate in a general sense, though it can be made to work for specific well-defined tasks.
Why does nature do this so much better than we can? Basically because we’ve approached the problem from too much of a hardware + code point of view. We forget to understand how nature has dealt with the problem.
Generalized computer vision cannot exist without a sufficient AI or AAI foundation.
Let me say that again: generalized computer vision cannot exist without a sufficient AI or AAI foundation. We have been wedded to the idea that film, CCDs, and other receptors are sufficient substitutes for the receptors in a retina (or similar biologic structure). This simply is not the case. Organic vision (as well as taste and smell) rely on special-purpose neurons to receive and pre-process EM radiation.
That’s right. Generalized computer vision isn’t about wiring a camera to a computer, it’s about teasing out neurons to act as the wires and receptors. These aren’t just simple conduits to allow data to move from camera to processor, but a string of pre-processors that process data along their entire route until they reach the larger neural processor for interpretation.
Why?
Because organisms require vision to be efficient and accurate. Their lives depend on it.
Sifting the Data
OK, so there is pre-processing. What sort of pre-processing?
That’s the thing, you see—we still aren’t entirely sure. On an obvious physical sense, we know that in humans the optic nerves from both eyes merge and then separate before reaching the portion of the brain dedicated to vision processing. Obviously this conveys an advantage, this merging of data.
In studies of the visual cortex, we find a lot of interesting specializations. Amazingly, it seems much of the processing is in what we could consider grayscale (or possibly just simple black and white). Motion is detected: vertical, horizontal, angular, spatial. Color is sorted through. Visual cues are tied directly to neurons throughout the brain (the visual neurons are localized, but there is some desegregation to other parts of the brain as well).
While on the surface is appears to be not that different from some of the methods we use now, the similarity is only superficial. A fair amount of this separation of visual data is happening not in the “brain”, but in the optic pathways. As the data flows from receptor to processor, it is being sorted both by the physical wiring as well as neural experience.
Neural Sensation
In many ways, it’s simply our limited mindset getting in the way. We think of nerves as wires because that is the analog that is taught when we are in school. The peripheral nerves, especially, are mostly if not exclusively transmissive. Perhaps the problem comes from the name “optic nerve”, which essentially divorces it in our minds from the central nervous system. If, instead, we think of the optic pathways as specialized neurons, our perspective changes.
See, people wonder sometimes how people with injuries to their spinal cord still have some functions available to them below the injury. After all, isn’t the spinal cord just a bundle of nerves not that much different from that zip-tied mass of network cables coming from the IT department? Actually, it’s very different, what with it actually being part of the central nervous system and all. Many locally mundane functions are situated outside of the brain proper. The reason this is possible is because there is some neural processing going on between “ordinary” nerves and the spinal neurons (though the brain stays informed).
That’s the crux of much of AAI sensation: having distributed neural control…albeit with neurons more specialized to efficient transmission than processing. Brains are compact for efficient processing of information via their trillions of connections. That level of complexity is inefficient the farther away you travel from this central computer. This gets demonstrated by the most neuron-like nerves being those closest to the brain, while the less neuron-like nerves are more distant, with intermediary neurons in the spinal chord acting as a (ahem) backbone for efficient communication.
Development
All of this brings the realization that senses and cerebral neurons can’t be developed separately. As they share structure and function, they have to grow from a standard base.
Nature has established a pattern that some neurons are better suited for some tasks than others. I’m not prepared to accept that as a requirement for AAI, but there is some logic in considering a degree of specialization. To this end, I’d suggest that along with the generalized design there also be a “suggestion” section of the code where a collection of neurons can be instructed to focus on a task unless it is forced by the larger network to alter itself to do something else. It would be using this mechanism that vision can be developed from general neurons.
Just as the central “brain” has to be trained in a child-development way, so too will vision. Perhaps even more than the “brain”, vision will have to evolve: first from discerning light vs dark, to general shapes, to finally being able to process with accuity.
There is a direct correlation to using this method to human development. Newborn humans have remarkably poor vision. Much of this is due to the visual neural net not being fully formed to deal with the realities of sight. Beyond issues of focus and stereoscopic vision, there is a gradual development over the course of months from seeing generalized blobs (humans seem predisposed to a crude facial blob configuration) to more specific blobs, to being able to judge distance, and finally to what we’d consider to be fully acuity.
Artificial vision will follow a similar development pattern—basically it has to teach itself how to see. The advantage of the artificial system is that (theoretically) it can be replicated in future versions once a baseline of development is established.
Perception
Of course one of the problems of implementing vision and other sensation via specialized neurons is that the human operators will have no way of knowing what the AAI is actually perceiving. It isn’t really any easier to capture and test an established and working AAI visual system than it is to do so in a human. The AAI’s perception will be its own.
This raises significant problems if the receptors for an AAI are extended beyond normal human experience. If the AAI can not only perceive the visual spectrum but also any additional subset within the range from radio waves to X-rays, it’s interpretation of the world around it would be so alien to us that there would be no good way for us to fully understand that AAI’s world.
In a similar way, AAI isn’t limited to the five conventional senses. It would certainly also have some sense of data transfer. Perhaps it would also have a sense of electronic noise or other EM experience outside of vision. Maybe a sense of en-/de-cryption would be standard. It’s important to remember that however much we try to make AAI’s be at least as capable as us, they are still a different species.
Well, if AAI’s perceive the world so differently, how can we ever hope to get information out of one? Well…they have an advantage that we don’t: they can have peripherals appended to them. They can have a “dumb” computer attached to help them with communicating with us, but instead of taping things out on a keyboard, they’d simply network with the device. Much more efficient.
In many ways, what they do will be very much what humans do, but often it will be faster because they will be specifically designed that way. Complex communication isn’t exactly rampant in the biological world, but it’s intrinsic in the computer one. AAI’s, properly executed, needn’t have our limitations.
Seeing is Believing
I’ve long felt that the core ability to know that an AAI has been established is with its ability to perceive the world. Since perception and processing are conjoined, in this context they have to essentially be extensions of each other.
There is a reason why all living creatures, even amoeba, have some ability to perceive their environment. It’s also instructive that the methods of perception do not exist separate from the organism. Perception is part of what defines an organism. Their senses dictates the limits of their world.
A line of thought posits that a sufficiently general AAI system can adapt itself to however it gets data from the outside world. That may indeed be the case. I contend that it’s likely an inefficient method. Again, the main processing unit would be tasked to do a lot of mundane pre-processing. Perhaps that would result in innovative and uniquely AAI methods of solving the senses problem. I think it more likely to become a bloated specialization instead of an elegant specialization; sort of like the difference between finding the answer to 2+2 via the Monte Carlo method versus arithmetic.
So, that’s really the key for us. For there to be computer vision we have to extend the neural network to the actual receptors (or at least to each pixel of the receptor) and have a bundle of neurons that will both come predisposed to sort the data but also to adapt to that data in conjunction with messages received from the visual cortex. As it learns, these “optic nerves” will increasingly lose their generalization as their function is set.
Until we do that, computer vision will be little better than humans looking at consecutive frames in a strip of motion picture film. There is information to be had, but much is lost in the conversion.
Leave a Reply