GROK2
From Sluggish Software Wiki
Contents |
Introduction
Aims of the project
The primary aim of the GROK2 robot was to build a platform suitable for experimenting with SLAM methods and 3D sensing using stereo vision. The robot needed to be:
- Large enough to carry a PC motherboard or laptop, so that a significant amount of computational power could be applied to vision problems.
- Have the cameras mounted at an elevation approximately similar to a typical human adult, so that the robot has a human-like perspective on the world, and can inspect things which are typically of interest to people, such as desk surfaces or door handles.
- As with previous robots, use as many off-the-shelf hardware and software components as possible, in order to avoid re-inventing the wheel.
- Use electronics components which are known to work with a Linux operating system.
- Develop the software in as generic a manner as possible such that it could also be applied to other kinds of robot with different physical forms, but typically assuming indoor operation on an approximately flat floor.
- As usual take an open source approach, such that ideally any results achieved are replicable/verifiable and others can participate if they wish.
Like previous projects GROK2 was intended to be a research purpose robot (the term originally used by Donald Michie), not having any immediate or obvious commercial applications. The project began in late 2007, following on from the earlier GROK1 telerobot.
Some background history
In the mid 2000s it looked as if PC based robotics was on the verge of emerging in a similar manner to the way that home computers did in the 1980s. The ideal scenario would be that anyone with sufficient interest could buy a generic robot platform at a low cost then begin experimenting with it using standard software tools and methodologies which are already familiar and readily available, and from an initial community of tinkerers a new industry would begin. A persistent gripe which I'd had for many years previous was that in robotics there was very little by way of standing on the shoulders of giants and too much re-invention from scratch, which meant that results were not easily comparable or verifiable. In a situation were there is little opportunity to verify, all sorts of grandiose claims and proclamations can be made by researchers, and you really just have to take their word for it as an article of faith.
A company called White Box Robotics, founded by Tom Burick, looked credibly as if it could deliver a generic PC robot platform and Burick himself did a respectable job of popularising the idea that the time for consumer robotics had come. I had been expecting (and waiting) to purchase one of these robots for experiments with SLAM and stereo vision, but unfortunately White Box never managed to deliver at the low price point initially envisaged.
So, failing a completely off-the-shelf robot equivalent of an IBM compatible PC architecture, by 2006 I knew that I'd have to build my own. I had an existing PC Rover chassis which could have been retrofitted with newer electronics, but eventually decided on a newer AL-101 chassis which was taller and therefore would provide a more sturdy platform for mounting cameras at a head height of approximately 1.5 metres.
Source code
Prerequisites
Construction
Parts list
Control system
1 x PhidgetAdvancedServo 8-Motor
1 x Household light switch (on/off switch)
7 x 5 pin male to female DIN leads
1 x Empty teabag container tin (Tetleys)
Sensor payload
1 x PhidgetInterfaceKit 8/8/8 w/6 Port Hub
1 x Microsoft Kinect
Chassis
1 x AL-101 Mobile Robot Chassis with HEDS encoders
1 x 100mm diameter drainpipe
2 x RC Servos
1 x Roll of 5 or 6 core telecomms cable
4 x Green momentary switch, normally open (buttons: start, location, learn, set, power)
1 x Red momentary switch, normally open (stop button)
2 x Terminal blocks (small)
2 x Terminal blocks (large)
Power system
1 x 20Ah deep cycle golf buggy battery
1 x 6A 12V battery charger
1 x 500W 12V DC to 220V AC inverter
1 x 12V P4 Power Cable and Molex Y Splitter
1 x 3A 12V Transformer
Information Processing system
1 x Intel D525MW Atom Mini-ITX Motherboard
1 x 2GB DDR3 1333 SODIMM
1 x 16GB USB flash drive with Linux Mint installed
1 x PC speaker
Deprecated
1 x USB to I2C communications module
Why use Phidgets?
Seasoned roboticists will by now surely be asking - why use Phidgets? Phidgets are relatively expensive compared to more traditional home-brew electronics or micro-controller based approaches. Although I could have gone down the home-brew route - for example making my own motor control hardware based upon existing documented electrical schematics - I wanted to avoid this kind of development as far as possible. Sticking to something which could be assembled relatively quickly by someone with little of no electronics expertise - even if that results in marginally more expense - I viewed to be a better way of proceeding, since it facilitates independent reproduction of the same results and allows me to concentrate on the software aspects of the project. The Phidgets also have a very good software API which seems quite straightforward to deal with, and works without any fuss on Linux operating systems.
Head Designs
Initial version of the head
A first idea for the GROK2 head was to simply use a single stereo camera mounted on a square Y-shaped frame. A tilt servo (on the right) could tilt the camera through a 180 degree range of motion, such that the camera would be able to see both forwards, backwards, and various angles down towards the floor (including seeing the robot itself). Although later a dual stereo camera design was used this still remains as a possible option which would minimise the complexity and number of cameras used. Drawbacks to this design would be excessive USB cable flexion and an uneven weight distribution, although a sufficiently robust construction would overcome the asymmetry issue and a sub-miniature sized servo could be used. With this design protective guarding to prevent damage to the cameras might also be difficult to engineer, making it ok for research purposes but not ideal for use in any situation which might present physical challenges.
The initially built version of the head is shown here, with a single stereo camera mounted. It used Creative Webcam NX Ultra cameras, which were only USB version 1 devices and which as of 2008/9 were no longer supported by the versions of Linux which I was using. The cameras can be panned and tilted. Unlike previous robots, which were more anthropomorphic, I deliberately kept to a very minimalist mechanism, which could easily apply to other types of robot and would not be hard to independently replicate.
Dual stereo camera head
The current version of the head uses two stereo cameras (four cameras in total) looking in opposite directions. This provides a large amount of range data which can be used for localisation or mapping and is also well balanced in terms of mass distribution, which places minimal stress on the pan axis servo. Both stereo cameras are oriented using the same pan and tilt axis servos. A covering can then be placed over the top to protect the cameras from any collisions or glancing blows.
The cameras used are Logitech Quickcam 9000s, and their mounting is somewhat peculiar due to the odd shape of the circuit boards. The image quality is good compared to other webcams which have been previously tried, although the lack of manual lens focus is not ideal. The separation (baseline) between the cameras is 120mm.
RGBD Sensor head
In January 2011 an RGBD Kinect sensor head has added. This used the simple pan and tilt concept which I originally thought of using for stereo vision. This sensor is considerably heavier than the previous stereo camera arrangement, so some extra attention needed to be paid to avoiding any wobbliness. To overcome the issue of uneven weight distribution a counterweight was initially tried, but a better solution was to add a collar made from aluminium sheet on which the pan axis could rest. This adds some small additional friction to the pan axis, but not enough to cause problems.
| | |
Another issue with this head design is that the sensor cannot look directly downwards, and viewing when tilted downwards could be partly obscured by the upper "neck" section. To give the robot a better downward view, such that it can view the floor ahead of it, the front section of the "neck" was cut away and a flat piece of aluminium substituted.
An advantage of using this sensor is that it will allow future development to leverage any contributions made by others using the ROS system. Combined effort via the open source methodology is far more likely to lead to success than the kind of bunker mentality or "not invented here" syndrome which characterised a lot of robotics research in the past.
Other experimental heads
A number of other experimental head designs have been tried. For omnidirectional vision silver Christmas decorations were used, which are very cheap and easy to acquire.
Simple omnidirectional vision
Here a camera looks at a single hemispherical mirror. This provides a good all-round view of the environment.
This is good for detecting motion in the environment which might indicate the presence of a person, but no structure of the environment can be inferred without movement of the robot. Attempting to project the image onto the ground plane and then employ dodgy heuristics to try to infer the locations of verticals may work in some environments, but doesn't seem to be a good strategy in general.
Omnidirectional stereo
Here a single camera looks at a pair of hemispherical mirrors, where a hole is cut in the lower mirror such that it allows both mirrors to be observed at the same time. This provides a vertical stereo baseline.
- View from omnidirectional stereo camera.
- Unwarped view from omnidirectional stereo camera.
- Anaglyph from omnidirectional stereo camera.
In theory this design has some promise, since it would be possible to obtain stereo ranges for the entire surroundings using only a single camera. In practice there are some limitations. Any flexibility in the tower structure can cause big optical changes which render the camera calibration useless, and the loss of resolution due to the high distortion makes accurate stereo ranging difficult. If this design is to succeed then it requires more careful engineering.
Omnidirectional stereo with multiple mirrors
Another possible stereo approach, rather than using multiple cameras, is to use multiple mirrors. Provided that enough resolution is available this is effectively the same as a multi-camera system, and avoids the issue of camera synchronisation.
In principle the increased number of views should help to disambiguate stereo matches (similar to trinocular vision). In practice I found that it's difficult to make this system work well. A video showing the view from this head arrangement can be seen here.
Poor Man's LIDAR
Another alternative was to try to make a LIDAR-type system, although at a much lower cost that was usually the case (at least prior to 2010). One idea involved using long range infrared sensors - the Sharp GP2Y0A700K0F - with a range of up to 5.5 metres.
This worked up to a point, and a representative video can be seen here. Like all 2D scanning LIDAR systems this does only deliver a very limited amount of information about the environment per sensor sweep. The main issue is that the cycle time of the sensor sweep needs to be quite slow, otherwise excessive range uncertainty results (the response of the sensor is not very rapid). It's also quite noisy to have servos continually moving.
Another type of Poor Man's LIDAR tried is a method which is at least a decade old, and this is to have a downward looking camera observe a laser line projected horizontally from the base of the robot. Only a small amount of experimentation was done on this, and whilst it would have definitely been a viable method of obtaining range data similar to a 2D LIDAR the lasers which were available were not particularly safe (class 3a) and so I discounted this option on safety grounds alone. There are limitations in terms of camera resolution for detecting the laser line. Natural illumination can also make it difficult to detect the laser line, although a suitable optical filter tuned to the frequency of the laser could perhaps have been used to reduce this problem.
Summary
After having tried various alternative head designs probably the RGBD approach will turn out to be the most practical, in terms of depth resolution, overall cost and minimisation of mechanical complexity. Although the Kinect sensor is quite large and heavy, since this is a first generation device it seems extremely likely that smaller and cheaper equivalents will become available due to competition in the games console and webcam markets and a new level of performance expectations amongst consumers.
Seeing in 3D
With stereo vision
Camera Calibration
Calibration of the stereo cameras is performed using the v4l2stereo utility, which uses the OpenCV camera calibration routines. Various camera calibration methods were tried, in an attempt to devise a procedure which was as simple as possible for the user, but ultimately the OpenCV based method seemed to be a good compromise between accuracy and ease of use.
A video showing the calibration procedure is available here. The command is typically of the form:
v4l2stereo --dev0 /dev/video-front-left --dev1 /dev/video-front-right --calibrate "6 9 24" --calibrationimages 50
Where the calibrate option specifies the number of squares across and down the pattern, as it appears presented to the cameras, with the third parameter being the square size in millimetres. The calibration pattern can be downloaded here.
To roll, or not to roll
Why not mount the cameras at a 45 degree roll angle, as with the stereo cameras on SEEGRID robots? As it turns out the rolled geometry is just an artefact of the stereo correspondence algorithm used by that system. More recent dense stereo correspondence methods aren't heavily reliant upon detecting features which are oriented non-parallel to the baseline.
Point clouds
From a fixed location the robot can acquire a detailed view of its surroundings by panning and tilting the cameras in situ. The below images show a colour point cloud model generated from the forward stereo camera by combining six views at different pan and tilt angles. Using both forward and rear cameras a near complete view of the environment can be generated in a short amount of time. Some other examples can be seen here. The range accuracy is not as high as what could be expected from a laser scanner, but even without grid mapping or sensor modelling the raw ranges still give a pretty good idea of the shape of the environment, and a significant percentage of image pixels are able to be ranged using dense stereo methods.
Videos
Some videos of point clouds created from a fixed location.
Mapping the Environment
Considerable progress was made on the mapping problem in early 2011, due to a happy confluence of events - the availability of the first affordable RGBD sensor (Kinect) and the relative maturity of the ROS robot operating system. For the first time this opened up the opportunity for a hobby robot with an onboard PC and inexpensive sensor to be able to achieve comparable mapping performance to research or commercial systems previously costing many thousands of dollars. It was also an example of innovation arising directly from an open source methodology in an area where proprietary efforts, such as Microsoft's Robotics Studio or Webots, had been struggling to gain traction.
Gmapping
In early 2011 I used the gmapping SLAM algorithm to generate 2D occupancy grid maps from the 3D point cloud data returned from the Kinect RGBD sensor. This is a particle filter based method, and produced quite acceptable results, probably good enough for practical use.
This involves creating a fake (virtual) laser scanner positioned at the base of the robot from the 3D point cloud produced by the RGBD sensor. The RGBD sensor is angled downwards towards the ground such that it obtains a good view of the immediate vicinity, and anything above 20cm tall is considered to be an obstacle and used to update the fake laser scan. By converting the 3D points into a laser scan this makes it easy to utilize a variety of well tested legacy SLAM algorithms intended for 2D scanning laser rangefinders.
To generate the map the robot is manually driven through the environment using an attached joystick, creating the map and localizing within it as it goes.
A disadvantage of this method, like many other SLAM algorithms, is that it's unable to be continuously updated over the operational lifetime of the robot. So there needs to be a discreet mapping phase after which the resulting map is saved and cannot be subsequently modified.
Factoids
- The robot's name comes from the term invented by Robert A. Heinlein in the book Stranger in a Strange Land.
- During occasional electrical power cuts the GROK2 robot doubles as a candle stand. Its metallic upper surface makes it unlikely to catch fire.
- The drainpipe neck is actually two pieces of plastic guttering connected together. Drainpipe would have been ideal, but I was unable to buy a drainpipe short enough to fit into the back of my car.
- The stereo cameras tilt on ordinary door hinges.
- GROK2 is also known as "the giraffe" or "ET".
- The "neck" is bolted to the chassis with steel brackets normally used to support shelving.
- The control electronics (mostly Phidgets devices) are contained within a tin box known as the "Tetley brain", because it was previously used to store teabags. This makes the electronics easily portable to other chassis if necessary.
- Whilst testing omnidirectional stereo vision the pan axis servo electronics blew up after it hit a hard stop (the mounting bracket) and continued to try driving through it.
References
- Robot Spatial Perception by Stereoscopic Vision and 3D Evidence Grids, Hans Moravec
- Efficient Large-Scale Stereo Matching, Andreas Geiger, Martin Roser and Raquel Urtasun



