Introduction to the 3D Imaging of Pre-Columbian
Philosophy is the theory of multiplicities, each of which is composed of actual and virtual elements. Purely actual objects do not exist. Every actual is surrounded by a cloud of virtual images.
--G. Deleuze, The Actual and the Virtual
The Library of Congress’ Geography and Map Division is home to a large collection of Pre-Columbian archaeological artifacts donated by the collector Jay I. Kislak, many of which are on display as part of the Exploring the Early Americas exhibit in the Thomas Jefferson Building here in Washington, DC. The artifacts that make up the collection range in dates from the Olmec culture around 1000 BCE, to the classic period Maya (300-900 CE) and Aztec civilizations, including many objects that date from the period just before contact with the Spanish in the late 15th century.
As the Curator of the Jay I. Kislak Collection, I am always looking for new and innovative ways to make this group of archaeological artifacts more accessible to scholars and educators around the world, who cannot, for whatever reason, make the trip to Washington, DC. Those that can make the trip are always welcome to use the Kislak Study Collection, located in the Geography and Map Division, where the artifacts not currently on display in the gallery are stored.
One way that I am attempting to make the collection more available is through the use of three-dimensional imaging. In the case of material artifacts, two-dimensional images, while helpful, do not allow for the complete examination of an object and, moreover, can often distort its dimensionality and structure. In order to make proper attributions and comparisons with similar objects in other collections, it is critical for scholars who cannot examine an artifact in person to have realistic views “in the round” of what they are studying. To this end we have embarked on a series of experiments here at the Library of Congress that uses three-dimensional structure from motion imaging to reconstruct scaled and true to life models of the artifacts in our collection.
Structure from motion imaging is a complex technique that allows for the extraction of three-dimensional information not only from single objects, but also from the architectural features of buildings and ruins, or from landscapes, all derived from a series of two-dimensional images. The technique was developed for computer and robot vision and is the digital equivalent of the task that the brain and eye perform as we humans move through a three-dimensional world using two-dimensional projections.
Figure 1: Hollow Kneeling Male Figure, West Mexico, Jalisco, Terminal Pre-Classic Period, 200 BCE- 300 CE. Kislak Collection 0012. Geography and Map Division, Library of Congress.
I have called this short review of the techniques and applications of this type of imaging to archaeological objects, “In Maudslay’s Shadow,” in order to recall his use of the most up to date technology of photography in the late-19th century. Alfred Maudslay (1850-1931), over the course of several decades and endless difficult journeys into the jungle took images of many of the most important Maya archaeological sites and inscriptions recently being unearthed in Central America. Besides his photographs, which were critical to the decipherment of the then unreadable Maya script, he also made many three-dimensional casts of stela. Maudslay’s images and models ushered in an amazing time of discovery in early American archaeology and revolutionized both the practice of both field and museum imaging. For this reason, those of us trying to use new imaging technologies today stand in Maudslay’s long shadow.
Figure 2: Maudslay’s Photograph of Stela H, Copan, 1885,
Geography and Map Division, Library of Congress
Techniques and Examples
In principle the calculations required to do structure from motion imaging (SfM) are algorithmically complicated, and are related to photogrammetric techniques involved in sorting out the difficult geometry of remote sensing images of the earth and other planetary bodies taken from satellites.
To make a three-dimensional model of an archaeological object using SfM, a group of two-dimensional images are taken from a variety of vantage points and are processed through a pipeline of computer programs that create a three-dimensional dense point-cloud representing all the various surfaces that make up an artifact.
Figure 3: To begin the process of SfM imaging a series of photographs are taken with a high-resolution digital camera from various vantage points. This process is done in order to get a complete set 2D images of the object from which point-correspondences are calculated and used to reconstruct the 3D object
The first step in any attempt to make three dimensional models from two dimensional photographs is the acquisition of the digital images themselves. As in all things computational the better the input data the better the resulting model. For Structure from Motion imaging the digital inputs must be taken by either rotating or walking around the object. Most of the images should be overlapping, and from a variety of angles, in order to ensure that the resulting point clouds cover the entire surface of the object that is being modeled. The images will be used to determine various key points, features and point correspondences used to generate the first sparse point cloud model of the object and are critical to all the calculations that follow.
There are many algorithms currently available that can calculate point correspondences which go by the generic name keypoint detectors. Perhaps the best, and the one used here is called SIFT (Scale Invariant Feature Transform) and was written and developed by David Lowe. SIFT is typically used for object recognition in computer vision and has many features important when trying to accomplish 3D reconstruction. The keypoints that are selected and matched across multiple images are invariant to scaling and other kinds of transformations like rotations allowing the algorithms to be used for uncalibrated camera images.
Figure 4: Camera Positions and Key points output from VisualSfM
Figure 5: Output of calculation yielding camera position and initial 3D reconstruction for the Kislak Olmec Figurine
Figure 6: Image Matching Matrix
The exact matching of points across the series of digital images can be represented in a feature matching matrix that gives a visual sense of the connection between images. Typically SIFT will detect tens of thousands of features for even low resolution images and hundreds of thousands of stable points for a 10-15 megapixel image.
The actual process of making three-dimensional models using structure from motion consists of two parts. During the first, the computer examines the two-dimensional photographs and finds matching points in multiple images. The points are then used to calculate the actual position in space where each of the images was taken. Once the positions of each of images are known, the location of the points from all the images are plotted in space, yielding a dense reconstruction of the shape of the object that was photographed. The result is a point-cloud that is very similar to the kind of data one would extract from a laser scan of an object.
In the case of the models being made here at the Library of Congress several different computer programs are being experimented with including VisualSfm developed by Changchang Wu while in the Department of Computer Science at University of North Carolina at Chapel Hill and who is currently at Google. His program, from which the point cloud images shown above where reconstructed uses SIFT combined with a graphic processing unit (GPU) developed by Sudipta N. Sinha and others.
Reconstruction can be accomplished using as few as two images but multiple images yield much better results even though computationally more complex. Using multiple images one faces what is known as the structure and motion problem. Put succinctly the problem says:
In the case we are working with here the intrinsic parameters are unknown as the sequence of images we are working from is un-calibrated. This reduces to a problem in projective and epipolar geometry regarding the position of the camera when each of the images is taken and calculation of what is called the fundamental matrix of the camera, which relates those positions to the external geometry of the image. The exact mathematical details of this are beyond this short review and the reader is referred to the excellent survey by Olivier Faugeras and Qunag-Tuan Luong.
Once the corresponding features have been identified across a large series of images the movement of the camera around the object is used in combination with its focal length to precisely reconstruct the original camera positions. This results in the kind of sparse point cloud shown in figures 5 and 6. Although this series of points gives a good impression of the objects 3d geometry it is insufficient for a metrically accurate and realistic reconstruction.
Currently, there are many different approaches for generating what is known as a dense point cloud, which results in a more accurate 3D representation. As is obvious from its name, a ‘dense’ cloud is computationally more involved than the initial sparse cloud models. One way to overcome this difficulty is to divide the task into smaller parts using Patch-based Multi-View Stereo (PMVS) algorithms. These algorithms are very efficient as they take the output from a structure from motion program like VisualSfM and decompose it into a set of clusters of manageable size.
The basic sequence of calculations performed by the programs being used here are as follows:
- Extraction of key points and linking features from a group of 2D images
- Image matching and calculation of the camera position at the time each of the images where taken
- Sparse model reconstruction
- Dense model reconstruction
- Surface meshing and error compensation (bundle adjustment)
The last step of bundle adjustment is always the final problem to be overcome in any 3D reconstruction project. Bundle adjustment is an optimization procedure that attempts to reduce the noise associated with the various errors introduced in the various projection calculations. This noise is important because keypoint matching algorithms, like SIFT, may introduce errors between the image locations of observed and predicted points in the reconstruction.
Even though dense point clouds can give an aesthetically and visually pleasing impression of an actual 3D object, the calculated model will dissolve into its individual points at some scale when magnified. The 3D point cloud and model need to be further processed to make an interactive scaled and true-to-form model of the object by overlaying a polygonal mesh representing mathematically the artifact’s form. Polygonal meshes come in many shapes but are most commonly triangular, and depend on the smoothness of the surface being imaged.
The polygonal mesh reconstructs the surface of the object which approximates, sometimes to extreme accuracy, the shape and features of the original, continuous surface. The newly emerging field of discrete differential geometry is allowing for faster computation of these meshes which can be quite large. Current methods of surface reconstruction can be roughly divided into two different classes. The first, so-called sculpting methods, start with the convex hull of the entire point cloud, and proceed to remove pieces until the actual surface of the object has been reached. The second method, termed region growing, starts with a minimal triangulation and keeps adding newer and denser triangles to the model until the desired level of realism is reached.
Many algorithms have been created to accomplish this task from simple Delaunay triangulation to Poisson Reconstruction, Marching Cubes and Power Crust.
At the Library of Congress we are currently experimenting with a combination of programs developed by AutoCAD, such as 123D Catch and Meshmixer, to produce the mesh models shown in this paper.
Figure 7: Triangular Mesh Rendering of a three-dimensional model of the Kneeling Male Figure in Figure 1
Besides the density of the triangular mesh, additional features are used to visualize the smoothness and texture of the surface. Techniques like specular shading, the use of lines of reflection, and what are called isophytes, or lines of constant illumination across the surface, help accentuate the three-dimensional data derived from the two-dimensional photographs. In some cases additional algorithms might be used to smooth the surface and de-noise the photographic data in order to fill holes or blend the surface curvature to improve the visualization of the artifact.
In technical terms these meshes are actually non-directed large graphs with many vertices and faces. The model above for example contains more than 400,000 nodes at medium resolution and is a truly complex and discrete mathematical object. The underlying mathematics of this re-construction relies on the geometry of digital spaces which has been developed, over the last decade or two, for the creation of realistic virtual and augmented reality experiences and for computer gaming applications.
Figure 8: Photo cluster of images surrounding a Seated Male Figure from the Olmec Middle Pre-Classic Period, 1100-500 BCE. This shows the locations at which each of the photos was taken relative to the object. Jay I. Kislak Collection, Geography and Map Division, Library of Congress.
The main difficulty associated with structure from motion imaging centers on solving a problem in projective and epipolar geometry. The solution relates all of the images taken of a particular object to each other by using common points and reference lines. This so-called, geometry of multiple images problem” is an active area of research in computer vision and is being applied increasingly to archaeological contexts. Constructing these geometries allows for the scaling and reconstruction of models that can be measured and compared to other like archaeological artifacts.
Figure 9: Scaling and Measuring the Kneeling Figure shown above.
Geography and Map Division, Library of Congress.
Here in the Geography and Map Division we are just beginning our experiments using this technique with the hope that soon we shall be able to make three-dimensional, dynamic, and interactive models of the Kislak Collection available to scholars around the world who are interested in applying this exciting new technology to their research.
Figure 10: Three-Dimensional Model of Kislak Olmec Figurine 0155.
Jay I. Kislak Collection, Geography and Map Division, Library of Congress.
Figure 11: Seated Olmec Figurine, Kislak Collection 0155, from the Middle Pre-Classic, 1000-500 BCE.
Jay I. Kislak Collection, Geography and Map Division, Library of Congress.
 For more on the process mathematics of structure from motion imaging and its relationship to the quickly evolving fields of computer and robotic vision see the works of Richard Hartley and Andrew Zimmerman, Multiple View Geometry in Computer Vision, (Cambridge, UK: Cambridge University Press, 2003).
 Alfred Maudslay, Biologia Centrali-Americana. Archaeology. 4 volumes and 16 fasicules of photographs (London: R.H.Porter, 1889-1902).
 Thomas Athol Joyce and Alfred Maudsaly, Guide to the Maudslay Collection of Maya Sculptures (casts and originals) from Central America, (London: British Museum, 1925).
 Ian Graham, Alfred Maudslay and the Maya: A Biography, (Norman: University of Oklahoma Press, 2002).
 Sudipta N. Sinha, Jan-Michael Frahm, Marc Pollefeys and Yakup Genc, “Feature Tracking and Matching in Video Using Programmable Graphics Hardware,” Machine Vision and Learning Applications, November 2007 and “GPU Based Video Feature Tracking and Matching, Technical Report 06-012, Department of Computer Science, UNC Chapel Hill, May 2006. http://cs.unc.edu/~ssinha/pubs/Sinha06TechReport.pdf
 Olivier Faugeras and Qunag-Tuan Luong, The Geometry of Multiple Images, (Cambridge, MA: MIT Press, 2001)
 For more on mesh generation and geometric modeling see Mario Botsch, Geometric Modeling Based on Triangle Meshes. http://lgg.epfl.ch/publications/2006/botsch_2006_GMT_sg.pdf
 There are many different kinds of both isotroptic and anisotropic meshes being used in geometric modeling and computations, and many varieties of data structures used to keep track of them. For more on this see Mario Botsch, Plygon Mesh Processing (Boca Raton: CRC Press, 2010) and the information found at
 Alexander I. Bobenko, Discrete Differential Geometry (Providence: American Mathematical Society, 2008)
 Michael Kazhdan, Matthew Bolitho and Hughes Hoppe, “Poisson Surface Reconstruction,” Proceedings of the Eurographics Symposium on Geometry Processing, 2006
 William E. Lorensen and Harvey E. Cline, ”Marching Cubes: a high-resolution 3d surface reconstruction algorithm,” Computer Graphics 21 (1987)
 Nina Amenta Sunghee, “The Power Crust ,unions of balls and medial axis transform,” Computational Geometry: Theory and Applications 19 (2000) 127-153
 Bojan Mohar and Carstem Thomassen, Graphs on Surfaces, (Baltimore: Johns Hopkins University Press, 2001)
 Gabor T. Herman, Geometry of Digital Spaces, (Boston: Birkhauser, 1998)
 See Susie Green, Andrew Bevan and Michael Shapland, “A Comparative assessment of structure from motion methods for archaeological research, Journal of Archaeological Science 46 (2014) 173-181; Fabio Bruno, et.al, “From 3D reconstruction to virtual reality: A complete methodology for digital archaeological exhibition,” Journal of Cultural Heritage 11 (2010) 42-49; Benjamin Ducke, David Score and Joseph Reeves, “Multiview 3D reconstruction of the archaeological site at Weymouth from image series”, Computers and Graphics 35 (2011) 375-382.