Leaked: “Classified LiDAR” of Pentagon in LAS 1.4 Format

LiDAR leaks have happened! Black helicopters are in the sky!  A few days ago a tiny tweet leaked the online location of “classified LiDAR” for Washington, DC. This LiDAR really is “classified” and includes an aerial scan of the Pentagon. For rogue scientists world-wide we offer a secret download link. It links to a file code-named ‘pentagon.laz‘ that contains the 8,044,789 “classified” returns of the Pentagon shown below. This “classified file” can be deciphered by any software with native LAZ support. It was encrypted with the “LAS 1.4 compatibility mode” of LASzip. The original LAS 1.4 content was encoded into a inconspicuous-looking LAZ file. New point attributes (such as the scanner channel) were hidden as “extra bytes” for fully lossless encryption. Use ‘laszip‘ to fully decode the original “classified” LAS 1.4 file … (-;

Seriously, a tiled LiDAR data set for the District of Columbia flown in 2015 is available for anyone to use on Amazon S3 with a very permissive open data license, namely the Creative Commons Attribution 3.0 License. The LiDAR coverage can be explored via this interactive map. The tiles are provided in LAS 1.4 format and use the new point type 6. We downloaded a few tiles near the White House, the Capitol, and the Pentagon to test the “native LAS 1.4 extension” of our LASzip compressor which will be released soon (a prototype for testing is already available). As these uncompressed LAS files are YUUUGE we use the command line utility ‘wget‘ for downloading. With option ‘-c’ the download continues where it left off in case the transfer gets interrupted.

LiDAR pulse density from 20 or less (blue) to 100 or more (red) pulses per square meter.

We use lasboundary to create labeled bounding boxes for display in Google Earth and lasgrid to a create false color visualization of pulse density with the command lines shown below. Pulse densities of 20 or below are mapped to blue. Pulse densities of 100 or above are mapped to red. We picked the min value 20 and the max value 100 for this false color mapping by running lasinfo with the ‘-cd’ option to compute an average pulse density and then refining the numbers experimentally. We also use lasoverlap to visualize how flightlines overlap and how well they align. Vertical differences of up to 20 cm are mapped to white and differences of 40 cm or more are mapped to saturated blue or red.

lasboundary -i *.las ^
            -use_bb ^
            -labels ^
            -odir quality -odix _bb -okml

lasgrid -i *.las ^
        -keep_last ^
        -point_density -step 2 ^
        -false -set_min_max 20 100 ^
        -odir quality -odix _d_20_100 -opng ^
        -cores 2

lasoverlap -i *.las ^
           -min_diff 0.2 -max_diff 0.4 ^
           -odir quality -opng ^
           -cores 2

The visualization of the pulse density and of the flightline overlap both show that there is no LiDAR for the White House or Capitol Hill. We will never know how tall the tomato and kale plants had grown in Michelle Obama’s organic garden on that day. Note that the White House and Capitol Hill were not simply “cut out”. Instead the flight plan of the survey plane was carefully designed to avoid these areas. Surprisingly, the Pentagon did not receive the same treatment and is (almost) fully included in the open LiDAR as mentioned in the dramatic first paragraph. Interesting is how the varying (tidal?) water level of the Potomac River shows up in the visualization of flightline miss-alignments.

There are a number of issues in these LiDAR files. The most serious ones are reported at the very end of this article. We will now scrutinize the partly-filled tile 2016.las close to the White House with only 11,060,334 returns. A lasvalidate check immediately reports three deviations from the LAS 1.4 specification:

lasvalidate -i 2016.las -o 2016_check.xml
  1. For proper LAS 1.4 files containing point type 6 through 10 all ‘legacy’ point counts in the LAS header should be set to 0. The following six fields in the LAS header should be zero for tile 2016.las (and all other tiles):
    + legacy number of point records
    + legacy number of points by return[0]
    + legacy number of points by return[1]
    + legacy number of points by return[2]
    + legacy number of points by return[3]
    + legacy number of points by return[4]
  2. There should not be any LiDAR return in a valid LAS file whose ‘number of returns of given pulse’ attribute is zero but there are 8 such points in tile 2016.las (and many more in various other tiles).
  3. There should not be any LiDAR return whose ‘return number’ attribute is larger than their ‘number of returns of given pulse’ attribute but there are 8 such points in tile 2016.las (and many more in various other tiles).

The first issue is trivial. There is an efficient in-place fix that does not require to rewrite the entire file using lasinfo with the following command line:

lasinfo -i 2016.las ^
        -nh -nv -nc ^
        -set_number_of_point_records 0 ^
        -set_number_of_points_by_return 0 0 0 0 0 ^

A quick check with las2txt shows us that the second and third issue are caused by the same eight points. Instead of writing an 8 for the ‘number of returns’ attribute the LAS file exporter must have written a 0 (marked in red for all eight returns) and instead of writing an 8 for the ‘return number’ attribute the LAS file exporter must have written a 1 (also marked in red). We can tell it from the true first return via its z coordinate (marked in blue) as the last return should be the lowest of all.

las2txt -i 2016.las ^
        -keep_number_of_returns 0 ^
        -parse xyzrnt ^
397372.70 136671.62 33.02 4 0 112813299.954811
397372.03 136671.64 28.50 5 0 112813299.954811
397371.28 136671.67 23.48 6 0 112813299.954811
397370.30 136671.68 16.86 7 0 112813299.954811
397369.65 136671.70 12.50 1 0 112813299.954811
397374.37 136671.58 44.17 3 0 112813299.954811
397375.46 136671.56 51.49 1 0 112813299.954811
397374.86 136671.57 47.45 2 0 112813299.954811

With las2las we can change the ‘number of returns’ from 0 to 8 using a ‘-filtered_transform’ as illustrated in the command line below. We suspect that higher number of returns such as 9 or 10 might have been mapped to 1 and 2. Fixing those as well as repairing the wrong return numbers will require a more complex tool. We would recommend to check all tiles with more scrutiny using the lasreturn tool. But wait … more return numbering issues are to come.

las2las -i 2016.las ^
        -keep_number_of_returns 0 ^
        -filtered_transform ^
        -set_extended_number_of_returns 8 ^
        -odix _fixed -olas

A closer look at the scan pattern reveals that the LiDAR survey was flown with a dual-beam system where two laser beams scan the terrain simultaneously. This is evident in the textual representation below as there are multiple “sets of returns” for the same GPS time stamp such as 112813952.110394. We group the returns from the two beams into an orange and a green group. Their coordinates show that the two laser beams point into different directions when they are simultaneously “shot” and therefore hit the terrain far apart from another.

las2txt -i 2016.las ^
        -keep_gps_time 112813952.110392 112813952.110396 ^
        -parse xyzlurntp ^
397271.40 136832.35 54.31 0 0 1 1 112813952.110394 117
397277.36 136793.35 38.68 0 1 1 4 112813952.110394 117
397277.35 136793.56 32.89 0 1 2 4 112813952.110394 117
397277.34 136793.88 24.13 0 1 3 4 112813952.110394 117
397277.32 136794.25 13.66 0 1 4 4 112813952.110394 117

The information about which point is from which beam is currently stored into the generic ‘user data’ attribute instead of into the dedicated ‘scanner channel’ attribute. This can be fixed with las2las as follows.

las2las -i 2016.las ^
        -copy_user_data_into_scanner_channel ^
        -set_user_data 0 ^
        -odix _fixed -olas

Unfortunately the LiDAR files have much more serious issues in the return numbering. It’s literally a “Total Disaster!” and “Sad!” as the US president will tweet shortly. After grouping all returns with the same GPS time stamp into an orange and a green group there is one more set of returns left unaccounted for.

las2txt -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -parse xyzlurntpi ^
397286.02 136790.60 45.90 0 0 1 4 112813951.416453 117 24
397286.06 136791.05 39.54 0 0 2 4 112813951.416453 117 35
397286.10 136791.51 33.34 0 0 3 4 112813951.416453 117 24
397286.18 136792.41 21.11 0 0 4 4 112813951.416453 117 0
397286.12 136791.75 30.07 0 0 1 1 112813951.416453 117 47
397291.74 136750.70 45.86 0 1 1 1 112813951.416453 117 105
las2txt -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -parse xyzlurntpi ^
397286.01 136790.06 45.84 0 0 1 4 112813951.408710 117 7
397286.05 136790.51 39.56 0 0 2 4 112813951.408710 117 15
397286.08 136790.96 33.33 0 0 3 4 112813951.408710 117 19
397286.18 136792.16 17.05 0 0 4 4 112813951.408710 117 0
397286.11 136791.20 30.03 0 0 1 2 112813951.408710 117 58
397286.14 136791.67 23.81 0 0 2 2 112813951.408710 117 42
397291.73 136750.16 45.88 0 1 1 1 112813951.408710 117 142

This can be visualized with lasview and the result is unmistakably clear: The return numbering is messed up. There should be one shot with five returns (not a group of four and a single return) in the first example. And there should be one shot with six returns (not a group of four and a group of two returns) in the second example. Such a broken return numbering results in extra first (or last) returns. These are serious issues that affect any algorithm that relies on the return numbering such as first-return DSM generation or canopy cover computation. Those extra returns will also make the pulse density appear higher and the pulse spacing appear tighter than they really are. The numbers from 20 (blue) to 100 (red) pulses per square meters in our earlier visualization are definitely inflated.

lasview -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^

lasview -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^

After all these troubles here something nice. Side-by-side a first-return TIN and a spike-free TIN (using a freeze of 0.8 m) of the center court cafe in the Pentagon. Especially given all these “fake first returns” in the Washington DC LiDAR we really need the spike-free algorithm to finally “Make a DSM great again!” … (-;

We would like to acknowledge the District of Columbia Office of the Chief Technology Officer (OCTO) for providing this data with a very permissive open data license, namely the Creative Commons Attribution 3.0 License.


LASmoons: Chloe Brown

Chloe Brown (recipient of three LASmoons)
Geosciences, School of Geography
University of Nottingham, UK

Malaysia’s North Selangor peat swamp forest is experiencing rapid and large scale conversion of peat swampland to oil palm agriculture, contrary to prevailing environmental guidelines. Given the global importance of tropical peat lands, and the uncertainties surrounding historical and future oil palm development, quantifying the spatial distribution of ecosystem service values, such as climate mitigation, is key to understanding the trade-offs associated with anthropogenic land use change.
The study explores the capabilities and methods of remote sensing and field-based data sets for extracting relevant metrics for the assessment of carbon stocks held in North Selangor peat swamp forest reserve, estimating both the current carbon stored in the above and below ground biomass, as well as the changes in carbon stock over time driven by anthropogenic land use change. Project findings will feed directly into peat land management practices and environmental accounting in Malaysia through the Tropical Catchments Research Initiative (TROCARI), and support the Integrated Management Plan of the Selangor State Forest Department (see here for a sample).

some clever caption

LiDAR data is now seen as the practical option when assessing canopy height over large scales (Fassnacht et al., 2014), with Lucas et al., (2008) believing LiDAR data to produce more accurate tree height estimates than those derived from manual field based methods. At this stage of the project, the goal is to produce a high quality LiDAR-derived Canopy Height Model (CHM) following the “pit-free” algorithm of Khosravipour et.al., 2014 using the LAStools software.

+ LiDAR provided by the Natural Environment Research Council (NERC) Airborne Research and Survey Facility’s 2014 Malaysia Campaign.
+ covers 685 square kilometers (closed source)
+ collected with Leica ALS50-II LiDAR system
+ average pulse spacing < 1 meter, average pulse density 1.8 per square meter

LAStools processing:
1) Create 1000 meter tiles with 35 meter buffer to avoid edge artifacts [lastile]
2) Remove noise points (class 7) that are already classified [las2las]
3) Classify point clouds into ground (class 2) and non-ground (class 1) [lasground]
4) Generate normalized above-ground heights [lasheight]
5) Create DSM and DTM [las2dem]
6) Generate a pit-free Canopy Height Model (CHM) as described here [lasthin, las2dem, lasgrid]
7) Generate a spike-free Canopy Height Model (CHM) as described here for comparison [las2dem]

Fassnacht, F.E., Hartig, F., Latifi, H., Berger, C., Hernández, J., Corvalán, and P., Koch, B. (2014). Importance of sample size, data type and prediction method for remote sensing-based estimations of above-ground forest biomass. Remote Sensing. Environment. 154, 102–114.
Khosravipour, A., Skidmore, A. K., Isenburg, M., Wang, T., and Hussin, Y. A. (2014). Generating pit-free canopy height models from airborne LiDAR. Photogrammetric Engineering & Remote Sensing, 80(9), 863-872.
Lucas, R. M., Lee, A. C., and Bunting, P. J., (2008). Retrieving forest biomass through integration of casi and lidar data. International Journal of Remote Sensing, 29 (5), 1553-1577.

LASmoons: Alen Berta

Alen Berta (recipient of three LASmoons)
Department of Terrestrial Ecosystems and Landscape, Faculty of Forestry
University of Zagreb and Oikon Ltd Institute for Applied Ecology, CROATIA

After becoming the EU member state, Croatia is obliged to fulfill the obligation risen from the Kyoto protocol: National Inventory Report (NIR) of the Green House Gasses according to UNFCCC. One of the most important things during the creation of the NIR is to know how many forested areas there are and their wood stock and increment. This is needed to calculate the size of the existing carbon pool and its potential for sequestration. Since in Croatia, according to legislative, it is not mandatory to calculate the wood stock and yield of the degraded forest areas (shrubbery and thickets) during the creation of the usual forest management plans, this data is missing. So far, only a rough approximation of the wood stock and increment is used during the creation of NIR. However, these areas are expanding every year due to depopulation of the rural areas and the cessation of traditional farming.

very diverse stand structure of degraded forest areas (shrubbery and thickets)

This study will focus on two things: (1) Developing regression models for biomass volume estimation in continental shrubberies and thickets based on airborne LiDAR data. To correlate LiDAR data with biomass volume, over 70 field plots with a radius of 12 meters have been established in more than 550 ha of the hilly and lowland shrubberies in Central Croatia and all trees and shrubberies above 1 cm Diameter at Breast Height (DBH) were recorded with information about tree species, DBH and height. Precise locations of the field plots are measured with survey GNNS and biomass is calculated with parameters from literature. For regression modeling, various statistics from the point clouds matching the field plots will be used (i.e. height percentiles, standard deviation, skewness, kurtosis, …). 2) Testing the developed models for different laser pulse densities to find out if there is a significant deviation from results if the LiDAR point cloud is thinner. This will be helpful for planning of the later scanning for the change detection (increment or degradation).

641 square km of discrete returns LiDAR data around the City of Zagreb, the capitol of Croatia (but since it is highly populated area, only the outskirts of the area will be used)
+ raw geo-referenced LAS files with up to 3 returns and an average last return point density of 1 pts/m².

LAStools processing:
extract area of interest [lasclip or las2las]
2) create differently dense versions (for goal no. 2) [lasthin]
3) remove isolated noise points [lasnoise]
4) classify point clouds into ground and non-ground [lasground]
5) create a Digital Terrain Model (DTM) [las2dem]
6) compute height of points above the ground [lasheight]
7) classify point clouds into vegetation and other [lasclassify]
8) normalize height of the vegetation points [lasheight]
9) extract the areas of the field plots [lasclip]
10) compute various metrics for each plot [lascanopy]
11) convert LAZ to TXT for regression modeling in R [las2txt]

LASmoons: Jane Meiforth

Jane Meiforth (recipient of three LASmoons)
Environmental Remote Sensing and Geoinformatics
University of Trier, GERMANY

The New Zealand Kauri trees (or Agathis australis) are under threat by the so called Kauri dieback disease. This disease is caused by a fungi like spore, which blocks the transport for nutrition and water in the trunk and finally kills the trees. Symptoms of the disease in the canopy like dropping of leaves and bare branches offer an opportunity for analysing the state of the disease by remote sensing. The study site covers three areas in the Waitakere Ranges, west of Auckland with Kauri trees in different growth and health classes.


The main objective of this study is to identify Kauri trees and canopy symptoms of the disease by remote sensing, in order to support the monitoring of the disease. In the first step LAStools will be used to extract the tree crowns and describe their characteristics based on height metrics, shapes and intensity values from airborne LiDAR data. In the second step, the spectral characteristics of the tree crowns will be analyzed based on very high resolution satellite data (WV02 and WV03). Finally the best describing spatial and spectral parameters will be combined in an object based classification, in order to identify the Kauri trees and different states of the disease..

 high resolution airborne LiDAR data (15-35p/sqm, ground classified) taken in January 2016
+ 15cm RGB aerial images taken on the same flight as the LIDAR data
+ ground truth field data from 2100 canopy trees in the study areas, recorded January – March 2016
+ helicopter images taken in January – April 2016 from selected Kauri trees by Auckland Council
+ vector layers with infrastructure data like roads and hiking trackslasmoons_CHM_Jane_Meiforth_0


LAStools processing:
create square tiles with edge length of 1000 m and a 25 m buffer to avoid edge artifacts [lastile]
2) generate DTMs and DSMs [las2dem]
3).produce height normalized tiles [lasheight]
4) generate a pit-free Canopy Height Model (CHM) using the method of Khosravipour et al. (2014) with the workflow described here [lasthin, las2dem, lasgrid]
5) extract crown polygons based on the pit-free CHM [inverse watershed method in GIS, las2iso]
6) normalize the points of each crown with constant ground elevation to avoid slope effects [lasclip, las2las with external source for the ground elevation]
7) derive height metrics for each crown on base of the normalized crown points [lascanopy]
8) derive intensity statistics for the crown points [lascanopy with ‘-int_avg’, ‘-int_std’ etc. on first returns]
9) derive metrics correlated with the dropping of leaves like canopy density, canopy cover and gap fraction for the crown points [lascanopy with ‘–cov’, ‘–dns’, ‘–gap’, ‘–fraction’]

Hu B, Li J, Jing L, Judah A. Improving the efficiency and accuracy of individual tree crown delineation from high-density LiDAR data. International Journal of Applied Earth Observation and Geoinformation. 2014; 26: 145-55.
Khosravipour, A., Skidmore, A.K., Isenburg, M., Wang, T.J., Hussin, Y.A., 2014. Generating pit-free Canopy Height Models from Airborne LiDAR. PE&RS = Photogrammetric Engineering and Remote Sensing 80, 863-872.
Li J, Hu B, Noland TL. Classification of tree species based on structural features derived from high density LiDAR data. Agricultural and Forest Meteorology. 2013; 171-172: 104-14.
MPI New Zealand http://www.kauridieback.co.nz – website with information on the kauri dieback disease
Vauhkonen, J., Ene, L., Gupta, S., Heinzel, J., Holmgren, J., Pitkänen, J., Solberg, S., Wang, Y., Weinacker, H., Hauglin, K. M., Lien, V., Packalén, P., Gobakken, T., Koch, B., Næsset, E., Tokola, T. and Maltamo, M. (2012) Comparative testing of single-tree detection algorithms under different types of forest. Forestry, 85, 27-40.

LASmoons: Jakob Iglhaut

Jakob Iglhaut (recipient of three LASmoons)
Program for Geospatial Information Management
Carinthia University of Applied Sciences, Villach, AUSTRIA

As part of the EU LIFE programme two river stretches in Carinthia, Austria have recently been subject to restoration measures. The LIFE-project aims at protecting valuable riverine flora and fauna while improving flood protection. By remodelling the river beds, the construction of groynes and still water bodies the river environment was directed to more natural morphology and state. The joint R&D project “Remotely Piloted Aircraft Multi Sensor System (RPAMSS)” aims at capturing multi-dimensional environmental data in order to monitor the development of these rivers stretches in a holistic way. Flights with an RTK capable fixed wing UAV are carried out at a particular section of the rivers Gail and Drau respectively. The project site at the Upper-Drau is located in the area of Obergottesfeld, Austria (560m ASL), with an area currently remotely monitored by the RPAmSS of approximately 3.5km². The second study area is located close to Feistritz at the river Gail (550m ASL) with an area of approx. 0.9km². Apart from being addressed by the LIFE project both study areas are also defined as NATURA 2000 nature protection sites. In both areas frequent UAV flights are carried out collecting high-resolution multi-spectral imagery. Structure from Motion photogrammetry enables the creation of high-density multi-spectral point clouds.


The aim of the project is to assess the morphology and related temporal changes of the described riverine environment based on SfM point clouds. A full processing chain will be developed to take full advantage of the high-density data. Particular interest lies in the extraction of ground points underneath vegetation in leaf-on/leaf-off. Ground points will be gridded to generate DTMs. The qualitative performance of the data will be held against an ALS acquired DTM. Furthermore forest metrics will be extracted for the riparian zone in order to quantify their current state and changes.

High-density multi-spectral (R,G,B,NIR) SfM derived point clouds (UAS imagery)
+ Variable point densities, GSD ~3cm.

LAStools processing:
fix SfM owing incoherence [lassort]
2) create 100m tiles (10m buffer) for parallel processing [lastile]
3) noise removal introduced by the SfM algorithm [lasnoise]
4).extract ground points [lasground_new]
5) generate normalized above heights [lasheight]
6) classify based on height-above-ground (low veg, high veg) [lasheight]
7) create DSM and DTM [blast2dem]
generate a Canopy Height Model (CHM) using the pit-free method of Khosravipour et al. (2014) with the workflow described here [lasthin, las2dem, lasgrid]
sub-sample the point clouds for other (spectral) analyses [lassplit, lasthin, lasmerge]

Westoby, M. J., et al. “Structure-from-Motion photogrammetry: A low-cost, effective tool for geoscience applications.” Geomorphology 179 (2012): 300-314.
Fonstad, Mark A., et al. “Topographic structure from motion: a new development in photogrammetric measurement.” Earth Surface Processes and Landforms 38.4 (2013): 421-430.
Khosravipour, A., Skidmore, A.K., Isenburg, M., Wang, T.J., Hussin, Y.A., 2014. Generating pit-free Canopy Height Models from Airborne LiDAR. PE&RS = Photogrammetric Engineering and Remote Sensing 80, 863-872.
Javernick, L., J. Brasington, and B. Caruso. “Modeling the topography of shallow braided rivers using Structure-from-Motion photogrammetry.” Geomorphology 213 (2014): 166-182.

LASmoons: Asanga Ramanayake

Asanga Ramanayake (recipient of three LASmoons)
BGSU Remote Sensing Lab, School of Earth, Environment and Society
Bowling Green State University, Ohio, USA

Lake Erie is the Southern most of the Great Lakes and it is shared by 4 states and 2 countries. It is the shallowest, warmest, and most biologically productive of all the Great Lakes. At wetland habitats along the Western Lake Erie coast, more than 300 species of plants have been identified. To study land use and to classify vegetation cover it is important to consider the vertical distribution of the vegetation. LiDAR is an active data collection system for generating 3D spatial information of objects. High-resolution Digital Terrain Models (DTMs) and Digital Surface Models (DSMs) can be generated from the available LiDAR points that allow accurate estimates of canopy height.


The main goal of this project is to derive Digital Terrain Models (DTMs) and Digital Surface Models (DSMs) for the coastal areas of Lake Erie using LIDAR data to estimate the height of the canopy. The derived products will be validated with in-situ measurements from other researchers and compared with ASTER Global Digital Elevation Model data.

coastal area LiDAR data coverage for Lake Erie

coastal area LiDAR data coverage for Lake Erie

The Ohio Geographically Referenced Information Program (OGRIP) has free downloadable LIDAR data in LAS format that was acquired by Ohio Statewide Imagery Program (OSIP) in 2006-2008.
+ In 2011-2012 NOAA’s mission was capturing coastal area LiDAR data. This data is served to the public and available in LAZ format.

LAStools processing:
create square tiles to avoid edge artifacts [lastile]
2) classify point clouds into ground and non-ground [lasground]
3) generate DTMs and DSMs for the coastal areas of Lake Erie [las2dem]
4).produce height normalized tiles [lasheight]
5) generate a Canopy Height Model (CHM) using the pit-free method of Khosravipour et al. (2014) [lasthin, las2dem, lasgrid]

Herdendorf, Charles E. The ecology of the coastal marshes of western Lake Erie: a community profile. OHIO STATE UNIV COLUMBUS, 1987.
Deems, Jeffrey S., Thomas H. Painter, and David C. Finnegan. “Lidar Measurement of Snow Depth: A Review.” Journal of Glaciology 59.215 (2013): 467–479. IngentaConnect. Web.
Jensen, John R. Remote Sensing of the Environment: An Earth Resource Perspective. 2nd ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2007. Print. Prentice Hall Series in Geographic Information Science.
Khosravipour, A., Skidmore, A.K., Isenburg, M., Wang, T.J., Hussin, Y.A., 2014. Generating pit-free Canopy Height Models from Airborne LiDAR. PE&RS = Photogrammetric Engineering and Remote Sensing 80, 863-872.

Generating Spike-Free Digital Surface Models from LiDAR

A Digital Surface Model (DSM) represents the elevation of the landscape including all vegetation and man-made objects. An easy way to generate a DSM raster from LiDAR is to use the highest elevation value from all points falling into each grid cell. However, this “binning” approach only works when then the resolution of the LiDAR is higher than the resolution of the raster. Only then sufficiently many LiDAR points fall into each raster cell to prevent “empty pixels” and “data pits” from forming. For example, given LiDAR with an average pulse spacing of 0.5 meters one can easily generate a 2.5 meter DSM raster with simple “binning”. But to generate a 0.5 meter DSM raster we need to use an “interpolation” method.

Returns of four fightlines on two trees.

Laser pulses and discrete returns of four fightlines.

For the past twenty or so years, GIS textbooks and LiDAR tutorials have recommened to use only the first returns to construct the interpolating surface for DSM generation. The intuition is that the first return is the highest return for an airborne survey where the laser beams come (more or less) from above. Hence, an interpolating surface of all first returns is constructed – usually based on a 2D Delaunay triangulation – and the resulting Triangular Irregular Network (TIN) is rasterized onto a grid at a user-specified resolution to create the DSM raster. The same way a Canopy Height Model (CHM) is generated except that elevations are height-normalized either before or after the rasterization step. However, using a first-return interpolation for DSM/CHM generation has two critical drawbacks:

(1) Using only first returns means not all LiDAR information is used and some detail is missing. This is particularly the case for off-nadir scan angles in traditional airborne surveys. It becomes more pronounced with new scanning systems such as UAV or hand-held LiDAR where laser beams no longer come “from above”. Furthermore, in the event of clouds or high noise the first returns are often removed and the remaining returns are not renumbered. Hence, any laser shot whose first return reflects from a cloud or a bird does not contribute its highest landscape hit to the DSM or CHM.

(2) Using all first returns practically guarantees the formation of needle-shaped triangles in vegetated areas and along building roofs that appear as spikes in the TIN. This is because at off-nadir scan angles first returns are often generated far below other first returns as shown in the illustration above. The resulting spikes turn into “data pits” in the corresponding raster that not only look ugly but impact the utility of the DSM or CHM in subsequent analysis, for example, in forestry applications when attempting to extract individual trees.

In the following we present results and command-line examples for the new “spike-free” algorithm by (Khosravipour et. al, 2015, 2016) that is implemented (as a slow prototype) in the current LAStools release. This completely novel method for DSM generation triangulates all relevant LiDAR returns using Contrained Delaunay algorithm. This constructs a “spike-free” TIN that is in turn rasterized into “pit-free” DSM or CHM. This work is both a generalization and an improvement of our previous result of pit-free CHM generation.

We now compare our “spike-free” DSM to a “first-return” DSM on the two small urban data sets “france.laz” and “zurich.laz” distributed with LAStools. Using lasinfo with options ‘-last_only’ and ‘-cd’ we determine that the average pulse spacing is around 0.33 meter for “france.laz” and 0.15 meter for “zurich.laz”. We decide to create a hillshaded 0.25 meter DSM for “france.laz” and a 0.15 meter DSM for “zurich.laz” with the command-lines shown below.

las2dem -i ..\data\france.laz ^
        -keep_first ^
        -step 0.25 ^
        -hillshade ^
        -o france_fr.png
las2dem -i ..\data\france.laz ^
        -spike_free 0.9 ^
        -step 0.25 ^
        -hillshade ^
        -o france_sf.png
las2dem -i ..\data\zurich.laz ^
           -keep_first ^
           -step 0.15 ^
           -hillshade ^
           -o zurich_fr.png
las2dem -i ..\data\zurich.laz ^
        -spike_free 0.5 ^
        -step 0.15 ^
        -hillshade ^
        -o zurich_sf.png

The differences between a first-return DSM and a spike-free DSM are most drastic along building roofs and in vegetated areas. To inspect in more detail the differences between a first-return and our spike-free TIN we use lasview that allows to iteratively visualize the construction process of a spike-free TIN.

lasview -i ..\data\france.laz -spike_free 0.9

Pressing <f> and <t> constructs the first-return TIN. Pressing <SHIFT> + <t> destroys the first-return TIN. Pressing <SHIFT> + <y> constructs the spike-free TIN. Pressing <y> once destroys the spike-free TIN. Pressing <y> many times iteratively constructs the spike-free TIN.

One crucial piece of information is still missing. What value should you use as the freeze constraint of the spike-free algorithm that we set to 0.9 for “france.laz” and to 0.5 for “zurich.laz” as the argument to the command-line option ‘-spike_free’. The optimal value is related to the expected edge-length and we found the 99th percentile of a histogram of edge lengths of the last-return TIN to be useful. Or simpler … try a value that is about three times the average pulse spacing.

Khosravipour, A., Skidmore, A.K., Isenburg, M. and Wang, T.J. (2015) Development of an algorithm to generate pit-free Digital Surface Models from LiDAR, Proceedings of SilviLaser 2015, pp. 247-249, September 2015.
Khosravipour, A., Skidmore, A.K., Isenburg, M (2016) Generating spike-free Digital Surface Models using raw LiDAR point clouds: a new approach for forestry applications, (journal manuscript under review).