NRW Open LiDAR: Download, Compression, Viewing

This is the first part of a series on how to process the newly released open LiDAR data for the entire state of North Rhine-Westphalia that was announced a few days ago. Again, kudos to OpenNRW for being the most progressive open data state in Germany. You can follow this tutorial after downloading the latest version of LAStools as well as a pair of DGM and DOM files for your area of interest from these two download pages.

We have downloaded the pair of DGM and DOM files for the Federal City of Bonn. Bonn is the former capital of Germany and was host to the FOSS4G 2016 conference. As both files are larger than 10 GB, we use the wget command line tool with option ‘-c’ that will restart where it left off in case the transmission gets interrupted.

The DGM file and the DOM file are zipped archives that contain the points in 1km by 1km tiles stored as x, y, z coordinates in ETRS89 / UTM 32 projection as simple ASCII text with centimeter resolution (i.e. two decimal digits).

>> more
360000.00 5613026.69 164.35
360000.00 5613057.67 164.20
360000.00 5613097.19 164.22
360000.00 5613117.89 164.08
360000.00 5613145.35 164.03

There is more than one tile for each square kilometer as the LiDAR points have been split into different files based on their classification and their return type. Furthermore there are also synthetic points that were used by the land survey department to replace certain LiDAR points in order to generate higher quality DTM and DSM raster products.

The zipped DGM archive is 10.5 GB in size and contains 956 *.xyz files totaling 43.5 GB after decompression. The zipped DOM archive is 11.5 GB in size and contains 244 *.xyz files totaling 47.8 GB. Repeatedly loading these 90 GB of text data and parsing these human-readable x, y, and z coordinates is inefficient with common LiDAR software. In the first step we convert the textual *.xyz files into binary *.laz files that can be stored, read and copied more efficiently. We do this with the open source LASzip compressor that is distributed with LAStools using these two command line calls:

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2

The point coordinates are is in EPSG 5555, which is a compound datum of horizontal EPSG 25832 aka ETRS89 / UTM zone 32N and vertical EPSG 5783 aka the “Deutsches Haupthoehennetz 1992” or DHHN92. We add this information to each *.laz file during the LASzip compression process with the command line options ‘-epsg 25832’ and ‘-vertical_dhhn92’.

LASzip reduces the file size by a factor of 10. The 956 *.laz DGM files compress down to 4.3 GB from 43.5 GB for the original *.xyz files and the 244 *.laz DOM files compress down to 4.8 GB from 47.8 GB. From here on out we continue to work with the 9 GB of slim *.laz files. But before we delete the 90 GB of bulky *.xyz files we make sure that there are no file corruptions (e.g. disk full, truncated files, interrupted processes, bit flips, …) in the *.laz files.

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check

One advantage of having the LiDAR in an industry standard such as the LAS format (or its lossless compressed twin, the LAZ format) is that the header of the file stores the number of points per file, the bounding box, as well as the projection information that we have added. This allows us to very quickly load an overview for example, into lasview.

lasview -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -GUI
The bounding boxes of the DGM files quickly display a preview of the data in the GUI when the files are in LAS or LAZ format.

The bounding boxes of the DGM files quickly give us an overview in the GUI when the files are in LAS or LAZ format.

Now we want to find a particular site in Bonn such as the World Conference Center Bonn where FOSS4G 2016 was held. Which tile is it in? We need some geospatial context to find it, for example, by creating an overview in form of KML files that we can load into Google Earth. We use the files from the DOM folder with “fp” in the name as points on buildings are mostly “first returns”. See what our previous blog post writes about the different file names if you can not wait for the second part of this series. We can create the KML files with lasboundary either via the GUI or in the command line.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
Only the "fp" tiles from the DOM folder loaded the GUI into lasboundary.

Only the “fp” tiles from the DOM folder loaded the GUI into lasboundary.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -use_bb -labels -okml

We zoom in and find the World Conference Center Bonn and load the identified tile into lasview. Well, we did not expect this to happen, but what we see below will make this series of tutorials even more worthwhile. There is a lot of “high noise” in the particular tile we picked. We should have noticed the unusually high z range of 406.42 meters in the Google Earth pop-up. Is this high electromagnetic radiation interfering with the sensors? There are a number of high-tech government buildings with all kind of antennas nearby (such as the United Nations Bonn Campus the mouse cursor points at).

Significant amounts of high noise are in the first returns of the DOM tile we picked.

Significant amounts of high noise are in the first returns of the DOM tile we picked.

But the intended area of interest was found. You can see the iconic “triangulated” roof of the building that is across from the World Conference Center Bonn.

The World Conference Center Bonn is across from the building with the "triangulated" roof.

The World Conference Center Bonn is across from the building with the “triangulated” roof.

Please don’t think it is the responsibility of OpenNRW to remove the noise and provide cleaner data. The land survey has already processed this data into whatever products they needed and that is where their job ended. Any additional services – other than sharing the raw data – are not in their job description. We’ll take care of that … (-:

Acknowledgement: The LiDAR data of OpenNRW comes with a very permissible license. It is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It only requires us to name the source. We need to cite the “Land NRW (2017)” with the year of the download in brackets and specify the Universal Resource Identification (URI) for both the DOM and the DGM. Done. So easy. Thank you, OpenNRW … (-:

Generating Spike-Free Digital Surface Models from LiDAR

A Digital Surface Model (DSM) represents the elevation of the landscape including all vegetation and man-made objects. An easy way to generate a DSM raster from LiDAR is to use the highest elevation value from all points falling into each grid cell. However, this “binning” approach only works when then the resolution of the LiDAR is higher than the resolution of the raster. Only then sufficiently many LiDAR points fall into each raster cell to prevent “empty pixels” and “data pits” from forming. For example, given LiDAR with an average pulse spacing of 0.5 meters one can easily generate a 2.5 meter DSM raster with simple “binning”. But to generate a 0.5 meter DSM raster we need to use an “interpolation” method.

Returns of four fightlines on two trees.

Laser pulses and discrete returns of four fightlines.

For the past twenty or so years, GIS textbooks and LiDAR tutorials have recommened to use only the first returns to construct the interpolating surface for DSM generation. The intuition is that the first return is the highest return for an airborne survey where the laser beams come (more or less) from above. Hence, an interpolating surface of all first returns is constructed – usually based on a 2D Delaunay triangulation – and the resulting Triangular Irregular Network (TIN) is rasterized onto a grid at a user-specified resolution to create the DSM raster. The same way a Canopy Height Model (CHM) is generated except that elevations are height-normalized either before or after the rasterization step. However, using a first-return interpolation for DSM/CHM generation has two critical drawbacks:

(1) Using only first returns means not all LiDAR information is used and some detail is missing. This is particularly the case for off-nadir scan angles in traditional airborne surveys. It becomes more pronounced with new scanning systems such as UAV or hand-held LiDAR where laser beams no longer come “from above”. Furthermore, in the event of clouds or high noise the first returns are often removed and the remaining returns are not renumbered. Hence, any laser shot whose first return reflects from a cloud or a bird does not contribute its highest landscape hit to the DSM or CHM.

(2) Using all first returns practically guarantees the formation of needle-shaped triangles in vegetated areas and along building roofs that appear as spikes in the TIN. This is because at off-nadir scan angles first returns are often generated far below other first returns as shown in the illustration above. The resulting spikes turn into “data pits” in the corresponding raster that not only look ugly but impact the utility of the DSM or CHM in subsequent analysis, for example, in forestry applications when attempting to extract individual trees.

In the following we present results and command-line examples for the new “spike-free” algorithm by (Khosravipour et. al, 2015, 2016) that is implemented (as a slow prototype) in the current LAStools release. This completely novel method for DSM generation triangulates all relevant LiDAR returns using Contrained Delaunay algorithm. This constructs a “spike-free” TIN that is in turn rasterized into “pit-free” DSM or CHM. This work is both a generalization and an improvement of our previous result of pit-free CHM generation.

We now compare our “spike-free” DSM to a “first-return” DSM on the two small urban data sets “france.laz” and “zurich.laz” distributed with LAStools. Using lasinfo with options ‘-last_only’ and ‘-cd’ we determine that the average pulse spacing is around 0.33 meter for “france.laz” and 0.15 meter for “zurich.laz”. We decide to create a hillshaded 0.25 meter DSM for “france.laz” and a 0.15 meter DSM for “zurich.laz” with the command-lines shown below.

las2dem -i ..\data\france.laz ^
        -keep_first ^
        -step 0.25 ^
        -hillshade ^
        -o france_fr.png
las2dem -i ..\data\france.laz ^
        -spike_free 0.9 ^
        -step 0.25 ^
        -hillshade ^
        -o france_sf.png
las2dem -i ..\data\zurich.laz ^
           -keep_first ^
           -step 0.15 ^
           -hillshade ^
           -o zurich_fr.png
las2dem -i ..\data\zurich.laz ^
        -spike_free 0.5 ^
        -step 0.15 ^
        -hillshade ^
        -o zurich_sf.png

The differences between a first-return DSM and a spike-free DSM are most drastic along building roofs and in vegetated areas. To inspect in more detail the differences between a first-return and our spike-free TIN we use lasview that allows to iteratively visualize the construction process of a spike-free TIN.

lasview -i ..\data\france.laz -spike_free 0.9

Pressing <f> and <t> constructs the first-return TIN. Pressing <SHIFT> + <t> destroys the first-return TIN. Pressing <SHIFT> + <y> constructs the spike-free TIN. Pressing <y> once destroys the spike-free TIN. Pressing <y> many times iteratively constructs the spike-free TIN.

One crucial piece of information is still missing. What value should you use as the freeze constraint of the spike-free algorithm that we set to 0.9 for “france.laz” and to 0.5 for “zurich.laz” as the argument to the command-line option ‘-spike_free’. The optimal value is related to the expected edge-length and we found the 99th percentile of a histogram of edge lengths of the last-return TIN to be useful. Or simpler … try a value that is about three times the average pulse spacing.

Khosravipour, A., Skidmore, A.K., Isenburg, M. and Wang, T.J. (2015) Development of an algorithm to generate pit-free Digital Surface Models from LiDAR, Proceedings of SilviLaser 2015, pp. 247-249, September 2015.
Khosravipour, A., Skidmore, A.K., Isenburg, M (2016) Generating spike-free Digital Surface Models using raw LiDAR point clouds: a new approach for forestry applications, (journal manuscript under review).

Discriminating Vegetation from Buildings

I came across an interesting blog article by Jarlath O’Neil-Dunne from the University of Vermont on how LiDAR return information can be used as a simple way to discriminate vegetated areas from buildings. He first computes a normalized first-return DSM and a normalized last-return DSM that he subtracts from another to highlight the vegetation. He writes “This is because the height difference of the first and last returns for buildings is often identical, whereas for trees it is typically much greater.”

Side note: I am not entirely happy with the terminology of a “Normalized Digital Terrain Model (nDTM)”. Jarlath writes: “A similar approach is used to create a Normalized Digital Terrain Model (nDTM).  A DTM is generated from the last returns. The DEM is then subtracted from the DTM to create the nDTM.” I like to reserve the term “Digital Terrain Model (DTM)” for bare-earth terrain computed from returns classified as ground.

Below I radically simplify Jarlath workflow by eliminating the two normalization steps. This not only saves the creation of 3 temporary rasters but also removes the requirement to have ground-classified LiDAR:

  1. Create a first-return frDSM
  2. Create a last-return lrDSM
  3. Subtract the lrDSM from the frDSM to get a return-difference rdDEM

This rdDEM has non-zero heights in all areas where the LiDAR produced more than one return. This happens most often and most pronounced in vegetated areas. Here is how to implement this with las2dem of LAStools:

las2dem -i ..\data\fusa.laz -first_only -o frDSM.bil
las2dem -i ..\data\fusa.laz -last_only -o lrDSM.bil
lasdiff -i frDSM.bil -i lrDSM.bil -o rdDEM.laz
lasview -i rdDEM.laz

The return-difference rdDEM shows the height difference between first and last returns.

Does this work well for you? The results on the “fusa.laz” data set are not entirely convincing … maybe because the vegetation was too dense (leaf-on?) so that the LiDAR penetration is not as pronounced. You can switch back and forth between the first-return and the last-return DSM by loading both *.bil files into lasview with the ‘-files_are_flightlines’ option and then press hotkeys ‘0’ and ‘1’ to toggle between the points and ‘t’ to triangulate the selected DSM.

lasview -i frDSM.bil lrDSM.bil -files_are_flightlines
first-return DSM

first-return DSM

last-return DSM

last-return DSM

We should point out that for Jarlath the return difference raster rdDEM is just one part of the pipeline that is followed by an object-based approach in which they integrate the spectral information from aerial imagery and then use iterative expert systems to further improve the tree canopy classification.

Nevertheless, we believe that our way of classifying vegetation and buildings via a pipeline of lasground, lasheight, and lasclassify gives a better and more robust initial guess than multi-return height differences towards what is vegetation and what are buildings. Below you see this is implemented using the new LASlayers concept:

lasground -i ..\data\fusa.laz -city -extra_fine -olay
lasheight  -i ..\data\fusa.laz -ilay -olay
lasclassify -i ..\data\fusa.laz -ilay -olay 
lasview -i ..\data\fusa.laz -ilay
Automated building and vegetation classification with lasclassify.

Automated building and vegetation classification with lasclassify.

Using lasgrid there are many ways that can easily turn the classified point cloud into a raster so that it can be used for subsequent exploitation together with other image data using a raster processing software. An example is shown below.

lasgrid -i ..\data\fusa.laz -ilay -keep_class 5 ^
        -step 0.5 -subcircle 0.1 -occupancy -fill 1 -false ^
        -use_bb -o vegetation.tif
lasgrid -i ..\data\fusa.laz -ilay -keep_class 6 ^
        -step 0.5 -subcircle 0.1 -occupancy -fill 1 -gray ^
        -use_bb -o buildings.tif
gdalwarp vegetation.tif buildings.tif classified.tif


Alternatively we can use lasboundary to create a shapefile describing either the vegetation or the buildings.

lasboundary -i ..\data\fusa.laz -ilay -keep_class 5 ^
            -disjoint -concavity 1.5 -o vegetation.shp
lasboundary -i ..\data\fusa.laz -ilay -keep_class 6 ^
            -disjoint -concavity 1.5 -o buildings.shp
SHP file generated with lasboundary with polygons describing the vegetation.

SHP file generated with lasboundary with polygons describing the vegetation.

SHP file generated with lasboundary with polygons describing the buildings.

SHP file generated with lasboundary with polygons describing the buildings.

Using LAStools on Mac OS X with “Wine”

[contributed by guest blogger Yuriy Czoli]

If you want to use LAStools on a Mac running OS X you will have to do some preparations. This is a brief introduction to get you up and running with LAStools on a Mac in the terminal. You may have heard that you can use “Wine” to run LAStools on OS X. Depending on your experience this might sound intuitive, or like utter jibberish. For those who feel more like the latter, let’s walk through this.


If you don’t have Homebrew go ahead and install that now by following the instructions on the site. It should be one line found at the bottom of the page, entered into the terminal. It is a fantastic package manager which has saved me the trouble of dealing with unruly libraries, paths, dependencies, etc.


What is Wine? Wine allows for Windows programs to run on Mac OS X (and other non-Windows platforms like Linux). That is all we are interested in here. Read more about Wine here, if you’d like. Side note: You might see something called WineBottler in your search for information on Wine. You can use WineBottler to transform *.exe files to *.app files. I found it did not work with LAStools, but good to know about for other applications.

Follow these steps!

1. Let’s install Wine with Homebrew:

brew install wine

My build took 3.7 minutes. Time will vary. This next part is based off the code on this site.

2. Download LAStools:

3. Place the download where you like (but avoid spaces and funny symbols in the directory names). Then change directories in the terminal to where the zipped folder is located. Unzip the LAStools distribution:


4. Enter the unzipped folder:

cd lastools

5. Now enter the ‘bin’ directory where the LAStools modules are located:

cd bin

6. Run some tool (here: lasview) by calling wine before the LAStools command:

wine lasview -i pathToYourFile/yourFile.laz

For lasview an OpenGL window should open up and you should see your LiDAR data being rendered (see the README file for all the different visualization options or follow this tutorial). Go ahead and start exploring your data. You can use any of the many LAStools modules by preceding the command with “wine”. Today I happened to be looking at a section of Helsinki:


Then get going with LAStools and follow along the 6 new videos or the 4 step by step tutorials (1: quality checking, 2: LiDAR preparation, 3: derivative generation, 4: manual editing). After having installed Wine you will also be able to use LAStools via the QGIS toolboxes.

For the geospatially inclined, check out Homebrew for installing other libraries. If you are working with geospatial data, you can use brew to install GDAL, Postgres SQL, PostGIS, and many more.

Removal of Cloud Returns With a Coarse DTM

Flying LiDAR in regions with frequent cloud cover presents a significant challenge. If flight plan constraints do not allow to stay below all of the clouds then some of them will be scanned from above. For denser clouds this often means that all of the laser’s energy gets reflected or absorbed by the cloud and no returns on the terrain are generated. Clouds of points that are all cloud points can spell trouble in subsequent processing steps like automated classification with lasground. This is especially true for large dense clouds that start to look like features.


scanned clouds in four neighboring LiDAR flight lines

Airborne LiDAR surveys are often carried out to create an improved Digital Terrain Model (DTM) with higher resolution than previous elevation products. Hence, usually there is already some lower resolution model that – as we will see in the following – can be used to robustly remove or mark all cloud points, at least those sufficiently high above this older ground approximation. We use data from the DREAM LiDAR Project in the Philippines who often acquire LiDAR in areas with a lot of cloud cover.

Our input are the 4 very short LiDAR strips in LAZ format shown above with lots of cloud returns and a coarse Aster DTM in ASC format at 50 m resolution. We will classify all LiDAR points far above the Aster DTM because they correspond to cloud returns. We need to be very conservative (because the Aster DTM is coarse and inaccurate) and only remove points that are really far above the Aster DTM whose ASC header is shown below.

ncols         2109
nrows         2162
xllcorner     512546.427859
yllcorner     1290961.682335
cellsize      50
NODATA_value  -9999
-9999 -9999 -9999 -9999 -9999 [...]

First we convert the Aster DTM from the inefficient ASC format to the efficient LAZ format using lassort. Why lassort? Because that puts the rasters that are on-the-fly converted to points on a grid into a spatially coherent order. This will allow efficient area-of-interest (AOI) queries once we have LAXxed the file with lasindex.

>> lassort -i masbate.asc -o masbate.laz
>> dir masbate*
    10,993,886 masbate.asc
     2,120,442 masbate.laz

Next we create a spatial indexing for ‘masbate.laz’ with lasindex. The default granularity of lasindex is 100 by 100 meter because it is set for airborne LiDAR. Given that there is only one point every 50 by 50 meters in the point cloud grid of the Aster DTM we increase the granularity to 1000 by 1000 meters.

>> lasindex -i masbate.laz -tile_size 1000 -append
>> dir masbate*
    10,993,886 masbate.asc
     2,132,006 masbate.laz

The ‘masbate.laz’ file is a little bit bigger than before because the tiny ‘masbate.lax’ file (that can also be stored separately) was appended to the end of ‘masbate.laz’ via the ‘-append’ option. Spatial indexing is realized with an underlying quadtree that can be visualized with lasview by pressing ‘Q’ (or with the pop-up menu) and the points corresponding to each quadtree cell can be turned blue by hovering above a cell and pressing ‘q’.

The points of the 50m Aster DTM as a LAZ file with the LAX spatial indexing quadtree  generated by lasindex.

Points of the 50m Aster DTM as a LAZ file and its spatial indexing quadtree generated by lasindex.

Before we use the sorted points from the Aster DTM to classify, flag, or delete the LiDAR returns far above the ground with lasheight we do a visual sanity check with lasview using the GUI settings shown below.

>> lasview -i raw_strips\*.laz ^
           -i masbate.laz ^
GUI settings in lasview for picking a small area

GUI settings in lasview for picking a small area

By triangulating only the Aster DTM points we can visually confirm that the LiDAR points on the terrain are also close to the Aster DTM whereas the cloud returns are far up with a clear seperation between. The pink “drop lines” of length 50 meters that are dangling off each point give us a sense of scale (enabled via the pop-up menu that appears with a right click). Note how many last returns get stuck in the clouds. Those would likely cause troubles in subsequent ground classification with lasground.

Finally we run lasheight on 4 cores to classify all points that are 150 meter or more above the Aster DTM as noise (class 7).

>> lasheight -i raw_strips\*.laz ^
             -ground_points masbate.laz ^
             -classify_above 150 7 ^
             -odix _cloud -olaz ^
             -cores 4

Below the result with the cloud returns in pink (i.e. the points classified as 7). This process scales well even for larger ground points files, because lasheight uses an area-of-index (AOI) query to load only those ground points from the spatially indexed ‘masbate.laz’ file, that fall into the (slightly enlarged) bounding box of the raw LiDAR strip that is being processed.