Leaked: “Classified LiDAR” of Pentagon in LAS 1.4 Format

LiDAR leaks have happened! Black helicopters are in the sky!  A few days ago a tiny tweet leaked the online location of “classified LiDAR” for Washington, DC. This LiDAR really is “classified” and includes an aerial scan of the Pentagon. For rogue scientists world-wide we offer a secret download link. It links to a file code-named ‘pentagon.laz‘ that contains the 8,044,789 “classified” returns of the Pentagon shown below. This “classified file” can be deciphered by any software with native LAZ support. It was encrypted with the “LAS 1.4 compatibility mode” of LASzip. The original LAS 1.4 content was encoded into a inconspicuous-looking LAZ file. New point attributes (such as the scanner channel) were hidden as “extra bytes” for fully lossless encryption. Use ‘laszip‘ to fully decode the original “classified” LAS 1.4 file … (-;

Seriously, a tiled LiDAR data set for the District of Columbia flown in 2015 is available for anyone to use on Amazon S3 with a very permissive open data license, namely the Creative Commons Attribution 3.0 License. The LiDAR coverage can be explored via this interactive map. The tiles are provided in LAS 1.4 format and use the new point type 6. We downloaded a few tiles near the White House, the Capitol, and the Pentagon to test the “native LAS 1.4 extension” of our LASzip compressor which will be released soon (a prototype for testing is already available). As these uncompressed LAS files are YUUUGE we use the command line utility ‘wget‘ for downloading. With option ‘-c’ the download continues where it left off in case the transfer gets interrupted.

LiDAR pulse density from 20 or less (blue) to 100 or more (red) pulses per square meter.

We use lasboundary to create labeled bounding boxes for display in Google Earth and lasgrid to a create false color visualization of pulse density with the command lines shown below. Pulse densities of 20 or below are mapped to blue. Pulse densities of 100 or above are mapped to red. We picked the min value 20 and the max value 100 for this false color mapping by running lasinfo with the ‘-cd’ option to compute an average pulse density and then refining the numbers experimentally. We also use lasoverlap to visualize how flightlines overlap and how well they align. Vertical differences of up to 20 cm are mapped to white and differences of 40 cm or more are mapped to saturated blue or red.

lasboundary -i *.las ^
            -use_bb ^
            -labels ^
            -odir quality -odix _bb -okml

lasgrid -i *.las ^
        -keep_last ^
        -point_density -step 2 ^
        -false -set_min_max 20 100 ^
        -odir quality -odix _d_20_100 -opng ^
        -cores 2

lasoverlap -i *.las ^
           -min_diff 0.2 -max_diff 0.4 ^
           -odir quality -opng ^
           -cores 2

The visualization of the pulse density and of the flightline overlap both show that there is no LiDAR for the White House or Capitol Hill. We will never know how tall the tomato and kale plants had grown in Michelle Obama’s organic garden on that day. Note that the White House and Capitol Hill were not simply “cut out”. Instead the flight plan of the survey plane was carefully designed to avoid these areas. Surprisingly, the Pentagon did not receive the same treatment and is (almost) fully included in the open LiDAR as mentioned in the dramatic first paragraph. Interesting is how the varying (tidal?) water level of the Potomac River shows up in the visualization of flightline miss-alignments.

There are a number of issues in these LiDAR files. The most serious ones are reported at the very end of this article. We will now scrutinize the partly-filled tile 2016.las close to the White House with only 11,060,334 returns. A lasvalidate check immediately reports three deviations from the LAS 1.4 specification:

lasvalidate -i 2016.las -o 2016_check.xml
  1. For proper LAS 1.4 files containing point type 6 through 10 all ‘legacy’ point counts in the LAS header should be set to 0. The following six fields in the LAS header should be zero for tile 2016.las (and all other tiles):
    + legacy number of point records
    + legacy number of points by return[0]
    + legacy number of points by return[1]
    + legacy number of points by return[2]
    + legacy number of points by return[3]
    + legacy number of points by return[4]
  2. There should not be any LiDAR return in a valid LAS file whose ‘number of returns of given pulse’ attribute is zero but there are 8 such points in tile 2016.las (and many more in various other tiles).
  3. There should not be any LiDAR return whose ‘return number’ attribute is larger than their ‘number of returns of given pulse’ attribute but there are 8 such points in tile 2016.las (and many more in various other tiles).

The first issue is trivial. There is an efficient in-place fix that does not require to rewrite the entire file using lasinfo with the following command line:

lasinfo -i 2016.las ^
        -nh -nv -nc ^
        -set_number_of_point_records 0 ^
        -set_number_of_points_by_return 0 0 0 0 0 ^

A quick check with las2txt shows us that the second and third issue are caused by the same eight points. Instead of writing an 8 for the ‘number of returns’ attribute the LAS file exporter must have written a 0 (marked in red for all eight returns) and instead of writing an 8 for the ‘return number’ attribute the LAS file exporter must have written a 1 (also marked in red). We can tell it from the true first return via its z coordinate (marked in blue) as the last return should be the lowest of all.

las2txt -i 2016.las ^
        -keep_number_of_returns 0 ^
        -parse xyzrnt ^
        -stdout
397372.70 136671.62 33.02 4 0 112813299.954811
397372.03 136671.64 28.50 5 0 112813299.954811
397371.28 136671.67 23.48 6 0 112813299.954811
397370.30 136671.68 16.86 7 0 112813299.954811
397369.65 136671.70 12.50 1 0 112813299.954811
397374.37 136671.58 44.17 3 0 112813299.954811
397375.46 136671.56 51.49 1 0 112813299.954811
397374.86 136671.57 47.45 2 0 112813299.954811

With las2las we can change the ‘number of returns’ from 0 to 8 using a ‘-filtered_transform’ as illustrated in the command line below. We suspect that higher number of returns such as 9 or 10 might have been mapped to 1 and 2. Fixing those as well as repairing the wrong return numbers will require a more complex tool. We would recommend to check all tiles with more scrutiny using the lasreturn tool. But wait … more return numbering issues are to come.

las2las -i 2016.las ^
        -keep_number_of_returns 0 ^
        -filtered_transform ^
        -set_extended_number_of_returns 8 ^
        -odix _fixed -olas

A closer look at the scan pattern reveals that the LiDAR survey was flown with a dual-beam system where two laser beams scan the terrain simultaneously. This is evident in the textual representation below as there are multiple “sets of returns” for the same GPS time stamp such as 112813952.110394. We group the returns from the two beams into an orange and a green group. Their coordinates show that the two laser beams point into different directions when they are simultaneously “shot” and therefore hit the terrain far apart from another.

las2txt -i 2016.las ^
        -keep_gps_time 112813952.110392 112813952.110396 ^
        -parse xyzlurntp ^
        -stdout
397271.40 136832.35 54.31 0 0 1 1 112813952.110394 117
397277.36 136793.35 38.68 0 1 1 4 112813952.110394 117
397277.35 136793.56 32.89 0 1 2 4 112813952.110394 117
397277.34 136793.88 24.13 0 1 3 4 112813952.110394 117
397277.32 136794.25 13.66 0 1 4 4 112813952.110394 117

The information about which point is from which beam is currently stored into the generic ‘user data’ attribute instead of into the dedicated ‘scanner channel’ attribute. This can be fixed with las2las as follows.

las2las -i 2016.las ^
        -copy_user_data_into_scanner_channel ^
        -set_user_data 0 ^
        -odix _fixed -olas

Unfortunately the LiDAR files have much more serious issues in the return numbering. It’s literally a “Total Disaster!” and “Sad!” as the US president will tweet shortly. After grouping all returns with the same GPS time stamp into an orange and a green group there is one more set of returns left unaccounted for.

las2txt -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -parse xyzlurntpi ^
        -stdout
397286.02 136790.60 45.90 0 0 1 4 112813951.416453 117 24
397286.06 136791.05 39.54 0 0 2 4 112813951.416453 117 35
397286.10 136791.51 33.34 0 0 3 4 112813951.416453 117 24
397286.18 136792.41 21.11 0 0 4 4 112813951.416453 117 0
397286.12 136791.75 30.07 0 0 1 1 112813951.416453 117 47
397291.74 136750.70 45.86 0 1 1 1 112813951.416453 117 105
las2txt -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -parse xyzlurntpi ^
        -stdout
397286.01 136790.06 45.84 0 0 1 4 112813951.408710 117 7
397286.05 136790.51 39.56 0 0 2 4 112813951.408710 117 15
397286.08 136790.96 33.33 0 0 3 4 112813951.408710 117 19
397286.18 136792.16 17.05 0 0 4 4 112813951.408710 117 0
397286.11 136791.20 30.03 0 0 1 2 112813951.408710 117 58
397286.14 136791.67 23.81 0 0 2 2 112813951.408710 117 42
397291.73 136750.16 45.88 0 1 1 1 112813951.408710 117 142

This can be visualized with lasview and the result is unmistakably clear: The return numbering is messed up. There should be one shot with five returns (not a group of four and a single return) in the first example. And there should be one shot with six returns (not a group of four and a group of two returns) in the second example. Such a broken return numbering results in extra first (or last) returns. These are serious issues that affect any algorithm that relies on the return numbering such as first-return DSM generation or canopy cover computation. Those extra returns will also make the pulse density appear higher and the pulse spacing appear tighter than they really are. The numbers from 20 (blue) to 100 (red) pulses per square meters in our earlier visualization are definitely inflated.

lasview -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -color_by_return

lasview -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -color_by_return

After all these troubles here something nice. Side-by-side a first-return TIN and a spike-free TIN (using a freeze of 0.8 m) of the center court cafe in the Pentagon. Especially given all these “fake first returns” in the Washington DC LiDAR we really need the spike-free algorithm to finally “Make a DSM great again!” … (-;

We would like to acknowledge the District of Columbia Office of the Chief Technology Officer (OCTO) for providing this data with a very permissive open data license, namely the Creative Commons Attribution 3.0 License.

 

Second German State Goes Open LiDAR

The floodgates of open geospatial data have opened in Germany. Days after reporting about the first state-wide release of open LiDAR, we are happy to follow up with a second wonderful open data story. The state of Thuringia (Thüringen) – also called the “green heart of Germany” – has also implemented an open geospatial data policy. This had already been announced in March 2016 but must have gone online just now. A reader of our last blog article pointed this out in the comments. And it’s not just LiDAR. You can download:

It all comes with the same permissible license as OpenNRW’s data. This is open data madness! Everything you could possibly hope for presented via a very functional download portal. Kudos to TLVermGeo (“Thüringisches Landesamt für Vermessung und Geoinformation”) for creating an open treasure cove of free-for-all geospatial data.

Let us have a look at the LiDAR. We use the interactive portal to zoom to an area of interest. With the recent rise of demagogues it cannot hurt to look at a stark reminder of where such demagoguery can lead. In his 1941 play “The Resistible Rise of Arturo Ui” – a satirical allegory on the rise of Adolf Hitler – Bertolt Brecht writes “… don’t rejoice too soon at your escape. The womb he crawled from is still going strong.”

We are downloading LiDAR data around the Buchenwald concentration camp. According to Wikipedia, it was established in July 1937 and was one of the largest on German soil. Today the remains of Buchenwald serve as a memorial and as a permanent exhibition and museum.

We download the 15 tiles surrounding the blue one: two on its left, two on its right and one corresponding row of five tiles above and below. Each of the 15 zipped archives contains a *.laz file and *.meta file. The *.laz file contains the LiDAR points and *.meta file contains the textual information below where “Lage” and “Höhe” refer to “horizontal” and “vertical”:

Datei: las_655_5653_1_th_2010-2013.laz
Erfassungsdatum: 2011-03
Erfassungsmethode: Airborne Laserscanning
Lasergebiet: Laser_04_2010
EPSG-Code Lage: 25832
EPSG-Code Höhe: 5783
Quasigeoid: GCG2005
Genauigkeit Lage: 0.12m
Genauigkeit Höhe: 0.04m
Urheber: (c) GDI-Th, Freistaat Thueringen, TLVermGeo

Next we will run a few quality checks on the 15 tiles by processing them with lasinfolasoverlap, lasgrid, and las2dem. We output all results into a folder named ‘quality’.

With lasinfo we create one text file per tile that summarizes its contents. The ‘-cd’ option computes the all return and last return density. The ‘-histo point_source 1’ option produces a histogram of point source IDs that are supposed to store which flight line each return came from. The ‘-odir’ and ‘-odix’ options specify the directory for the output and an appendix to the output file name. The ‘-cores 4’ option starts 4 processes in parallel, each working on a different tile.

lasinfo  -i las_*2010-2013.laz ^
         -cd ^
         -histo point_source 1 ^
         -odir quality -odix _info -otxt ^
         -cores 4

If you scrutinize the resulting text files you will find that the average last return density ranges from 6.29 to 8.13 and that the point source IDs 1 and 9999 seem to encode some special points. Likely those are synthetic points added to improve the derived rasters similar to the “ab”, “ag”, and “aw” files in the OpenNRW LiDAR. Odd is the lack of intermediate returns despite return numbers ranging all the way up to 7. Looks like only the first returns and the last returns are made available (like for the OpenNRW LiDAR). That will make those a bit sad who were planning to use this LiDAR for forest or vegetation mapping. The header of the *.laz files does not store geo-referencing information, so we will have to enter that manually. And the classification codes do not follow the standard ASPRS assignment. In red is our (currently) best guess what these classification codes mean:

[...]
histogram of classification of points:
 887223 ground (2) ground
 305319 wire guard (13) building
 172 tower (15) bridges
 41 wire connector (16) synthetic ground under bridges
 12286 bridge deck (17) synthetic ground under building
 166 Reserved for ... (18) synthetic ground building edge
 5642801 Reserved for ... (20) non-ground
[...]

With lasoverlap we can visualize how much overlap the flight lines have and the (potential miss-)alignment between them. We drop the synthetic points with point source IDs 1 and 9999 and add geo-referencing information with ‘-epsg 25832’ so that the resulting images can be displayed as Google Earth overlays. The options ‘-min_diff 0.1’ and ‘-max_diff 0.4’ map elevation differences of up +/- 10 cm to white. Above +/- 10 cm the color becomes increasingly red/blue with full saturation at +/- 40 cm or higher. This difference can only be computed for pixels with two or more overlapping flight lines.

lasoverlap  -i las_*2010-2013.laz ^
            -drop_point_source 1 ^
            -drop_point_source 9999 ^
            -min_diff 0.1 -max_diff 0.4 ^
            -odir quality -opng ^
            -epsg 25832 ^
            -cores 4

With lasgrid we check the density distribution of the laser pulses by computing the point density of the last returns for each 2 by 2 meter pixel and then mapping the computed density value to a false color that is blue for a density of 0 and red for a density of 10 or higher.

lasgrid  -i las_*2010-2013.laz ^
         -drop_point_source 1 ^
         -drop_point_source 9999 ^
         -keep_last ^
         -step 2 -point_density ^
         -false -set_min_max 0 10 ^
         -odir quality -odix _d_0_10 -opng ^
         -epsg 25832 ^
         -cores 4
Pulse density variation due to flight line overlap and flight turbulence.

Pulse density variation due to flight line overlap is expected. But also the contribution of flight turbulence is quite significant.

With las2dem we can check the quality of the already existing ground classification in the LiDAR by producing a hillshaded image of a DTM for visual inspection. Based on our initial guess on the classification codes (see above) we keep those synthetic points that improve the DTM (classification codes 16, 17, and 18) in addition to the ground points (classification code 2).

las2dem  -i las_*2010-2013.laz ^
         -keep_class 2 16 17 18 ^
         -step 1 ^
         -hillshade ^
         -odir quality -odix _shaded_dtm -opng ^
         -epsg 25832 ^
         -cores 4
Problems in the ground classification of LiDAR points are often visible in a hillshaded DTM raster.

Problems in the ground classification of LiDAR points are often visible in a hillshaded DTM.

Wow. We see a number of ground disturbances in the resulting hillshaded DTM. Some of them are expected because if you read up on the history of the Buchenwald concentration camp you will learn that in 1950 large parts of the camp were demolished. However, the laser finds the remnants of those barracks and buildings as clearly visible ground disturbances under the canopy of the dense forest that has grown there since. And then there are also these bumps that look like bomb craters. Are those from the American bombing raid on August 24, 1944?

We are still not entirely sure what those “bumps” arem but our initially assumption that all of those would have to be bomb craters from that fatal American bombing raid on August 24, 1944 seems to be wrong. Below is a close-up with lasview of the triangulated and shaded ground points from the lower right corner of tile ‘las_656_5654_1_th_2010-2013.laz’.

Close-up in lasview on the bumbs in the ground.

Close-up in lasview on the bumbs in the ground.

We are not sure if all the bumps we can see here are there for the same reason. But we found an old map and managed to overlay it on Google Earth. It suggest that at least the bigger bumps are not bomb craters. On the map they are labelled as “Erdfälle” which is German for “sink hole”.

We got a reminder on the danger of demagogues as well as a glimpse into conflict archaeology and geomorphology with this open LiDAR download and processing exercise. If you want to explore this area any further you can either download the LiDAR and download LAStools and process the data yourself or simply get our KML files here.

Acknowledgement: The LiDAR data of TLVermGeo comes with a very permissible license. It is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It only requires us to name the source. We need to cite the “geoportal-th.de (2017)” with the year of the download in brackets and should specify the Universal Resource Identification (URI). We have not found this yet and use this URL as a placeholder until we know the correct one. Done. So easy. Thank you, geoportal Thüringen … (-:

NRW Open LiDAR: Download, Compression, Viewing

UPDATE: (March 6th): Second part merging Bonn into proper LAS files

This is the first part of a series on how to process the newly released open LiDAR data for the entire state of North Rhine-Westphalia that was announced a few days ago. Again, kudos to OpenNRW for being the most progressive open data state in Germany. You can follow this tutorial after downloading the latest version of LAStools as well as a pair of DGM and DOM files for your area of interest from these two download pages.

We have downloaded the pair of DGM and DOM files for the Federal City of Bonn. Bonn is the former capital of Germany and was host to the FOSS4G 2016 conference. As both files are larger than 10 GB, we use the wget command line tool with option ‘-c’ that will restart where it left off in case the transmission gets interrupted.

The DGM file and the DOM file are zipped archives that contain the points in 1km by 1km tiles stored as x, y, z coordinates in ETRS89 / UTM 32 projection as simple ASCII text with centimeter resolution (i.e. two decimal digits).

>> more dgm1l-lpb_32360_5613_1_nw.xyz
360000.00 5613026.69 164.35
360000.00 5613057.67 164.20
360000.00 5613097.19 164.22
360000.00 5613117.89 164.08
360000.00 5613145.35 164.03
[...]

There is more than one tile for each square kilometer as the LiDAR points have been split into different files based on their classification and their return type. Furthermore there are also synthetic points that were used by the land survey department to replace certain LiDAR points in order to generate higher quality DTM and DSM raster products.

The zipped DGM archive is 10.5 GB in size and contains 956 *.xyz files totaling 43.5 GB after decompression. The zipped DOM archive is 11.5 GB in size and contains 244 *.xyz files totaling 47.8 GB. Repeatedly loading these 90 GB of text data and parsing these human-readable x, y, and z coordinates is inefficient with common LiDAR software. In the first step we convert the textual *.xyz files into binary *.laz files that can be stored, read and copied more efficiently. We do this with the open source LASzip compressor that is distributed with LAStools using these two command line calls:

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2

The point coordinates are is in EPSG 5555, which is a compound datum of horizontal EPSG 25832 aka ETRS89 / UTM zone 32N and vertical EPSG 5783 aka the “Deutsches Haupthoehennetz 1992” or DHHN92. We add this information to each *.laz file during the LASzip compression process with the command line options ‘-epsg 25832’ and ‘-vertical_dhhn92’.

LASzip reduces the file size by a factor of 10. The 956 *.laz DGM files compress down to 4.3 GB from 43.5 GB for the original *.xyz files and the 244 *.laz DOM files compress down to 4.8 GB from 47.8 GB. From here on out we continue to work with the 9 GB of slim *.laz files. But before we delete the 90 GB of bulky *.xyz files we make sure that there are no file corruptions (e.g. disk full, truncated files, interrupted processes, bit flips, …) in the *.laz files.

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check

One advantage of having the LiDAR in an industry standard such as the LAS format (or its lossless compressed twin, the LAZ format) is that the header of the file stores the number of points per file, the bounding box, as well as the projection information that we have added. This allows us to very quickly load an overview for example, into lasview.

lasview -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -GUI
The bounding boxes of the DGM files quickly display a preview of the data in the GUI when the files are in LAS or LAZ format.

The bounding boxes of the DGM files quickly give us an overview in the GUI when the files are in LAS or LAZ format.

Now we want to find a particular site in Bonn such as the World Conference Center Bonn where FOSS4G 2016 was held. Which tile is it in? We need some geospatial context to find it, for example, by creating an overview in form of KML files that we can load into Google Earth. We use the files from the DOM folder with “fp” in the name as points on buildings are mostly “first returns”. See what our previous blog post writes about the different file names or look at the second part of this series. We can create the KML files with lasboundary either via the GUI or in the command line.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -gui
Only the "fp" tiles from the DOM folder loaded the GUI into lasboundary.

Only the “fp” tiles from the DOM folder loaded the GUI into lasboundary.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -use_bb -labels -okml

We zoom in and find the World Conference Center Bonn and load the identified tile into lasview. Well, we did not expect this to happen, but what we see below will make this series of tutorials even more worthwhile. There is a lot of “high noise” in the particular tile we picked. We should have noticed the unusually high z range of 406.42 meters in the Google Earth pop-up. Is this high electromagnetic radiation interfering with the sensors? There are a number of high-tech government buildings with all kind of antennas nearby (such as the United Nations Bonn Campus the mouse cursor points at).

Significant amounts of high noise are in the first returns of the DOM tile we picked.

Significant amounts of high noise are in the first returns of the DOM tile we picked.

But the intended area of interest was found. You can see the iconic “triangulated” roof of the building that is across from the World Conference Center Bonn.

The World Conference Center Bonn is across from the building with the "triangulated" roof.

The World Conference Center Bonn is across from the building with the “triangulated” roof.

Please don’t think it is the responsibility of OpenNRW to remove the noise and provide cleaner data. The land survey has already processed this data into whatever products they needed and that is where their job ended. Any additional services – other than sharing the raw data – are not in their job description. We’ll take care of that … (-:

Acknowledgement: The LiDAR data of OpenNRW comes with a very permissible license. It is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It only requires us to name the source. We need to cite the “Land NRW (2017)” with the year of the download in brackets and specify the Universal Resource Identification (URI) for both the DOM and the DGM. Done. So easy. Thank you, OpenNRW … (-:

First Open LiDAR in Germany

UPDATE: (January 6th): Our new tutorial “downloading Bonn in LiDAR“.
UPDATE: (January 9th): Now a second state went open LiDAR as well.
UPDATE: (March 6th): New tutorial merging Bonn into proper LAS files

Kudos to OpenNRW for offering online download links for hundreds of Gigabytes of open LiDAR for the entire state of North Rhine-Westphalia (Nordrhein-Westfalen) as announced a few months ago:

More and more countries, states, and municipalities are deciding to make their LiDAR archives accessible to the general public. Some are doing entirely for free with instant online download and a generous open license that allows data sharing and commercial use. Others still charge a “small administrative fee” and require filling our actual paperwork with real signatures in ink and postal mailing of hard drives that can easily take half a year to complete. Some licences are also stricter in terms of data sharing and commercial use.

And then there was Germany where all the LiDAR data has traditionally been locked up in some cave by the 16 state survey departments and was sold for more than just a fee. Financial reasons would usually prohibit residents in Germany from making, for example, an elevation profile for their favorite mountain bike trail for hobby purposes. Or starting a small side business that (for 5 Euro a sheet) sells “Gassi Maps” with elevation-optimized dog walking paths of low incline that go past suitable potty spots and dog-friendly coffee shops.

The reason that many national and state mapping agencies have opened their LiDAR holdings for free and unencumbered access are manifold. A previous blog post had looked at the situation in England whose Environment Agency also used to sell LiDAR data and derivatives. The common argument that government agencies have been using to justify selling data (paid for with taxes) to those very same tax payers was that this would be used to finance future surveys.

It was the “freedom of information” request by Louise Huby on November 21st, 2014 that exposed this as a flawed argument. The total amount of revenue generated for all LiDAR and derivatives sales by the Environment Agency was just around £323,000 per year between 2007 and 2014. This figure was dwarfed by its annual operational budget of £1,025,000,000 in 2007/08. The revenue from LiDAR sales was merely 0.03 percent of the agencies’s budget. That can maybe pay for free coffee and tea in the office, but not for future airborne LiDAR flights. The reaction was swift. In September 2015 the Environment Agency made all their DTM and DSM rasters down to 0.25 meter resolution available online for open access and in March 2016 added the raw point clouds for download in LAZ format with a very permissible license. It has since been an incredible success story and the Environment Agency has been propelled into the role of a “champion for open data” as sketched in my ACRS 2016 keynote talk that is available on video here.

Frustrated with the situation in Germany and inspired by the change in geospatial data policy in England, we have been putting in similar “Frag den Staat” freedom of information requests with 11 of the 16 German state mapping agencies, asking about how much sales revenue was generated annually from their LiDAR and derivative sales. These four states denied the request:

We never heard back from Lower Saxony (Niedersachsen) and Thuringia (Thüringen) wanted more fees than we were willing to spend on this, but our information requests were answered by five states that are here listed with the average amount of reported revenue per year in EUR:

LiDAR acquisitions are expensive and while it would be interesting to also find out how much each state has actually spent on airborne surveys over the past years (another “Frag den Staat” request anyone?), it is obvious that the reported revenues are just a tiny fraction of those costs. Exact details of the reported revenue per year can be accessed via the links to the information requests above. The cost table of each answer letter is copied below.

However, it’s not all bad news in Germany. Some of you may have seen my happy announcement of the OpenNRW initiative that come January 2017 this would also include the raw LiDAR points. And did it happen? Yes it did! Although the raw LiDAR points are maybe a little tricky to find, they are available for free download and they come with a very permissible license.

The license is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It merely requires you to properly name the source. For the LiDAR you need to list the “Land NRW (2017)” with the year of the download in brackets as the source and specify the data set that was used via the respective Universal Resource Identification (URI) for the DOM and/or the DGM.

The OpenNRW portal now also offers the download of the LiDAR "punktwolke" (German for point cloud).

The OpenNRW portal now also offers the download of the LiDAR “punktwolke” (German for point cloud).

Follow this link to get to the open data download portal. Now type in “punktwolke” (German for “point cloud”) into the search field and on this January 3rd 2017 that gives me 2 “Ergebnisse” (German for “results”). The LiDAR point cloud representation is a bit unusual by international standards. One link is for the DGM (German for DTM) and the other for the DOM (German for DSM). Both links are eventually leading you to unstructured LiDAR point clouds that are describing these surfaces. But it’s still a little tricky to find them. First click on the “ATOM” links that get you to XML description with meta information and a lot of links for the DTM and the DSM. Somewhere hidden in there you find the actual download links for hundreds of Gigabytes of LiDAR for the entire state of North Rhine-Westphalia (Nordrhein-Westfalen):

We download the two smallest zipped files DGM and DOM for the municipality of Wickede (Ruhr) to have a look at the data. The point cloud is in EPSG 5555 which is a compound datum of horizontal EPSG 25832 aka ETRS89 / UTM zone 32N and vertical EPSG 5783 aka the “Deutches Haupthohennetz 1992”.

The contents of the DGM zip file contains multiple files per tile.

The contents of the DGM zip file contains multiple files per tile.

The DGM zip file has a total of 212 *.xyz files that list the x, y, and z coordinate for each point in ASCII format. We first compress them and add the EPSG 25832 code with laszip. The compressed LAZ files are less than half the size of the zipped XYZ files. Each file corresponds to a particular square kilometer. The name of each tile contains the lower left coordinate of this square kilometer but there can be multiple files for each square kilometer:

  • 14 files with “ab” in the name contain very few points. They look like additional points for under bridges. The “b” is likely for “Brücke” (German for “bridge”).
  • 38 files with “ag” in the name contain seem to contain only points in areas where buildings used to cover the terrain but with ground elevation. The “g” is likely for “Gebäude” (German for “building”).
  • 30 files with “aw” in the name contain seem to contain only points in areas where there are water bodies but with ground elevation. The “w” is likely for “Wasser” (German for “water”).
  • 14 files with “brk” in the name also contain few points. They look like the original bridge point that are replaced by the points in the files with “ab” in the name to flatten the bridges. The “brk” is also likely for “Brücke” (German for “bridge”).
  • 42 files with “lpb” in the name. They look like the last return LiDAR points that were classified as ground. The “lpb” is likely for “Letzter Pulse Boden” (German for “last return ground”).
  • 42 files with “lpnb” in the name. They look like those last return LiDAR points that were classified as non-ground. The “lpnb” is likely for “Letzter Pulse Nicht Boden” (German for “last return not ground”).
  • 32 files with “lpub” in the name contain very few points. They look like the last return points that are too low and were therefore excluded. The “lpub” is likely for “Letzter Pulse Unter Boden” (German for “last return under ground”).

It is left to an exercise to the user for figure out which of those above sets of files should be used for generating a raster DTM. (-: Give us your ideas in the comments. The DOM zip file has a total of 72 *.xyz files. We also compress them and add the EPSG 25832 code with laszip. The compressed LAZ files are less than half the size of the zipped XYZ files. Again there are multiple files for each square kilometer:

  • 30 files with “aw” in the name contain seem to contain only points in areas where there are water bodies but with ground elevation. The “w” is likely for “Wasser” (German for “water”).
  • 42 files with “fp” in the name. They look like the first return LiDAR points. The “fp” is likely for “Frühester Pulse” (German for “first return”).

If you use all these points from the DOM folder your get the nice DSM shown below … albeit not a spike-free one.

A triangulated first return DSM generated mainly from the file "dom1l-fp_32421_5705_1_nw.laz" with the points from "dom1l-aw_32421_5705_1_nw.laz" for areas with water bodies shown in yellow.

A triangulated first return DSM generated mainly from the file “dom1l-fp_32421_5705_1_nw.laz” with the points from “dom1l-aw_32421_5705_1_nw.laz” for areas with water bodies shown in yellow.

Kudos to OpenNRW for being the first German state to open their LiDAR holdings. Which one of the other 15 German state survey departments will be next to promote their LiDAR as open data. If you are not the last one to do so you can expect to get featured here too … (-;

UPDATE (January 5th): The folks at OpenNRW just tweeted us information about the organization of the zipped archives in the DTM (DGM) and DSM (DOM) folders. We guessed pretty okay which points are in which file but the graphic below (also available here) summarizes it much more nicely and also tells us that “a” was for “ausgefüllt” (German for “filled up”). Maximally two returns per pulse are available: either a single return or a first return plus a last return. There are no intermediate returns, which may be an issue for those interested in vegetation mapping.

Illustration of which LiDAR point is in which file.

Nice illustration of which LiDAR point is in which file. All files with ‘ab’, ‘ag’, or ‘aw’ in the name contain synthetic points that fill up ground areas not properly reached by the laser.

Generating Spike-Free Digital Surface Models from LiDAR

A Digital Surface Model (DSM) represents the elevation of the landscape including all vegetation and man-made objects. An easy way to generate a DSM raster from LiDAR is to use the highest elevation value from all points falling into each grid cell. However, this “binning” approach only works when then the resolution of the LiDAR is higher than the resolution of the raster. Only then sufficiently many LiDAR points fall into each raster cell to prevent “empty pixels” and “data pits” from forming. For example, given LiDAR with an average pulse spacing of 0.5 meters one can easily generate a 2.5 meter DSM raster with simple “binning”. But to generate a 0.5 meter DSM raster we need to use an “interpolation” method.

Returns of four fightlines on two trees.

Laser pulses and discrete returns of four fightlines.

For the past twenty or so years, GIS textbooks and LiDAR tutorials have recommened to use only the first returns to construct the interpolating surface for DSM generation. The intuition is that the first return is the highest return for an airborne survey where the laser beams come (more or less) from above. Hence, an interpolating surface of all first returns is constructed – usually based on a 2D Delaunay triangulation – and the resulting Triangular Irregular Network (TIN) is rasterized onto a grid at a user-specified resolution to create the DSM raster. The same way a Canopy Height Model (CHM) is generated except that elevations are height-normalized either before or after the rasterization step. However, using a first-return interpolation for DSM/CHM generation has two critical drawbacks:

(1) Using only first returns means not all LiDAR information is used and some detail is missing. This is particularly the case for off-nadir scan angles in traditional airborne surveys. It becomes more pronounced with new scanning systems such as UAV or hand-held LiDAR where laser beams no longer come “from above”. Furthermore, in the event of clouds or high noise the first returns are often removed and the remaining returns are not renumbered. Hence, any laser shot whose first return reflects from a cloud or a bird does not contribute its highest landscape hit to the DSM or CHM.

(2) Using all first returns practically guarantees the formation of needle-shaped triangles in vegetated areas and along building roofs that appear as spikes in the TIN. This is because at off-nadir scan angles first returns are often generated far below other first returns as shown in the illustration above. The resulting spikes turn into “data pits” in the corresponding raster that not only look ugly but impact the utility of the DSM or CHM in subsequent analysis, for example, in forestry applications when attempting to extract individual trees.

In the following we present results and command-line examples for the new “spike-free” algorithm by (Khosravipour et. al, 2015, 2016) that is implemented (as a slow prototype) in the current LAStools release. This completely novel method for DSM generation triangulates all relevant LiDAR returns using Contrained Delaunay algorithm. This constructs a “spike-free” TIN that is in turn rasterized into “pit-free” DSM or CHM. This work is both a generalization and an improvement of our previous result of pit-free CHM generation.

We now compare our “spike-free” DSM to a “first-return” DSM on the two small urban data sets “france.laz” and “zurich.laz” distributed with LAStools. Using lasinfo with options ‘-last_only’ and ‘-cd’ we determine that the average pulse spacing is around 0.33 meter for “france.laz” and 0.15 meter for “zurich.laz”. We decide to create a hillshaded 0.25 meter DSM for “france.laz” and a 0.15 meter DSM for “zurich.laz” with the command-lines shown below.

las2dem -i ..\data\france.laz ^
        -keep_first ^
        -step 0.25 ^
        -hillshade ^
        -o france_fr.png
las2dem -i ..\data\france.laz ^
        -spike_free 0.9 ^
        -step 0.25 ^
        -hillshade ^
        -o france_sf.png
las2dem -i ..\data\zurich.laz ^
           -keep_first ^
           -step 0.15 ^
           -hillshade ^
           -o zurich_fr.png
las2dem -i ..\data\zurich.laz ^
        -spike_free 0.5 ^
        -step 0.15 ^
        -hillshade ^
        -o zurich_sf.png

The differences between a first-return DSM and a spike-free DSM are most drastic along building roofs and in vegetated areas. To inspect in more detail the differences between a first-return and our spike-free TIN we use lasview that allows to iteratively visualize the construction process of a spike-free TIN.

lasview -i ..\data\france.laz -spike_free 0.9

Pressing <f> and <t> constructs the first-return TIN. Pressing <SHIFT> + <t> destroys the first-return TIN. Pressing <SHIFT> + <y> constructs the spike-free TIN. Pressing <y> once destroys the spike-free TIN. Pressing <y> many times iteratively constructs the spike-free TIN.

One crucial piece of information is still missing. What value should you use as the freeze constraint of the spike-free algorithm that we set to 0.9 for “france.laz” and to 0.5 for “zurich.laz” as the argument to the command-line option ‘-spike_free’. The optimal value is related to the expected edge-length and we found the 99th percentile of a histogram of edge lengths of the last-return TIN to be useful. Or simpler … try a value that is about three times the average pulse spacing.

References:
Khosravipour, A., Skidmore, A.K., Isenburg, M. and Wang, T.J. (2015) Development of an algorithm to generate pit-free Digital Surface Models from LiDAR, Proceedings of SilviLaser 2015, pp. 247-249, September 2015.
Khosravipour, A., Skidmore, A.K., Isenburg, M (2016) Generating spike-free Digital Surface Models using raw LiDAR point clouds: a new approach for forestry applications, (journal manuscript under review).

Use Buffers when Processing LiDAR in Tiles !!!

We often process LiDAR in tiles for two reasons: first, to keep the number of points per file low and use main memory efficient, and second, to speed up the computation with parallel tile processing and keep all cores of a modern CPU busy. However, it is very (!!!) important to take the necessary precautions to avoid “edge artifacts” when processing LiDAR in tiles. We have to include points from neighboring tiles during certain LAStools processing steps to avoid edge artifacts. Why? Here is an illustration from our PHIL LiDAR tour earlier this year:

Buffers are important to avoid edge artifacts along tile boundaries during DTM creation.

Buffers are important to avoid edge artifacts along tile boundaries.

What you see is the temporary TIN of ground points created (internally) by las2dem or blast2dem that is then rastered at the user-specified step size onto a grid. Without a buffer (right side) there will not always be a triangle to cover every pixel. Especially in the corners of the tile you will often find empty pixels. Furthermore the poorly shaped “sliver triangles” along the boundary of the TIN do not interpolate the ground elevations properly. In contrast, with buffer (left side) the TIN generously covers the entire area that is to be rastered with nicely shaped triangles.

The

Christmas cookie analogy: buffers are like generously rolling out the dough

Here the christmas cookies analogy: You need to roll out the dough larger than the cookies you will cut to make sure your cookies will have nice edges. Think of the TIN as the dough and the square tile as your cookie cutter. You need to use a sufficiently large buffer when you roll out your TIN to assure an edge without crumbles when you cut out the tile … (-: … otherwise you are pretty much guaranteed to get results that – upon closer inspection – have these kind of artifacts:

Without buffers processing artifacts also happen when classifying points with lasground or lasclassify, when calculating height above ground or height-normalizing LiDAR tiles with lasheight, when removing noise with lasnoise, when creationg contours with las2iso or blast2iso, or any other operation where an incomplete neighborhood of points can affect the results. Hence, we need to surround each tile with a temporary buffer of points. Currently there are two ways of working with buffers with LAStools:

  1. creating buffered tiles during the initial tiling step with the ‘-buffer 25’ option of lastile, maintaining buffered tiles throughout processing and finally using the ‘-use_tile_bb’ option of lasgrid, las2dem, blast2dem, or lascanopy to raster the tiles without the temporary buffer.
  2. creating buffered tiles from non-overlapping (= unbuffered) tiles with “on-the-fly” buffering using the ‘-buffered 25’ option of most LAStools such as lasground, lasheight, or las2dem. For some workflows it is useful to also add ‘-remain_buffered’ if buffers are needed again in the next step. Finally, we use the ‘-use_orig_bb’ option of lasgrid, las2dem, blast2dem, or lascanopy to raster the tiles without the temporary buffer.

In the following three (tiny) examples using the venerable ‘fusa.laz’ sample that is distributed with LAStools to illustrate the two types of buffering as well as to show what happens when no buffers are used. In each example we will first cut the small ‘fusa.laz’ sample into nine smaller tiles and then process these separately on 4 cores in parallel.

1. Initial buffer creation with lastile

This is what most of my tutorials teach. It assumes you are the one creating the tiling in the first place. If you do it with lastile and add a buffer right from the start things are pretty easy.

lastile -i ..\data\fusa.laz ^
        -set_classification 0 -set_user_data 0 ^
        -tile_size 100 -buffer 20 ^
        -odir 1_raw -o futi.laz

We cut the input into 100 meter by 100 meter tiles but add a 20 meter buffer around each tile. That means that each tile on disk will contain the points for an area of up to 140 meter by 140 meter. The GUI for LAStools shows the overlap and if you scrutinize the bounding box values that the cursor points to you notice the extra 20 meters in each direction.

tiles_buffered_with_lastile

Now we can forget about the buffers and run the standard workflow consiting of lasground, lasheight, and lasclassify to distinguish ground, vegetation, and building points in the LiDAR tiles.

lasground -i 1_raw\futi*.laz ^
          -city ^
          -odir 1_ground -olaz ^
          -cores 4
lasheight -i 1_ground\futi*.laz ^
          -drop_above 50 ^
          -odir 1_height -olaz ^
          -cores 4
lasclassify -i 1_height\futi*.laz ^
            -odir 1_classify -olaz ^
            -cores 4

At the end – when we generate raster products – we have to remember that the tiles were buffered by lastile and cut off the buffers when we raster the TIN with option ‘-use_tile_bb’ of las2dem.

las2dem -i 1_classify\futi*.laz ^
        -keep_class 2 6 ^
        -step 0.25 -use_tile_bb ^
        -odir 1_dbm -obil ^
        -cores 4

We created a digital terrain model with buildings (DBM) by keeping the points with classification 2 (ground) and 6 (building). After loading the resulting 9 tiles into QGIS and generating a virtual raster we see a nice seamless DBM without any edge artifacts.

The DEM of the 9 tiles computed with buffers created by lastile has no edge artifacts acoss tile boundaries.

The DBM of the 9 tiles computed with buffers created by lastile has no edge artifacts acoss tile boundaries.

If you need to deliver the LiDAR files you should remove the buffers with lastile and option ‘-remove_buffer’.

lastile -i 1_classify\futi*.laz ^
        -remove_buffer ^
        -odir 1_final -olaz ^
        -cores 4

2. On-the-fly buffering

Now assume you are given LiDAR tiles without buffers. We generate them here with lastile.

lastile -i ..\data\fusa.laz ^
        -set_classification 0 -set_user_data 0 ^
        -tile_size 100 ^
        -odir 2_raw -o futi.laz

The only difference is that we do not request the 20 meter buffer and the result is a typical tiling as you may receive it from a vendor or download it from a LiDAR portal. The GUI for LAStools shows that there is no overlap and if you scrutinize the bounding box values that the cursor points to, you see that the tiles is exactly 100 meters bty 100 meters.

tiles_without_buffer

Now we have to think about buffers a lot. When using on-the-fly buffering we should first spatially index the tiles with lasindex for faster access to the points from neighbouring tiles.

lasindex -i 1_raw\futi*.laz -cores 4

Below in red are the modifications for on-the-fly buffering to the standard workflow of lasground, lasheight, and lasclassify. The first lasground run uses ‘-buffered 20’ to add buffers to each tile and ‘-remain_buffered’ to write those buffers to disk. This way they do not have to created again by lasheight and lasclassify.

lasground -i 2_raw\futi*.laz ^
          -buffered 20 -remain_buffered ^
          -city ^
          -odir 2_ground -olaz ^
          -cores 4
lasheight -i 2_ground\futi*.laz ^
          -remain_buffered ^
          -drop_above 50 ^
          -odir 2_height -olaz ^
          -cores 4
lasclassify -i 2_height\futi*.laz ^
            -remain_buffered ^
            -odir 2_classify -olaz ^
            -cores 4

At the end we have to remember that the tiles still have on-the-fly buffers and them cut off with option ‘-use_orig_bb’ of las2dem.

las2dem -i 2_classify\futi*.laz ^
        -keep_class 2 6 ^
        -step 0.25 -use_orig_bb ^
        -odir 2_dbm -obil ^
        -cores 4

Again, we created a digital terrain model with buildings (DBM) by keeping the points with classification 2 (ground) and 6 (building). The resulting hillshade computed from a virtual raster that combines the 9 BIL rastera into one looks perfectly smooth in QGIS.

The hillshaded DEM of the 9 tiles computed with on-the-fly buffering has no edge artifacts acoss tile boundaries.

The hillshaded DBM of 9 tiles computed with on-the-fly buffering has no edge artifacts acoss tile boundaries.

If you need to deliver the LiDAR files you should probably remove the buffers first … but that is not yet implemented. (-:

lastile -i 2_classify\futi*.laz ^
        -remove_buffer ^
        -odir 2_final -olaz ^
        -cores 4

3. Bad: No buffering

Here what you are *not* supposed to do. Assuming you get unbuffered tiles.

lastile -i ..\data\fusa.laz ^
        -set_classification 0 -set_user_data 0 ^
        -tile_size 100 ^
        -odir 3_raw -o futi.laz

Bad. You do not take care about buffering when processing the tiles.

lasground -i 3_raw\futi*.laz ^
          -city ^
          -odir 3_ground -olaz ^
          -cores 4
lasheight -i 3_ground\futi*.laz ^
          -drop_above 50 ^
          -odir 3_height -olaz ^
          -cores 4
lasclassify -i 3_height\futi*.laz ^
            -odir 3_classify -olaz ^
            -cores 4

Bad. You do not take care about buffering when generating the DBM.

las2dem -i 3_classify\futi*.laz ^
        -keep_class 2 6 ^
        -step 0.25 ^
        -odir 3_dbm -obil ^
        -cores 4

Bad. You get crappy results with edge artifacts clearly visible in the hillshade.

The hillshaded DBM of 9 tiles computed WITHOUT using buffers has severe edge artifacts acoss tile boundaries.

The hillshaded DBM of 9 tiles computed WITHOUT using buffers has severe edge artifacts acoss tile boundaries.

Bad. If you zoom in on a corner where 4 tiles meet you find missing pixels and incorrect elevation values. Bad. Bad. Bad. So please folks. Try this on your own data. Notice the horrible edge artifacts. Then always use buffers … (-:

PS: Usually no buffers are needed for running lasgrid, lasoverlap, or lascanopy as they perform simple binning operations that do not make use of neighbour information.

Rasterizing Perfect Canopy Height Models from LiDAR

In literature you sometimes read “we generated a Canopy Height Models (CHM) and then did this and that” without the process that was used to create the CHM being described in detail. One approach computes the CHM as a difference between DSM and DTM: create a DTM from the ground returns and a DSM from the first returns and subtract the two rasters. Also here is still a question left to be answered: how exactly are the DTM and the DSM generated. A different approach computes the CHM directly from height-normalized LiDAR points. And again there are many ways of doing so and we want to look at the possibilities in more detail.

the 100 by 100 meter sample plot 'drawno.laz'

the 100 by 100 meter sample plot ‘drawno.laz’

In the following we demonstrate different alternatives for CHM generation on a 100 by 100 meter sample LiDAR tile ‘drawno.laz‘ from a forest near Drawno in Poland (that you can download here), slowly converging towards the CHM generation method that we recommend using. We start with ground classifying the LiDAR using lasground:

lasground -i drawno.laz ^
          -wilderness ^
          -o ground.laz

Then we height-normalize the LiDAR using lasheight. As we know that there are no trees higher than 28 meters in this plot we drop all LiDAR points that are higher than 30 meters that may be bird hits or other noise.

lasheight -i ground.laz ^
          -drop_above 30 ^
          -replace_z ^
          -o normalized.laz

As the sample plot has an average pulse spacing of around 0.3 meters we decide to use a step size of 0.33333 meters to create a 300 by 300 pixel raster. We produce a false color visualization instead of a height raster for our CHMs so we can include the results here. The simplest method uses lasgrid with option ‘-highest’ that uses for each pixel the highest z coordinate among all LiDAR returns falling into the corresponding 0.33333 by 0.33333 meter area.

lasgrid -i normalized.laz ^
        -step 0.33333 ^
        -highest ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_grd.png
Gridding the highest point that falls into each 0.33333 meter by 0.33333 meter cell.

Gridding the highest point that falls into each 0.33333 meter by 0.33333 meter cell.

The resulting CHM (shown at a 200 % zoom) is full of empty pixels and so called “pits” that will hamper subsequent analysis for single tree detection, height and crown diameter computation, and the like. A simple improvement can be obtained by replacing each LiDAR return with a small disk. After all, the laser beam has – depending on the flying height – a diameter of 10 to 50 centimeter and approximating this with a single point of zero area seems overly conservative. In lasgrid there is the option ‘-subcircle 0.1’, which replaces each return by a circle with a radius of 10 centimeter or a diameter of 20 centimeter.

lasgrid -i normalized.laz ^
        -subcircle 0.1 ^
        -step 0.33333 ^
        -highest ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_grd_d20.png
Gridding the highest z value after turning each point into a circle with 20 cm diameter.

Gridding the highest z value after turning each point into a circle with 20 cm diameter.

The resulting CHM (shown again at a 200 % zoom) is much improved but there are still empty pixels and “pits”. We could simply widen the circles further with ‘-subcircle 0.15’, ‘-subcircle 0.2’, or ‘-subcircle 0.25’. As you can see below this produces increasingly smooth CHMs with widening tree crowns. But this “splats” the LiDAR returns into circles that are growing larger and larger than the laser beam diameter and thus have less and less in common with reality.

Gridding the highest returns will often leave empty pixels in the data even when “splatting” the points. Another popular approach avoids this by interpolating all first returns with a triangulated irregular network (TIN) and then rasterizing it onto a grid to create the CHM. This can be implemented with las2dem as shown below:

las2dem -i normalized.laz ^
        -first_only ^
        -step 0.33333 ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_tin.png
Rasterizing the TIN that interpolates all first returns onto a  0.33333 meter grid.

Rasterizing the TIN that interpolates all first returns onto a 0.33333 meter grid.

The result has no more empty pixels but is full of pits because many laser pulses manage to deeply penetrate the canopy before producing the first return. When combining multiple flight lines some laser pulses may have an unobstructed view of the ground under the canopy without hitting any branches. These “pits” and how to avoid them is discussed in great length in the September 2014 edition of the ASPRS PE&RS journal by a paper of Khosravipour et al. We build upon these ideas in the following. But first we combine the ‘-highest’ gridding with TIN interpolation with a two step approach: (1) keep only one highest return per grid cell with lasthin (2) interpolate all these highest returns with las2dem.

lasthin -i normalized.laz ^
        -step 0.33333 ^
        -highest ^
        -o temp.laz
las2dem -i temp.laz ^
        -step 0.33333 ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_tin_his.png
Rasterizing the TIN that interpolates only the highest points falling into each  0.33333 meter by 0.33333 meter grid cell.

Rasterizing the TIN that interpolates only the highest points falling into each 0.33333 meter by 0.33333 meter grid cell.

Next we integrate the idea of “splatting” the points into circles with a diameter of 20 centimeter to account for the laser beam diameter by adding option ‘-subcircle 0.1’ to lasthin.

lasthin -i normalized.laz ^
        -subcircle 0.1 ^
        -step 0.33333 ^
        -highest ^
        -o temp.laz
las2dem -i temp.laz ^
        -step 0.33333 ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_tin_his_d20.png
Rasterizing the TIN that interpolates only the highest points of a 0.33333 meter grid  after first splatting points into circles with 20 cm in diameter.

Rasterizing the TIN that interpolates only the highest points of a 0.33333 meter grid after first splatting points into circles with 20 cm in diameter.

The results are much nicer but there are still pits. However, one may argue at this points that thinning with a step size of 0.33333 is too agressive and removes too many points for subsequent interpolation and rasterization with the exact same step size. So we double the resolution of the temporary point cloud by thinning with half the step size of 0.16667.

lasthin -i normalized.laz ^
        -subcircle 0.1 ^
        -step 0.16667 ^
        -highest ^
        -o temp.laz
las2dem -i temp.laz ^
        -step 0.33333 ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_tin_hhs_d20.png
Rasterizing the TIN that interpolates only the highest points of a 0.16667 meter grid  after first splatting points into circles with 20 cm in diameter onto a raster with step size 0.33333.

Rasterizing the TIN that interpolates only the highest points of a 0.16667 meter grid after first splatting points into circles with 20 cm in diameter onto a raster with step size 0.33333.

We now have more detail but also many more pits. Furthermore the interpolation of highest returns makes a big error across areas were we do not have any LiDAR returns that are flanked by canopy returns on both sides. This happens in the area circled where wrong canopy is created because the TIN is interpolating canopy returns across a small wet area without any returns, We show now how the pit-free method of Khosravipour et al. can be used to generate the perfect CHM:

rmdir tmp_chm /s /q
mkdir tmp_chm
las2dem -i normalized.laz ^
        -drop_z_above 0.1 ^
        -step 0.33333 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_ground.bil
lasthin -i normalized.laz ^
        -subcircle 0.1 ^
        -step 0.16667 ^
        -highest ^
        -o temp.laz
las2dem -i temp.laz ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_00.bil
las2dem -i temp.laz ^
        -drop_z_below 2 ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_02.bil
las2dem -i temp.laz ^
        -drop_z_below 5 ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_05.bil
las2dem -i temp.laz ^
        -drop_z_below 10 ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_10.bil
las2dem -i temp.laz ^
        -drop_z_below 15 ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_15.bil
las2dem -i temp.laz ^
        -drop_z_below 20 ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_20.bil
las2dem -i temp.laz ^
        -drop_z_below 25 ^
        -step 0.33333 -kill 1.0 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o tmp_chm/chm_25.bil
lasgrid -i tmp_chm/chm*.bil -merged ^
        -step 0.33333 ^
        -highest ^
        -false -set_min_max 0 25 ^
        -ll 278200 602200 -ncols 300 -nrows 300 ^
        -o chm_pit_free_d20.png
rmdir tmp_chm /s /q
Running the pit-free algorithm on the highest LiDAR returns in a 0.16667 meter grid (after splatting them to circles 20 cm in diameter) and producing a 0.33333 meter raster CHM.

Running the pit-free algorithm on the highest LiDAR returns in a 0.16667 meter grid (after splatting them to circles 20 cm in diameter) and producing a 0.33333 meter raster CHM.

With such a perfectly pit-free output we can now be even more conservative and lower the diameter of the circles that we replace each LiDAR return with from 20 cm to 10 cm by replacing ‘-subcircle 0.1’ with ‘-subcircle 0.05’ in the above script.

Running the pit-free algorithm on the highest LiDAR returns in a 0.16667 meter grid (after splatting them to circles only 10 cm in diameter) and producing a 0.33333 meter raster CHM.

Running the pit-free algorithm on the highest LiDAR returns in a 0.16667 meter grid (after splatting them to circles only 10 cm in diameter) and producing a 0.33333 meter raster CHM.

Compared to the original pit-free algorithm published by Khosravipour et al. there are two minor differences: (1) instead of using all first returns as input we use one highest returns per grid cell after splatting all returns as small circles (instead of area-less points) onto a grid that has twice the output resolution. (2) we also have a ‘-kill 1.0’ threshold to also generate a partial CHM from all points for ‘chm_00.bil’ and add a new ‘chm_ground.bil’ to fill the potential holes. This prevents that higher up canopy returns are wrongly connected across water bodies where there are no LiDAR returns at all.

Reference:
Khosravipour, A., Skidmore, A.K., Isenburg, M., Wang, T.J., Hussin, Y.A., 2014. Generating pit-free Canopy Height Models from Airborne LiDAR. PE&RS = Photogrammetric Engineering and Remote Sensing 80, 863-872.