NRW Open LiDAR: Download, Compression, Viewing

This is the first part of a series on how to process the newly released open LiDAR data for the entire state of North Rhine-Westphalia that was announced a few days ago. Again, kudos to OpenNRW for being the most progressive open data state in Germany. You can follow this tutorial after downloading the latest version of LAStools as well as a pair of DGM and DOM files for your area of interest from these two download pages.

We have downloaded the pair of DGM and DOM files for the Federal City of Bonn. Bonn is the former capital of Germany and was host to the FOSS4G 2016 conference. As both files are larger than 10 GB, we use the wget command line tool with option ‘-c’ that will restart where it left off in case the transmission gets interrupted.

The DGM file and the DOM file are zipped archives that contain the points in 1km by 1km tiles stored as x, y, z coordinates in ETRS89 / UTM 32 projection as simple ASCII text with centimeter resolution (i.e. two decimal digits).

>> more dgm1l-lpb_32360_5613_1_nw.xyz
360000.00 5613026.69 164.35
360000.00 5613057.67 164.20
360000.00 5613097.19 164.22
360000.00 5613117.89 164.08
360000.00 5613145.35 164.03
[...]

There is more than one tile for each square kilometer as the LiDAR points have been split into different files based on their classification and their return type. Furthermore there are also synthetic points that were used by the land survey department to replace certain LiDAR points in order to generate higher quality DTM and DSM raster products.

The zipped DGM archive is 10.5 GB in size and contains 956 *.xyz files totaling 43.5 GB after decompression. The zipped DOM archive is 11.5 GB in size and contains 244 *.xyz files totaling 47.8 GB. Repeatedly loading these 90 GB of text data and parsing these human-readable x, y, and z coordinates is inefficient with common LiDAR software. In the first step we convert the textual *.xyz files into binary *.laz files that can be stored, read and copied more efficiently. We do this with the open source LASzip compressor that is distributed with LAStools using these two command line calls:

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2

The point coordinates are is in EPSG 5555, which is a compound datum of horizontal EPSG 25832 aka ETRS89 / UTM zone 32N and vertical EPSG 5783 aka the “Deutsches Haupthoehennetz 1992” or DHHN92. We add this information to each *.laz file during the LASzip compression process with the command line options ‘-epsg 25832’ and ‘-vertical_dhhn92’.

LASzip reduces the file size by a factor of 10. The 956 *.laz DGM files compress down to 4.3 GB from 43.5 GB for the original *.xyz files and the 244 *.laz DOM files compress down to 4.8 GB from 47.8 GB. From here on out we continue to work with the 9 GB of slim *.laz files. But before we delete the 90 GB of bulky *.xyz files we make sure that there are no file corruptions (e.g. disk full, truncated files, interrupted processes, bit flips, …) in the *.laz files.

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check

One advantage of having the LiDAR in an industry standard such as the LAS format (or its lossless compressed twin, the LAZ format) is that the header of the file stores the number of points per file, the bounding box, as well as the projection information that we have added. This allows us to very quickly load an overview for example, into lasview.

lasview -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -GUI
The bounding boxes of the DGM files quickly display a preview of the data in the GUI when the files are in LAS or LAZ format.

The bounding boxes of the DGM files quickly give us an overview in the GUI when the files are in LAS or LAZ format.

Now we want to find a particular site in Bonn such as the World Conference Center Bonn where FOSS4G 2016 was held. Which tile is it in? We need some geospatial context to find it, for example, by creating an overview in form of KML files that we can load into Google Earth. We use the files from the DOM folder with “fp” in the name as points on buildings are mostly “first returns”. See what our previous blog post writes about the different file names if you can not wait for the second part of this series. We can create the KML files with lasboundary either via the GUI or in the command line.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -gui
Only the "fp" tiles from the DOM folder loaded the GUI into lasboundary.

Only the “fp” tiles from the DOM folder loaded the GUI into lasboundary.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -use_bb -labels -okml

We zoom in and find the World Conference Center Bonn and load the identified tile into lasview. Well, we did not expect this to happen, but what we see below will make this series of tutorials even more worthwhile. There is a lot of “high noise” in the particular tile we picked. We should have noticed the unusually high z range of 406.42 meters in the Google Earth pop-up. Is this high electromagnetic radiation interfering with the sensors? There are a number of high-tech government buildings with all kind of antennas nearby (such as the United Nations Bonn Campus the mouse cursor points at).

Significant amounts of high noise are in the first returns of the DOM tile we picked.

Significant amounts of high noise are in the first returns of the DOM tile we picked.

But the intended area of interest was found. You can see the iconic “triangulated” roof of the building that is across from the World Conference Center Bonn.

The World Conference Center Bonn is across from the building with the "triangulated" roof.

The World Conference Center Bonn is across from the building with the “triangulated” roof.

Please don’t think it is the responsibility of OpenNRW to remove the noise and provide cleaner data. The land survey has already processed this data into whatever products they needed and that is where their job ended. Any additional services – other than sharing the raw data – are not in their job description. We’ll take care of that … (-:

Acknowledgement: The LiDAR data of OpenNRW comes with a very permissible license. It is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It only requires us to name the source. We need to cite the “Land NRW (2017)” with the year of the download in brackets and specify the Universal Resource Identification (URI) for both the DOM and the DGM. Done. So easy. Thank you, OpenNRW … (-:

Creating a Better DTM from Photogrammetic Points by Avoiding Shadows

At INTERGEO 2016 in Hamburg, the guys from Aerowest GmbH shared with us a small photogrammetric point cloud from the city of Soest that had been generated with the SURE dense-matching software from nFrames. We want to test whether – using LAStools – we can generate a decent DTM from these points that are essentially a gridded DSM. If this interest you also see this, this, this, and this story.

soest_00_google_earth

Here you can download the four original tiles (tile1, tile2, tile3, tile4) that we are using in these experiments. We first re-tile them into smaller 100 meter by 100 meter tiles with a 20 meter buffer using lastile. We use option ‘-flag_as_withheld’ that flags all the points falling into the buffer using the withheld flag so they can easily be removed on-the-fly later with the ‘-drop_withheld’ filter (see the README for more on this). We also add the missing projection with ‘-epsg 32632’.

lastile -i original\*.laz ^
        -tile_size 100 -buffer 20 ^
        -flag_as_withheld ^
        -epsg 32632 ^
        -odir tiles_raw -o soest.laz

Below is a screenshot from one of the resulting 100 meter by 100 meter tiles (with 20 meter buffer) that we will be focusing on in the following experiments.

The tiles 'soest_437900_5713800.laz'

The tile ‘soest_437900_5713800.laz’ used in all experiments.

Next we use lassort to reorder the points ordered along a coherent space-filling curve as the existing scan-line order has the potential to cause our triangulation engine to slow down. We do this on 4 cores.

lassort -i tiles_raw\*.laz ^
        -odir tiles_sorted -olaz ^
        -cores 4

We then use lasthin to lower the number of points that we consider as ground points (see the README for more info on this tool). We do this because the original 5 cm spacing of the dense matching points is a bit excessive for generating a DTM with a resolution of, for example, 50 cm. Instead we only use the lowest point in each 20 cm by 20 cm cell as a candidate for becoming a ground point, which reduces the number of considered points by a factor of 16. We achieve this by classifying these lowest point to a unique classification code (here: 9) and later tell lasground to ignore all other classifications.

lasthin -i tiles_sorted\*.laz ^
        -step 0.2 -lowest -classify_as 9 ^
        -odir tiles_thinned -olaz ^
        -cores 4
Then we run lasground on 4 cores to classify the ground points with options ‘-step 20’, ‘-bulge 0.5’, ‘-spike 0.5’ and ‘-fine’ that we selected after two trials on a single tile. There are several other options in lasground to play with that may achieve better results on other data sets (see README file for more on this). The ‘-ignore_class 0’ option instructs lasground to ignore all points that are not classified so that only those points that lasthin was classifying as 9 in the previous step are considered.
lasground -i tiles_thinned\*.laz ^
          -step 20 -bulge 0.5 -spike 0.5 -fine ^
          -ignore_class 0 ^
          -odir tiles_ground -olaz ^
          -cores 4
However, when we scrutinize the resulting ground classification notice that there are bumps in the corresponding ground TIN that seem to correspond to areas where the original imagery has dark shadows that in turn seem to significantly affect the geometric accuracy of the points generated by the dense-matching algorithm.
Looking a the bump from below we identify the RGB colors of the points have that form the bump that seem to be of much lower accuracy than the rest. This is an effect that we have noticed in the past for data generated with other dense-matching software and maybe an approach similar to the one we take here could also help with this low noise. It seems that points that are generated from shadowed areas in the input images can have a lot lower accuracy than those from well-lit areas. We use this correlation between RGB color and geometric accuracy to simply exclude all points whose RGB colors indicate that they might be from shadow areas from becoming ground points.
The RGB colors of low-accuracy points are mostly from very dark shadowed areas.

The RGB colors of low-accuracy points are mostly from very dark shadowed areas.

We use las2las with the relatively new ‘-filtered_transform’ option to reclassify all points whose RGB color is close to zero to yet classification code 7 (see README file for more on this). We do this for all points whose red value is between 0 and 30, whose green value is between 0 and 35, and whose blue value is between 0 and 40. Because the RGB values were stored with 16 bits in these files we have to multiply these values with 256 when applying the filter.
las2las -i tiles_thinned\*.laz ^
        -keep_RGB_red 0 7680 ^
        -keep_RGB_green 0 8960 ^
        -keep_RGB_blue 0 10240 ^
        -filtered_transform ^
        -set_classification 7 ^
        -odir tiles_rgb_filtered -olaz ^
        -cores 4
Below you see all points that will no longer be considered because their classification was set to 7 by the command above.
Points whose RGB values indicate they might lie in the shadows.

Points whose RGB values indicate they might lie in the shadows.

We then re-run lasground with the same options ‘-step 20’, ‘-bulge 0.5’, ‘-spike 0.5’ and ‘-fine’ as before but now we ignore all points that are still have classification 0 because they were not classified as 9 by lasthin earlier and we also ignore all points that have been assigned classification 7 by las2las in the previous step.
lasground -i tiles_thinned\*.laz ^
          -step 20 -bulge 0.5 -spike 0.5 -fine ^
          -ignore_class 0 7 ^
          -odir tiles_ground -olaz ^
          -cores 4
The situation has significantly improved for the bumb we saw before as you can see in the images below.

Finally we create a DTM with blast2dem (see README) and a DSM with lasgrid (see README) by merging all points into one file but dropping the buffer points that were marked as withheld by the initial run of lastile (see README).

blast2dem -i tiles_ground\*.laz -merged ^
          -drop_withheld -keep_class 2 ^
          -step 0.5 ^
          -o dtm.bil

lasgrid -i tiles_ground\*.laz -merged ^
        -drop_withheld ^
        -step 0.5 -average ^
        -o dsm.bil
 As our venerable lasview (see README) can directly read BIL rasters as points (just like all the other LAStools), so we can compare the DTM and the DTM by loading them as two flight lines (‘-faf’) and then switch back and forth between the two by pressing ‘0’ and ‘1’ in the viewer.
lasview -i dtm.bil dsm.bil -faf

Above you see the final DTM and the original DSM. So yes, LAStools can definitely create a DTM from point clouds that are the result of dense-matching photogrammetry. We used the correlation between shadowed areas in the source image and geometric errors to remove those points from consideration for ground points that are likely to be too low and result in bumps. For dense-matching algorithms that also output an uncertainty value for each point there is the potential for even better results as our range of eliminated RGB colors may not cover all geometrically uncertain points while at the same time may be too conservative and also remove correct ground points.

Fixing Intensity Differences between Flightlines (“quick and dirty”)

Visiting our users on-site, such as last week at Mariano Marcos State University in Ilocos Norte in the Philippines, we sometimes come across situations as pictured below where the intensity values of the returns of one flightline are drastically different from that of other flightlines.

The intensity of returns in the left most flightline is different from that of other flightlines.

The intensity of returns in the left most flightline is different from that of other flightlines.

Using intensity rasters with such dark strips as an additional input for land cover classification may likely make things worse. Radiometrically “correct” intensity calibration is a complex topic and may not always be possible to do using only the LAZ files without meta information such as the internals of the scanning system and the aircraft trajectory. However, we now describe a “quick and dirty” fix to the situation shown above so that the intensity grids (that were computed as averages of first return intensities) at least “look” as sensible as for the one square tile (shown below) that was corrected by a simple multiplication with 5 for all intensities of the dark strip.

Simply multiplying all intensities of the dark flightline with 5 seems to "fix" the issue.

Simply multiplying all intensities of the dark flightline with 5 seems to “fix” the issue for our test tile.

The number 5 was determined by a quick glance at the intensity histograms that we can generate with lasinfo. We decide to only look at single returns as we expect them to have a higher correlation: Their locations are more likely to be “seen similarly” from and their energy is more likely “reflected similarly” to different flightlines compared to that of multiple returns.

lasinfo -i strip1.laz strip2.laz strip3.laz ^
        -keep_single ^
        -histo intensity 1 ^
        -nmm -nh -nv ^
        -odix _histo_int -otxt

The resulting histograms for the dark ‘strip1.laz’ is quite different from that of the much brighter ‘strip2.laz’ and ‘strip3.laz’. The average single return intensity for the dark ‘strip1.laz’ is a meager 5.13 whereas the  brighter ‘strip2.laz’ and ‘strip3.laz’ have similar averages of 24.15 and 24.50 respectively.

Draw your own conclusion about which scale factor to use. We have the choice to match either the peak of the histograms or their averages. Scaling the peak of 3 for ‘strip1.laz’ to match the 25 of the other two strips is too much of an upscaling. But the average 24.15 divided by 5.13 gives a potential scale of 4.71 and the average 24.50 divided by 5.13 gives a potential scale of 4.77 and we already saw a multiplication by 5 giving reasonable results. So this is how we can fix the intensity:

las2las -i strip1.laz ^
        -scale_intensity 4.75 ^
        -odix _corr_int -olaz

But what if your data is already in tiles? How can you adjust only the intensity of those returns that are from the flightline 1? Assuming that your flightline information is properly stored in the point source ID field of every point this is easily done with the new ‘-filtered_transform’ in LAStools using las2las on as many cores as you have as follows:

las2las -i tiles/*.laz ^
        -keep_point_source 1 ^
        -filtered_transform ^
        -scale_intensity 4.75 ^
        -odir tiles_corr -olaz ^
        -cores 8

This is not currently exposed in the GUI of las2las but you can simply add it by typing it into the ‘RUN’ pop-up window as shown below.

Scaling only the intensities of flightline 1 by 4.9 using the new '-filtered_transform'.

Scaling only the intensities of flightline 1 by 4.9 using the new ‘-filtered_transform’.

After this “quick and dirty” intensity correction we again ran lasgrid as follows:

lasgrid -i tiles_corr/*.laz ^
        -gray -set_min_max 0 60 ^
        -odir tiles_int_rasters -opng ^
        -cores 8

And the result is shown below. The obvious flightline-induced discontinuity in the intensities has pretty much disappeared. Do you have similar flightline-related intensity issues? We like to hear from you whether this technique works or if we need to implement something more clever in the future …

lasgrid_intensity_differences_3

Generating Spike-Free Digital Surface Models from LiDAR

A Digital Surface Model (DSM) represents the elevation of the landscape including all vegetation and man-made objects. An easy way to generate a DSM raster from LiDAR is to use the highest elevation value from all points falling into each grid cell. However, this “binning” approach only works when then the resolution of the LiDAR is higher than the resolution of the raster. Only then sufficiently many LiDAR points fall into each raster cell to prevent “empty pixels” and “data pits” from forming. For example, given LiDAR with an average pulse spacing of 0.5 meters one can easily generate a 2.5 meter DSM raster with simple “binning”. But to generate a 0.5 meter DSM raster we need to use an “interpolation” method.

Returns of four fightlines on two trees.

Laser pulses and discrete returns of four fightlines.

For the past twenty or so years, GIS textbooks and LiDAR tutorials have recommened to use only the first returns to construct the interpolating surface for DSM generation. The intuition is that the first return is the highest return for an airborne survey where the laser beams come (more or less) from above. Hence, an interpolating surface of all first returns is constructed – usually based on a 2D Delaunay triangulation – and the resulting Triangular Irregular Network (TIN) is rasterized onto a grid at a user-specified resolution to create the DSM raster. The same way a Canopy Height Model (CHM) is generated except that elevations are height-normalized either before or after the rasterization step. However, using a first-return interpolation for DSM/CHM generation has two critical drawbacks:

(1) Using only first returns means not all LiDAR information is used and some detail is missing. This is particularly the case for off-nadir scan angles in traditional airborne surveys. It becomes more pronounced with new scanning systems such as UAV or hand-held LiDAR where laser beams no longer come “from above”. Furthermore, in the event of clouds or high noise the first returns are often removed and the remaining returns are not renumbered. Hence, any laser shot whose first return reflects from a cloud or a bird does not contribute its highest landscape hit to the DSM or CHM.

(2) Using all first returns practically guarantees the formation of needle-shaped triangles in vegetated areas and along building roofs that appear as spikes in the TIN. This is because at off-nadir scan angles first returns are often generated far below other first returns as shown in the illustration above. The resulting spikes turn into “data pits” in the corresponding raster that not only look ugly but impact the utility of the DSM or CHM in subsequent analysis, for example, in forestry applications when attempting to extract individual trees.

In the following we present results and command-line examples for the new “spike-free” algorithm by (Khosravipour et. al, 2015, 2016) that is implemented (as a slow prototype) in the current LAStools release. This completely novel method for DSM generation triangulates all relevant LiDAR returns using Contrained Delaunay algorithm. This constructs a “spike-free” TIN that is in turn rasterized into “pit-free” DSM or CHM. This work is both a generalization and an improvement of our previous result of pit-free CHM generation.

We now compare our “spike-free” DSM to a “first-return” DSM on the two small urban data sets “france.laz” and “zurich.laz” distributed with LAStools. Using lasinfo with options ‘-last_only’ and ‘-cd’ we determine that the average pulse spacing is around 0.33 meter for “france.laz” and 0.15 meter for “zurich.laz”. We decide to create a hillshaded 0.25 meter DSM for “france.laz” and a 0.15 meter DSM for “zurich.laz” with the command-lines shown below.

las2dem -i ..\data\france.laz ^
        -keep_first ^
        -step 0.25 ^
        -hillshade ^
        -o france_fr.png
las2dem -i ..\data\france.laz ^
        -spike_free 0.9 ^
        -step 0.25 ^
        -hillshade ^
        -o france_sf.png
las2dem -i ..\data\zurich.laz ^
           -keep_first ^
           -step 0.15 ^
           -hillshade ^
           -o zurich_fr.png
las2dem -i ..\data\zurich.laz ^
        -spike_free 0.5 ^
        -step 0.15 ^
        -hillshade ^
        -o zurich_sf.png

The differences between a first-return DSM and a spike-free DSM are most drastic along building roofs and in vegetated areas. To inspect in more detail the differences between a first-return and our spike-free TIN we use lasview that allows to iteratively visualize the construction process of a spike-free TIN.

lasview -i ..\data\france.laz -spike_free 0.9

Pressing <f> and <t> constructs the first-return TIN. Pressing <SHIFT> + <t> destroys the first-return TIN. Pressing <SHIFT> + <y> constructs the spike-free TIN. Pressing <y> once destroys the spike-free TIN. Pressing <y> many times iteratively constructs the spike-free TIN.

One crucial piece of information is still missing. What value should you use as the freeze constraint of the spike-free algorithm that we set to 0.9 for “france.laz” and to 0.5 for “zurich.laz” as the argument to the command-line option ‘-spike_free’. The optimal value is related to the expected edge-length and we found the 99th percentile of a histogram of edge lengths of the last-return TIN to be useful. Or simpler … try a value that is about three times the average pulse spacing.

References:
Khosravipour, A., Skidmore, A.K., Isenburg, M. and Wang, T.J. (2015) Development of an algorithm to generate pit-free Digital Surface Models from LiDAR, Proceedings of SilviLaser 2015, pp. 247-249, September 2015.
Khosravipour, A., Skidmore, A.K., Isenburg, M (2016) Generating spike-free Digital Surface Models using raw LiDAR point clouds: a new approach for forestry applications, (journal manuscript under review).

Create proper LAS 1.4 files with LAStools (for free)

So your LiDAR processing workflow produces beautiful LAS or LAZ output. You think the files are nothing short of perfect. But they are in LAS 1.2 format and the tender document explicitly requests delivery in the latest LAS 1.4 format. The free and open source las2las module of LAStools – also known as the Swiss Army knife of LiDAR processing – can easily up-convert them.

There are two possible scenarios: (1) The client wants the data as LAS 1.4 files but does not care about which point type is used. (2) The client wants LAS 1.4 and specifically asks for one of the new point types such as point type 6.

Let us assume you have LAS 1.2 content with point type 1 like these LAZ files from Martin county in Minnesota. Use these four tiles 5126-05-42.laz5126-05-43.laz5126-06-42.laz, and 5126-06-43.laz to repeat the LAS 1.4 up-conversion we are doing in the following. You will need to use version 160110 of LAStools (or newer) that you can download here.

A report generated with lasinfo shows geo-referencing information stored as GeoTIFF keys in the first VLR. Then there are 10 useless numbers stored in the second VLR that seem left-over from an earlier version of geo-referencing without EPSG codes. The two vendor-specific ‘NIIRS10’ VLRs should probably be removed unless they are meaningful to you. The LAS header is followed by 1564 user-defined bytes (sometimes used as “padding”) that we can also delete before delivery.

D:\LAStools\bin>lasinfo -i martin\5126-05-42.laz
reporting all LAS header entries:
 file signature: 'LASF'
 file source ID: 0
 global_encoding: 0
 project ID GUID data 1-4: E603549E-B74A-4696-3BAD-EA49F643C221
 version major.minor: 1.2
 system identifier: 'LAStools (c) by Martin Isenburg'
 generating software: 'lassort (110815) unlicensed'
 file creation day/year: 119/2011
 header size: 227
 offset to point data: 2163
 number var. length records: 4
 point data format: 1
 point data record length: 28
 number of point records: 13772409
 number of points by return: 13440460 240598 78743 12608 0
 scale factor x y z: 0.01 0.01 0.01
 offset x y z: 0 0 0
 min x y z: 361818.33 4855898.31 349.89
 max x y z: 364400.98 4859420.79 404.22
variable length header record 1 of 4:
 reserved 43707
 user ID 'LASF_Projection'
 record ID 34735
 length after header 40
 description 'by LAStools of Martin Isenburg'
 GeoKeyDirectoryTag version 1.1.0 number of keys 4
 key 1024 tiff_tag_location 0 count 1 value_offset 1 - GTModelTypeGeoKey: ModelTypeProjected
 key 3072 tiff_tag_location 0 count 1 value_offset 26915 - ProjectedCSTypeGeoKey: NAD83 / UTM 15N
 key 3076 tiff_tag_location 0 count 1 value_offset 9001 - ProjLinearUnitsGeoKey: Linear_Meter
 key 4099 tiff_tag_location 0 count 1 value_offset 9001 - VerticalUnitsGeoKey: Linear_Meter
variable length header record 2 of 4:
 reserved 43707
 user ID 'LASF_Projection'
 record ID 34736
 length after header 80
 description 'GeoTiff double parameters'
 GeoDoubleParamsTag (number of doubles 10)
 500000 0 -93 0.9996 0 1 6.37814e+006 298.257 0 0.0174533
variable length header record 3 of 4:
 reserved 43707
 user ID 'NIIRS10'
 record ID 1
 length after header 26
 description 'NIIRS10 Tile Index'
variable length header record 4 of 4:
 reserved 43707
 user ID 'NIIRS10'
 record ID 4
 length after header 10
 description 'NIIRS10 Timestamp'
the header is followed by 1564 user-defined bytes
LASzip compression (version 2.1r0 c2 50000): POINT10 2 GPSTIME11 2
reporting minimum and maximum for all LAS point record entries ...
 X 36181833 36440098
 Y 485589831 485942079
 Z 34989 40422
 intensity 1 4780
 return_number 1 4
 number_of_returns 1 4
 edge_of_flight_line 0 0
 scan_direction_flag 0 1
 classification 1 10
 scan_angle_rank -24 24
 user_data 0 32
 point_source_ID 5401 8015
 gps_time 243852.663596 403514.323142
number of first returns: 13440460
number of intermediate returns: 91408
number of last returns: 13440349
number of single returns: 13199808
overview over number of returns of given pulse: 13199808 323647 198449 50505 0 0 0
histogram of classification of points:
 6560507 unclassified (1)
 6744481 ground (2)
 401464 medium vegetation (4)
 55433 building (6)
 100 noise (7)
 10157 water (9)
 267 rail (10)

(1) Create LAS 1.4 files with point type 1

The las2las command shown below turns a folder of LAS 1.2 files into a folder of  LAS 1.4 files with option ‘-set_version 1.4’ without changing the point type. It removes the last three VLRs with option ‘-remove_vlrs_from_to 1 3’ and the user-defined bytes in the LAS header with option ‘-remove_padding’ (*). The result is as LAZ with option ‘-olaz’ and the task is run on up to 4 CPUs in parallel with option ‘-cores 4’.

(*) option ‘-remove_padding’ is called  ‘-remove_extra’ in older versions of LAStools.

las2las -i martin/*.laz ^
        -remove_vlrs_from_to 1 3 ^
        -remove_padding ^
        -set_version 1.4 ^
        -odir martin_LAS14_pt1 -olaz ^
        -cores 4

(2) Create LAS 1.4 files with point type 6

The las2las command shown below turns a folder of LAS 1.2 files into a folder of  LAS 1.4 files with option ‘-set_version 1.4’ and changes the point type from 1 to 6 with option ‘-set_point_type 6’. As before some VLRs and the header padding is removed. It also adds a second description of the georeferencing information by rewriting the existing information stored as GeoTIFF tags into a properly formatted OGC WKT string with option ‘-set_ogc_wkt’. Thanks to ESRI’s assault on the LAZ format the output still needs to be in LAS. It can be compressed with laszip in a subsequent step using the ‘LAS 1.4 compatibility mode‘ sponsored by NOAA.

las2las -i martin/*.laz ^
        -remove_vlrs_from_to 1 3 ^
        -remove_padding ^
        -set_version 1.4 ^
        -set_point_type 6 ^
        -set_ogc_wkt ^
        -odir martin_LAS14_pt6 -olas ^
        -cores 4

laszip -i martin_LAS14_pt6/*.las -cores 4

Below the output of lasinfo for the “upgraded” LAS 1.4 file with the OGC WKT string. Some of the key changes have been marked in red.

D:\LAStools\bin>lasinfo -i martin_LAS14_pt6\5126-05-42.las
reporting all LAS header entries:
 file signature: 'LASF'
 file source ID: 0
 global_encoding: 16
 project ID GUID data 1-4: E603549E-B74A-4696-3BAD-EA49F643C221
 version major.minor: 1.4
 system identifier: 'LAStools (c) by rapidlasso GmbH'
 generating software: 'las2las (version 160110)'
 file creation day/year: 119/2011
 header size: 375
 offset to point data: 1135
 number var. length records: 2
 point data format: 6
 point data record length: 30
 number of point records: 0
 number of points by return: 0 0 0 0 0
 scale factor x y z: 0.01 0.01 0.01
 offset x y z: 0 0 0
 min x y z: 361818.33 4855898.31 349.89
 max x y z: 364400.98 4859420.79 404.22
 start of waveform data packet record: 0
 start of first extended variable length record: 0
 number of extended_variable length records: 0
 extended number of point records: 13772409
 extended number of points by return: 13440460 240598 78743 12608 0 0 0 0 0 0 0 0 0 0 0
variable length header record 1 of 2:
 reserved 43707
 user ID 'LASF_Projection'
 record ID 34735
 length after header 40
 description 'by LAStools of Martin Isenburg'
 GeoKeyDirectoryTag version 1.1.0 number of keys 4
 key 1024 tiff_tag_location 0 count 1 value_offset 1 - GTModelTypeGeoKey: ModelTypeProjected
 key 3072 tiff_tag_location 0 count 1 value_offset 26915 - ProjectedCSTypeGeoKey: NAD83 / UTM 15N
 key 3076 tiff_tag_location 0 count 1 value_offset 9001 - ProjLinearUnitsGeoKey: Linear_Meter
 key 4099 tiff_tag_location 0 count 1 value_offset 9001 - VerticalUnitsGeoKey: Linear_Meter
variable length header record 2 of 2:
 reserved 43707
 user ID 'LASF_Projection'
 record ID 2112
 length after header 612
 description 'by LAStools of rapidlasso GmbH'
 WKT OGC COORDINATE SYSTEM:
 PROJCS["NAD83 / UTM 15N",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-93],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","26915"]]
reporting minimum and maximum for all LAS point record entries ...
 X 36181833 36440098
 Y 485589831 485942079
 Z 34989 40422
 intensity 1 4780
 return_number 1 4
 number_of_returns 1 4
 edge_of_flight_line 0 0
 scan_direction_flag 0 1
 classification 1 10
 scan_angle_rank -24 24
 user_data 0 32
 point_source_ID 5401 8015
 gps_time 243852.663596 403514.323142
 extended_return_number 1 4
 extended_number_of_returns 1 4
 extended_classification 1 10
 extended_scan_angle -4000 4000
 extended_scanner_channel 0 0
number of first returns: 13440460
number of intermediate returns: 91408
number of last returns: 13440349
number of single returns: 13199808
overview over extended number of returns of given pulse: 13199808 323647 198449 50505 0 0 0 0 0 0 0 0 0 0 0
histogram of classification of points:
 6560507 unclassified (1)
 6744481 ground (2)
 401464 medium vegetation (4)
 55433 building (6)
 100 noise (7)
 10157 water (9)
 267 rail (10)

Use Buffers when Processing LiDAR in Tiles !!!

We often process LiDAR in tiles for two reasons: first, to keep the number of points per file low and use main memory efficient, and second, to speed up the computation with parallel tile processing and keep all cores of a modern CPU busy. However, it is very (!!!) important to take the necessary precautions to avoid “edge artifacts” when processing LiDAR in tiles. We have to include points from neighboring tiles during certain LAStools processing steps to avoid edge artifacts. Why? Here is an illustration from our PHIL LiDAR tour earlier this year:

Buffers are important to avoid edge artifacts along tile boundaries during DTM creation.

Buffers are important to avoid edge artifacts along tile boundaries.

What you see is the temporary TIN of ground points created (internally) by las2dem or blast2dem that is then rastered at the user-specified step size onto a grid. Without a buffer (right side) there will not always be a triangle to cover every pixel. Especially in the corners of the tile you will often find empty pixels. Furthermore the poorly shaped “sliver triangles” along the boundary of the TIN do not interpolate the ground elevations properly. In contrast, with buffer (left side) the TIN generously covers the entire area that is to be rastered with nicely shaped triangles.

The

Christmas cookie analogy: buffers are like generously rolling out the dough

Here the christmas cookies analogy: You need to roll out the dough larger than the cookies you will cut to make sure your cookies will have nice edges. Think of the TIN as the dough and the square tile as your cookie cutter. You need to use a sufficiently large buffer when you roll out your TIN to assure an edge without crumbles when you cut out the tile … (-: … otherwise you are pretty much guaranteed to get results that – upon closer inspection – have these kind of artifacts:

Without buffers processing artifacts also happen when classifying points with lasground or lasclassify, when calculating height above ground or height-normalizing LiDAR tiles with lasheight, when removing noise with lasnoise, when creationg contours with las2iso or blast2iso, or any other operation where an incomplete neighborhood of points can affect the results. Hence, we need to surround each tile with a temporary buffer of points. Currently there are two ways of working with buffers with LAStools:

  1. creating buffered tiles during the initial tiling step with the ‘-buffer 25’ option of lastile, maintaining buffered tiles throughout processing and finally using the ‘-use_tile_bb’ option of lasgrid, las2dem, blast2dem, or lascanopy to raster the tiles without the temporary buffer.
  2. creating buffered tiles from non-overlapping (= unbuffered) tiles with “on-the-fly” buffering using the ‘-buffered 25’ option of most LAStools such as lasground, lasheight, or las2dem. For some workflows it is useful to also add ‘-remain_buffered’ if buffers are needed again in the next step. Finally, we use the ‘-use_orig_bb’ option of lasgrid, las2dem, blast2dem, or lascanopy to raster the tiles without the temporary buffer.

In the following three (tiny) examples using the venerable ‘fusa.laz’ sample that is distributed with LAStools to illustrate the two types of buffering as well as to show what happens when no buffers are used. In each example we will first cut the small ‘fusa.laz’ sample into nine smaller tiles and then process these separately on 4 cores in parallel.

1. Initial buffer creation with lastile

This is what most of my tutorials teach. It assumes you are the one creating the tiling in the first place. If you do it with lastile and add a buffer right from the start things are pretty easy.

lastile -i ..\data\fusa.laz ^
        -set_classification 0 -set_user_data 0 ^
        -tile_size 100 -buffer 20 ^
        -odir 1_raw -o futi.laz

We cut the input into 100 meter by 100 meter tiles but add a 20 meter buffer around each tile. That means that each tile on disk will contain the points for an area of up to 140 meter by 140 meter. The GUI for LAStools shows the overlap and if you scrutinize the bounding box values that the cursor points to you notice the extra 20 meters in each direction.

tiles_buffered_with_lastile

Now we can forget about the buffers and run the standard workflow consiting of lasground, lasheight, and lasclassify to distinguish ground, vegetation, and building points in the LiDAR tiles.

lasground -i 1_raw\futi*.laz ^
          -city ^
          -odir 1_ground -olaz ^
          -cores 4
lasheight -i 1_ground\futi*.laz ^
          -drop_above 50 ^
          -odir 1_height -olaz ^
          -cores 4
lasclassify -i 1_height\futi*.laz ^
            -odir 1_classify -olaz ^
            -cores 4

At the end – when we generate raster products – we have to remember that the tiles were buffered by lastile and cut off the buffers when we raster the TIN with option ‘-use_tile_bb’ of las2dem.

las2dem -i 1_classify\futi*.laz ^
        -keep_class 2 6 ^
        -step 0.25 -use_tile_bb ^
        -odir 1_dbm -obil ^
        -cores 4

We created a digital terrain model with buildings (DBM) by keeping the points with classification 2 (ground) and 6 (building). After loading the resulting 9 tiles into QGIS and generating a virtual raster we see a nice seamless DBM without any edge artifacts.

The DEM of the 9 tiles computed with buffers created by lastile has no edge artifacts acoss tile boundaries.

The DBM of the 9 tiles computed with buffers created by lastile has no edge artifacts acoss tile boundaries.

If you need to deliver the LiDAR files you should remove the buffers with lastile and option ‘-remove_buffer’.

lastile -i 1_classify\futi*.laz ^
        -remove_buffer ^
        -odir 1_final -olaz ^
        -cores 4

2. On-the-fly buffering

Now assume you are given LiDAR tiles without buffers. We generate them here with lastile.

lastile -i ..\data\fusa.laz ^
        -set_classification 0 -set_user_data 0 ^
        -tile_size 100 ^
        -odir 2_raw -o futi.laz

The only difference is that we do not request the 20 meter buffer and the result is a typical tiling as you may receive it from a vendor or download it from a LiDAR portal. The GUI for LAStools shows that there is no overlap and if you scrutinize the bounding box values that the cursor points to, you see that the tiles is exactly 100 meters bty 100 meters.

tiles_without_buffer

Now we have to think about buffers a lot. When using on-the-fly buffering we should first spatially index the tiles with lasindex for faster access to the points from neighbouring tiles.

lasindex -i 1_raw\futi*.laz -cores 4

Below in red are the modifications for on-the-fly buffering to the standard workflow of lasground, lasheight, and lasclassify. The first lasground run uses ‘-buffered 20’ to add buffers to each tile and ‘-remain_buffered’ to write those buffers to disk. This way they do not have to created again by lasheight and lasclassify.

lasground -i 2_raw\futi*.laz ^
          -buffered 20 -remain_buffered ^
          -city ^
          -odir 2_ground -olaz ^
          -cores 4
lasheight -i 2_ground\futi*.laz ^
          -remain_buffered ^
          -drop_above 50 ^
          -odir 2_height -olaz ^
          -cores 4
lasclassify -i 2_height\futi*.laz ^
            -remain_buffered ^
            -odir 2_classify -olaz ^
            -cores 4

At the end we have to remember that the tiles still have on-the-fly buffers and them cut off with option ‘-use_orig_bb’ of las2dem.

las2dem -i 2_classify\futi*.laz ^
        -keep_class 2 6 ^
        -step 0.25 -use_orig_bb ^
        -odir 2_dbm -obil ^
        -cores 4

Again, we created a digital terrain model with buildings (DBM) by keeping the points with classification 2 (ground) and 6 (building). The resulting hillshade computed from a virtual raster that combines the 9 BIL rastera into one looks perfectly smooth in QGIS.

The hillshaded DEM of the 9 tiles computed with on-the-fly buffering has no edge artifacts acoss tile boundaries.

The hillshaded DBM of 9 tiles computed with on-the-fly buffering has no edge artifacts acoss tile boundaries.

If you need to deliver the LiDAR files you should probably remove the buffers first … but that is not yet implemented. (-:

lastile -i 2_classify\futi*.laz ^
        -remove_buffer ^
        -odir 2_final -olaz ^
        -cores 4

3. Bad: No buffering

Here what you are *not* supposed to do. Assuming you get unbuffered tiles.

lastile -i ..\data\fusa.laz ^
        -set_classification 0 -set_user_data 0 ^
        -tile_size 100 ^
        -odir 3_raw -o futi.laz

Bad. You do not take care about buffering when processing the tiles.

lasground -i 3_raw\futi*.laz ^
          -city ^
          -odir 3_ground -olaz ^
          -cores 4
lasheight -i 3_ground\futi*.laz ^
          -drop_above 50 ^
          -odir 3_height -olaz ^
          -cores 4
lasclassify -i 3_height\futi*.laz ^
            -odir 3_classify -olaz ^
            -cores 4

Bad. You do not take care about buffering when generating the DBM.

las2dem -i 3_classify\futi*.laz ^
        -keep_class 2 6 ^
        -step 0.25 ^
        -odir 3_dbm -obil ^
        -cores 4

Bad. You get crappy results with edge artifacts clearly visible in the hillshade.

The hillshaded DBM of 9 tiles computed WITHOUT using buffers has severe edge artifacts acoss tile boundaries.

The hillshaded DBM of 9 tiles computed WITHOUT using buffers has severe edge artifacts acoss tile boundaries.

Bad. If you zoom in on a corner where 4 tiles meet you find missing pixels and incorrect elevation values. Bad. Bad. Bad. So please folks. Try this on your own data. Notice the horrible edge artifacts. Then always use buffers … (-:

PS: Usually no buffers are needed for running lasgrid, lasoverlap, or lascanopy as they perform simple binning operations that do not make use of neighbour information.