Plots to Stands: Producing LiDAR Vegetation Metrics for Imputation Calculations

Some professionals in remote sensing find LAStools a useful tool to extract statistical metrics from LiDAR that are used to make estimations about a larger area of land from a small set of sample plots. Common applications are prediction of the timber volume or the above-ground biomass for entire forests based on a number of representative plots where exact measurements were obtained with field work. The same technique can also be used to make estimations about animal habitat or coconut yield or to classify the type of vegetation that covers the land. In this tutorial we describe the typical workflow for computing common metrics for smaller plots and larger areas using LAStools.

Download these six LiDAR tiles (1, 2, 3, 4, 5, 6) from a Eucalyptus plantation in Brazil to follow along the step by step instructions of this tutorial. This data is courtesy of Suzano Pulp and Paper. Please also download the two shapefiles that delineate the plots where field measurements were taken and the stands for which predictions are to be made. You should download version 170327 (or higher) of LAStools due to some recent bug fixes.

Quality Checking

Before processing newly received LiDAR data we always perform a quality check first. This ranges from visual inspection with lasview, to printing textual content reports and attribute histograms with lasinfo, to flight-line alignment checks with lasoverlap, pulse density and pulse spacing checks with lasgrid and las2dem, and completeness-of-returns check with lassort followed by lasreturn.

lasinfo -i tiles_raw\CODL0003-C0006.laz ^
        -odir quality -odix _info -otxt

The lasinfo report tells us that there is no projection information. However, we remember that this Brazilian data was in the common SIRGAS 2000 projection and try for a few likely UTM zones whether the hillshaded DSM produced by las2dem falls onto the right spot in Google Earth.

las2dem -i tiles_raw\CODL0003-C0006.laz ^
        -keep_first -thin_with_grid 1 ^
        -hillshade -epsg 31983 ^
        -o epsg_check.png

Hillshaded DSM and Google Earth imagery align for EPSG code 31983

The lasinfo report also tells us that the xyz coordinates are stored with millimeter resolution which is a bit of an overkill. For higher and faster LASzip compression we will later lower this to a more appropriate centimeter resolution. It further tells us that the returns are stored using point type 0 and that is a bit unfortunate. This (older) point type does not have a GPS time stamp so that some quality checks (e.g. “completeness of returns” with lasreturn) and operations (e.g. “resorting of returns into acquisition order” with lassort) will not be possible. Fortunately the min-max range of the ‘point source ID’ suggests that this point attribute is correctly populated with flightline numbers so that we can do a check for overlap and alignment of the different flightlines that contribute to the LiDAR in each tile.

lasoverlap -i tiles_raw\*.laz ^
           -min_diff 0.2 -max_diff 0.4 ^
           -epsg 31983 ^
           -odir quality -opng ^
           -cores 3

We run lasoverlap to visualize the amount of overlap between flightlines and the vertical differences between them. The produced images (see below) color code the number of flightlines and the maximum vertical difference between any two flightlines as seen below. Most of the area is cyan (2 flightlines) except in the bottom left where the pilot was sloppy and left some gaps in the yellow seams (3 flightlines) so that some spots are only blue (1 flightline). We also see that two tiles in the upper left are partly covered by a diagonal flightline. We will drop that flightline later to create a more uniform density.across the tiles. The mostly blue areas in the difference are mostly aligned with features in the landscape and less with the flightline pattern. Closer inspection shows that these vertical difference occur mainly in the dense old growth forests with species of different heights that are much harder to penetrate by the laser than the uniform and short-lived Eucalyptus plantation that is more of a “dead forest” with little undergrowth or animal habitat.

Interesting observation: The vertical difference of the lowest return from different flightlines computed per 2 meter by 2 meter grid cell could maybe be used a new forestry metric to help distinguish mono cultures from natural forests.

lasgrid -i tiles_raw\*.laz ^
        -keep_last ^
        -step 2 -point_density ^
        -false -set_min_max 10 20 ^
        -epsg 31983 ^
        -odir quality -odix _d_2m_10_20 -opng ^
        -cores 3

lasgrid -i tiles_raw\*.laz ^
        -keep_last ^
        -step 5 -point_density ^
        -false -set_min_max 10 20 ^
        -epsg 31983 ^
        -odir quality -odix _d_5m_10_20 -opng ^
        -cores 3

We run lasgrid to visualize the pulse density per 2 by 2 meter cell and per 5 by 5 meter cell. The produced images (see below) color code the number of last return per square meter. The impact of the tall Eucalyptus trees on the density per cell computation is evident for the smaller 2 meter cell size in form of a noisy blue/red diagonal in the top right as well as a noisy blue/red area in the bottom left. Both of those turn to a more consistent yellow for the density per cell computation with larger 5 meter cells. Immediately evident is the higher density (red) for those parts or the two tiles in the upper left that are covered by the additional diagonal flightline. The blueish area left to the center of the image suggests a consistently lower pulse density whose cause remains to be investigated: Lower reflectivity? Missing last returns? Cloud cover?

The lasinfo report suggests that the tiles are already classified. We could either use the ground classification provided by the vendor or re-classify the data ourselves (using lastilelasnoise, and lasground). We check the quality of the ground classification by visually inspecting a hillshaded DTM created with las2dem from the ground returns. We buffer the tiles on-the-fly for a seamless hillshade without artifacts along tile boundaries by adding ‘-buffered 25’ and ‘-use_orig_bb’ to the command-line. To speed up reading the 25 meter buffers from neighboring tiles we first create a spatial indexing with lasindex.

lasindex -i tiles_raw\*.laz ^
         -cores 3

las2dem -i tiles_raw\*.laz ^
        -buffered 25 ^
        -keep_class 2 -thin_with_grid 0.5 ^
        -use_orig_bb ^
        -hillshade -epsg 31983 ^
        -odir quality -odix _dtm -opng ^
        -cores 3

hillshaded DTM tiles generated with las2dem and on-the-fly buffering

The resulting hillshaded DTM shows a few minor issues in the ground classification but also a big bump (above the mouse cursor). Closer inspection of this area (you can cut it from the larger tile using the command-line below) shows that there is a group of miss-classified points about 1200 meters below the terrain. Hence, we will start from scratch to prepare the data for the extraction of forestry metrics.

las2las -i tiles_raw\CODL0004-C0006.laz ^
        -inside_tile 207900 7358350 100 ^
        -o bump.laz

lasview -i bump.laz

bump in hillshaded DTM caused by miss-classified low noise

Data Preparation

Using lastile we first tile the data into smaller 500 meter by 500 meter tiles with 25 meter buffer while flagging all points in the buffer as ‘withheld’. In the same step we lower the resolution to centimeter and put nicer a coordinate offset in the LAS header. We also remove the existing classification and classify all points that are much lower than the target terrain as class 7 (aka noise). We also add CRS information and give each tile the base name ‘suzana.laz’.

lastile -i tiles_raw\*.laz ^
        -rescale 0.01 0.01 0.01 ^
        -auto_reoffset ^
        -set_classification 0 ^
        -classify_z_below_as 500.0 7 ^
        -tile_size 500 ^
        -buffer 25 -flag_as_withheld ^
        -epsg 31983 ^
        -odir tiles_buffered -o suzana.laz

With lasnoise we mark the many isolated points that are high above or below the terrain as class 7 (aka noise). Using two tiles we played around with the ‘step’ parameters until we find good parameter settings. See the README of lasnoise for the exact meaning and the choice of parameters for noise classification. We look at one of the resulting tiles with lasview.

lasnoise -i tiles_buffered\*.laz ^
         -step_xy 4 -step_z 2 ^
         -odir tiles_denoised -olaz ^
         -cores 3

lasview -i tiles_denoised\suzana_206000_7357000.laz ^
        -color_by_classification ^
        -win 1024 192

noise points shown in pink: all points (top), only noise points (bottom)

Next we use lasground to classify the last returns into ground (2) and non-ground (1). It is important to ignore the noise points with classification 7 to avoid the kind of bump we saw in the vendor-delivered classification. We again check the quality of the computed ground classification by producing a hillshaded DTM with las2dem. Here the las2dem command-line is sightly different as instead of buffering on-the-fly we use the buffers stored with each tile.

lasground -i tiles_denoised\*.laz ^
          -ignore_class 7 ^
          -nature -extra_fine ^
          -odir tiles_ground -olaz ^
          -cores 3

las2dem -i tiles_ground\*.laz ^
        -keep_class 2 -thin_with_grid 0.5 ^
        -hillshade ^
        -use_tile_bb ^
        -odir quality -odix _dtm_new -opng ^
        -cores 3

Finally, with lasheight we compute how high each return is above the triangulated surface of all ground returns and store this height value in place of the elevation value into the z coordinate using the ‘-replace_z’ switch. This height-normalizes the LiDAR in the sense that all ground returns are set to an elevation of 0 while all other returns get an elevation relative to the ground. The result are height-normalized LiDAR tiles that are ready for producing the desired forestry metrics.

lasheight -i tiles_ground\*.laz ^
          -replace_z ^
          -odir tiles_normalized -olaz ^
          -cores 3
Metric Production

The tool for computing the metrics for the entire area as well as for the individual field plots is lascanopy. Which metrics are well suited for your particular imputation calculation is your job to determine. Maybe first compute a large number of them and then eliminate the redundant ones. Do not use any point from the tile buffers for these calculations. We had flagged them as ‘withheld’ during the lastile operation, so they are easy to drop. We also want to drop the noise points that were classified as 7. And we were planning to drop that additional diagonal flightline we noticed during quality checking. We generated two lasinfo reports with the ‘-histo point_source 1’ option for the two tiles it was covering. From the two histograms it was easy to see that the diagonal flightline has the number 31.

First we run lascanopy on the 11 plots that you can download here. When running on plots it makes sense to first create a spatial indexing with lasindex for faster querying so that only those tiny parts of the LAZ file need to be loaded that actually cover the plots.

lasindex -i tiles_normalized\*.laz ^
         -cores 3

lascanopy -i tiles_normalized\*.laz -merged ^
          -drop_withheld ^
          -drop_class 7 ^
          -drop_point_source 31 ^
          -lop WKS_PLOTS.shp ^
          -cover_cutoff 3.0 ^
          -cov -dns ^
          -height_cutoff 2.0 ^
          -c 2.0 999.0 ^
          -max -avg -std -kur ^
          -p 25 50 75 95 ^
          -b 30 50 80 ^
          -d 2.0 5.0 10.0 50.0 ^
          -o plots.csv

The resulting ‘plots.csv’ file you can easily process in other software packages. It contains one line for each polygonal plot listed in the shapefile that lists its bounding box followed by all the requested metrics. But is why is there a zero maximum height (marked in orange) for plots 6 though 10? All height metrics are computed solely from returns that are higher than the ‘height_cutoff’ that was set to 2 meters. We added the ‘-c 2.0 999.0’ absolute count metric to keep track of the number of returns used in these calculations. Apparently in plots 6 though 10 there was not a single return above 2 meters as the count (also marked in orange) is zero for all these plots. Turns out this Eucalyptus stand had recently been harvested and the new seedlings are still shorter than 2 meters.

more plots.csv
index,min_x,min_y,max_x,max_y,max,avg,std,kur,p25,p50,p75,p95,b30,b50,b80,c00,d00,d01,d02,cov,dns
0,206260.500,7358289.909,206283.068,7358312.477,11.23,6.22,1.91,2.26,4.71,6.01,7.67,9.5,26.3,59.7,94.2,5359,18.9,41.3,1.5,76.3,60.0
1,206422.500,7357972.909,206445.068,7357995.477,13.54,7.5,2.54,1.97,5.32,7.34,9.65,11.62,26.9,54.6,92.2,7030,12.3,36.6,13.3,77.0,61.0
2,206579.501,7358125.909,206602.068,7358148.477,12.22,5.72,2.15,2.5,4.11,5.32,7.26,9.76,46.0,73.7,97.4,4901,24.8,28.7,2.0,66.8,51.2
3,206578.500,7358452.910,206601.068,7358475.477,12.21,5.68,2.23,2.64,4.01,5.14,7.18,10.04,48.3,74.1,95.5,4861,25.7,26.2,2.9,68.0,50.2
4,206734.501,7358604.910,206757.068,7358627.478,15.98,10.3,2.18,2.64,8.85,10.46,11.9,13.65,3.3,27.0,91.0,4946,0.6,32.5,44.5,91.0,77.5
5,207043.501,7358761.910,207066.068,7358784.478,15.76,10.78,2.32,3.43,9.27,11.03,12.49,14.11,3.2,20.7,83.3,4819,1.5,24.7,51.0,91.1,76.8
6,207677.192,7359630.526,207699.760,7359653.094,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0
7,207519.291,7359372.366,207541.859,7359394.934,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0
8,207786.742,7359255.850,207809.309,7359278.417,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0
9,208159.017,7358997.344,208181.584,7359019.911,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0
10,208370.909,7358602.565,208393.477,7358625.133,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0

Then we run lascanopy on the entire area and produce one raster per tile for each metric. Here we remove the buffered points with the ‘-use_tile_bb’ switch that also ensures that the produced rasters have exactly the extend of the tiles without buffers. Again, it is imperative that you download the version 170327 (or higher) of LAStools for this to work correctly.

lascanopy -version
LAStools (by martin@rapidlasso.com) version 170327 (academic)

lascanopy -i tiles_normalized\*.laz ^
          -use_tile_bb ^
          -drop_class 7 ^
          -drop_point_source 31 ^
          -step 10 ^
          -cover_cutoff 3.0 ^
          -cov -dns ^
          -height_cutoff 2.0 ^
          -c 2.0 999.0 ^
          -max -avg -std -kur ^
          -p 25 50 75 95 ^
          -b 30 50 80 ^
          -d 2.0 5.0 10.0 50.0 ^
          -odir tile_metrics -oasc ^
          -cores 3

The resulting rasters in ASC format can easily be previewed using lasview for some “sanity checking” that our metrics make sense and to get a quick overview about what these metrics look like.

lasview -i tile_metrics\suzana_*max.asc
lasview -i tile_metrics\suzana_*p95.asc
lasview -i tile_metrics\suzana_*p50.asc
lasview -i tile_metrics\suzana_*p25.asc
lasview -i tile_metrics\suzana_*cov.asc
lasview -i tile_metrics\suzana_*d00.asc
lasview -i tile_metrics\suzana_*d01.asc
lasview -i tile_metrics\suzana_*b30.asc
lasview -i tile_metrics\suzana_*b80.asc

The maximum height rasters are useful to inspect more closely as they will immediately tell us if there was any high noise point that slipped through the cracks. And indeed it happened as we see a maximum of 388.55 meters for of the 10 by 10 meter cells. As we know the expected height of the trees we could have added a ‘-drop_z_above 70’ to the lascanopy command line. Careful, however, when computing forestry metrics in strongly sloped terrains as the terrain slope can significantly lift up returns to heights much higher than that of the tree. This is guaranteed to happen for LiDAR returns from branches that are extending horizontally far over the down-sloped part of the terrain as shown in this paper here.

We did not use the shapefile for the stands in this exercise. We could have clipped the normalized LiDAR points to these stands using lasclip as shown in the GUI below before generating the raster metrics. This would have saved space and computation time as many of the LiDAR points lie outside of the stands. However, it might be better to do that clipping step on the rasters in whichever GIS software or statistics package you are using for the imputation computation to properly account for partly covered raster cells along the stand boundary. This could be subject of another blog article … (-:

not all LiDAR was needed to compute metrics for

Leaked: “Classified LiDAR” of Pentagon in LAS 1.4 Format

LiDAR leaks have happened! Black helicopters are in the sky!  A few days ago a tiny tweet leaked the online location of “classified LiDAR” for Washington, DC. This LiDAR really is “classified” and includes an aerial scan of the Pentagon. For rogue scientists world-wide we offer a secret download link. It links to a file code-named ‘pentagon.laz‘ that contains the 8,044,789 “classified” returns of the Pentagon shown below. This “classified file” can be deciphered by any software with native LAZ support. It was encrypted with the “LAS 1.4 compatibility mode” of LASzip. The original LAS 1.4 content was encoded into a inconspicuous-looking LAZ file. New point attributes (such as the scanner channel) were hidden as “extra bytes” for fully lossless encryption. Use ‘laszip‘ to fully decode the original “classified” LAS 1.4 file … (-;

Seriously, a tiled LiDAR data set for the District of Columbia flown in 2015 is available for anyone to use on Amazon S3 with a very permissive open data license, namely the Creative Commons Attribution 3.0 License. The LiDAR coverage can be explored via this interactive map. The tiles are provided in LAS 1.4 format and use the new point type 6. We downloaded a few tiles near the White House, the Capitol, and the Pentagon to test the “native LAS 1.4 extension” of our LASzip compressor which will be released soon (a prototype for testing is already available). As these uncompressed LAS files are YUUUGE we use the command line utility ‘wget‘ for downloading. With option ‘-c’ the download continues where it left off in case the transfer gets interrupted.

LiDAR pulse density from 20 or less (blue) to 100 or more (red) pulses per square meter.

We use lasboundary to create labeled bounding boxes for display in Google Earth and lasgrid to a create false color visualization of pulse density with the command lines shown below. Pulse densities of 20 or below are mapped to blue. Pulse densities of 100 or above are mapped to red. We picked the min value 20 and the max value 100 for this false color mapping by running lasinfo with the ‘-cd’ option to compute an average pulse density and then refining the numbers experimentally. We also use lasoverlap to visualize how flightlines overlap and how well they align. Vertical differences of up to 20 cm are mapped to white and differences of 40 cm or more are mapped to saturated blue or red.

lasboundary -i *.las ^
            -use_bb ^
            -labels ^
            -odir quality -odix _bb -okml

lasgrid -i *.las ^
        -keep_last ^
        -point_density -step 2 ^
        -false -set_min_max 20 100 ^
        -odir quality -odix _d_20_100 -opng ^
        -cores 2

lasoverlap -i *.las ^
           -min_diff 0.2 -max_diff 0.4 ^
           -odir quality -opng ^
           -cores 2

The visualization of the pulse density and of the flightline overlap both show that there is no LiDAR for the White House or Capitol Hill. We will never know how tall the tomato and kale plants had grown in Michelle Obama’s organic garden on that day. Note that the White House and Capitol Hill were not simply “cut out”. Instead the flight plan of the survey plane was carefully designed to avoid these areas. Surprisingly, the Pentagon did not receive the same treatment and is (almost) fully included in the open LiDAR as mentioned in the dramatic first paragraph. Interesting is how the varying (tidal?) water level of the Potomac River shows up in the visualization of flightline miss-alignments.

There are a number of issues in these LiDAR files. The most serious ones are reported at the very end of this article. We will now scrutinize the partly-filled tile 2016.las close to the White House with only 11,060,334 returns. A lasvalidate check immediately reports three deviations from the LAS 1.4 specification:

lasvalidate -i 2016.las -o 2016_check.xml
  1. For proper LAS 1.4 files containing point type 6 through 10 all ‘legacy’ point counts in the LAS header should be set to 0. The following six fields in the LAS header should be zero for tile 2016.las (and all other tiles):
    + legacy number of point records
    + legacy number of points by return[0]
    + legacy number of points by return[1]
    + legacy number of points by return[2]
    + legacy number of points by return[3]
    + legacy number of points by return[4]
  2. There should not be any LiDAR return in a valid LAS file whose ‘number of returns of given pulse’ attribute is zero but there are 8 such points in tile 2016.las (and many more in various other tiles).
  3. There should not be any LiDAR return whose ‘return number’ attribute is larger than their ‘number of returns of given pulse’ attribute but there are 8 such points in tile 2016.las (and many more in various other tiles).

The first issue is trivial. There is an efficient in-place fix that does not require to rewrite the entire file using lasinfo with the following command line:

lasinfo -i 2016.las ^
        -nh -nv -nc ^
        -set_number_of_point_records 0 ^
        -set_number_of_points_by_return 0 0 0 0 0 ^

A quick check with las2txt shows us that the second and third issue are caused by the same eight points. Instead of writing an 8 for the ‘number of returns’ attribute the LAS file exporter must have written a 0 (marked in red for all eight returns) and instead of writing an 8 for the ‘return number’ attribute the LAS file exporter must have written a 1 (also marked in red). We can tell it from the true first return via its z coordinate (marked in blue) as the last return should be the lowest of all.

las2txt -i 2016.las ^
        -keep_number_of_returns 0 ^
        -parse xyzrnt ^
        -stdout
397372.70 136671.62 33.02 4 0 112813299.954811
397372.03 136671.64 28.50 5 0 112813299.954811
397371.28 136671.67 23.48 6 0 112813299.954811
397370.30 136671.68 16.86 7 0 112813299.954811
397369.65 136671.70 12.50 1 0 112813299.954811
397374.37 136671.58 44.17 3 0 112813299.954811
397375.46 136671.56 51.49 1 0 112813299.954811
397374.86 136671.57 47.45 2 0 112813299.954811

With las2las we can change the ‘number of returns’ from 0 to 8 using a ‘-filtered_transform’ as illustrated in the command line below. We suspect that higher number of returns such as 9 or 10 might have been mapped to 1 and 2. Fixing those as well as repairing the wrong return numbers will require a more complex tool. We would recommend to check all tiles with more scrutiny using the lasreturn tool. But wait … more return numbering issues are to come.

las2las -i 2016.las ^
        -keep_number_of_returns 0 ^
        -filtered_transform ^
        -set_extended_number_of_returns 8 ^
        -odix _fixed -olas

A closer look at the scan pattern reveals that the LiDAR survey was flown with a dual-beam system where two laser beams scan the terrain simultaneously. This is evident in the textual representation below as there are multiple “sets of returns” for the same GPS time stamp such as 112813952.110394. We group the returns from the two beams into an orange and a green group. Their coordinates show that the two laser beams point into different directions when they are simultaneously “shot” and therefore hit the terrain far apart from another.

las2txt -i 2016.las ^
        -keep_gps_time 112813952.110392 112813952.110396 ^
        -parse xyzlurntp ^
        -stdout
397271.40 136832.35 54.31 0 0 1 1 112813952.110394 117
397277.36 136793.35 38.68 0 1 1 4 112813952.110394 117
397277.35 136793.56 32.89 0 1 2 4 112813952.110394 117
397277.34 136793.88 24.13 0 1 3 4 112813952.110394 117
397277.32 136794.25 13.66 0 1 4 4 112813952.110394 117

The information about which point is from which beam is currently stored into the generic ‘user data’ attribute instead of into the dedicated ‘scanner channel’ attribute. This can be fixed with las2las as follows.

las2las -i 2016.las ^
        -copy_user_data_into_scanner_channel ^
        -set_user_data 0 ^
        -odix _fixed -olas

Unfortunately the LiDAR files have much more serious issues in the return numbering. It’s literally a “Total Disaster!” and “Sad!” as the US president will tweet shortly. After grouping all returns with the same GPS time stamp into an orange and a green group there is one more set of returns left unaccounted for.

las2txt -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -parse xyzlurntpi ^
        -stdout
397286.02 136790.60 45.90 0 0 1 4 112813951.416453 117 24
397286.06 136791.05 39.54 0 0 2 4 112813951.416453 117 35
397286.10 136791.51 33.34 0 0 3 4 112813951.416453 117 24
397286.18 136792.41 21.11 0 0 4 4 112813951.416453 117 0
397286.12 136791.75 30.07 0 0 1 1 112813951.416453 117 47
397291.74 136750.70 45.86 0 1 1 1 112813951.416453 117 105
las2txt -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -parse xyzlurntpi ^
        -stdout
397286.01 136790.06 45.84 0 0 1 4 112813951.408710 117 7
397286.05 136790.51 39.56 0 0 2 4 112813951.408710 117 15
397286.08 136790.96 33.33 0 0 3 4 112813951.408710 117 19
397286.18 136792.16 17.05 0 0 4 4 112813951.408710 117 0
397286.11 136791.20 30.03 0 0 1 2 112813951.408710 117 58
397286.14 136791.67 23.81 0 0 2 2 112813951.408710 117 42
397291.73 136750.16 45.88 0 1 1 1 112813951.408710 117 142

This can be visualized with lasview and the result is unmistakably clear: The return numbering is messed up. There should be one shot with five returns (not a group of four and a single return) in the first example. And there should be one shot with six returns (not a group of four and a group of two returns) in the second example. Such a broken return numbering results in extra first (or last) returns. These are serious issues that affect any algorithm that relies on the return numbering such as first-return DSM generation or canopy cover computation. Those extra returns will also make the pulse density appear higher and the pulse spacing appear tighter than they really are. The numbers from 20 (blue) to 100 (red) pulses per square meters in our earlier visualization are definitely inflated.

lasview -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -color_by_return

lasview -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -color_by_return

After all these troubles here something nice. Side-by-side a first-return TIN and a spike-free TIN (using a freeze of 0.8 m) of the center court cafe in the Pentagon. Especially given all these “fake first returns” in the Washington DC LiDAR we really need the spike-free algorithm to finally “Make a DSM great again!” … (-;

We would like to acknowledge the District of Columbia Office of the Chief Technology Officer (OCTO) for providing this data with a very permissive open data license, namely the Creative Commons Attribution 3.0 License.

 

LASmoons: Alen Berta

Alen Berta (recipient of three LASmoons)
Department of Terrestrial Ecosystems and Landscape, Faculty of Forestry
University of Zagreb and Oikon Ltd Institute for Applied Ecology, CROATIA

Background:
After becoming the EU member state, Croatia is obliged to fulfill the obligation risen from the Kyoto protocol: National Inventory Report (NIR) of the Green House Gasses according to UNFCCC. One of the most important things during the creation of the NIR is to know how many forested areas there are and their wood stock and increment. This is needed to calculate the size of the existing carbon pool and its potential for sequestration. Since in Croatia, according to legislative, it is not mandatory to calculate the wood stock and yield of the degraded forest areas (shrubbery and thickets) during the creation of the usual forest management plans, this data is missing. So far, only a rough approximation of the wood stock and increment is used during the creation of NIR. However, these areas are expanding every year due to depopulation of the rural areas and the cessation of traditional farming.

very diverse stand structure of degraded forest areas (shrubbery and thickets)

Goal:
This study will focus on two things: (1) Developing regression models for biomass volume estimation in continental shrubberies and thickets based on airborne LiDAR data. To correlate LiDAR data with biomass volume, over 70 field plots with a radius of 12 meters have been established in more than 550 ha of the hilly and lowland shrubberies in Central Croatia and all trees and shrubberies above 1 cm Diameter at Breast Height (DBH) were recorded with information about tree species, DBH and height. Precise locations of the field plots are measured with survey GNNS and biomass is calculated with parameters from literature. For regression modeling, various statistics from the point clouds matching the field plots will be used (i.e. height percentiles, standard deviation, skewness, kurtosis, …). 2) Testing the developed models for different laser pulse densities to find out if there is a significant deviation from results if the LiDAR point cloud is thinner. This will be helpful for planning of the later scanning for the change detection (increment or degradation).

Data:
+
641 square km of discrete returns LiDAR data around the City of Zagreb, the capitol of Croatia (but since it is highly populated area, only the outskirts of the area will be used)
+ raw geo-referenced LAS files with up to 3 returns and an average last return point density of 1 pts/m².

LAStools processing:
1)
extract area of interest [lasclip or las2las]
2) create differently dense versions (for goal no. 2) [lasthin]
3) remove isolated noise points [lasnoise]
4) classify point clouds into ground and non-ground [lasground]
5) create a Digital Terrain Model (DTM) [las2dem]
6) compute height of points above the ground [lasheight]
7) classify point clouds into vegetation and other [lasclassify]
8) normalize height of the vegetation points [lasheight]
9) extract the areas of the field plots [lasclip]
10) compute various metrics for each plot [lascanopy]
11) convert LAZ to TXT for regression modeling in R [las2txt]

Density and Spacing of LiDAR

Recently I worked with LiDAR from an Optech Gemini scanner in the Philippines and with LiDAR from a RIEGL Q680i scanner in Thailand. The two devices scan the terrain below with a very different pattern: the Optech uses an oscillating mirror producing zig-zag scan lines, whereas the RIEGL uses a rotating polygon producing parallel scan lines. In the following we investigate the point distribution in a small piece cut from each flightline.

First we compute an estimate for the average point density and the average point spacing with lasinfo:

D:\lastools\bin>lasinfo -i optech.laz ^
                        -nh -nv -nmm -cd
number of last returns: 2330671
covered area in square units/kilounits: 1280956/1.28
point density: all returns 2.25 last only 1.82 (per unit^2)
      spacing: all returns 0.67 last only 0.74 (in units)

The option ‘-nh’ asks lasinfo not to print the header (aka ‘no header’), the option ‘-nv’ means not to print the variable length records (aka ‘no vlr’), the option ‘-nmm’ supresses the output of minimum and maximum point values (aka ‘no min max’), and the option ‘-cd’ requests that an average point density is computed (aka ‘compute density’) from which the average point spacing is derived.

D:\lastools\bin>lasinfo -i riegl.laz ^
                        -nh -nv -nmm -cd
number of last returns: 3707217
covered area in square units/kilounits: 889816/0.89
point density: all returns 4.58 last only 4.17 (per unit^2)
      spacing: all returns 0.47 last only 0.49 (in units)

What we really want to know are pulse density and pulse spacing which we get by only counting one return from every pulse. Commonly one uses the last return, which is reported as ‘last only’ values by lasinfo. According to lasinfo the Optech and the RIEGL scan have an average pulse density of 1.82 and 4.17 [pulses per square meter] and an average pulse spacing of 0.74 and 0.49 [meters] respectively.

Single number averages do not capture anything about the actual distribution of the last returns in the swath of the scan. Let us compute density rasters with lasgrid. Because the density of the two scans is different we use 3 by 3 meter cells for the Optech, which should average 3 * 3 * 1.82 = 16.38 points per cell, and 2 by 2 meter cells for the RIEGL, which should average 2 * 2 * 4.17 = 16.68 points per cell.

lasgrid -i optech.laz -last_only ^
        -density -step 3 ^
        -odix _density3x3 -obil
lasview -i optech_density3x3.bil

lasgrid -i riegl.laz -last_only ^
        -density -step 2 ^
        -odix _density2x2 -obil
lasview -i riegl_density2x2.bil

A visual comparison of the two resulting density rasters in BIL format with lasview shows first differences in pulse distribution for the two scanners.

Using 'lasview' to inspect a BIL density raster: Elevation and colors show number of points per 3 by 3 meter cell for Optech (top) and per 2 by 2 meter cell for RIEGL (bottom).

Using ‘lasview’ to inspect a BIL density raster: Elevations and colors illustrate the number of last returns per 3 by 3 meter cell for Optech (top) and per 2 by 2 meter cell for RIEGL (bottom). The images use differently scaled color ramps.

It is noticable that the number of last returns per cell increases at the edges of the swath for the Optech whereas it decreases for the RIEGL. For the Optech it is higher due to the slowing down of the oscillating mirror and the reversing of scan direction when reaching either side of the scanline. This decreases the distances between pulses at the edges of the zig-zag scan and increases the number of last returns falling into cells there. For the rotating polygon scan of the RIEGL these pulse distances are more uniform. Here the numbers are lower because many cells along the edges of the swath are only partly covered by the scan and therefore receive fewer last returns.

lasgrid -i optech.laz -last_only ^
        -density -step 3 ^
        -false -set_min_max 0 20 ^
        -odix _density3x3 -opng

lasgrid -i riegl.laz -last_only ^
        -density -step 2 ^
        -false -set_min_max 0 20 ^
        -odix _density2x2 -opng

Here another visualization of the pulse distribution by letting lasgrid false-color density rasters to a fixed range of 0 to 20. This range is based on the near identical expected values of around 16.38 and 16.68 last returns per cell based on the average densities that lasinfo reported and the cell sized used here.

A more quantitative check of how well distributed the pulse are is to generate histograms. You can use lasinfo with the ‘-histo z 1’ option to print a histogram of the z values of the BIL rasters and import these statistics into your favorite software to create a chart.

lasinfo -i optech_density3x3.bil ^
        -nh -nv -nmm -histo z 1
lasinfo -i riegl_density2x2.bil ^
        -nh -nv -nmm -histo z 1

Both distributions have their peaks at the expected 16 to 17 last returns per cell although the Optech distribution has a significantly wider spead. The odd two-peak distribution of the Optech scan seems puzzling at first but looking at the density image shows that this is an aliasing artifact of putting a fixed 3 by 3 meter grid over zig-zagging scanlines.

The next experiments uses the new ‘-edge_shortest’ and ‘-edge_longest’ options available in las2dem that compute for each point the length of its shortest or longest edge in a Delaunay triangulation and then rasters these lengths. Below you see the command lines to do this and generate histograms of the rastered edge lengths.

las2dem -i optech.laz -last_only ^
        -step 1.5 ^
        -edge_longest ^
        -odix _edge_longest -obil
lasinfo -i optech_edge_longest.bil ^
        -nh -nv -nmm -histo z 0.1 ^
        -o optech_edge_shortest_0_10.txt
las2dem -i optech.laz -last_only ^
        -step 1.5 ^
        -edge_shortest ^
        -odix _edge_shortest -obil
lasinfo -i optech_edge_shortest.bil ^
        -nh -nv -nmm -histo z 0.05 ^
        -o optech_edge_shortest_0_05.txt
las2dem -i riegl.laz -last_only ^
        -edge_longest ^
        -odix _edge_longest -obil
lasinfo -i riegl_edge_longest.bil ^
        -nh -nv -nmm -histo z 0.05 ^
        -o riegl_edge_longest_0_05.txt
las2dem -i riegl.laz -last_only ^
        -edge_shortest ^
        -odix _edge_shortest -obil
lasinfo -i riegl_edge_shortest.bil ^
        -nh -nv -nmm -histo z 0.05 ^
        -o riegl_edge_shortest_0_05.txt

These resulting histograms do look quite different.

The more important histograms are those of longest edge lengths. They illustrate the observed spacing between pulses showing us the maximal distance of each pulse from its surrounding pulses. A perfect pulse distribution that samples the ground evenly in all directions would have one narrow peak. The Optech scan is far from this ideal with a wide and flat peak that tells us that pulses are spaced apart anywhere from 1.2 to 2.2 meters. Visualizing the scan pattern shows that the 2.2 meter spacings happen at the tips of zig-zag and the 1.2 meter spacings at nadir. The RIEGL scan has much narrower peak with most pulses spaced apart at most 60 to 80 centimeters.

The histograms of shortest edge lengths illustrate how close pulses are spaced. Again, it helps to visualize the zig-zag scan pattern to explain the odd distribution of shortest edge lengths in the Optech scan: as we get closer to the edges of the flightline the pulse spacing along the scanline becomes increasingly dense. This is represented on the left side in the histogram that is slowly leveling off to zero. Again, the RIEGL scan has a narrower peak with most pulses spaced no closer than 35 to 55 centimeters.

To confirm our findings we illustrate where the scanners produce longest or shortest edges using ranges that we find in the histograms above and create false-colored rasters:

las2dem -i optech.laz -last_only ^
        -step 1.5 -edge_longest ^
        -false -set_min_max 1 2.5 ^
        -odix _edge_longest -opng
las2dem -i optech.laz -last_only ^
        -step 1.5 -edge_shortest ^
        -false -set_min_max 0 0.7 ^
        -odix _edge_shortest -opng
las2dem -i riegl.laz -last_only ^
        -edge_longest ^
        -false -set_min_max 0.5 1.0 ^
        -odix _edge_longest -opng
las2dem -i riegl.laz -last_only ^
        -edge_shortest ^
        -false -set_min_max 0.1 0.7 ^
        -odix _edge_shortest -opng

The color-codings clearly illustrate that the Optech has a much wider pulse spacing on both edges of the flightline but also a much narrower pulse spacing. That reads contradictory but one are the “within-zig-zag” spacings and the other are the “between-zig-zag” spacings. The RIEGL has an overall much more even distribution in pulse spacings.

As a final confirmation to what causes the pulse spacings to be both widest and narrowest at the edges of the flightline for the Optech scan we scrutinizing the distribution of last returns and their triangulation visually.

Now it should be clear. At the edge of the flightline the Optech scanner has increasingly close-spaced pulses within each zig-zagging scan line but also increasingly distant-spaced pulses between subsequent pairs of zig-zagging scanlines. At the very edge of the flightline the scanner is almost sampling a “linear area” twice but leaves unsampled gaps that are twice as wide as in the center of the flightline. I believe this is one of the reasons why people “cut off” the edges of the flightlines based on the scan angle when using oscillating mirror systems. Now let’s took at the RIEGL.

There is no obvious difference in pulse distribution at the edges and the center of the flightline. The entire width of the swath seems more or less evenly sampled.

The above results show that “average point density” and “average point spacing” do not capture the whole picture. A quality check that properly verifies that a scan has the desired point / pulse density and spacing should measure the actual spacings – especially when operating a scanner with oscillating mirrors. We have done it here by computing for each last return its longest surrounding edge in the Delaunay triangulation, rasterizing these edge length, and then generating a histogram of the raster values.

 

Good follow-up reads are Dr. Ullrich’s “Impact of point distribution on information content of point clouds of airborne LiDAR” presented at ELMF 2013 and “Assessing the Information Content of LiDAR Point Clouds” from PhoWo 2013.