CyArk partners with Google, takes over “Don’t be Evil” Mantra, opens LiDAR Archive

One of our most popular (and controversial) blog articles was “Can You Copyright LiDAR“. It was written after we saw the then chief executive director at CyArk commenting “Sweeeet use of CyArk data” on an article describing the creation of a sugary fudge replica of Guatemala’s Tikal temple promoting a series of sugars by multinational agribusiness Tate & Lyle. Yet just a few months earlier our CEO’s university was instructed to take down his Web pages that – using the same data set – were demonstrating how to realize efficient 3D content delivery across the Web. CyArk told university administrators in an email that he was “[…] hosting unauthorized content from CyArk […]”. The full story is here.

Back then, the digital preservation strategy of CyArk was to keep their archaeological scans safe through their partnership with Iron Mountain. In the comment section of “Can You Copyright LiDAR” you can find several entries that are critical of this approach. But that was five years ago. Earlier this year and just after Google removed the “Don’t be Evil” mantra from their code of conduct, CyArk stepped up to take it over and completely changed their tight data control policies. Through their “Open Heritage initiative” CyArk released for the first time their raw LiDAR and imagery with an open license. Here in their own words:

In 2018, CyArk launched the Open Heritage initiative, a
collaboration with Google Arts and Culture to make available
our archive to a broader audience. This was the first time
CyArk has made available primary data sets, including lidar
scans, photogrammetric imagery and corresponding metadata
in a standardized format on a self-serve platform. We are
committed to opening up our archive further as we collect
new data and publishing existing projects where permissions
allow. The data is made available for education, research
and other non-commercial uses via a a Creative Commons
Attribution-NonCommercial 4.0 International License.

This is a HUGE change from the situation in 2013 that resulted in the deletion of our CEO’s Web pages. So we went to download Guatemala’s Tikal temple – the one that got him into trouble back then. It is provided as a single E57 file called ‘Tikal.e57’ with a size of 1074 MB that contains 35,551,759 points in 118 individual scan positions. Using the e572las.exe tool that is part of LAStools we converted this into a single LAZ file ‘Tikal.laz’ with a size of 164 MB.

C:\LAStools\bin>e572las -i c:\data\Tikal\Tikal.e57 ^
                        -o c:\data\Tikal\Tikal.laz

We were not able to find information about the Coordinate Reference System (CRS), but after looking at the coordinate bounding box (see lasinfo report at the end of the article) and the set of projections covering Guatemala, one can make an educated guess that it might be UTM 16 north. Generating a false-colored highest-return 0.5 meter raster with lasgrid and loading it into Google Earth quickly confirms that this is correct.

lasgrid -i c:\data\Tikal\Tikal.laz ^
        -step 0.5 ^
        -highest ^
        -false ^
        -utm 16north ^
        -odix _elev -opng

Now we can laspublish the file with the command line below to create an interactive 3D Web portal using Potree. Unlike five years ago we should now be permitted to create an online portal without the headaches of last time. The CC BY-NC 4.0 license allows to copy and redistribute the material in any medium or format.

laspublish -i c:\data\Tikal\Tikal.laz ^
           -rgb ^
           -utm 16north ^
           -o tikal.html ^
           -title "CyArk's LiDAR Scan of Tikal" ^
           -description "35,551,759 points from 118 individual scans (licensed CC BY-NC 4.0)" ^
           -odir C:\data\Tikal\Tikal -olaz ^
           -overwrite

Below are two screenshots of the online portal that we have just created including some quick distance measurements. This is amazing data. Wow!

Looking at “Templo del Gran Jaguar” from “La Gran Plaza” after taking two measurements.

Overlooking “La Gran Plaza” out of the upper opening of “Templo del Gran Jaguar” with “Templo del las Mascaras” in the back.

We congratulate CyArk to their new Open Heritage initiative and thank them for providing easy access to the Tikal temple LiDAR scans as open data with a useful Creative Commons Attribution-NonCommercial 4.0 International license. Thank you, CyArk, for your contribution to open data and open science. Kudos!

C:\LAStools\bin>lasinfo -i c:\data\Tikal\Tikal.laz
lasinfo (181119) report for 'c:\data\Tikal\Tikal.laz'
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-0000-000000000000
  version major.minor:        1.2
  system identifier:          'LAStools (c) by Martin Isenburg'
  generating software:        'e572las.exe (version 180919)'
  file creation day/year:     0/0
  header size:                227
  offset to point data:       227
  number var. length records: 0
  point data format:          2
  point data record length:   26
  number of point records:    35551759
  number of points by return: 35551759 0 0 0 0
  scale factor x y z:         0.001 0.001 0.001
  offset x y z:               220000 1900000 0
  min x y z:                  220854.951 1905881.781 291.967
  max x y z:                  221115.921 1906154.829 341.540
LASzip compression (version 3.2r4 c2 50000): POINT10 2 RGB12 2
reporting minimum and maximum for all LAS point record entries ...
  X              854951    1115921
  Y             5881781    6154829
  Z              291967     341540
  intensity       24832      44800
  return_number       1          1
  number_of_returns   1          1
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      0          0
  scan_angle_rank     0          0
  user_data           0          0
  point_source_ID     1        118
  Color R 0 65280
        G 0 65280
        B 0 65280
number of first returns:        35551759
number of intermediate returns: 0
number of last returns:         35551759
number of single returns:       35551759
overview over number of returns of given pulse: 35551759 0 0 0 0 0 0
histogram of classification of points:
        35551759  never classified (0)

City of Guadalajara creates first Open LiDAR Portal of Latin America

Small to medium sized LiDAR data sets can easily be published online for exploration and download with laspublish of LAStools, which is an easy-to-use wrapper around the powerful Potree open source software for which rapidlasso GmbH has been a major sponsor. During a workshop on LiDAR processing at CICESE in Ensenada, Mexico we learned that Guadalajara – the city with five “a” in its name – has recently published its LiDAR holdings online for download using an interactive 3D portal based on Potree.

There is a lot more data available in Mexico but only Guadalajara seems to have an interactive download portal at the moment with open LiDAR. Have a look at the map below to get an idea of the LiDAR holdings that are held in the archives of the Instituto Nacional de Estadística y Geografía (INEGI). You can request this data either by filling out this form or by sending an email to atencion.usuarios@inegi.org.mx. You will need to explain the use of the information, but apparently INEGI has a fast response time. I was given the KML files you see below and told that each letter in scale 1: 50,000 is divided into 6 regions (a-f) and each region subdivided into 4 parts. Contact me if you want the KML files or if you can provide further clarification on this indexing scheme and/or the data license.

LiDAR available at the Instituto Nacional de Estadística y Geografía (INEGI)

But back to Guadalajara’s open LiDAR. The tile names become visible when you zoom in closer on the map with the tiling overlay as seen below. An individual tile can easily be downloaded by first clicking so that it becomes highlighted and then pressing the “D” button in the lower left corner. We download the two tiles called ‘F08C04.laz’ and ‘F08C05.laz’ and use lasinfo to determine that their average density is 9.0 and 8.9 last returns per per square meter. This means on average 9 laser pulses were fired at each square meter in those two tiles.

lasinfo -i F08C04.laz -cd
lasinfo -i F08C05.laz -cd

Selecting a tile on the map and pressing the “D” button will download the highlighted tile.

The minimal quality check that we recommend doing for any newly obtained LiDAR data is to verify proper alignment of the flightlines using lasoverlap. For tiles with properly populated ‘point source ID’ fields this can be done using the command line shown below.

lasoverlap -i F08C04.laz F08C05.laz ^
           -min_diff 0.1 -max_diff 0.3 ^
           -odir quality -opng ^
           -cores 2

We notice some slight miss-alignments in the difference image (see other tutorials such as this one for how to interpret the resulting color images). We suggest you follow the steps done there to take a closer look at some of the larger strip-like areas that exhibit some systematic disscolorization (compared to other areas) into overly blueish or reddish tones of with lasview. Overlaying one of the resulting *_diff.png files in the GUI of LAStools makes it easy to pick a suspicious area.

We use the “pick” functionality to view only the building of interest.

Unusual are also the large red and blue areas where some of the taller buildings are. Usually those are just one pixel wide which has to do with the laser of one flightline not being able to see the lower area seen by the laser of the other flightline because the line-of-sight is blocked by the structure. We have a closer look at one of these unusual building colorization by picking the building shown above and viewing it with the different visualization options that are shown in the images below.

No. Those are not the “James Bond movie” kind of lasers that burn holes into the building to get ground returns through several floors. The building facade is covered with glass so that the lasers do not scatter photons when they hit the side of the building. Instead they reflect by the usual rule “incidence angle equals reflection angle” of perfectly specular surfaces and eventually hit the ground next to the building. Some of the photons travel back the same way to the receiver on the plane where they get registered as returns. The LiDAR system has no way to know that the photons did not travel the usual straight path. It only measures the time until the photons return and generates a return at the range corresponding to this time along the direction vector that this laser shot was fired at. If the specular reflection of the photons hits a truck or a tree situated next to to building, then we should find that truck or that tree – mirrored by the glossy surface of the building – on the inside of the building. If you look careful at the “slice” through the building below you may find an example … (-:

Some objects located outside the building are mirrored into the building due to its glossy facade.

Kudos to the City of Guadalajara for becoming – to my knowledge – the first city in Latin America to both open its entire LiDAR holdings and also making it available for download in form of a nice and functional interactive 3D portal.

Estonia leads in Open LiDAR: nationwide & multi-temporal Point Clouds now Online

At the beginning of July 2018 the Baltic country of Estonia – with an area of 45 thousand square kilometers inhabited by around 1.3 million people – opened much of their geospatial data archives and is now offering easy and free download of LiDAR point clouds nationwide via a portal of the Estonian Land Board. What is even more exciting is that multi-temporal data sets flown in different years and seasons are available. Raw LiDAR point clouds collected either during a “regular flight in spring” or during a “forestry flight in summer” can be obtained for multiple years. The 1 km by 1 km tile with map sheet index 377650, for example, is available for four different LiDAR surveys carried out in spring 2011, summer 2013, spring 2015 and summer 2017. This offers incredible potential for studying temporal changes of man-made or natural environments. The screenshot sequence below shows how to navigate to the download site starting from this page.

We found out about this open data release during our hands-on workshop on LiDAR and photogrammetry point cloud processing with LAStools that was part of the UAV remote sensing summer school in Tartu, Estonia. See our calendar for upcoming events or contact us for holding a similar event at your university, agency, company, or conference. 

The LiDAR data provided on the download portal is compressed with LASzip and provided as 1km by 1km LiDAR tiles in LAZ format. You can search for these tiles via their Estonian 1:2000 map sheet indices. To find out which map sheet index corresponds to the tile you are interested in you can overlay the maps sheet indices over an online map. However, you will need to zoom in before you can see the indices as illustrated in the screenshot sequence below. Here a zoom to the map sheet indices for the area that we visited during the social event of the summer school.

One thing we noticed is that the tiles contain only a single layer of points. The overlaps between flightlines were removed which results in a more uniform point density but strips the user of the possibility to do their own flightline alignment checks with lasoverlap. Below you see the spring 2014 acquisition for the tile with map sheet index 475861 colored by classification, elevation, return type, flightline ID and intensity.

The license for the open data of the Estonian Land Board is very permissive and can be found here. Agreeing to the licence gives the licence holder the rights to use data free of charge for an unspecified term, to good purpose in accordance with law and best practice. Licence holder may produce derivatives of data, combine data with its own products or services, use data for commercial or non-commercial purposes and redistribute data. The licence holder obliges to refer to the origin of data when publishing and redistributing data. The reference must include the name of the licensor, the title of data, the age of data (or the date of data extraction).

Scotland’s LiDAR goes Open Data (too)

Following the lead of England and Wales, the Scottish LiDAR is now also open data. The implementation of such an open geospatial policy in the United Kingdom was spear-headed by the Environment Agency of England who started to make all of their LiDAR holdings available as open data. In September 2015 they opened DTM and DSM raster derivatives down to 25 cm resolution and in March 2016 also the raw point clouds went online our compressed and open LAZ format (more info here) – all with the very permissible Open Government Licence v3. This treasure cove of geospatial data was collected by the Environment Agency Geomatics own survey aircraft mainly for flood mapping purposes. The data that had been access restricted for the past 17 years of operation and was made open only after it was shown that restricting access in order to recover costs to finance future operations – a common argument for withholding tax-payer funded data – was nothing but an utter myth. This open data policy has resulted in an incredible re-use of the LiDAR and the Environment Agency has literally been propelled into the role of a “champion for open data” inspiring Wales (possibly the German states of North-Rhine Westfalia and Thuringia) and now also Scotland to open up their geospatial archives as well …

Huge LAS files available for download from the Scottish Open Data portal.

We went to the nice online portal of Scotland to download three files from the Phase II LiDAR for Scotland that are provided as uncompressed LAS files, namely LAS_NN45NE.las, LAS_NN55NE.las, and LAS_NN55NW.las, whose sizes are listed as 1.2 GB, 2.8 GB, and 4.7 GB in the screenshot above. Needless to say that it took quite some time and several restarts (using wget with option ‘-c’) to successfully download these very large LAS files.

laszip -i LAS_NN45NE.las -odix _cm -olaz -rescale 0.01 0.01 0.01 
laszip -i LAS_NN45NE.las -odix _mm -olaz
laszip -i LAS_NN55NE.las -odix _cm -olaz -rescale 0.01 0.01 0.01 
laszip -i LAS_NN55NE.las -odix _mm -olaz
laszip -i LAS_NN55NW.las -odix _cm -olaz -rescale 0.01 0.01 0.01 
laszip -i LAS_NN55NW.las -odix _mm -olaz

After downloading we decided to see how well these files compress with LASzip by running the six commands shown above creating LAZ files when re-scaling of coordinate resolution to centimeter (cm) and LAZ files with the original millimeter (mm) coordinate resolution (i.e. the original scale factors are 0.001 which is somewhat excessive for aerial LiDAR where the error in position per coordinate is typically between 5 cm and 20 cm). Below you see the resulting file sizes for the three different files.

 1,164,141,247 LAS_NN45NE.las
   124,351,690 LAS_NN45NE_cm.laz (1 : 9.4)
   146,651,719 LAS_NN45NE_mm.laz (1 : 7.9)
 2,833,123,863 LAS_NN55NE.las
   396,521,115 LAS_NN55NE_cm.laz (1 : 7.1)
   474,767,495 LAS_NN55NE_mm.laz (1 : 6.0)
 4,664,782,671 LAS_NN55NW.las
   531,454,473 LAS_NN55NW_cm.laz (1 : 8.8)
   629,141,151 LAS_NN55NW_mm.laz (1 : 7.4)

The savings in download time and storage space of storing the LiDAR in LAZ versus LAS are sixfold to tenfold. If I was a tax payer in Scotland and if my government was hosting open data on in the Amazon cloud (i.e. paying for AWS cloud services with my taxes) I would encourage them to store their data in a more compressed format. Some more details on the data.

According to the provided meta data, the Scottish Public Sector LiDAR Phase II dataset was commissioned by the Scottish Government in response to the Flood Risk Management Act (2009). The project was managed by Sniffer and the contract was awarded to Fugro BKS. Airborne LiDAR data was collected for 66 sites (the dataset does not have full national coverage) totaling 3,516 km^2 between 29th November 2012 and 18th April 2014. The point density was a minimum of 1 point/sqm, and approximately 2 points/sqm on average. A DTM and DSM were produced from the point clouds, with 1m spatial resolution. The Coordinate reference system is OSGB 1936 / British National Grid (EPSG code 27700). The data is licensed under an Open Government Licence. However, under the use constraints section it now only states that the following attribution statement must be used to acknowledge the source of the information: “Copyright Scottish Government and SEPA (2014)” but also that Fugro retain the commercial copyright, which is somewhat disconcerting and may require more clarification. According to this tweet a lesser license (NCGL) applies to the raw LiDAR point clouds. Below a lasinfo report for the large LAS_NN55NW.las as well as several visualizations with lasview.

lasinfo (170915) report for LAS_NN55NW.las
reporting all LAS header entries:
 file signature: 'LASF'
 file source ID: 0
 global_encoding: 1
 project ID GUID data 1-4: 00000000-0000-0000-0000-000000000000
 version major.minor: 1.2
 system identifier: 'Riegl LMS-Q'
 generating software: 'Fugro LAS Processor'
 file creation day/year: 343/2016
 header size: 227
 offset to point data: 227
 number var. length records: 0
 point data format: 1
 point data record length: 28
 number of point records: 166599373
 number of points by return: 149685204 14102522 2531075 280572 0
 scale factor x y z: 0.001 0.001 0.001
 offset x y z: 250050 755050 270
 min x y z: 250000.000 755000.000 203.731
 max x y z: 254999.999 759999.999 491.901
reporting minimum and maximum for all LAS point record entries ...
 X -50000 4949999
 Y -50000 4949999
 Z -66269 221901
 intensity 39 2046
 return_number 1 4
 number_of_returns 1 4
 edge_of_flight_line 0 1
 scan_direction_flag 1 1
 classification 1 11
 scan_angle_rank -30 30
 user_data 0 3
 point_source_ID 66 91
 gps_time 38230669.389034 38402435.753789
number of first returns: 149685204
number of intermediate returns: 2813604
number of last returns: 149687616
number of single returns: 135599244
overview over number of returns of given pulse: 135599244 23122229 6754118 1123782 0 0 0
histogram of classification of points:
 287819 unclassified (1)
 109019874 ground (2)
 14476880 low vegetation (3)
 3487218 medium vegetation (4)
 39141518 high vegetation (5)
 165340 building (6)
 13508 rail (10)
 7216 road surface (11)

Kudos to the Scottish government for opening their data. We hereby acknowledge the source of the LiDAR that we have used in the experiments above as “Copyright Scottish Government and SEPA (2014)”.

Leaked: “Classified LiDAR” of Pentagon in LAS 1.4 Format

LiDAR leaks have happened! Black helicopters are in the sky!  A few days ago a tiny tweet leaked the online location of “classified LiDAR” for Washington, DC. This LiDAR really is “classified” and includes an aerial scan of the Pentagon. For rogue scientists world-wide we offer a secret download link. It links to a file code-named ‘pentagon.laz‘ that contains the 8,044,789 “classified” returns of the Pentagon shown below. This “classified file” can be deciphered by any software with native LAZ support. It was encrypted with the “LAS 1.4 compatibility mode” of LASzip. The original LAS 1.4 content was encoded into a inconspicuous-looking LAZ file. New point attributes (such as the scanner channel) were hidden as “extra bytes” for fully lossless encryption. Use ‘laszip‘ to fully decode the original “classified” LAS 1.4 file … (-;

Seriously, a tiled LiDAR data set for the District of Columbia flown in 2015 is available for anyone to use on Amazon S3 with a very permissive open data license, namely the Creative Commons Attribution 3.0 License. The LiDAR coverage can be explored via this interactive map. The tiles are provided in LAS 1.4 format and use the new point type 6. We downloaded a few tiles near the White House, the Capitol, and the Pentagon to test the “native LAS 1.4 extension” of our LASzip compressor which will be released soon (a prototype for testing is already available). As these uncompressed LAS files are YUUUGE we use the command line utility ‘wget‘ for downloading. With option ‘-c’ the download continues where it left off in case the transfer gets interrupted.

LiDAR pulse density from 20 or less (blue) to 100 or more (red) pulses per square meter.

We use lasboundary to create labeled bounding boxes for display in Google Earth and lasgrid to a create false color visualization of pulse density with the command lines shown below. Pulse densities of 20 or below are mapped to blue. Pulse densities of 100 or above are mapped to red. We picked the min value 20 and the max value 100 for this false color mapping by running lasinfo with the ‘-cd’ option to compute an average pulse density and then refining the numbers experimentally. We also use lasoverlap to visualize how flightlines overlap and how well they align. Vertical differences of up to 20 cm are mapped to white and differences of 40 cm or more are mapped to saturated blue or red.

lasboundary -i *.las ^
            -use_bb ^
            -labels ^
            -odir quality -odix _bb -okml

lasgrid -i *.las ^
        -keep_last ^
        -point_density -step 2 ^
        -false -set_min_max 20 100 ^
        -odir quality -odix _d_20_100 -opng ^
        -cores 2

lasoverlap -i *.las ^
           -min_diff 0.2 -max_diff 0.4 ^
           -odir quality -opng ^
           -cores 2

The visualization of the pulse density and of the flightline overlap both show that there is no LiDAR for the White House or Capitol Hill. We will never know how tall the tomato and kale plants had grown in Michelle Obama’s organic garden on that day. Note that the White House and Capitol Hill were not simply “cut out”. Instead the flight plan of the survey plane was carefully designed to avoid these areas. Surprisingly, the Pentagon did not receive the same treatment and is (almost) fully included in the open LiDAR as mentioned in the dramatic first paragraph. Interesting is how the varying (tidal?) water level of the Potomac River shows up in the visualization of flightline miss-alignments.

There are a number of issues in these LiDAR files. The most serious ones are reported at the very end of this article. We will now scrutinize the partly-filled tile 2016.las close to the White House with only 11,060,334 returns. A lasvalidate check immediately reports three deviations from the LAS 1.4 specification:

lasvalidate -i 2016.las -o 2016_check.xml
  1. For proper LAS 1.4 files containing point type 6 through 10 all ‘legacy’ point counts in the LAS header should be set to 0. The following six fields in the LAS header should be zero for tile 2016.las (and all other tiles):
    + legacy number of point records
    + legacy number of points by return[0]
    + legacy number of points by return[1]
    + legacy number of points by return[2]
    + legacy number of points by return[3]
    + legacy number of points by return[4]
  2. There should not be any LiDAR return in a valid LAS file whose ‘number of returns of given pulse’ attribute is zero but there are 8 such points in tile 2016.las (and many more in various other tiles).
  3. There should not be any LiDAR return whose ‘return number’ attribute is larger than their ‘number of returns of given pulse’ attribute but there are 8 such points in tile 2016.las (and many more in various other tiles).

The first issue is trivial. There is an efficient in-place fix that does not require to rewrite the entire file using lasinfo with the following command line:

lasinfo -i 2016.las ^
        -nh -nv -nc ^
        -set_number_of_point_records 0 ^
        -set_number_of_points_by_return 0 0 0 0 0 ^

A quick check with las2txt shows us that the second and third issue are caused by the same eight points. Instead of writing an 8 for the ‘number of returns’ attribute the LAS file exporter must have written a 0 (marked in red for all eight returns) and instead of writing an 8 for the ‘return number’ attribute the LAS file exporter must have written a 1 (also marked in red). We can tell it from the true first return via its z coordinate (marked in blue) as the last return should be the lowest of all.

las2txt -i 2016.las ^
        -keep_number_of_returns 0 ^
        -parse xyzrnt ^
        -stdout
397372.70 136671.62 33.02 4 0 112813299.954811
397372.03 136671.64 28.50 5 0 112813299.954811
397371.28 136671.67 23.48 6 0 112813299.954811
397370.30 136671.68 16.86 7 0 112813299.954811
397369.65 136671.70 12.50 1 0 112813299.954811
397374.37 136671.58 44.17 3 0 112813299.954811
397375.46 136671.56 51.49 1 0 112813299.954811
397374.86 136671.57 47.45 2 0 112813299.954811

With las2las we can change the ‘number of returns’ from 0 to 8 using a ‘-filtered_transform’ as illustrated in the command line below. We suspect that higher number of returns such as 9 or 10 might have been mapped to 1 and 2. Fixing those as well as repairing the wrong return numbers will require a more complex tool. We would recommend to check all tiles with more scrutiny using the lasreturn tool. But wait … more return numbering issues are to come.

las2las -i 2016.las ^
        -keep_number_of_returns 0 ^
        -filtered_transform ^
        -set_extended_number_of_returns 8 ^
        -odix _fixed -olas

A closer look at the scan pattern reveals that the LiDAR survey was flown with a dual-beam system where two laser beams scan the terrain simultaneously. This is evident in the textual representation below as there are multiple “sets of returns” for the same GPS time stamp such as 112813952.110394. We group the returns from the two beams into an orange and a green group. Their coordinates show that the two laser beams point into different directions when they are simultaneously “shot” and therefore hit the terrain far apart from another.

las2txt -i 2016.las ^
        -keep_gps_time 112813952.110392 112813952.110396 ^
        -parse xyzlurntp ^
        -stdout
397271.40 136832.35 54.31 0 0 1 1 112813952.110394 117
397277.36 136793.35 38.68 0 1 1 4 112813952.110394 117
397277.35 136793.56 32.89 0 1 2 4 112813952.110394 117
397277.34 136793.88 24.13 0 1 3 4 112813952.110394 117
397277.32 136794.25 13.66 0 1 4 4 112813952.110394 117

The information about which point is from which beam is currently stored into the generic ‘user data’ attribute instead of into the dedicated ‘scanner channel’ attribute. This can be fixed with las2las as follows.

las2las -i 2016.las ^
        -copy_user_data_into_scanner_channel ^
        -set_user_data 0 ^
        -odix _fixed -olas

Unfortunately the LiDAR files have much more serious issues in the return numbering. It’s literally a “Total Disaster!” and “Sad!” as the US president will tweet shortly. After grouping all returns with the same GPS time stamp into an orange and a green group there is one more set of returns left unaccounted for.

las2txt -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -parse xyzlurntpi ^
        -stdout
397286.02 136790.60 45.90 0 0 1 4 112813951.416453 117 24
397286.06 136791.05 39.54 0 0 2 4 112813951.416453 117 35
397286.10 136791.51 33.34 0 0 3 4 112813951.416453 117 24
397286.18 136792.41 21.11 0 0 4 4 112813951.416453 117 0
397286.12 136791.75 30.07 0 0 1 1 112813951.416453 117 47
397291.74 136750.70 45.86 0 1 1 1 112813951.416453 117 105
las2txt -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -parse xyzlurntpi ^
        -stdout
397286.01 136790.06 45.84 0 0 1 4 112813951.408710 117 7
397286.05 136790.51 39.56 0 0 2 4 112813951.408710 117 15
397286.08 136790.96 33.33 0 0 3 4 112813951.408710 117 19
397286.18 136792.16 17.05 0 0 4 4 112813951.408710 117 0
397286.11 136791.20 30.03 0 0 1 2 112813951.408710 117 58
397286.14 136791.67 23.81 0 0 2 2 112813951.408710 117 42
397291.73 136750.16 45.88 0 1 1 1 112813951.408710 117 142

This can be visualized with lasview and the result is unmistakably clear: The return numbering is messed up. There should be one shot with five returns (not a group of four and a single return) in the first example. And there should be one shot with six returns (not a group of four and a group of two returns) in the second example. Such a broken return numbering results in extra first (or last) returns. These are serious issues that affect any algorithm that relies on the return numbering such as first-return DSM generation or canopy cover computation. Those extra returns will also make the pulse density appear higher and the pulse spacing appear tighter than they really are. The numbers from 20 (blue) to 100 (red) pulses per square meters in our earlier visualization are definitely inflated.

lasview -i 2016.las ^
        -keep_gps_time 112813951.416451 112813951.416455 ^
        -color_by_return

lasview -i 2016.las ^
        -keep_gps_time 112813951.408708 112813951.408712 ^
        -color_by_return

After all these troubles here something nice. Side-by-side a first-return TIN and a spike-free TIN (using a freeze of 0.8 m) of the center court cafe in the Pentagon. Especially given all these “fake first returns” in the Washington DC LiDAR we really need the spike-free algorithm to finally “Make a DSM great again!” … (-;

We would like to acknowledge the District of Columbia Office of the Chief Technology Officer (OCTO) for providing this data with a very permissive open data license, namely the Creative Commons Attribution 3.0 License.

 

Second German State Goes Open LiDAR

The floodgates of open geospatial data have opened in Germany. Days after reporting about the first state-wide release of open LiDAR, we are happy to follow up with a second wonderful open data story. The state of Thuringia (Thüringen) – also called the “green heart of Germany” – has also implemented an open geospatial data policy. This had already been announced in March 2016 but must have gone online just now. A reader of our last blog article pointed this out in the comments. And it’s not just LiDAR. You can download:

It all comes with the same permissible license as OpenNRW’s data. This is open data madness! Everything you could possibly hope for presented via a very functional download portal. Kudos to TLVermGeo (“Thüringisches Landesamt für Vermessung und Geoinformation”) for creating an open treasure cove of free-for-all geospatial data.

Let us have a look at the LiDAR. We use the interactive portal to zoom to an area of interest. With the recent rise of demagogues it cannot hurt to look at a stark reminder of where such demagoguery can lead. In his 1941 play “The Resistible Rise of Arturo Ui” – a satirical allegory on the rise of Adolf Hitler – Bertolt Brecht writes “… don’t rejoice too soon at your escape. The womb he crawled from is still going strong.”

We are downloading LiDAR data around the Buchenwald concentration camp. According to Wikipedia, it was established in July 1937 and was one of the largest on German soil. Today the remains of Buchenwald serve as a memorial and as a permanent exhibition and museum.

We download the 15 tiles surrounding the blue one: two on its left, two on its right and one corresponding row of five tiles above and below. Each of the 15 zipped archives contains a *.laz file and *.meta file. The *.laz file contains the LiDAR points and *.meta file contains the textual information below where “Lage” and “Höhe” refer to “horizontal” and “vertical”:

Datei: las_655_5653_1_th_2010-2013.laz
Erfassungsdatum: 2011-03
Erfassungsmethode: Airborne Laserscanning
Lasergebiet: Laser_04_2010
EPSG-Code Lage: 25832
EPSG-Code Höhe: 5783
Quasigeoid: GCG2005
Genauigkeit Lage: 0.12m
Genauigkeit Höhe: 0.04m
Urheber: (c) GDI-Th, Freistaat Thueringen, TLVermGeo

Next we will run a few quality checks on the 15 tiles by processing them with lasinfolasoverlap, lasgrid, and las2dem. We output all results into a folder named ‘quality’.

With lasinfo we create one text file per tile that summarizes its contents. The ‘-cd’ option computes the all return and last return density. The ‘-histo point_source 1’ option produces a histogram of point source IDs that are supposed to store which flight line each return came from. The ‘-odir’ and ‘-odix’ options specify the directory for the output and an appendix to the output file name. The ‘-cores 4’ option starts 4 processes in parallel, each working on a different tile.

lasinfo  -i las_*2010-2013.laz ^
         -cd ^
         -histo point_source 1 ^
         -odir quality -odix _info -otxt ^
         -cores 4

If you scrutinize the resulting text files you will find that the average last return density ranges from 6.29 to 8.13 and that the point source IDs 1 and 9999 seem to encode some special points. Likely those are synthetic points added to improve the derived rasters similar to the “ab”, “ag”, and “aw” files in the OpenNRW LiDAR. Odd is the lack of intermediate returns despite return numbers ranging all the way up to 7. Looks like only the first returns and the last returns are made available (like for the OpenNRW LiDAR). That will make those a bit sad who were planning to use this LiDAR for forest or vegetation mapping. The header of the *.laz files does not store geo-referencing information, so we will have to enter that manually. And the classification codes do not follow the standard ASPRS assignment. In red is our (currently) best guess what these classification codes mean:

[...]
histogram of classification of points:
 887223 ground (2) ground
 305319 wire guard (13) building
 172 tower (15) bridges
 41 wire connector (16) synthetic ground under bridges
 12286 bridge deck (17) synthetic ground under building
 166 Reserved for ... (18) synthetic ground building edge
 5642801 Reserved for ... (20) non-ground
[...]

With lasoverlap we can visualize how much overlap the flight lines have and the (potential miss-)alignment between them. We drop the synthetic points with point source IDs 1 and 9999 and add geo-referencing information with ‘-epsg 25832’ so that the resulting images can be displayed as Google Earth overlays. The options ‘-min_diff 0.1’ and ‘-max_diff 0.4’ map elevation differences of up +/- 10 cm to white. Above +/- 10 cm the color becomes increasingly red/blue with full saturation at +/- 40 cm or higher. This difference can only be computed for pixels with two or more overlapping flight lines.

lasoverlap  -i las_*2010-2013.laz ^
            -drop_point_source 1 ^
            -drop_point_source 9999 ^
            -min_diff 0.1 -max_diff 0.4 ^
            -odir quality -opng ^
            -epsg 25832 ^
            -cores 4

With lasgrid we check the density distribution of the laser pulses by computing the point density of the last returns for each 2 by 2 meter pixel and then mapping the computed density value to a false color that is blue for a density of 0 and red for a density of 10 or higher.

lasgrid  -i las_*2010-2013.laz ^
         -drop_point_source 1 ^
         -drop_point_source 9999 ^
         -keep_last ^
         -step 2 -point_density ^
         -false -set_min_max 0 10 ^
         -odir quality -odix _d_0_10 -opng ^
         -epsg 25832 ^
         -cores 4
Pulse density variation due to flight line overlap and flight turbulence.

Pulse density variation due to flight line overlap is expected. But also the contribution of flight turbulence is quite significant.

With las2dem we can check the quality of the already existing ground classification in the LiDAR by producing a hillshaded image of a DTM for visual inspection. Based on our initial guess on the classification codes (see above) we keep those synthetic points that improve the DTM (classification codes 16, 17, and 18) in addition to the ground points (classification code 2).

las2dem  -i las_*2010-2013.laz ^
         -keep_class 2 16 17 18 ^
         -step 1 ^
         -hillshade ^
         -odir quality -odix _shaded_dtm -opng ^
         -epsg 25832 ^
         -cores 4
Problems in the ground classification of LiDAR points are often visible in a hillshaded DTM raster.

Problems in the ground classification of LiDAR points are often visible in a hillshaded DTM.

Wow. We see a number of ground disturbances in the resulting hillshaded DTM. Some of them are expected because if you read up on the history of the Buchenwald concentration camp you will learn that in 1950 large parts of the camp were demolished. However, the laser finds the remnants of those barracks and buildings as clearly visible ground disturbances under the canopy of the dense forest that has grown there since. And then there are also these bumps that look like bomb craters. Are those from the American bombing raid on August 24, 1944?

We are still not entirely sure what those “bumps” arem but our initially assumption that all of those would have to be bomb craters from that fatal American bombing raid on August 24, 1944 seems to be wrong. Below is a close-up with lasview of the triangulated and shaded ground points from the lower right corner of tile ‘las_656_5654_1_th_2010-2013.laz’.

Close-up in lasview on the bumbs in the ground.

Close-up in lasview on the bumbs in the ground.

We are not sure if all the bumps we can see here are there for the same reason. But we found an old map and managed to overlay it on Google Earth. It suggest that at least the bigger bumps are not bomb craters. On the map they are labelled as “Erdfälle” which is German for “sink hole”.

We got a reminder on the danger of demagogues as well as a glimpse into conflict archaeology and geomorphology with this open LiDAR download and processing exercise. If you want to explore this area any further you can either download the LiDAR and download LAStools and process the data yourself or simply get our KML files here.

Acknowledgement: The LiDAR data of TLVermGeo comes with a very permissible license. It is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It only requires us to name the source. We need to cite the “geoportal-th.de (2017)” with the year of the download in brackets and should specify the Universal Resource Identification (URI). We have not found this yet and use this URL as a placeholder until we know the correct one. Done. So easy. Thank you, geoportal Thüringen … (-:

NRW Open LiDAR: Download, Compression, Viewing

UPDATE: (March 6th): Second part merging Bonn into proper LAS files

This is the first part of a series on how to process the newly released open LiDAR data for the entire state of North Rhine-Westphalia that was announced a few days ago. Again, kudos to OpenNRW for being the most progressive open data state in Germany. You can follow this tutorial after downloading the latest version of LAStools as well as a pair of DGM and DOM files for your area of interest from these two download pages.

We have downloaded the pair of DGM and DOM files for the Federal City of Bonn. Bonn is the former capital of Germany and was host to the FOSS4G 2016 conference. As both files are larger than 10 GB, we use the wget command line tool with option ‘-c’ that will restart where it left off in case the transmission gets interrupted.

The DGM file and the DOM file are zipped archives that contain the points in 1km by 1km tiles stored as x, y, z coordinates in ETRS89 / UTM 32 projection as simple ASCII text with centimeter resolution (i.e. two decimal digits).

>> more dgm1l-lpb_32360_5613_1_nw.xyz
360000.00 5613026.69 164.35
360000.00 5613057.67 164.20
360000.00 5613097.19 164.22
360000.00 5613117.89 164.08
360000.00 5613145.35 164.03
[...]

There is more than one tile for each square kilometer as the LiDAR points have been split into different files based on their classification and their return type. Furthermore there are also synthetic points that were used by the land survey department to replace certain LiDAR points in order to generate higher quality DTM and DSM raster products.

The zipped DGM archive is 10.5 GB in size and contains 956 *.xyz files totaling 43.5 GB after decompression. The zipped DOM archive is 11.5 GB in size and contains 244 *.xyz files totaling 47.8 GB. Repeatedly loading these 90 GB of text data and parsing these human-readable x, y, and z coordinates is inefficient with common LiDAR software. In the first step we convert the textual *.xyz files into binary *.laz files that can be stored, read and copied more efficiently. We do this with the open source LASzip compressor that is distributed with LAStools using these two command line calls:

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.xyz ^
       -epsg 25832 -vertical_dhhn92 ^
       -olaz ^
       -cores 2

The point coordinates are is in EPSG 5555, which is a compound datum of horizontal EPSG 25832 aka ETRS89 / UTM zone 32N and vertical EPSG 5783 aka the “Deutsches Haupthoehennetz 1992” or DHHN92. We add this information to each *.laz file during the LASzip compression process with the command line options ‘-epsg 25832’ and ‘-vertical_dhhn92’.

LASzip reduces the file size by a factor of 10. The 956 *.laz DGM files compress down to 4.3 GB from 43.5 GB for the original *.xyz files and the 244 *.laz DOM files compress down to 4.8 GB from 47.8 GB. From here on out we continue to work with the 9 GB of slim *.laz files. But before we delete the 90 GB of bulky *.xyz files we make sure that there are no file corruptions (e.g. disk full, truncated files, interrupted processes, bit flips, …) in the *.laz files.

laszip -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check
laszip -i dom1l_05314000_Bonn_EPSG5555_XYZ\*.laz -check

One advantage of having the LiDAR in an industry standard such as the LAS format (or its lossless compressed twin, the LAZ format) is that the header of the file stores the number of points per file, the bounding box, as well as the projection information that we have added. This allows us to very quickly load an overview for example, into lasview.

lasview -i dgm1l_05314000_Bonn_EPSG5555_XYZ\*.laz -GUI
The bounding boxes of the DGM files quickly display a preview of the data in the GUI when the files are in LAS or LAZ format.

The bounding boxes of the DGM files quickly give us an overview in the GUI when the files are in LAS or LAZ format.

Now we want to find a particular site in Bonn such as the World Conference Center Bonn where FOSS4G 2016 was held. Which tile is it in? We need some geospatial context to find it, for example, by creating an overview in form of KML files that we can load into Google Earth. We use the files from the DOM folder with “fp” in the name as points on buildings are mostly “first returns”. See what our previous blog post writes about the different file names or look at the second part of this series. We can create the KML files with lasboundary either via the GUI or in the command line.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -gui
Only the "fp" tiles from the DOM folder loaded the GUI into lasboundary.

Only the “fp” tiles from the DOM folder loaded the GUI into lasboundary.

lasboundary -i dom1l_05314000_Bonn_EPSG5555_XYZ\dom1l-fp*.laz ^
            -use_bb -labels -okml

We zoom in and find the World Conference Center Bonn and load the identified tile into lasview. Well, we did not expect this to happen, but what we see below will make this series of tutorials even more worthwhile. There is a lot of “high noise” in the particular tile we picked. We should have noticed the unusually high z range of 406.42 meters in the Google Earth pop-up. Is this high electromagnetic radiation interfering with the sensors? There are a number of high-tech government buildings with all kind of antennas nearby (such as the United Nations Bonn Campus the mouse cursor points at).

Significant amounts of high noise are in the first returns of the DOM tile we picked.

Significant amounts of high noise are in the first returns of the DOM tile we picked.

But the intended area of interest was found. You can see the iconic “triangulated” roof of the building that is across from the World Conference Center Bonn.

The World Conference Center Bonn is across from the building with the "triangulated" roof.

The World Conference Center Bonn is across from the building with the “triangulated” roof.

Please don’t think it is the responsibility of OpenNRW to remove the noise and provide cleaner data. The land survey has already processed this data into whatever products they needed and that is where their job ended. Any additional services – other than sharing the raw data – are not in their job description. We’ll take care of that … (-:

Acknowledgement: The LiDAR data of OpenNRW comes with a very permissible license. It is called “Datenlizenz Deutschland – Namensnennung – Version 2.0” or “dl-de/by-2-0” and allows data and derivative sharing as well as commercial use. It only requires us to name the source. We need to cite the “Land NRW (2017)” with the year of the download in brackets and specify the Universal Resource Identification (URI) for both the DOM and the DGM. Done. So easy. Thank you, OpenNRW … (-: