Scan/US GeocoderCL™ (Command Line)

Do you have a high volume of street addresses, coming from multiple sources, and want to attach latitude/longitude coordinates for each address, so you can map the locations or use them in geospatial analysis?

Street address to location translation

When you have a lot of street addresses, and need latitude/Longitude coordinates for each address, use the Scan/US GeocoderCL.

  • You have: A lot of street addresses Street, City, State, ZIP. You have a lot of these, in multiple files.
  • You want: Latitude Longitude coordinates
  • You are looking for: a high-volume geocoding component for a scripted automation solution
  • You need: The Scan/US GeocoderCL

What does this Geocoder do?

Scan/US GeocoderCL is a Windows command line application that adds latitude-longitude coordinates and selected geocodes to records in a file with addresses.

Geographic area codes retrieved for each address location include the Census block, blockgroup, tract and US county, the MSA area, The US Postal Service ZIP code area and the Scan/US Microgrid™.

Scan/US GeocoderCL: a Command-line Geocoder

The Scan/US GeocoderCL application adds latitude and longitude coordinates to records in a file containing addresses.

This is a ‘command line’ Windows application, which can be called using batch (.cmd or .bat), Powershell (.ps1), or other scripting methods.

If you do not require command-line processing, a better choice is the Scan/US Desktop geocoder. The main use of the command-line GeocoderCL is as a high-volume geocoding component of a scripted automation solution.

Inputs:

Input records in the file are assumed to contain fields specifying a street address. There may also be any number of other fields containing data relating to that address.

The address itself will typically include the street address, ZIP code, City, and State, for any of the 50 states or District of Columbia (Puerto Rico not supported). GeocoderCL accepts files in CSV (with comma- or tab-separated values), Excel .XLSX or .XLS, or the legacy DBF format.

Outputs:

The GeocoderCL will append latitude and longitude to each record, with a high success rate for well-specified actual addresses. Geographic area ID codes (GEOID’s) retrieved for each address location include Census blocks, blockgroups and tracts, US counties, MSA areas, USPS Zip code areas and Scan/US Microgrids. The output files are by default written to the same folder as the input file, but a different output folder can be specified.

Files of any number of records can be geocoded at a fast rate of throughput [see Note 1]. Records need not be in any particular order. (Some geocoder vendors may require input records to be sorted.)

Processing rate

The example described below in Note 1 is a file of six million records, and is geocoded in 231 seconds (just under 4 minutes), at a rate of 82 million records per hour. Thus, a file of 2.5 million records would be geocoded in roughly a minute and a half. The processor in the test log is an 8-core i7-9700K, a 3.6 GHz Intel processor launched in Q4 of 2018, a fairly solid high end desktop processor even now in the latter half of 2022.

Processing details

Processing of the input file is controlled by a ‘task parameters file’ or ‘parms’ file declared on the command line at the time of program invocation.

The task parameters file specifies the names of the fields that represent an address: street address, city, state and/or Zip code. An option is provided to compose the address from several of the input fields, concatenating fields such as street number, predirection (typically ’N’, ’S’, ‘E’, or ‘W’), street name, postdirection and street type, since input files containing street addresses are fairly often “deconstructed” into component fields of this type. The address composition function of the GeocoderCL obviates the need to re-edit the input file.

Fields other than those containing an address are normally copied into the output record. In the task parameters file, you may modify the list of these copied-through fields, giving you a way to filter them out either by selection or exclusion.

Geocoding results fields are appended to the output record automatically. These include geocoding results such as latitude/longitude coordinates, 2020 Census block code, Zip+4 code, ZIP code, geocoding quality codes showing how well the geocoder thinks it did on a particular field, or error messages when an address was not able to be found or was defective in some way.

You may, using the task parameters file, select other types of codes and data resulting from geocoding to be added to the output record, such as fields with the address and lastline updated to USPS standards, Census block group and tract code, FIPS county code, county name, MSA code, MSA name, or Scan/US Microgrid code.

Details on Usage

GeocoderCL is invoked on the command line as follows
C:\AppPath\GeocoderCL.exe _parms=taskparmsfile [ _in=infilepath ] [ optional-parameters]

where 'C:\AppPath\GeocoderCL.exe' is the full file path of the GeocoderCL application, 
 'taskparmsfile' is the file name of the task parameters file (with file extension .parms implied), 
 'infilepath' is the full file path of the input file with addresses, and 
 optional-parameters is one or more run-time parameters that are declared as arguments on the 
command line instead of being set as parameters in the .parms file.

The 'infilepath' argument is optional. When not specified on the command line, the task parameters file is expected to include an _in= reference to the address input file to be processed. The application can be also run directly from a shortcut on the desktop. In this mode, GeocoderCL launches an Open File dialog to browse for a suitable input file. The shortcut may specify as a target argument a particular .parms file to control the processing of an entire class of address input files; if not specified, the application will fetch task parameters from the GeocoderCL.parms file located in the “Documents\ScanUS\GeocoderCL” folder.

Details on Outputs

Outputs are written into the input folder by default and output files are named by taking the input file name with “_out” appended to the name. Outputs may be redirected to any other folder and renamed using an _out=outputpath parameter in the .parms file or supplied as a command line argument. Output file format conforms to that of the input, except that a .csv file results from processing a .dbf file.

Logging

Records with addresses that cannot be geocoded are written into an exceptions file named infilename_exc.csv in the output folder. Records with addresses that resolve to a 5-digit Zip code centroid can be treated as exceptions and written to the exceptions file by declaring a _zip5exc= parameter. Records that lack both a state code and a zip code or contain an invalid state code (e.g. PR) are written into a file named infilename_stzipexc.csv in the output folder.

Performance Notes

This performance note describes a geocoder run of a test file containing six million records, which runs in just under four minutes

[Note 1]

The GeocoderCL records per second processing rate primarily depends on the PC configuration and the size of input file. Overhead of loading the ACGX (address coding guide) files and other indexes results in lower reported throughput of files with small number of records. Log below reports the run of a file with 6.15M records on a PC with Intel(R) Core(™) i7-9700K CPU @ 3.60GHz and 64 GB RAM, with one SSD storing address input files and another SSD storing the index files.

Your results may vary depending on configuration.

Other questions

Q. Is it a latitude/longitude coordinate geocoder, or does it just attach Census codes.

A. Both. The Scan/US Geocoder attaches latitude-logitude coordinates, and GEOID codes for areas, based on a U.S. street address

Latitude/longitude are attached on output. If your file has ONLY latitude/longitude coordinates already (many cell phone spotting databases fall into this category) , and NOT street addresses, then you may need the Scan/US PointCoder instead of a geocoder. The pointcoder will take bare latitude and longitude, and attach GEOID codes for Census, ZIP, and Scan/US Microgrid cartography.

Q. Is an internet connection needed to access the Geocoder?

A. No, the Scan/US GeocoderCL command line geocoder is "on-premises" rather than "in the cloud." You do NOT need a cloud connection to run the GeocoderCL. But you only need internet access to download the initial install setups.

Q. What about Linux? No.

A. The Scan/US GeocoderCL, a Windows command line tool, requires Microsoft Windows, and runs from the Windows Command line, NOT the Windows Linux Subsystem.

Scan/US GeocoderCL ©2022 Scan/US, Inc. All rights reserved. Scan/US is a registered trademark of Scan/US, Inc. MicroGrid is a trademark of Scan/US, Inc. Scan/US GeocoderCL is a trademark of Scan/US, Inc. Other company or product names used herein are for identification purposes only and may be trademarks of their respective companies.