Scraping the Web for Water Levels using PowerShell

I like to take my bike out to ride it on trails whenever I get the time. It’s fun to see how far I can go and a lot of times the scenery is always great. The problem is when it rains, the trail that I typically use sits next to a creek that can and will flood over if we get too much rain. Fortunately for me, there is a web site that the United States Geological Survey uses to monitor and report on the water levels of streams, rivers, etc… that I can bring up to see. The default display is a nice graph that shows the history for the last couple of days of the water level so you can see where it is at and also where it is trending toward now. The link that I use for my travels is this one: http://waterdata.usgs.gov/nwis/uv?site_no=06610795

image

Now, graphs are great and all, but I want to see something with actual data because to me,that is more interesting to see because I can then look at using PowerShell to scrape it from the web. Luckily, there are other options to display the data that happens to reside near the top of the web page for me that includes a table display with the raw data to look at.

image

 

image

This is just a quick snapshot of some of the data that is available to me. I can see the exact height of the water as well as its discharge rate. Now that I know where this resides, I can grab the link and see what I can do to use it.

image

The key pieces of data here are the code for the location that I want to pull data from which is available a couple of different ways (1)From the site that I pull the data from or (2) from the following link which provides you a map that you can then locate the nearest data point and find the code: http://maps.waterdata.usgs.gov/mapper/index.html.

image

My warning here is that the way the data is collected and displayed does not follow a consistent approach. So my way of parsing the data here will be hit or miss depending on the location that you are looking at. My recommendation is that you look at the table data and see if it matches up with what is available here…if it even has this type of data (I did find some areas that just flat out didn’t have anything useful).

I have a little function that works in the instance that I want to use it for in my area, so your use may vary. The first thing that I am going to do is set up my RegEx so it will grab all of the necessary data that I need. RegEx is pretty important in the world of web scraping because it can be difficult to pull the data unless you have some other tool to do so.

[OutputType('Web.USGS.Data')]
[cmdletbinding()]
Param (
    [string]$Location = '06610795', #Ft Crook Rd in Bellevue, NE
    [datetime]$StartDate = (Get-Date).AddDays(-1),
    [datetime]$EndDate = (Get-Date)
)

If ($PSBoundParameters.ContainsKey('Debug')) {
    $DebugPreference = 'Continue'
}
If ($StartDate.Date -eq (Get-Date).Date) {
    $StartDate = $StartDate.AddDays(-1)
}

$__StartDate = $StartDate.ToString('yyyy-MM-dd')
$__EndDate = $EndDate.ToString('yyyy-MM-dd')
$URI = "http://waterdata.usgs.gov/ne/nwis/uv?cb_00065=on&cb_00060=on&format=html&site_no=$($Location)&period=&begin_date=$($__StartDate)&end_date=$($__EndDate)"
$RegEx = [regex]"^(?<DateTime>(?:\d{2}/){2}\d{4}\s\d{2}:\d{2})\s(?<TimeZone>[a-zA-Z]{1,})(?<Height>\d{1,}\.\d{2})(?:A|P)\s{2}(?<Discharge>(?:\d{1,},)*\d{1,})(?:A|P)"

 

As you can see, I am looking for the DateTime, Height and Discharge from the web page. Note that I fill in the blanks on the web page url based on the location and the starting date and time. If I do not use a starting or ending datetime, I just use default values so that way the query will not throw errors. Once that is done, we get to send out the query using Invoke-WebRequest.

 

Try {
    $Data = Invoke-WebRequest -Uri $URI
}
Catch {
    Write-Warning $_
    BREAK
}

If ($Data.ParsedHtml.body.innertext -match 'Redirecting') {
    Write-Verbose "Requesting data older or longer than 120 days, performing redirection"
    $Data = Invoke-WebRequest -Uri $Data.links.href
}

 

The query will go out and bring back all of the web page data. Sometimes, depending on what I am sending it, it may have to redirect me to a different site to get the data. I want to be sure to recognize it and handle that change so I get the data that I need by looking at the ParsedHtml.Body.InnerText for the word ‘Redirecting’ and then look at the link provided.

From there, we will take the data and begin parsing each line to get what we needed.

$Title = ((@($Data.ParsedHtml.getElementsByTagName('Title'))[0].Text -replace '.*USGS(.*)','$1').Trim() -replace ',|\.') -replace ' ','-'
Write-Verbose "[$($Title)]"
@($Data.ParsedHtml.getElementsByTagName('Table'))[3].InnerText -split '\r\n' | ForEach {
    If ($_ -match $RegEx) {
        $Object = [pscustomobject]@{
            Location = $Title
            DateTime = [datetime]$Matches.DateTime
            Height_FT = [decimal]$Matches.Height
            Discharge = [decimal]$Matches.Discharge -replace ','
        }
        $Object.pstypenames.insert(0,'Web.USGS.Data')
        $Object
    }
    Else {
        Write-Debug "[$($_)] No match found!"
    }
}

The end result looks like this:

image

 

We now have our data in object form about the current water levels on a stream, in this case the stream where I happen to ride my bike by most days.

The function that I wrote to help me easily view this is available at https://github.com/proxb/PowerShell_Scripts/blob/master/Get-USGSWaterData.ps1

Using the map provided, you will get limited success in pulling water data depending on the data source that you use. Of course, you can update the code to make it work with other areas.

This entry was posted in powershell and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s