From: ILPI Support <info**At_Symbol_Here**ILPI.COM>
Subject: Re: [DCHAS-L] Chemical Safety headlines (11 articles)
Date: Sat, 1 Sep 2018 10:15:11 -0400
Reply-To: ACS Division of Chemical Health and Safety <DCHAS-L**At_Symbol_Here**PRINCETON.EDU>
Message-ID: A70D57B9-0101-48DB-B31C-B609936C7082**At_Symbol_Here**ilpi.com
In-Reply-To <57020F67732C4061AEDEE1D783B45B90**At_Symbol_Here**OwnerPC>


On Aug 31, 2018, at 11:14 AM, Bob Buntrock <buntrock16**At_Symbol_Here**ROADRUNNER.COM> wrote:

These news postings are informative except for the fact that too often the locality of the incident can't be identified except to "locals".  Most people know that El Cajon is in California, but is the Melbourne cited in Florida?  Australia?  Elsewhere?

-- Bob Buntrock
Orono, ME

If you look underneath the headline in the table of contents you will see tags listed there.  Those tags also appear on the Pinboard posts of the articles. The El Cajon story is tagged us_CA, and the Melbourne one is tagged Australia.

You're correct that it can be very difficult to make that determination once you've clicked on a link, which is why we tag it. Ralph used to do that completely manually, looking through the web site trying to determine a country/state of origin and it can be very tedious.  Which is why we automated the process a few years ago.

The location determination algorithm I wrote goes through a series of of processes to try and guess the location. Which is incredibly difficult given there are no standards for any of this.  Here are the comments from that code in order:

// Look in article text for up to 3 state names, culling out common street, river names etc.
// We will gather up to three states in this process...

///////////////// Byline search in the article text  /////////////////
// We will first examine the second word of the text to see if it is a state in a byline.
// We will also look after the first comma in first line (if any), just in case

// Expect to find US Postal Service abbreviation for state, but some papers use four letters
// while others use two letters and two periods.    And some don't use bylines at all. Why can't bylines be standardized??

///////////////// Look for country in article text  /////////////////
// We added a synonyms list to Countries so we can find "Canadian" etc.
// Need to null out certain items such as Jersey triggering for New Jersey

////////////// Examine URL for known foreign media or top level domains (TLD's)  //////////////

// Extract domain name

// List of known foreign news sites and their country; example www.thestar.com, Canada

BTW I added www.miragenews.com as an Australian news site today.

// And if still no country and no state, try the TLD of the domain name ; example .uk = United Kingdom

////////////// Get the state from this list of known US media sites  //////////////
// These are newspaper web sites (3505 currently in list)

// And these are TV station web sites- (1009 currently in list)

// And radio station web sites
// Pull the news site and state from it.

///// Here we look for large/common/unique city names to assign possible states/country  /////////  Examples, "Dalhousie,Canada" and "San Bernardino,us_CA"

BTW I added El Cajon to this list today

/////////////////////////// End of location routines/////////////////////////////////////////

// Assign state and country now.

// Byline is the most dependable.  If a state was found here, assign it the selection and make other stuff alternates.

// Next most dependable is to find a state in the text itself

// If that's no good, we rely on the country name found in the text itself

//  If we are still empty then go by the domain state

// If that's no good, then the domain country

// And our last-ditch effort, the city check
///////////////// End of creating location choices /////////////////////////////

This routine comes up with its best guess at locations and offers them for Ralph to approve when reviewing all of the other tags which are also generated by similar routines.  It's not perfect, but it works pretty good in saving him a fair amount of time in his human curation effort. But he still has to look through them once in a while, so appreciate the time he takes with this ongoing service, folks!

For more on this, please see Stuart R., Toreki, R, "Learning Opportunities in Three Years of Hazmat Headlines" J Chem Health & Safety, 2014,  21(2), 2-8 which is available at https://www.sciencedirect.com/science/article/pii/S1871553213006026

BTW, a team of us is currently looking into using similar algorithmic/heuristic approaches into retrieving chemical safety data for use in the undergraduate setting.

Best wishes,

Rob Toreki

 ======================================================
Safety Emporium - Lab & Safety Supplies featuring brand names
you know and trust.  Visit us at http://www.SafetyEmporium.com
esales**At_Symbol_Here**safetyemporium.com  or toll-free: (866) 326-5412
Fax: (856) 553-6154, PO Box 1003, Blackwood, NJ 08012




On Aug 31, 2018, at 11:14 AM, Bob Buntrock <buntrock16**At_Symbol_Here**ROADRUNNER.COM> wrote:

These news postings are informative except for the fact that too often the locality of the incident can't be identified except to "locals".  Most people know that El Cajon is in California, but is the Melbourne cited in Florida?  Australia?  Elsewhere?

-- Bob Buntrock
Orono, ME

-----Original Message----- From: DCHAS Membership Chair
Sent: Friday, August 31, 2018 7:46 AM
To: DCHAS-L**At_Symbol_Here**PRINCETON.EDU
Subject: [DCHAS-L] Chemical Safety headlines (11 articles)

Chemical Safety Headlines =46rom Google
Friday, August 31, 2018 at 7:46:13 AM

 A service of the ACS Division of Chemical Health and Safety
 Connecting Chemistry and Safety at http://www.dchas.org
 All article summaries and tags are archived at http://pinboard.in/u:dchas

Table of Contents (11 articles)

PEPSI BOTTLING COMPANY IN NORTHEAST PHILADELPHIA EVACUATED AFTER AMMONIA LEAK
Tags: us_PA, industrial, release, response, ammonia

UPDATE: TRUCK WITH POOL CHEMICALS CRASHES, CAUSING HAZMAT RESPONSE
Tags: us_CA, transportation, release, response, pool_chemicals

WATER RECLAMATION BUILDING'S ROOF COLLAPSES IN CHICAGO AFTER EXPLOSION; 10 INJURED
Tags: us_IL, industrial, explosion, injury, unknown_chemical

MERCURY LEAKS INSIDE U.S. POSTAL VEHICLE IN JAMES CITY CO. NEIGHBORHOOD
Tags: us_VA, transportation, release, injury, mercury

LAWSUIT ALLEGES CHEMICAL COMPANIES SHOULD PREPARE FOR UNPRECEDENTED STORMS
Tags: us_TX, industrial, follow-up, environmental

HISTORIC TOXIC CHEMICAL BAN PASSES CALIFORNIA LEGISLATURE
Tags: us_CA, public, discovery, environmental, toxics

OFFICIALS SAY CHEMICAL IN DITCH DID NOT REACH RIVER, NO PUBLIC SAFETY HAZARDS
Tags: us_ID, public, release, response, xylene

TOXIC CHEMICAL LEAKS FROM OLD EXTINGUISHER AT STATESVILLE GOODWILL
Tags: us_NC, public, release, response, other_chemical, fire_extinguisher

MAFB REMOVING POSSIBLY DANGEROUS CHEMICAL AFTER CHOTEAU PUBLIC S
Tags: us_MT, laboratory, discovery, response, waste

HYDROGEN TANK GAS LEAK, FLASH-OVER FIRE PROMPTS EVACUATIONS IN EL CAJON
Tags: us_CA, industrial, explosion, response, hydrogen

MELBOURNE FIRE BILLOWS TOXIC SMOKE
Tags: Australia, industrial, explosion, response, acetone



Previous post   |  Top of Page   |   Next post



The content of this page reflects the personal opinion(s) of the author(s) only, not the American Chemical Society, ILPI, Safety Emporium, or any other party. Use of any information on this page is at the reader's own risk. Unauthorized reproduction of these materials is prohibited. Send questions/comments about the archive to secretary@dchas.org.
The maintenance and hosting of the DCHAS-L archive is provided through the generous support of Safety Emporium.