\n","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"Earthquakes are irregular and devastating natural hazards that affect many regions of the world. Seismology is the scientific, data-driven study of earthquakes."}]},{"type":"p","children":[{"type":"text","text":"The field aims to record earthquakes with seismometers and assemble catalogs of past earthquakes, quantifying their spatial distribution, temporal occurrence, and magnitude. With this information, it is possible to forecast the likelihood of future earthquakes, assess hazards, and inform impactful decisions."}]},{"type":"p","children":[{"type":"text","text":"However, there are some frustrations when working with seismic data. Firstly, there are many competing data formats used by different institutions. These different standards make it difficult to combine multiple sources of information."}]},{"type":"p","children":[{"type":"text","text":"Furthermore, many tools for handling seismic data use imperative languages, only supporting simple queries. Despite these difficulties, the information conveyed by seismic data tends to be well structured and hierarchical."}]},{"type":"p","children":[{"type":"text","text":"These frustrations and opportunities motivate a fresh, modern approach to seismic data, with a relational knowledge graph framework and "},{"type":"a","url":"https://docs.relational.ai/rel","title":null,"children":[{"type":"text","text":"RelationalAI's declarative querying language, Rel"}]},{"type":"text","text":"."}]},{"type":"h2","children":[{"type":"text","text":"Relational Knowledge Graphs"}]},{"type":"p","children":[{"type":"text","text":"To approach this problem, I used Rel to construct relational knowledge graphs for two types of seismic data:"}]},{"type":"ul","children":[{"type":"li","children":[{"type":"lic","children":[{"type":"text","text":"Seismometer metadata, and"}]}]},{"type":"li","children":[{"type":"lic","children":[{"type":"text","text":"Earthquake data."}]}]}]},{"type":"p","children":[{"type":"text","text":"Seismometer metadata contains information like the geographic locations of seismometers, and their temporal operational extents. On the other hand, earthquake data (or catalogs) record the locations and points in time where earthquakes have happened."}]},{"type":"p","children":[{"type":"text","text":"An element of particular interest is the intersection of these two concepts. If a seismometer is operational during an earthquake, it will record the variations in ground motion. This data is valuable for assessing future hazards, and hence is of great scientific interest."}]},{"type":"p","children":[{"type":"text","text":"To explore both seismometer and earthquake data, two knowledge graphs can be constructed, with the aim of querying either one, or both, to obtain useful information."}]},{"type":"mdxJsxFlowElement","name":"ImgFig","children":[{"type":"text","text":""}],"props":{"src":"/blog/knowledge-graphs-for-earthquake-data/event-time.png","alt":"Event time","caption":"Illustrating earthquake events that intersect with some operational extents of seismometers. Only seismometers A, B, and D were active during the earthquake, and would record valuable data.","width":"100%"}},{"type":"h2","children":[{"type":"text","text":"Seismic Station Metadata"}]},{"type":"p","children":[{"type":"text","text":"Seismometer metadata conveys information about the instrument’s position, start and end times, and their organizational structure. An example snippet of this metadata is shown below (collected from "},{"type":"a","url":"https://ds.iris.edu/ds/","title":null,"children":[{"type":"text","text":"IRIS"}]},{"type":"text","text":" as "},{"type":"a","url":"https://www.fdsn.org/xml/station/","title":null,"children":[{"type":"text","text":"“stationXML”"}]},{"type":"text","text":" files, converted into JSON)."}]},{"type":"code_block","lang":"javascript","value":"{\n \"Network\": [\n \"@code\": \"AV\",\n \"Description\": \"Alaska Volcano Observatory\",\n \"Station\": [\n {\n \"@code\": \"AMKA\",\n \"Latitude\": \"51.378682\",\n \"Longitude\": \"179.301832\",\n \"Channel\": [\n \t{\n \t\"@code\": \"BHE\",\n \t\"@locationCode\": \"00\",\n \t\"@startDate\": \"2005-10-14T00:00:00.0000\",\n \t\"@endDate\": \"2018-08-21T00:00:00.0000\",\n \t\"SampleRate\": \"5E01\",\n \t ... other attributes ...","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"The information is arranged hierarchically. The different types of entities include:"}]},{"type":"mdxJsxFlowElement","name":"ImgFig","children":[{"type":"text","text":""}],"props":{"src":"/blog/knowledge-graphs-for-earthquake-data/ormjs-station.png","alt":"Seismometer metadata","caption":"An ORM diagram for seismometer metadata.","width":"100%"}},{"type":"p","children":[{"type":"text","text":"One added complexity of the data is that the uniqueness of individual entities are governed by a "},{"type":"a","url":"http://docs.fdsn.org/projects/source-identifiers/en/v1.0/definition.html","title":null,"children":[{"type":"text","text":"prescribed set of rules"}]},{"type":"text","text":". These rules state that:"}]},{"type":"ul","children":[{"type":"li","children":[{"type":"lic","children":[{"type":"text","text":"Networks","bold":true},{"type":"text","text":" have unique names, or codes."}]}]},{"type":"li","children":[{"type":"lic","children":[{"type":"text","text":"Stations","bold":true},{"type":"text","text":" managed by the same "},{"type":"text","text":"network","bold":true},{"type":"text","text":" have unique codes."}]}]},{"type":"li","children":[{"type":"lic","children":[{"type":"text","text":"Different "},{"type":"text","text":"channel groups","bold":true},{"type":"text","text":" in the same "},{"type":"text","text":"station","bold":true},{"type":"text","text":" have unique location codes."}]}]},{"type":"li","children":[{"type":"lic","children":[{"type":"text","text":"Channels","bold":true},{"type":"text","text":" within the same "},{"type":"text","text":"channel","bold":true},{"type":"text","text":" group have unique codes."}]}]}]},{"type":"p","children":[{"type":"text","text":"These conditions can be translated into logic using Rel when constructing entities, and are validated using a "},{"type":"a","url":"https://docs.relational.ai/rel/concepts/integrity-constraints","title":null,"children":[{"type":"text","text":"built-in language feature: integrity constraints"}]},{"type":"text","text":"."}]},{"type":"p","children":[{"type":"text","text":"For example, the constraints on networks are written like this:"}]},{"type":"code_block","lang":"rel","value":"// Network manages Station\nic { seis:manages ⊆ (seis:Network, seis:Station) }\n\n// For every Station, Network manages Station\nic { total(seis:Station, transpose[seis:manages]) }\n\n// Network has NetworkCode\nic { seis:network_has_code ⊆ (seis:Network, seis:NetworkCode) }\n\n// For every Network, Network has NetworkCode\nic { total(seis:Network, seis:network_has_code) }","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"As a result, the data can be modeled easily, with clear, readable instructions!"}]},{"type":"h2","children":[{"type":"text","text":"Earthquake Event Data"}]},{"type":"p","children":[{"type":"text","text":"Earthquake data comes in the form of catalogs, which convey information about the event’s geographic position, time, and magnitude."}]},{"type":"p","children":[{"type":"text","text":"This data is usually presented in simple tabular forms, and although catalogs maintained by different institutions (or reviewers) usually have different schemas, they often contain equivalent information. As such, entities and attributes associated with this data can be modeled by an ORM diagram."}]},{"type":"p","children":[{"type":"text","text":"Again, Rel provides an easy way of parsing, importing, and forming entities, values, and relations."}]},{"type":"mdxJsxFlowElement","name":"ImgFig","children":[{"type":"text","text":""}],"props":{"src":"/blog/knowledge-graphs-for-earthquake-data/ormjs-event.png","alt":"Seismometer metadata","caption":"My ORM diagram for earthquake event data.","width":"100%"}},{"type":"h2","children":[{"type":"text","text":"Results and Querying"}]},{"type":"p","children":[{"type":"text","text":"Now, the seismic data can be visualized directly in the "},{"type":"a","url":"https://docs.relational.ai/rkgms/console","title":null,"children":[{"type":"text","text":"RAI Console"}]},{"type":"text","text":", using "},{"type":"a","url":"https://docs.relational.ai/rel/how-to/data-visualization-vegalite","title":null,"children":[{"type":"text","text":"Vega-Lite plotting tools"}]},{"type":"text","text":"."}]},{"type":"mdxJsxFlowElement","name":"ImgFig","children":[{"type":"text","text":""}],"props":{"src":"/blog/knowledge-graphs-for-earthquake-data/map.png","alt":"Seismic data map","caption":"A map showing a large selection of seismic data across the USA. Both the geographic positions of seismic stations (black triangles) and earthquakes (red dots) are shown.","width":"100%"}},{"type":"p","children":[{"type":"text","text":"With the seismometer and earthquake data imported into the "},{"type":"a","url":"https://docs.relational.ai/rkgms/why-rkgs","title":null,"children":[{"type":"text","text":"Relational Knowledge Graph System (RKGS)"}]},{"type":"text","text":", complex queries can be written in Rel."}]},{"type":"p","children":[{"type":"text","text":"For example, you can filter by the magnitude of the earthquake:"}]},{"type":"code_block","lang":"rel","value":"def magnitude_query[value_min,value_max](event) =\n\tseismic_sources:has_magnitude(event, magnitude) and\n\t(value_min < magnitude) and (magnitude < value_max)\n\tfrom magnitude\n\n// E.g.,\ndef output = magnitude_query[5.5,6.2]","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"Or, you could filter geographically:"}]},{"type":"code_block","lang":"rel","value":"def latlon_query[lat_min,lat_max,lon_min,lon_max](event) =\n\tseismic_sources:at_latitude(event, lat) and\n\tseismic_sources:at_longitude(event, lon) and\n\t(lat_min < lat) and (lat < lat_max) and\n\t(lon_min < lon) and (lon < lon_max)\n\tfrom lat, lon\n\n// E.g.,\ndef output = latlon_query[30,50,-90,-70]","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"Other attributes of the earthquake can be recalled, queried here by an event’s catalog ID:"}]},{"type":"code_block","lang":"rel","value":"def my_event_id = ^EventID[\"se60324281\"]\n\ndef all_attributes(attribute, value) =\n\tseismic_sources:event_has_id(event, my_event_id) and\n\tseismic_sources(attribute, event, value)\n\tfrom event, my_event_id\n\ndef output = all_attributes","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"This query returns the following output:"}]},{"type":"mdxJsxFlowElement","children":[{"type":"text","text":""}],"name":"table","props":{"align":[],"tableRows":[{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"Attribute"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"Value"}]}]}}]},{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":":at_longitude"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"-81.0865"}]}]}}]},{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":":has_magnitude"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"5.1"}]}]}}]},{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":":at_latitude"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"36.4743333"}]}]}}]},{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":":at_datetime"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"2020-08-09T12:07:37.650Z"}]}]}}]},{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":":event_has_id"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"se60324281"}]}]}}]},{"tableCells":[{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":":at_depth"}]}]}},{"value":{"type":"root","children":[{"type":"p","children":[{"type":"text","text":"4.14"}]}]}}]}]}},{"type":"p","children":[{"type":"text","text":"Now with the date-time of the earthquake, you can find the seismometers that were operationally active and recorded that event."}]},{"type":"code_block","lang":"rel","value":"@inline\ndef in_timerange[datetime,date_start,date_end] =\n\t(date_start < datetime) and (datetime < date_end)\n\ndef time_query[query_time](channel) =\n\tseis:starts_at(channel, operational_start) and\n\tseis:ends_at(channel, operational_end) and\n\tin_timerange[query_time,operational_start,operational_end]\n\tfrom operational_start, operational_end\n\n\ndef output = count[\n\ttime_query[all_attributes:at_datetime]\n]","children":[{"type":"text","text":""}]},{"type":"p","children":[{"type":"text","text":"This identifies and counts the seismometers that recorded valuable ground motion data, and allows for further scientific inquiry. Eventually, this information could be used quantitatively in hazard assessments, and inform future policy decisions."}]},{"type":"h2","children":[{"type":"text","text":"Using Rel and the RKGS"}]},{"type":"p","children":[{"type":"text","text":"Rel and the RKGS provide an excellent tool for investigating and analyzing seismic data."}]},{"type":"p","children":[{"type":"text","text":"This project illustrates an example of working with data that is distributed geographically and temporally. Rel was able to load, process, and query datasets of up to 500 000 seismic events, and many thousands of seismometers, in short lengths of time."}]},{"type":"p","children":[{"type":"text","text":"The RKGS also allowed for the integration of multiple data sources into a single, robust framework. Integrity constraints written in Rel enabled the application of the domain-specific logic for entity uniqueness."}]},{"type":"p","children":[{"type":"text","text":"Finally, Rel provided a flexible querying language for exploring the data. Complex and creative queries were possible, without having to worry about the underlying implementation."}]},{"type":"p","children":[{"type":"text","text":"Overall, Rel could be a useful tool for seismologists in industry and research. This work will be presented at the "},{"type":"a","url":"https://agu.confex.com/agu/fm22/meetingapp.cgi/Paper/1189915","title":null,"children":[{"type":"text","text":"American Geophysical Union Fall Meeting this December"}]},{"type":"text","text":". Come and stop by if you want to learn more!"}]}],"_content_source":{"queryId":"src/content/resources/knowledge-graphs-for-earthquake-data.mdx","path":["resource","body"]}},"_content_source":{"queryId":"src/content/resources/knowledge-graphs-for-earthquake-data.mdx","path":["resource"]}}}}};
globalThis.tina_info = tina;
})();
Knowledge Graphs for Earthquake Data · RelationalAI
Check out
highlights
of RelationalAI
at
Snowflake's Data Cloud Summit 2024!
Earthquakes are irregular and devastating natural hazards that affect many regions of the world. Seismology is the scientific, data-driven study of earthquakes.
The field aims to record earthquakes with seismometers and assemble catalogs of past earthquakes, quantifying their spatial distribution, temporal occurrence, and magnitude. With this information, it is possible to forecast the likelihood of future earthquakes, assess hazards, and inform impactful decisions.
However, there are some frustrations when working with seismic data. Firstly, there are many competing data formats used by different institutions. These different standards make it difficult to combine multiple sources of information.
Furthermore, many tools for handling seismic data use imperative languages, only supporting simple queries. Despite these difficulties, the information conveyed by seismic data tends to be well structured and hierarchical.
To approach this problem, I used Rel to construct relational knowledge graphs for two types of seismic data:
Seismometer metadata, and
Earthquake data.
Seismometer metadata contains information like the geographic locations of seismometers, and their temporal operational extents. On the other hand, earthquake data (or catalogs) record the locations and points in time where earthquakes have happened.
An element of particular interest is the intersection of these two concepts. If a seismometer is operational during an earthquake, it will record the variations in ground motion. This data is valuable for assessing future hazards, and hence is of great scientific interest.
To explore both seismometer and earthquake data, two knowledge graphs can be constructed, with the aim of querying either one, or both, to obtain useful information.
Illustrating earthquake events that intersect with some operational extents of seismometers. Only seismometers A, B, and D were active during the earthquake, and would record valuable data.
Seismic Station Metadata
Seismometer metadata conveys information about the instrument’s position, start and end times, and their organizational structure. An example snippet of this metadata is shown below (collected from IRIS as “stationXML” files, converted into JSON).
For example, the constraints on networks are written like this:
// Network manages Station
ic { seis:manages ⊆ (seis:Network, seis:Station) }
// For every Station, Network manages Station
ic { total(seis:Station, transpose[seis:manages]) }
// Network has NetworkCode
ic { seis:network_has_code ⊆ (seis:Network, seis:NetworkCode) }
// For every Network, Network has NetworkCode
ic { total(seis:Network, seis:network_has_code) }
As a result, the data can be modeled easily, with clear, readable instructions!
Earthquake Event Data
Earthquake data comes in the form of catalogs, which convey information about the event’s geographic position, time, and magnitude.
This data is usually presented in simple tabular forms, and although catalogs maintained by different institutions (or reviewers) usually have different schemas, they often contain equivalent information. As such, entities and attributes associated with this data can be modeled by an ORM diagram.
Again, Rel provides an easy way of parsing, importing, and forming entities, values, and relations.
A map showing a large selection of seismic data across the USA. Both the geographic positions of seismic stations (black triangles) and earthquakes (red dots) are shown.
For example, you can filter by the magnitude of the earthquake:
def magnitude_query[value_min,value_max](event) =
seismic_sources:has_magnitude(event, magnitude) and
(value_min < magnitude) and (magnitude < value_max)
from magnitude
// E.g.,
def output = magnitude_query[5.5,6.2]
Or, you could filter geographically:
def latlon_query[lat_min,lat_max,lon_min,lon_max](event) =
seismic_sources:at_latitude(event, lat) and
seismic_sources:at_longitude(event, lon) and
(lat_min < lat) and (lat < lat_max) and
(lon_min < lon) and (lon < lon_max)
from lat, lon
// E.g.,
def output = latlon_query[30,50,-90,-70]
Other attributes of the earthquake can be recalled, queried here by an event’s catalog ID:
Now with the date-time of the earthquake, you can find the seismometers that were operationally active and recorded that event.
@inline
def in_timerange[datetime,date_start,date_end] =
(date_start < datetime) and (datetime < date_end)
def time_query[query_time](channel) =
seis:starts_at(channel, operational_start) and
seis:ends_at(channel, operational_end) and
in_timerange[query_time,operational_start,operational_end]
from operational_start, operational_end
def output = count[
time_query[all_attributes:at_datetime]
]
This identifies and counts the seismometers that recorded valuable ground motion data, and allows for further scientific inquiry. Eventually, this information could be used quantitatively in hazard assessments, and inform future policy decisions.
Using Rel and the RKGS
Rel and the RKGS provide an excellent tool for investigating and analyzing seismic data.
This project illustrates an example of working with data that is distributed geographically and temporally. Rel was able to load, process, and query datasets of up to 500 000 seismic events, and many thousands of seismometers, in short lengths of time.
The RKGS also allowed for the integration of multiple data sources into a single, robust framework. Integrity constraints written in Rel enabled the application of the domain-specific logic for entity uniqueness.
Finally, Rel provided a flexible querying language for exploring the data. Complex and creative queries were possible, without having to worry about the underlying implementation.
Overall, Rel could be a useful tool for seismologists in industry and research. This work will be presented at the American Geophysical Union Fall Meeting this December. Come and stop by if you want to learn more!