Title: | Matching Event Data by Location, Time and Type |
---|---|
Description: | Framework for merging and disambiguating event data based on spatiotemporal co-occurrence and secondary event characteristics. It can account for intrinsic "fuzziness" in the coding of events, varying event taxonomies and different geo-precision codes. |
Authors: | Karsten Donnay and Eric Dunford |
Maintainer: | Karsten Donnay <[email protected]> |
License: | LGPL-3 |
Version: | 0.4.3 |
Built: | 2025-02-21 04:45:55 UTC |
Source: | https://github.com/kdonnay/meltt |
meltt
is a framework for merging and disambiguating event data based on spatiotemporal co-occurrence and secondary event characteristics. It can account for intrinsic "fuzziness" in the coding of events, varying event taxonomies and different geo-precision codes.
The meltt
function iteratively matches multiple datasets by isolating proximate events based on a user-specified spatio-temporal window to determine co-occurrence. It then assesses potential matches by leveraging secondary event characteristics formalized as user-specified input taxonomies.
Karsten Donnay and Eric Dunford
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
meltt
, meltt_data
, meltt_duplicates
, meltt_inspect
, tplot
, mplot
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) plot(output) tplot(output, time_unit = 'days')
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) plot(output) tplot(output, time_unit = 'days')
This artificial dataset illustrates how meltt can be used to automatically integrate and disambiguate event data. It contains timing and location information about (simulated) car crashes for one month (Jan. 2012) in the state of Maryland, U.S., with information about the model, color, and type of accident.
data(crashMD)
data(crashMD)
A data.frame
containing observations.
Karsten Donnay and Eric Dunford.
Simulated data.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
This artificial dataset illustrates how meltt can be used to automatically integrate and disambiguate event data. It contains timing and location information about (simulated) car crashes for one month (Jan. 2012) in the state of Maryland, U.S., with information about the model, color, and type of accident.
data(crashMD)
data(crashMD)
A data.frame
containing observations.
Karsten Donnay and Eric Dunford.
Simulated data.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
This artificial dataset illustrates how meltt can be used to automatically integrate and disambiguate event data. It contains timing and location information about (simulated) car crashes for one month (Jan. 2012) in the state of Maryland, U.S., with information about the model, color and, and type of accident.
data(crashMD)
data(crashMD)
A data.frame
containing observations.
Karsten Donnay and Eric Dunford.
Simulated data.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
These taxonomies formalize how the information about model, color, and type of accident in our three artificial car crash datasets map onto one another.
data(crashMD)
data(crashMD)
A list of three data.frame
containing information about the different categories (specific to general) of models, colors and degree of damages coded in each dataset.
Karsten Donnay and Eric Dunford.
Simulated data.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
Function returns logical statement whether an object is of class meltt
.
is.meltt(object)
is.meltt(object)
object |
object to be tested. |
is.meltt
returns TRUE
or FALSE
depending on whether its argument is of type meltt
or not.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
data(crashMD) output = meltt(crash_data1,crash_data2,crash_data3, taxonomies = crash_taxonomies,twindow = 1,spatwindow = 3) is.meltt(output)
data(crashMD) output = meltt(crash_data1,crash_data2,crash_data3, taxonomies = crash_taxonomies,twindow = 1,spatwindow = 3) is.meltt(output)
meltt
merges and disambiguates event data based on spatiotemporal co-occurrence and secondary event characteristics. It can account for intrinsic "fuzziness" in the coding of events through the incorporation of user-specified taxonomies and adjusts for different degrees of geospatial and temporal precision by allowing for the specification of spatiotemporal "windows".
meltt(...,taxonomies, twindow, spatwindow, smartmatch = TRUE, certainty = NA, partial = 0, averaging = FALSE, weight = NA, silent = FALSE)
meltt(...,taxonomies, twindow, spatwindow, smartmatch = TRUE, certainty = NA, partial = 0, averaging = FALSE, weight = NA, silent = FALSE)
... |
input datasets. See Details. |
taxonomies |
list of user-specified taxonomies. Taxonomies map onto a specific variable in the input data that contains the same name as the input taxonomy. See Details. |
twindow |
specification of temporal window in unit days. See Details. |
spatwindow |
specification of a spatial window in kilometers. See Details. |
smartmatch |
implement matching using all available taxonomy levels. When false, matching will occur only on a specified taxonomy level. Default = TRUE. See Details. |
certainty |
specification of the the exact taxonomy level to match on when |
partial |
specifies whether matches along less than the full taxonomy dimensions are permitted. Default = 0. See Details. |
averaging |
implement averaging of all values events are match on when matching across multiple dataframes. Default = FALSE. See Details. |
weight |
specified weights for each taxonomy level to increase or decrease the importances of each taxonomy's contribution to the matching score. Default = NA. See Details. |
silent |
Boolean specifying whether or not messages are displayed. Default = FALSE. |
meltt
expects input datasets to be of class data.frame
. Minimally each data must have columns "date" (formatted as "YYYY-mm-dd" or "YYYY-mm-dd hh:mm:ss"), "longitude" and "latitude" (both in degree; we assume global coordinates formatted in WGS-84) and the columns representing the dimensions used in the matching taxonomies. Note that meltt
requires at least two datasets as input and can otherwise, in principle, handle any number of datasets.
The input taxonomies
is expected to be of class list
, which contain one or more taxonomy data frames. Each taxonomy must have a column denoting the "base.category" (i.e. the version of the variable that appears in each data frame) and a "data.source" column that matches the object name of the dataset containing those variables. All subsequent column in each taxonomy denote the user-specified levels of generalization, which capture the degree to which the taxonomy category generalizes out. The most left column must contain the most granular levels while the furthest right the broadest. Error will be issued if taxonomy levels are not in the correct order.
The twindow
and spatwindow
inputs specify the temporal and spatial dimensions for which entries are considered to be spatio-temporally proximate, and with that, potential matches (i.e. duplicate entries). For all potential matches, meltt
then leverages the secondary information about events (formalized through the mapping of categories specified in taxonomies
) to identify most likely matches.
meltt
by default uses smartmatch
, which leverages all taxonomy levels, i.e., establishes agreement on any taxonomy level while discounting inferior (i.e. more coarse) agreement using a matching score. When smartmatch
is set to false, a certainty
must be set, specifying which taxonomy level (i.e., 1 for the base level of the taxonomy, 2 for the next broader level etc.) two events must agree on to be considered a match.
partial
specifies the number of dimensions along which no matching information is permitted for events to still be considered a potential match. In this case, every dimension not matched is assigned the worst matching score in the calculation of the overall fit. By default, all dimensions are considered, i.e. partial=0. averaging
allows for users to take the average of all input information (date, longitude, latitude, taxonomy, etc.) when merging more than one dataset. When set to FALSE, events use the input information of the first or most left dataset in the order the data was received.
weight
allows to weigh matches for different taxonomies in order to discount one (or several) event dimensions compared to others or vice versa. If weight
=NA the package assumes homogeneous weights of 1. If weights are manually specified the must sum up to the total number of taxonomy dimensions used, i.e., the normalized overall weight always has to be 1. If not, the package returns an error.
Returns an object of class "meltt".
The functions summary
, print
, plot
overload the standard outputs for objects of type meltt
providing summary information and and visualizations specific to the output object. The generic accessor functions meltt_data
, meltt_duplicates
, tplot
, mplot
extract various useful features of the integrated data frame: the unique de-duplicated entries, all duplicate entries (or matches), a histogram of the temporal distribution and a map of the integrated output.
An object of class "meltt" is a list containing at least the following components. First, a list named "processed" that contains all outputs of the integration process:
complete_index |
a |
deduplicated_index |
a posterior |
event_matched |
Numeric matrix containing indices for each matching event from each input dataset. The leading data set is the furthest left, every matching event to its right is identified as a duplicate of the initial entry and is removed. |
event_contenders |
Numeric matrix containing indices for each "runner up" event from each input dataset that was identified as a potential but less optimal match based on its matching score. |
episode_matched |
Numeric matrix containing indices for each matching "episodes" (i.e. events that span more than one time unit with an end and start date) from each input dataset. Only contains matches between episodes. Matches between events and episodes must be manually reviewed by users (see |
episode_contenders |
Numeric matrix containing indices for each "runner up" episodes from each input dataset that was identified as a potential but less optimal match based on its matching score. |
Second, it contains a comprehensive summary of the input data, parameters and taxonomy specifications. Specifically it returns:
inputData |
List containing the original object name and information of the input data prior to integration. |
parameters |
List containing information on all input parameters on which the data was integrated. |
inputDataNames |
Vector of the object names of the input datasets. These names are carried through the integration process to differentiate between input datasets. The index keys contained in the numeric matrix representations of the data follow the order the data was entered. |
taxonomy |
List containing the taxonomy (secondary assumption criteria) datasets used to integrate the input data. The list contains: the names of the taxonomies (which must match the names of the variables they seek to generalize in the input data), an integer of the number of input taxonomies, a vector containing information on the depth (i.e. the number of columns) of each taxonomy, and a list of the original input taxonomies. |
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
meltt_data
, meltt_duplicates
, meltt_inspect
, tplot
, mplot
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) plot(output) # Extract De-duplicated events dataset = meltt_data(output) head(dataset)
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) plot(output) # Extract De-duplicated events dataset = meltt_data(output) head(dataset)
meltt_data
returns all unique, de-duplicated entries across all input datasets. Function provides a dataset where all overlapping, duplicate entries are removed, offering a version of the input data with no redundancies.
meltt_data(object, columns = NULL, return_all = FALSE)
meltt_data(object, columns = NULL, return_all = FALSE)
object |
object of class |
columns |
string vector referencing column names located in the input data. Default is to return all location, time stamp, and taxonomy columns the data was evaluated on. |
return_all |
Boolean specifying whether all columns in any of the original data should be returned. |
meltt_data
returns all unique entries along with specified columns. Function allows for easy extraction all de-duplicated entries.
Returns an data.frame
where the first columns contains the name of the original input data object from which the data entry was drawn, and a unique event ID. The subsequent columns are all columns specified in the columns
argument, or location, time stamp, and taxonomy columns the data was evaluated on columns = NULL
.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
meltt
, meltt_duplicates
, meltt_inspect
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) dataset = meltt_data(output, column = c("date", "longitude", "latitude")) head(dataset) # Return all original columns dataset = meltt_data(output, return_all = TRUE)
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) dataset = meltt_data(output, column = c("date", "longitude", "latitude")) head(dataset) # Return all original columns dataset = meltt_data(output, return_all = TRUE)
meltt_duplicates
returns all matching entries that are identified as matches during the integration process.
meltt_duplicates(object, columns = NULL)
meltt_duplicates(object, columns = NULL)
object |
object of class |
columns |
string vector referencing column names located in the input data. Default is to return all columns contained in the input data. |
meltt_duplicates
returns all duplicated entries along with specified columns to compare which entries matched. Function allows for easy extraction all entries identified as duplicates.
Returns an data.frame
where the first columns contain an index for the data.source and event for each data frame. The subsequent columns are all columns specified in the columns
argument, or all columns contained in the original input data if columns = NULL
.
An "event_type" column is added to the output data.frame
specifying if the match was between events or episode. See meltt_inspect
for handling flagged event-to-episode matches.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
meltt
, meltt_data
, meltt_inspect
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) duplicates = meltt_duplicates(output, column = c("date", "longitude", "latitude")) head(duplicates)
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) duplicates = meltt_duplicates(output, column = c("date", "longitude", "latitude")) head(duplicates)
meltt.inspect
returns all episode entries that were flagged to match to an event. Function provides a list containing each flagged event and episode to ease comparison and assessment. All flagged entries should be manually reviewed to determine the validity of the match.
If an flagged event-to-episode is determined to be a match, the duplicate can be removed by providing a Boolean vector to the confirmed_matches
argument. All TRUE
episodes will be removed as duplicates, retaining only the event entry.
meltt_inspect(object, columns = NULL, confirmed_matches = NULL)
meltt_inspect(object, columns = NULL, confirmed_matches = NULL)
object |
object of class |
columns |
string vector referencing column names located in the input data. Default is to return all location, time stamp, and taxonomy columns the data was evaluated on. |
confirmed_matches |
boolean vector specifying entries to be removed from deduplicated set. Function returns a |
meltt_inspect
returns all episode entries that were flagged to match to an event. Function provides a list containing each flagged event and episode for easy comparison. Matching event-to-episodes can be cleaned by specifying a boolean vector where TRUE
identifies that entry as a duplicate, otherwise FALSE
Returns a list
object where each entry in the list contains information on the event and the flagged episode for manual assessment of the match. The information by which the entries are evaluated are specified by the columns
argument. If columns = NULL
,location, time stamp, and taxonomy information is reported.
Events and episodes confirmed as duplicate entries can be removed by providing a boolean vector to the confirmed_matches
argument. A data.frame
of unique entries (similar to the output of meltt_data
) will be returned.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
meltt
, meltt_data
, meltt_duplicates
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) flagged = meltt_inspect(output) flagged retain = c(TRUE,TRUE,TRUE,TRUE,TRUE) dataset = meltt_inspect(output, confirmed_matches = retain) head(dataset)
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) flagged = meltt_inspect(output) flagged retain = c(TRUE,TRUE,TRUE,TRUE,TRUE) dataset = meltt_inspect(output, confirmed_matches = retain) head(dataset)
Function to efficiently sample a subset of integrated data to generate performance statistics.
meltt_validate(object, description.vars = NULL, sample_prop = .1, within_window = TRUE, spatial_window = NULL, temporal_window = NULL, reset = FALSE)
meltt_validate(object, description.vars = NULL, sample_prop = .1, within_window = TRUE, spatial_window = NULL, temporal_window = NULL, reset = FALSE)
object |
object of class |
description.vars |
String vector referencing column names located in the input data. These are the variables that will be folded into the description being validated; if none are provided, taxonomy levels are used by default. |
sample_prop |
Argument establishes the proportion of of the total matched pairs that are sampled from. The size of this sample is then used to determine the size of the control group (i.e. all entries not flagged as matches, which are paired with other unique entries and matches. These entries should not be matches). For example, if |
within_window |
Use the same spatio-temporal window used in the initial data integration to calculate what counts as a "proximate event" for all entries in the control group. |
spatial_window |
If |
temporal_window |
If |
reset |
If TRUE, the validation step will be reset and a new validation sample frame will be produced. Default = FALSE. |
meltt_validate
offers an efficient method of assessing the performance of meltt for a specific integration, by randomly sampling from a proportion of pairs of matching events flagged by the algorithm as the same event, and then sampling a "control group" of equal size from events that were identified as unique (offering both unique-unique and unique-match pairs). The function compiles the samples and then generates a shiny app to ease assessment. Once all entries in the sample have been assessed, the function then returns accuracy statistics in terms of a confusion matrix. Performance is determined by the difference in the qualitative assessment in comparison to the meltt integration.
Function automatically overwrites input "meltt" object; if validation set has been completely reviewed, then the function prints the performance statistics.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
data(crashMD) output <- meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) ## Not run: # app will activate to validate sample. meltt_validate(output) # for smaller sample, must reset to overwrite existing validation sample meltt_validate(output, sample_prop=.1, reset = TRUE) # override of the validation to get a sense of the report output$validation$validation_set$coding = 1 meltt_validate(output) ## End(Not run)
data(crashMD) output <- meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) ## Not run: # app will activate to validate sample. meltt_validate(output) # for smaller sample, must reset to overwrite existing validation sample meltt_validate(output, sample_prop=.1, reset = TRUE) # override of the validation to get a sense of the report output$validation$validation_set$coding = 1 meltt_validate(output) ## End(Not run)
Auxiliary function used within meltt
to index events as multiple datasets are processed.
meltt.disambiguate(data, match_output, indexing, priormatches, averaging)
meltt.disambiguate(data, match_output, indexing, priormatches, averaging)
data |
data to be disambiguated passed as |
match_output |
data.frame object of identified matches passed from |
indexing |
data.frame object passed from |
priormatches |
prior matches (if any) passed as |
averaging |
specification if common information among matches should be averaged. Passed from |
Auxilary function used within meltt
to index events as multiple datasets are processed. Function keeps track of matching and non-matching events as each subsequent data.frame is processed. Using the identified matches from the meltt.match
output, meltt.disambiguate
merges the matches in the data and indexes the match. indexing
is used for correct labeling of matches in case more than two datasets are merged. averaging
averages the common information between matching events. The parameter is specified within the main function meltt
.
meltt.disambiguate
returns a list containing two object: a data frame with all located matches paired and a new index, specifying the data frame as a single frame, and a running index of all matched events.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
Auxiliary function that receives the compilation matrix and systematically subsets events and episodes to deal with differences in event duration. The function passes subsets to meltt.match
to be processed. Output includes a full list of matching events and/or episodes.
meltt.episodal(data, indexing, priormatches, twindow, spatwindow, smartmatch, certainty, k, secondary, partial, averaging, weight, silent)
meltt.episodal(data, indexing, priormatches, twindow, spatwindow, smartmatch, certainty, k, secondary, partial, averaging, weight, silent)
data |
object of class data.frame. |
indexing |
list of indices given the entry location of events and episodes in the original input data. |
priormatches |
prior matches (if any) passed as |
twindow |
specification of temporal window in unit days. |
spatwindow |
specification of a spatial window in kilometers. |
smartmatch |
implement matching using all available taxonomy levels. When false, matching will occur only on a specified taxonomy level. Default = TRUE. |
certainty |
specification of the the exact taxonomy level to match on when |
k |
number of taxonomies passed from |
secondary |
vector of the number of taxonomy levels for each taxonomy passed from |
partial |
specifies whether matches along less than the full taxonomy dimensions are permitted. Passed from |
averaging |
implement averaging of all values events are match on when matching across multiple dataframes. Default = FALSE. |
weight |
relative weight of each taxonomy in the calculation of the matching score. Passed from |
silent |
Boolean specifying whether or not messages are displayed. Passed from |
Internal function that helps manage integration of event and episodal data by easing the transition between the two logics. meltt
algorithm tracks event-to-event matches, episode-to-episode, and event-to-episode matches. meltt.episodal
streamlines the transfer between these matching states.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
Auxiliary function that generates an R wrapper around the main python function used to process the numerical matrix generated in meltt
. Returns a summary of matched entries.
meltt.match(data, twindow, spatwindow, smartmatch, certainty, k, secondary, partial, weight, episodal, silent)
meltt.match(data, twindow, spatwindow, smartmatch, certainty, k, secondary, partial, weight, episodal, silent)
data |
numerical matrix passed from |
twindow |
specification of temporal window in unit days passed from |
spatwindow |
specification of a spatial window in kilometers passed from |
smartmatch |
implement matching using all available taxonomy levels. When FALSE, matching will occur only on a specified taxonomy level passed from |
certainty |
specification of the the exact taxonomy level to match on when |
k |
number of taxonomies passed from |
secondary |
vector of the number of taxonomy levels for each taxonomy passed from |
partial |
specifies whether matches along less than the full taxonomy dimensions are permitted. Passed from |
weight |
relative weight of each taxonomy in the calculation of the matching score. Passed from |
episodal |
boolean for wether normal or episodal matches are performed. Automatically determined and passed from |
silent |
Boolean specifying whether or not messages are displayed. Passed from |
Main auxiliary wrapper function that passes the processed data matrix from meltt
to the python code used to manage the matching procedure.
Returns a list object containing output of matching entries and a matrix of optimal selected matches.
meltt.match
requires the Python package NumPy to run. The package automatically checks whether NumPy is installed at runtime and returns an error if it is not.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
Auxiliary function that maps secondary taxonomies onto the input data and transforms the taxonomies into a numerical matrices.
meltt.taxonomy(data, taxonomies)
meltt.taxonomy(data, taxonomies)
data |
object of class data.frame. |
taxonomies |
object of class list, containing data.frames of input taxonomies for secondary matching criteria. |
meltt.taxonomy
maps the user-created taxonomies onto the input data, and converts the taxonomy to a numerical matrix. The taxonomies are used as secondary criteria in the matching procedure.
Returns a numerical matrix that contains all data indices, date/enddate, longitude/latitude, and taxonomies.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
mplot
provides an interactive javascript map to plot the spatial distribution of duplicate and unique entries in the integrated data.
mplot(object, matching = FALSE, jitter=.0001)
mplot(object, matching = FALSE, jitter=.0001)
object |
object of class |
matching |
if TRUE, plot only matching entries (i.e. duplicates and matches), else plot unique and matching entries. Default = |
jitter |
Numeric value to randomly offset longitude and latitude of points for plotting. Useful when points overlap. Default is a small jitter of .0001 degrees. |
mplot
generates a spatial map using javascript via the Leaflet
package. The map identifies unique and duplicative (i.e. entries with "matches") entries. The function provides a concise summary of the integration output across the spatial bounds of the geo-referenced input data. Plot renders in the users viewer pane (if using RStudio) or in browser. Images of the map can be saved using the export button.
Returns a javascript map, which renders in the user's viewer pane, of all unique event locations (or duplicate and matching entries if matching=
argument is true). Each unique event are denoted as orange circles, matching entries as blue circles, and duplicate entries as green entries.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1,spatwindow = 3) mplot(output)
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1,spatwindow = 3) mplot(output)
Overloads the default plot()
for objects of class meltt
.
## S3 method for class 'meltt' plot(x, ...)
## S3 method for class 'meltt' plot(x, ...)
x |
object of class |
... |
further arguments passed to or from other methods. |
Returns a bar plot outlining the proportion of events that are unique and duplicates from an object of class meltt
.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
data(crashMD) output = meltt(crash_data1,crash_data2,crash_data3, taxonomies = crash_taxonomies,twindow = 1,spatwindow = 3) plot(output)
data(crashMD) output = meltt(crash_data1,crash_data2,crash_data3, taxonomies = crash_taxonomies,twindow = 1,spatwindow = 3) plot(output)
Overloads the default print()
for objects of class meltt
.
## S3 method for class 'meltt' print(x, ...)
## S3 method for class 'meltt' print(x, ...)
x |
object of class |
... |
further arguments passed to or from other methods. |
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
Overloads the default summary()
for objects of class meltt
.
## S3 method for class 'meltt' summary(object, ...)
## S3 method for class 'meltt' summary(object, ...)
object |
object of class |
... |
further arguments passed to or from other methods. |
Prints a number of summary statistics regarding inputs (datasets, spatial and temporal windows, taxonomies) and observations (unique, matching, duplicate entries removed). It also prints and returns a data.frame
summarizing the overlap among datasets, i.e., how many entries in any one dataset match up to entries in one or more of the other.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
tplot
provides a histogram of integrated data that plots the temporal distribution of duplicate and unique entries
tplot(object, time_unit = "months", free_scale = TRUE)
tplot(object, time_unit = "months", free_scale = TRUE)
object |
object of class |
time_unit |
character specifying the temporal bin: "days", "weeks", "months", and "years". |
free_scale |
boolean if duplicates should be presented on a different scale from unique entries. A free scale makes it easier to assess the number of duplicate entries and from which input data they emerge, given that there can be relatively few at times. |
tplot
generates a temporal histogram that identifies unique entries after duplicates are removed and a reverse temporal histogram charting the distribution of duplicate entries. The function provides a concise summary of the integration output across the input time period presented in a relevant unit.
Returns a histogram plot where the y-axis is a frequency capturing the total number of events for that time period, and the x-axis is time.
Karsten Donnay and Eric Dunford.
Karsten Donnay, Eric T. Dunford, Erin C. McGrath, David Backer, David E. Cunningham. (2018). "Integrating Conflict Event Data." Journal of Conflict Resolution.
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) # Free scale tplot(output, time_unit = "days") # Relative scale tplot(output, time_unit = "days", free_scale = FALSE)
data(crashMD) output = meltt(crash_data1, crash_data2, crash_data3, taxonomies = crash_taxonomies, twindow = 1, spatwindow = 3) # Free scale tplot(output, time_unit = "days") # Relative scale tplot(output, time_unit = "days", free_scale = FALSE)