--- title: "An introduction to geoGraph" author: "Thibaut Jombart and Andrea Manica" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{geoGraph} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo=FALSE} # knitr::opts_chunk$set(fig.width = 7, fig.height = 6) options(digits = 4) ``` *geoGraph*: walking through the geographic space using graphs. ================================================= This document describes the *geoGraph* package for the R software. *geoGraph* aims at implementing graph approaches for geographic data. In *geoGraph*, a given geographic area is modelled by a fine regular grid, where each vertice has a set of spatial coordinates and a set of attributes, which can be for instance habitat descriptors, or the presence/abundance of a given species. 'Travelling' within the geographic area can then be easily modelled as moving between connected vertices. The cost of moving from one vertex to another can be defined according to attribute values, which allows for instance to define friction routes based on habitat. *geoGraph* harnesses the full power of graph algorithms implemented in R by the *graph* and *RBGL* (R Boost Graph Library) packages. In particular, RBGL is an interface between R and the comprehensive *Boost Graph Library* in C++, which provides fast and efficient implementations of a wide range of graph algorithms. Once we have defined frictions for an entire geographic area, we can easily, for instance, find the least costs path from one location to another, or find the most parsimonious way of connecting a set of locations. Interfacing spatial data and graphs can be a complicated task. The purpose of *geoGraph* is to provide tools to achieve and simplify this 'preliminary' step. This is achieved by defining new classes of objects which are essentially geo-referenced graphs with node attributes (`gGraph` objects), and interfaced spatial data (`gData` objects). In this vignette, we show how to install *geoGraph*, construct and handle `gGraph`/`gData` objects, and illustrate some basic features of graph algorithms. # First steps ## Installing the package All the following instructions should be entered from a new R session to avoid errors due to installing attached packages. *devtools* is also needed to install *geoGraph*: ```{r eval = FALSE} install.packages("devtools") ``` Then, to install *geoGraph*, simply type: ```{r eval = FALSE} library(devtools) install_github("thibautjombart/geoGraph") ``` Once installed, the package can be loaded using: ```{r load, message = FALSE} library("geoGraph") ``` If you have an error regarding missing packages, you may need to install manually the packages *graph* and *RBGL* from *Bioconductor*: ```{r eval = FALSE} install.packages("BiocManager") BiocManager::install(c("graph", "RBGL")) ``` And then attempt to reinstall *geoGraph* from GitHub. ## Data representation Data representation refers to the way a given type of data is handled by a computer program. Two types of objects are used in *geoGraph*: `gGraph`, and `gData` objects. Both objects are defined as formal (S4) classes and often have methods for similar generic function (e.g. `getNodes` is defined for both objects). Essentially, `gGraph` objects contain underlying layers of informations, including a spatial grid and possibly node attributes, and covering the area of interest. `gData` are sets of locations (like sampled sites, for instance) which have been interfaced to a `gGraph` object, to allow further manipulations such as finding paths on the grid between pairs of locations. ### gGraph objects The definition of the formal class `gGraph` can be obtained using: ```{r } getClass("gGraph") ``` and a new empty object can be obtained using the constructor: ```{r } new("gGraph") ``` The documentation `?gGraph` explains the basics about the object's content. In a nutshell, these objects are spatial grids with nodes and segments connecting neighbouring nodes, and additional informations on the nodes or on the graph itself. `coords` is a matrix of longitudes and latitudes of the nodes. `nodes.attr` is a data.frame storing attributes of the nodes, such as habitat descriptors; each row corresponds to a node of the grid, while each column corresponds to a variable. `meta` is a list containing miscellanous informations about the graph itself. There is no contraint applying to the components of the list, but some typical components such as `$costs` or `$colors` will be recognised by certain functions. For instance, you can specify plotting rules for representing a given node attribute by a given color by defining a component `$colors`. Similarly, you can associate costs to a given node attribute by defining a component `$costs`. An example of this can be found in already existing `gGraph` objects. For instance, `worldgraph.10k` is a graph of the world with approximately 10,000 nodes, and only on-land connectivity (\textit{i.e.` no travelling on the seas). ```{r } worldgraph.10k worldgraph.10k@meta ``` Lastly, the `graph` component is a `graphNEL` object, which is the standard class for graphs in the *graph* and *RBGL* packages. This object contains all information on the connections between nodes, and the weights (costs) of these connections. Four main `gGraph` are provided with *geoGraph*: `rawgraph.10k`, `rawgraph.40k`, `worldgraph.10k`, and `worldgraph.40k`. These datasets are available using the command `data`. The grid used in these datasets are the best geometric approximation of a regular grid for the surface of a sphere. One advantage of working with these grids is that we do not have to use a projection for geographic coordinates, which is a usual issue in regular GIS. The difference between rawgraphs and worldgraphs is that the first are entirely connected, while in the second connections occur only on land. Numbers `10k' and `40k' indicate that the grids consist of roughly 10,000 and 40,000 nodes. For illustrative purposes, we will often use the 10k grids, since they are less heavy to handle. For most large-scale applications, the 40k versions should provide sufficient resolution. New `gGraph` can be constructed using the constructor (`new(...)`), but this topic is not documented in this vignette. ### gData objects `gData` are essentially sets of locations that are interfaced with a `gGraph` object. During this operation, each location is assigned to the closest node on the grid of the `gGraph`, then allowing for travelling between locations using the grid. Then, it is for instance possible to find the shortest path between two locations through various types of habitats. Like for `gGraph`, the content of the formal class `gData` can be obtained using: ```{r } getClass("gData") ``` and a new empty object can be obtained using the constructor: ```{r } new("gData") ``` As before, the description of the content of these objects can be found in the documentation (`?gData`). `coords` is a matrix of xy (longitude/latitude) coordinates in which each row is a location. `nodes.id` is vector of characters giving the name of the vertices matching the locations; this is defined automatically when creating a new `gData`, or using the function `closestNode`. `data` is a slot storing data associated to the locations; it can be any type of object, but a data.frame should cover most requirements for storing data. Note that this object should be subsettable (i.e. the `[` operator should be defined), so that data can be subsetted when subsetting the `gData` object. Lastly, the slot `gGraph.name` contains the name of the `gGraph` object to which the `gData` has been interfaced. Contrary to `gGraph` objects, `gData` objects will frequently be constructed by the user. In the next sections, we illustrate how we can build and use `gData` objects from a set of locations. # Using *geoGraph* An overview of the material implemented in the package is summarized the package's manpage, accessible via: ```{r eval=FALSE} ?geoGraph ``` The html version of this manpage may be preferred to browse more easily the content of *geoGraph*; it is accessible by typing: ```{r eval=FALSE} help("geoGraph", package = "geoGraph", html = TRUE) ``` To revert help back to text mode, simply type: ```{r eval=FALSE} options(htmlhelp = FALSE) ``` In the following, we go through various tasks that can be achieve using *geoGraph*. ## Importing geographic data GeoGraphic data consist of a set of locations, possibly accompanied by additional information. For instance, one may want to study the migrations amongst a set of biological populations with known geographic coordinates. In *geoGraph*, geographic data are stored in `gData` objects. These objects match locations to the closest nodes on a grid (a `gGraph` object), and store additional data if needed. As a toy example, let us consider four locations: Bordeaux (France), London (UK), Malaga (Spain), and Zagreb (Croatia). Since we will be working with a crude grid (10,000 nodes), locations need not be exact. We enter the longitudes and latitudes (in this order, that is, xy coordinates) of these cities in decimal degrees, as well as approximate population sizes: ```{r cities} Bordeaux <- c(-1, 45) London <- c(0, 51) Malaga <- c(-4, 37) Zagreb <- c(16, 46) cities.dat <- rbind.data.frame(Bordeaux, London, Malaga, Zagreb) colnames(cities.dat) <- c("lon", "lat") cities.dat$pop <- c(1e6, 13e6, 5e5, 1.2e6) row.names(cities.dat) <- c("Bordeaux", "London", "Malaga", "Zagreb") cities.dat ``` We load a `gGraph` object which contains the grid that will support the data: ```{r wg10plot, fig = TRUE, fig.width = 8} worldgraph.10k plot(worldgraph.10k) ``` (we could use `worldgraph.40k` for a better resolution). In this figure, each node is represented with a color depending on the habitat type, either 'sea' (blue) or 'land' (green). We are going to interface the cities data with this grid; to do so, we create a `gData` object using `new` (see `?gData` object): ```{r citiesplot, fig=TRUE} cities <- new("gData", coords = cities.dat[, 1:2], data = cities.dat[, 3, drop = FALSE], gGraph.name = "worldgraph.10k") cities plot(cities, type = "both", reset = TRUE) plotEdges(worldgraph.10k) ``` This figure illustrates the matching of original locations (black crosses) to nodes of the grid (red circles). As we can see, an issue occured for Bordeaux, which has been assigned to a node in the sea (in blue). Locations can be re-assigned to nodes with restrictions for some node attribute values using `closestNode`; for instance, here we constrain matching nodes to have an `habitat` value (defined as node attribute in `worldgraph.10k`) equalling `land` (green points): ```{r closeNode, fig=TRUE} cities <- closestNode(cities, attr.name = "habitat", attr.value = "land") plot(cities, type = "both", reset = TRUE) plotEdges(worldgraph.10k) ``` Now, all cities have been assigned to a `land' node of the grid (again, better accuracy will be gained on 40k or finer grids - we use 10k for illustrative purposes only). Content of `cities` can be accessed via various accessors (see `?gData`). For instance, we can retrieve original locations, assigned nodes, and stored data using: ```{r } getCoords(cities) getNodes(cities) getData(cities) ``` We can also get the coordinates of the matching nodes (\textit{i.e.}, red circle on previous figure) using: ```{r } getCoords(cities, original = FALSE) ``` More interestingly, we can now retrieve all the geographic information contained in the underlying grid (\textit{i.e.}, `gGraph` object) as node attributes: ```{r } getNodesAttr(cities) ``` In this example, the information stored in `worldgraph.10k` is rather crude: `habitat` only distinguishes the land from the sea. However, more complex habitat information could be incorporated, for instance from GIS shapefiles (see dedicated section below). ## Visualizing data An essential aspect of spatial analysis lies in visualizing the data. In *geoGraph*, the spatial grids (`gGraph`) and spatial data (`gData`) can be plotted and browsed using a variety of functions. ### Plotting `gGraph` objects Displaying a `gGraph` object is done through `plot` and `points` functions. The first opens a new plotting region, while the second draws in the current plotting region; functions have otherwise similar arguments (see `?plot.gGraph`). By default, plotting a `gGraph` displays the grid of nodes overlaying a shapefile (by default, the landmasses). Edges can be plotted at the same time (argument `edges`), or added afterwards using `plotEdges`. If the `gGraph` object possesses an adequately formed `meta$colors` component, the colors of the nodes are chosen according to the node attributes and the color scheme specified in `meta$colors`. Alternatively, the color of the nodes can be specified via the `col` argument in `plot`/`points`. Here is an example using `worldgraph.10k`: ```{r wg10kdefplot, fig=TRUE, fig.width=8} worldgraph.10k@meta$colors head(getNodesAttr(worldgraph.10k)) table(getNodesAttr(worldgraph.10k)) plot(worldgraph.10k, reset = TRUE) title("Default plotting of worldgraph.10k") ``` It may be worth noting that plotting `gGraph` objects involves plotting a fairly large number of points and edges. On some graphical devices, the resulting plotting can be slow. For instance, one may want to disable `cairo` under linux: this graphical device yields better graphics than `Xlib`, but at the expense of increase computational time. To switch to `Xlib`, type: ```{r eval=FALSE} X11.options(type = "Xlib") ``` and to revert to `cairo`, type: ```{r eval=FALSE} X11.options(type = "cairo") ``` ### Zooming in and out, sliding, etc. In practice, it is often useful to be able to peer at specific regions, and more generally to navigate inside the graphical representation of the data. For this, we can use the interactive functions `geo.zoomin`, `geo.zoomout`, `geo.slide`, `geo.back`, `geo.bookmark`, and `geo.goto`. The zoom and slide functions require to left-click on the graphics to zoom in, zoom out, or slide to adjacent areas; in all cases, a right click ends the function. Also note that `geo.zoomin` can accept an argument specifying a rectangular region, which will be adapted by the function to fit best a square area with similar position and centre, and zoom to this area (see `?geo.zoomin`). `geo.bookmark` and `geo.goto` respectively set and go to a bookmark, *i.e.* a tagged area. This is most useful when one has to switch between distant areas repeatedly. Here are some examples based on the previous plotting of `worldgraph.10k`: Zooming in: ```{r eval=FALSE} geo.zoomin() ``` ```{r zoomin, echo=FALSE} geo.zoomin(c(-19.25, 44.77, 15.09, 66.19)) ``` Zooming out: ```{r eval=FALSE} geo.zoomout() ``` ```{r zoomout, echo=FALSE} geo.zoomin(c(-51.26, 76.78, -23.38, 90.00)) ``` Sliding to the east: ```{r eval=FALSE} geo.slide() ``` ```{r geoslide, echo=FALSE} geo.zoomin(c(41.76, 169.80, -28.28, 99.76)) ``` One important thing which makes plotting `gGraph` objects different from most other plotting in R is that `geoGraph` keeps the changes made to the plotting area in memory. This allows to undo one or several moves using `geo.back`. Moreover, even if the graphical device is killed, plotting again a `gGraph` will use the old parameters by default. To disable this behavior, set the argument `reset=TRUE` when calling upon `plot`. Technically, this 'plotting memory' is implemented by storing plotting information in an environment defined as the hidden environment `geoGraph:::.geoGraphEnv`: ```{r} ls(env = geoGraph:::.geoGraphEnv) ``` You can inspect individual variables within this environment: ```{r} get("last.plot.param", envir = geoGraph:::.geoGraphEnv) ``` However, it is recommended not to modify these objects directly, unless you really know what you are doing. In any case, plotting a `gGraph` object with argument `reset=TRUE` will remove previous plotting history and undo possible wrong manipulations. ### Plotting `gData` objects `gData` objects are by default plotted overlaying the corresponding `gGraph`. For instance, using the `cities` example from above: ```{r citiesPlot2, fig=TRUE} plot(cities, reset = TRUE) text(getCoords(cities), rownames(getData(cities))) ``` Note the argument `reset=TRUE`, which tells the plotting function to adapt the plotting area to the geographic extent of the dataset. To plot additional information, it can be useful to extract the spatial coordinates from the data. This is achieved by `getCoords`. This method takes an extra argument `original`, which is TRUE if original spatial coordinates are seeked, or FALSE for coordinates of the nodes on the grid. We can use this to represent, for instance, the population sizes for the different cities: ```{r cities_plot, fig=TRUE} transp <- function(col, alpha = .5) { res <- apply(col2rgb(col), 2, function(c) rgb(c[1] / 255, c[2] / 255, c[3] / 255, alpha)) return(res) } plot(cities, reset = TRUE) par(xpd = TRUE) text(getCoords(cities) + -.5, rownames(getData(cities))) symbols(getCoords(cities)[, 1], getCoords(cities)[, 2], circ = sqrt(unlist(getData(cities))), inch = .2, bg = transp("red"), add = TRUE) ``` ## Editing `gGraphs` Editing graphs is an essential task in *geoGraph*. While available `gGraph` objects provide a basis to work with (see `?worldgraph.10k`), one may want to adapt a graph to a specific case. For instance, connectivity should be defined according to biological knowledge of the organism under study. `gGraph` can be modified in different ways: by changing the connectivity, the costs of edges, or the attribute values. ### Changing the global connectivity of a `gGraph` There are two main ways of changing the connectivity of a `gGraph`, which match two different objectives. The first approach is to perform global and systematic changes of the connectivity of the graph. Typically, one will want to remove all connections over a given type of landscape, which is uncrossable by the organism under study. Let's assume we are interested in saltwater fishes. To model fish dispersal, we have to define a graph which connects only nodes overlaying the sea. We load the `gGraph` object `rawgraph.10k`, and zoom in to a smaller area (Madagascar) to illustrate changes in connectivity: ```{r plot_edges, fig=TRUE} geo.zoomin(c(35, 54, -26, -10)) plotEdges(rawgraph.10k) ``` We shall set a bookmark for this area, in case we would want to get back to this place later on: ```{r } geo.bookmark("madagascar") ``` What we now want to do is remove all but sea-sea connections. To do so, the easiest approach is to i) define costs for the edges based on habitat, with land being given large costs and ii) remove all edges with large costs. Costs of a given node attribute (here, `habitat') are indicated in the `meta$costs` slot: ```{r } rawgraph.10k@meta$costs newGraph <- rawgraph.10k newGraph@meta$costs[2:6, 2] <- 100 newGraph@meta$costs[1, 2] <- 1 newGraph@meta$costs ``` We have just changed the costs associated to habitat type, but this change is not yet effective on edges between nodes. We use `setCosts` to set the cost of an edge to the average of the costs of its nodes: ```{r fig = TRUE, fig.width = 8} newGraph <- setCosts(newGraph, attr.name = "habitat") plot(newGraph, edge = TRUE) ``` On this new graph, we represent the edges with a width inversely proportional to the associated cost; that is, bold lines for easy travelling and light edges/dotted lines for more costly mouvement. This is not enough yet, since travelling on land is still possible. However, we can tell *geoGraph* to remove all edges associated to too strong a cost, as defined by a given threshold (using `dropDeadEdges`). Here, only sea-sea connections shall be retained, that is, edges with cost 1. ```{r fig=TRUE} newGraph <- dropDeadEdges(newGraph, thres = 1.1) plot(newGraph, edge = TRUE) ``` Here we are: `newGraph` only contains connections in the sea. Note that, although we restrained the plotting area to Madagascar, this change is effective everywhere. For instance, travelling to the nort-west Australian coasts: ```{r bookmark, fig=TRUE} geo.zoomin(c(110, 130, -27, -12)) geo.bookmark("australia") ``` ### Changing local properties of a `gGraph` A second approach to changing a `gGraph` is to refine the graph by hand, adding or removing locally some connections, or altering the attributes of some nodes. This can be necessary to connect components such as islands to the main landmasses, or to correct erroneous data. Adding and removing edges from the grid of a `gGraph` can be achieved by `geo.add.edges` and `geo.remove.edges`, respectively. These functions are interactive, and require the user to select individual nodes or a rectangular area in which edges are added or removed. See `?geo.add.edges` for more information on these functions. For instance, we can remove a few odd connections in the previous graph, near the Australian coasts (note that we have to save the changes using `<-`): ```{r eval=FALSE} geo.goto("australia") newGraph <- geo.remove.edges(newGraph) ``` ```{r eval=TRUE, echo=FALSE} newGraph <- readRDS("Robjects/newGraph.RDS") #load("Robjects/newGraph.RData") ``` ![img](figs/georemove.png) When adding connections within an area or in an entire graph, node addition is based on another `gGraph`,*i.e.* only connections existing in another `gGraph` serving as reference can be added to the current `gGraph`. For graphs based on 10k or 40k grids, the raw graphs provided in `geoGraph` should be used, (`rawgraph.10k`, `rawgraph.40k`), since they are fully connected. In addition to changing grid connectivity, we may also want to modify the attributes of specific nodes. This is again done interactively, using the function `geo.change.attr`. For instance, here, we define a new value `shalowwater` (plotted in light blue) for the attribute `habitat`, selecting affected nodes using the 'area' mode first, and refining the changes using the 'point' mode: ```{r eval=FALSE} plot(newGraph, edge = TRUE) temp <- geo.change.attr(newGraph, mode = "area", attr.name = "habitat", attr.value = "shallowwater", newCol = "deepskyblue") temp <- geo.change.attr(temp, attr.name = "habitat", attr.value = "shallowwater", newCol = "deepskyblue") newGraph <- temp ``` ```{r echo=FALSE} newGraph <- readRDS("Robjects/newGraph2.RDS") #load("Robjects/newGraph2.RData") ``` ```{r fig=TRUE} newGraph@meta$colors plot(newGraph, edge = TRUE) ``` Again, note that the changes made to the graph have to be save in an object (using `<-`) to be effective. ## Extracting information from GIS shapefiles An important feature of *geoGraph* is serving as an interface between *geographic information system* (GIS) layers and geographic data. As currently implemented, *geoGraph* can extract information from shapefiles with the Arc GIS (http://www.esri.com/software/arcgis/index.html) format, using the function `extractFromLayer`. Here, we illustrate this procedure using the shapefile `world-countries.shp` provided with the package. *geoGraph* still relies on `SpatialPolygonDataFrame` objects from the `sp` package. A move to `sf` objects is planned, but for the moment we have to use `sf::st_read()` to read a GIS shapefile into an `sf` object, and then cast it to a `SpatialPolygonDataFrame` with `sf::as_Spatial()`: ```{r } library(sf) world.countries <- sf::st_read(system.file("files/shapefiles/world-countries.shp", package = "geoGraph")) world.countries <- sf::as_Spatial(world.countries) class(world.countries) summary(world.countries) ``` The summary of `world.countries` shows the data (attributes) stored in the layer. Let us assume that we are interested in retrieving continent and country information for the `worldgraph.10k` object. Note that `extractFromLayer` can extract information to other types of objects than `gGraph` (see `?extractFromLayer`) ```{r } summary(getNodesAttr(worldgraph.10k)) newGraph <- extractFromLayer(worldgraph.10k, layer = world.countries, attr = c("CONTINENT", "NAME")) summary(getNodesAttr(newGraph)) ``` The new object `newGraph` is a `gGraph` which now includes, for each node of the grid, the corresponding continent and country retrieved from the GIS layer. We can use the newly acquired information for plotting `newGraph`, by defining new color rules: ```{r fig=TRUE} temp <- unique(getNodesAttr(newGraph)$"NAME") col <- c("transparent", rainbow(length(temp) - 1)) colMat <- data.frame(NAME = temp, color = col) head(colMat) tail(colMat) plot(newGraph, col.rules = colMat, reset = TRUE) ``` This information could in turn be used to define costs for travelling on the grid. For instance, one could import habitat descriptors from a GIS, use these values to formulate a habitat model, and derive costs for dispersal on the grid. As soon as a GIS layer has been extracted to a `gGraph`, this information becomes also available for any `gData` interfaced with this object. For instance, we can re-use the `cities` example defined in a previous section, and interface it with `newGraph` to retrieve continent and country information for the cities of the dataset: ```{r } cities.dat cities <- new("gData", coords = cities.dat[, 1:2], data = cities.dat[, 3, drop = FALSE], gGraph.name = "newGraph") cities <- closestNode(cities, attr.name = "habitat", attr.value = "land") getData(cities) getNodesAttr(cities) ``` ## Finding least-cost paths One of the most useful applications of *geoGraph* is the research of least-cost paths between couples of locations. This can be achieved using the functions `dijkstraFrom` and `dijkstraBetween` on a `gData` object which contains all the locations of interest. These functions return least-cost paths with the format `gPath`. `dijkstraFrom` compute the paths from a given node of the grid to all locations of the `gData`, while `dijkstraBetween` computes the paths between pairs of locations of the `gData`. Below, we detail the example of the documentation of these functions, which uses the famous dataset of native Human populations, HGDP: ```{r fig=TRUE, fig.width = 8} hgdp plot(hgdp, reset = TRUE) ``` Populations of the dataset are shown by red circles, while the underlying grid (`worldgraph.40k`) is represented with colors depending on habitat (blue: sea; green: land; pink: coasts). Population genetics predicts that genetic diversity within populations should decay as populations are located further away from the geographic origin of the species. Here, we verify this relationship for a theoretical origin in Addis abeba, Ethiopia. We shall seek all paths through landmasses to the HGDP populations. First, we check that all populations are connected on the grid using `isConnected`: ```{r } isConnected(hgdp) ``` Note that in practice, we may often want to assess graphically the connectivity of the underlying grid, especially if not all locations of the `gData` are connected. This can be done using `connectivityPlot`, which has methods for both `gGraph` and `gData`, and represents different connected components using different colors. For instance, for `worldgraph.10k`: ```{r connectivityPlot, fig.width = 8} connectivityPlot(worldgraph.10k, edges = TRUE, seed = 1) ``` ```{r fig=TRUE} geo.zoomin(c(90, 150, 18, -25)) title("Different connected components\n in worldgraph.10k") ``` Since all locations in `hgdp` are connected, we can proceed further. We have to set the costs of edges in the `gGraph` grid. To do so, we can choose between i) strictly uniform costs (using `dropCosts`) ii) distance-based costs -- roughly uniform -- (using `setDistCosts`) or iii) attribute-driven costs (using `setCosts`). We shall first illustrate the strictly uniform costs. After setting a `gGraph` with uniform costs, we use `dijkstraFrom` to find the shortest paths between Addis abeba and the populations of `hgdp`: ```{r } myGraph <- dropCosts(worldgraph.40k) hgdp@gGraph.name <- "myGraph" addis <- cbind(38, 9) ori <- closestNode(myGraph, addis) paths <- dijkstraFrom(hgdp, ori) ``` The object `paths` contains the identified paths, which are stored as a list with class `gPath` (see `?gPath`). Paths can be plotted easily: ```{r fig=TRUE, fig.width = 8} addis <- as.vector(addis) plot(myGraph, col = NA, reset = TRUE) plot(paths) points(addis[1], addis[2], pch = "x", cex = 2) text(addis[1] + 35, addis[2], "Addis abeba", cex = .8, font = 2) points(hgdp, col.node = "black") ``` In this graph, each path is plotted with a different color, but several paths overlap in several places. We can extract the distances from the `origin' using `gPath2dist`, and then examine the relationship between genetic diversity within populations (stored in `hgdp`) and the distance from the origin: ```{r fig=TRUE} div <- getData(hgdp)$"Genetic.Div" dgeo.unif <- gPath2dist(paths, res.type = "vector") plot(div ~ dgeo.unif, xlab = "GeoGraphic distance (arbitrary units)", ylab = "Genetic diversity") lm.unif <- lm(div ~ dgeo.unif) abline(lm.unif, col = "red") summary(lm.unif) title("Genetic diversity vs geographic distance \n uniform costs ") ``` Alternatively, we can use costs based on habitat. As a toy example, we will consider that coasts are four times more favourable for dispersal than the rest of the landmasses. We define these new costs, and then compute and plot the corresponding shortest paths: ```{r } myGraph@meta$costs[7, ] <- c("coast", 0.25) myGraph@meta$costs myGraph <- setCosts(myGraph, attr.name = "habitat") paths.2 <- dijkstraFrom(hgdp, ori) ``` ```{r fig=TRUE, fig.width = 8} plot(newGraph, col = NA, reset = TRUE) plot(paths.2) points(addis[1], addis[2], pch = "x", cex = 2) text(addis[1] + 35, addis[2], "Addis abeba", cex = .8, font = 2) points(hgdp, col.node = "black") ``` The new paths are slightly different from the previous ones. We can examine the new relationship with genetic distance: ```{r fig=TRUE} dgeo.hab <- gPath2dist(paths.2, res.type = "vector") plot(div ~ dgeo.hab, xlab = "GeoGraphic distance (arbitrary units)", ylab = "Genetic diversity") lm.hab <- lm(div ~ dgeo.hab) abline(lm.hab, col = "red") summary(lm.hab) title("Genetic diversity vs geographic distance \n habitat costs ") ``` Of course, the distinction between coasts and inner landmasses is a somewhat poor description of habitat. In practice, complex habitat models can be used as simply.