Title: | Strain Elevation Tension Spring Embedding |
---|---|
Description: | An R implementation for the Strain Elevation and Tension embedding algorithm from Bourne (2020) <doi:10.1007/s41109-020-00329-4>. The package embeds graphs and networks using the Strain Elevation and Tension embedding (SETSe) algorithm. SETSe represents the network as a physical system, where edges are elastic, and nodes exert a force either up or down based on node features. SETSe positions the nodes vertically such that the tension in the edges of a node is equal and opposite to the force it exerts for all nodes in the network. The resultant structure can then be analysed by looking at the node elevation and the edge strain and tension. This algorithm works on weighted and unweighted networks as well as networks with or without explicit node features. Edge elasticity can be created from existing edge weights or kept as a constant. |
Authors: | Jonathan Bourne [aut, cre] |
Maintainer: | Jonathan Bourne <[email protected]> |
License: | GPL-3 |
Version: | 0.5.0 |
Built: | 2025-02-24 05:04:42 UTC |
Source: | https://github.com/jonnob/rsetse |
The data set can be used to explore simple different embeddings methods on a very simple graph
biconnected_network
biconnected_network
An igraph network with 7 nodes and 19 edges which forms three biconnected components:
The name of the edge connecting the two vertices
The edge weight connecting the two vertices. This value is 1000 for edges connecting nodes A to D, it is 500 for edges connecting nodes E to G, it is 100 connecting nodes D and E
The force produced by each node. It was calculated by subtracting the mean node centrality for the network from the node centrality
The group each node is in. This can be used to generate force if required
## Not run: plot(biconnected_network)
## Not run: plot(biconnected_network)
This function adds the graph characteristic A which is the cross sectional area of the edge.
calc_spring_area(g, value, minimum_value, range)
calc_spring_area(g, value, minimum_value, range)
g |
an igraph object. The graph representing the network |
value |
a character string. The name of the edge attribute that is used as value from which Area will be calculated |
minimum_value |
a numeric value. Indicating the most thinnest edge |
range |
a numeric value. This gives the range of A values above the minimum. |
This function is pretty niche but calculates a cross sectional area of an edge. This is useful when you wish to calculate the spring coefficient k using Young's modulus. The function coerces and edge characteristic to be within a certain range of values preventing negative/zero/infinite values.
a igraph object with the new edge attribute "Area" for each edge
library(igraph) set.seed(234) g_prep <- generate_peels_network("A") %>% set.edge.attribute(., name = "edge_characteristic", value = rep(1:16, each = 10)) g <- calc_spring_area(g_prep, value = "edge_characteristic", minimum_value = 10, range = 20) get.edge.attribute(g, "Area")
library(igraph) set.seed(234) g_prep <- generate_peels_network("A") %>% set.edge.attribute(., name = "edge_characteristic", value = rep(1:16, each = 10)) g <- calc_spring_area(g_prep, value = "edge_characteristic", minimum_value = 10, range = 20) get.edge.attribute(g, "Area")
This function adds the graph characteristic k which is the spring constant for a given Area and Young's modulus.
calc_spring_constant(g, youngs_mod = "E", A = "Area", distance = "distance")
calc_spring_constant(g, youngs_mod = "E", A = "Area", distance = "distance")
g |
an igraph object. The graph representing the network |
youngs_mod |
a character string. The Young's modulus of the edge. The default is E |
A |
a character string. The cross sectional area of the line. The default is A. see details on values of A |
distance |
A character string. See details on values of distance |
When A and distance are both set to 1 k=E
and the spring constant is equivalent to Young's modulus.
In this case there is no need to call this function as the edge weight representing youngs modulus can be used for k instead.
The values A and distance are edge attributes referring to the cross-sectional area of the edge and the horizontal distance of the edge, in other words the distance between the two nodes at each end of the edge. These values can be set to anything the user wishes, they may be constant or not. However, consider carefully setting the values to anything else other than 1. There needs to be a clear reasoning or the results will be meaningless.
For example setting the distance of an edge that represents an electrical cable to the distance of the electrical cable will return very different results when compared to a constant of one. However, the physical distance between two points does not necessarily have an impact on the loading of the line and so the results would not be interpretable. In contrast setting the distance metric to be some function of the line resistance may have meaning and be appropriate. As a general rule distance and area should be set to 1.
and edge attribute called k with value EA/distance
[calc_spring_area]
library(igraph) set.seed(234) g_prep <- generate_peels_network("A") %>% set.edge.attribute(., name = "edge_characteristic", value = rep(1:16, each = 10)) %>% #set some pretend Young's modulus value set.edge.attribute(., name = "E", value = rep(c(1e5, 5e5, 2e5, 3e5), each = 40)) %>% #calculate the spring area from another edge characteristic calc_spring_area(., value = "edge_characteristic", minimum_value = 10, range = 20) %>% prepare_edges() %>% prepare_categorical_force(., node_names = "name", force_var = "class") g <- calc_spring_constant(g_prep, youngs_mod = "E", A = "Area", distance = "distance")
library(igraph) set.seed(234) g_prep <- generate_peels_network("A") %>% set.edge.attribute(., name = "edge_characteristic", value = rep(1:16, each = 10)) %>% #set some pretend Young's modulus value set.edge.attribute(., name = "E", value = rep(c(1e5, 5e5, 2e5, 3e5), each = 40)) %>% #calculate the spring area from another edge characteristic calc_spring_area(., value = "edge_characteristic", minimum_value = 10, range = 20) %>% prepare_edges() %>% prepare_categorical_force(., node_names = "name", force_var = "class") g <- calc_spring_constant(g_prep, youngs_mod = "E", A = "Area", distance = "distance")
This function calculates the line tension and strain characteristics for the edges in a graph. It is called by default by all the embedding functions (SETSe_*) but is included here for completeness.
calc_tension_strain( g, height_embeddings_df, distance = "distance", edge_name = "edge_name", k = "k" )
calc_tension_strain( g, height_embeddings_df, distance = "distance", edge_name = "edge_name", k = "k" )
g |
An igraph object of the network. |
height_embeddings_df |
A data frame. This is the results of Create_stabilised_blocks or Find_network_balance |
distance |
A character string. The name of the edge attribute that contains the distance between two nodes. The default is "distance" |
edge_name |
A character string. The name of the edge attribute that contains the edge name. The default is "edge_name". |
k |
A character string. The name of the edge attribute that contains the spring coefficient |
Whilst the node embeddings dataframe contains the elevation of the setse algorithm this function produces a data frame that contains the Tension and Strain. The dataframe that is returned contains a substantial amount of line information so reducing the number of variables may be necessary if the data frame will be merged with previously generated data as there could be multiple columns of the same value. This function is called by default at the end of all setse functions
The function returns a data frame of 7 columns. These columns are the edge name, the change in elevation, The final distance between the two nodes (the hypotenuse of the original distance and the vertical distance), the spring constant k, the edge tension, the edge strain, and the mean elevation.
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E")%>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") #embed the network using auto setse embeddings <- setse_auto(g, force = "class_A") edge_embeddings_df <- calc_tension_strain(g, embeddings$node_embeddings) all.equal(embeddings$edge_embeddings, edge_embeddings_df)
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E")%>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") #embed the network using auto setse embeddings <- setse_auto(g, force = "class_A") edge_embeddings_df <- calc_tension_strain(g, embeddings$node_embeddings) all.equal(embeddings$edge_embeddings, edge_embeddings_df)
This function calculates the line tension and strain characteristics for the edges in a graph. It is called by default by all the embedding functions (SETSe_*) but is included here for completeness.
calc_tension_strain_hd( g, height_embeddings_df, distance = "distance", edge_name = "edge_name", k = "k" )
calc_tension_strain_hd( g, height_embeddings_df, distance = "distance", edge_name = "edge_name", k = "k" )
g |
An igraph object of the network. |
height_embeddings_df |
A data frame. This is the results of Create_stabilised_blocks or Find_network_balance |
distance |
A character string. The name of the edge attribute that contains the distance between two nodes. The default is "distance" |
edge_name |
A character string. The name of the edge attribute that contains the edge name. The default is "edge_name". |
k |
A character string. The name of the edge attribute that contains the spring coefficient |
Whilst the node embeddings dataframe contains the elevation of the setse algorithm this function produces a data frame that contains the Tension and Strain. The dataframe that is returned contains a substantial amount of line information so reducing the number of variables may be necessary if the data frame will be merged with previously generated data as there could be multiple columns of the same value. This function is called by default at the end of all setse functions
The function returns a data frame of 7 columns. These columns are the edge name, the change in elevation, The final distance between the two nodes (the hypotenuse of the original distance and the vertical distance), the spring constant k, the edge tension, the edge strain, and the mean elevation.
g <- biconnected_network %>% prepare_edges(., k = 1000) %>% #prepare the continuous features as normal prepare_continuous_force(., node_names = "name", force_var = "force") %>% #prepare the categorical features as normal prepare_categorical_force(., node_names = "name", force_var = "group") #embed them using the high dimensional function two_dimensional_embeddings <- setse_auto_hd(g, force = c("group_A", "force"), k = "weight") edge_embeddings_df <- calc_tension_strain_hd(g, two_dimensional_embeddings$node_embeddings) all.equal(two_dimensional_embeddings$edge_embeddings, edge_embeddings_df)
g <- biconnected_network %>% prepare_edges(., k = 1000) %>% #prepare the continuous features as normal prepare_continuous_force(., node_names = "name", force_var = "force") %>% #prepare the categorical features as normal prepare_categorical_force(., node_names = "name", force_var = "group") #embed them using the high dimensional function two_dimensional_embeddings <- setse_auto_hd(g, force = c("group_A", "force"), k = "weight") edge_embeddings_df <- calc_tension_strain_hd(g, two_dimensional_embeddings$node_embeddings) all.equal(two_dimensional_embeddings$edge_embeddings, edge_embeddings_df)
Separates the network into a series of bi-connected components that can be solved separately. Solving smaller subgraphs using the bi-connected component method reduces the risk of network divergence. This function is seldom called independently of setse_bicomp
create_balanced_blocks(g, force = "force", bigraph = bigraph)
create_balanced_blocks(g, force = "force", bigraph = bigraph)
g |
An igraph object. The network for which embeddings will be found |
force |
A character vector. The name of the node attribute that is the force exerted by the nodes |
bigraph |
A list. the list of biconnected components produced by the biconnected_components function. This function take a non trivial amount of time on large graphs so this pass through minimises the function being called. |
When networks are separated into the bi-connected subgraphs or blocks. The overall network balance needs to be maintained.
create_balanced_blocks
maintains the balance by summing the net force across the all the nodes that are being removed from
the subgraph. Therefore a node that is an articulation point has a force value equal to the total of all the nodes on the adjacent
bi-connected component.
A list containing all the bi connected component where each component is balanced to have a net force of 0.
library(igraph) #create a list of balanced network using the biconnected_network dataset balanced_list <-create_balanced_blocks(biconnected_network, bigraph = biconnected_components(biconnected_network)) #count the edges in each of the bi-components sapply(balanced_list, ecount)
library(igraph) #create a list of balanced network using the biconnected_network dataset balanced_list <-create_balanced_blocks(biconnected_network, bigraph = biconnected_components(biconnected_network)) #count the edges in each of the bi-components sapply(balanced_list, ecount)
Aggregates edge strain and tension to node level
create_node_edge_df(embeddings_data, function_names = c("mean", "median"))
create_node_edge_df(embeddings_data, function_names = c("mean", "median"))
embeddings_data |
A list. The output of any of the setse embedding functions |
function_names |
A string vector. the names of the aggregation methods to be used |
Often if can be useful to have edge data at node level, an example of this would be plotting the node and tension or strain. To do this requires that the edge embeddings are aggregated somehow to node level and joined to the appropriate node. This function takes as an argument the output of the setse embedding functions and any number of aggregation functions to produce a dataframe that is convenient to use.
A dataframe with node names, node force, node elevation and strain and tension aggregated using the named functions. The strain and tension columns are returned with names in the form "strain_x" where "x" is the name of the function used to aggregate. The total number of columns is dependent on the number of aggregation functions.
embeddings_data <- biconnected_network %>% prepare_edges(.) %>% prepare_continuous_force(., node_names = "name", force_var = "force") %>% setse_auto(., k = "weight") out <- create_node_edge_df(embeddings_data, function_names = c("mean", "mode", "sum"))
embeddings_data <- biconnected_network %>% prepare_edges(.) %>% prepare_continuous_force(., node_names = "name", force_var = "force") %>% setse_auto(., k = "weight") out <- create_node_edge_df(embeddings_data, function_names = c("mean", "mode", "sum"))
Aggregates edge strain and tension to node level
create_node_edge_df_hd(embeddings_data, function_names = c("mean", "median"))
create_node_edge_df_hd(embeddings_data, function_names = c("mean", "median"))
embeddings_data |
A list. The output of any of the setse embedding functions |
function_names |
A string vector. the names of the aggregation methods to be used |
Often if can be useful to have edge data at node level, an example of this would be plotting the node and tension or strain. To do this requires that the edge embeddings are aggregated somehow to node level and joined to the appropriate node. This function takes as an argument the output of the setse embedding functions and any number of aggregation functions to produce a dataframe that is convenient to use.
A dataframe with node names, node force, node elevation and strain and tension aggregated using the named functions. The strain and tension columns are returned with names in the form "strain_x" where "x" is the name of the function used to aggregate. The total number of columns is dependent on the number of aggregation functions.
g <- biconnected_network %>% prepare_edges(.) %>% #prepare the continuous features as normal prepare_continuous_force(., node_names = "name", force_var = "force") %>% #prepare the categorical features as normal prepare_categorical_force(., node_names = "name", force_var = "group") #embed them using the high dimensional function two_dimensional_embeddings <- setse_auto_hd(g, force = c("group_A", "force"), k = "weight") out <- create_node_edge_df_hd(two_dimensional_embeddings , function_names = c("mean", "mode", "sum"))
g <- biconnected_network %>% prepare_edges(.) %>% #prepare the continuous features as normal prepare_continuous_force(., node_names = "name", force_var = "force") %>% #prepare the categorical features as normal prepare_categorical_force(., node_names = "name", force_var = "group") #embed them using the high dimensional function two_dimensional_embeddings <- setse_auto_hd(g, force = c("group_A", "force"), k = "weight") out <- create_node_edge_df_hd(two_dimensional_embeddings , function_names = c("mean", "mode", "sum"))
Creates an example of a network from Peel's quintet of the specified type.
generate_peels_network( type, k_values = c(1000, 500, 100), single_component = TRUE )
generate_peels_network( type, k_values = c(1000, 500, 100), single_component = TRUE )
type |
A character which is any of the capital letters A-E |
k_values |
An integer vector. The spring constant for the edge types within sub class, within class but not sub-class, between classes. The default value is 1000, 500, 100. This means the strongest connection is for nodes in the same sub-class and the weakest connection is for nodes in different classes |
single_component |
Logical. Guarantees a single component network. Set to TRUE as default |
This function generates networks matching the 5 types described in Peel et al 2019 (doi:10.1073/pnas.1713019115). All networks have 40 nodes, 60 edges, two node classes and four node sub-classes. The connections between the are equal across all 5 types. As a result all networks generated have identical assortativity. However, as the sub-classes have different connection probability the structures produced by the networks are very different. When projected into SETSe space the network types occupy there own area, see Bourne 2020 (doi:10.1007/s41109-020-00329-4) for details.
An igraph object that matches one of the 5 Peel's quintet types. The nodes are labeled with class and sub class. The edges have attribute k which is the spring constant of the edge given relationship between the nodes the edge connects to
set.seed(234) g <- generate_peels_network(type = "E") plot(g)
set.seed(234) g <- generate_peels_network(type = "E") plot(g)
This function adjusts the mass of the nodes so that the force in each direction over the mass for that direction produces an acceleration of 1.
mass_adjuster(g, force = "force", resolution_limit = TRUE)
mass_adjuster(g, force = "force", resolution_limit = TRUE)
g |
An igraph object. the network |
force |
A character string. The name of the network attribute contain the network forces. Default is "force" |
resolution_limit |
logical. If the forces in the network are smaller than the square root of the machine floating point limit then the mass is set to one. default is true |
This function can help stabilise the convergence of networks by preventing major imbalances between the force in the network and the mass of the nodes. in certain cases acceleration can become very large or very small in if force and mass are not well parametrised.
This function means that if the network were reduced to two nodes where each node contained all the mass and all the force of one of the two directions, then each node would have an acceleration of 1ms^-2
The function can become important when using setse_bicomp as the force mass ratio of biconnection components can vary widely from the total force mass ratio of the network.
A numeric value giving the adjusted mass of the nodes in the network.
set.seed(234) #set the random see for generating the network set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") mass_adjuster(g, force = "class_B", resolution_limit = TRUE)
set.seed(234) #set the random see for generating the network set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") mass_adjuster(g, force = "class_B", resolution_limit = TRUE)
This function prepares a binary network for SETSe projection.
prepare_categorical_force(g, node_names, force_var, sum_to_one = TRUE)
prepare_categorical_force(g, node_names, force_var, sum_to_one = TRUE)
g |
an igraph object |
node_names |
a character string. A vertex attribute which contains the node names. |
force_var |
A vector of force attributes. This describes all the categorical force attributes of the network. All named attributes must be either character or factor attributes. |
sum_to_one |
Logical. whether the total positive force sums to 1, if FALSE the total is the sum of the positive cases |
The network takes in an igraph object and produces an undirected igraph object that can be used with the embedding functions.
The purpose of the function is to easily be able to project categorical features using SETSe. The function creates new variables where each variable represents one level of the categorical variables. For embedding only n-1 of the levels are needed.
The function creates several variables of the format "force_". Vertex attributes representing the force produced by each node for each categorical value, there will be n of these variables representing each level of the categorical values. The variable names will be the the name of the variable and the name of the level seperated by and underscore. For example, with a variable group and levels A and B, the created force variables will be "group_A" and "group_B" The sum of these variables will be 0.
A network with the correct node attributes for the embeddings process.
setse, setse_auto, setse_bicomp, setse_auto_hd
Other prepare_setse:
prepare_continuous_force()
,
prepare_edges()
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto_setse setse_auto(., force = "class_A")
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto_setse setse_auto(., force = "class_A")
This function prepares a continuous network for SETSe projection. The function works for networks with a single feature or high-dimensional features. The network takes in an igraph object and produces an undirected igraph object that can be used with the embedding functions.
prepare_continuous_force( g, node_names, k = NULL, force_var, sum_to_one = TRUE, distance = 1 )
prepare_continuous_force( g, node_names, k = NULL, force_var, sum_to_one = TRUE, distance = 1 )
g |
an igraph object |
node_names |
a character string. A vertex attribute which contains the node names. |
k |
The spring constant. This value is either a numeric value giving the spring constant for all edges or NULL. If NULL is used the k value will not be added to the network. This is useful k is made through some other process. |
force_var |
A character vector. This is the vector of node attributes to be used as the force variables. All the attributes must be a numeric or integer value, and cannot have NA's. On a single variable embedding this is usually "force" |
sum_to_one |
Logical. whether the total positive force sums to 1, if FALSE the total is the sum of the positive cases |
distance |
a positive numeric value. The default is 1 |
The function subtracts the mean from all the values so that the system is balanced. If sum_to_one is true then everything is divided by the absolute sum over two
The function adds the node attribute 'force' and the edge attribute 'k' unless k=NULL. The purpose of the function is to easily be able to project continuous networks using SETSe.
The function creates several variables
force: a vertex attribute representing the force produced by each node. The sum of this variable will be 0
k: The spring constant representing the stiffness of the spring.
edge_name: the name of the edges. it takes the form "from_to" where "from" is the origin node and "to" is the destination node using the as_data_frame function from igraph
A network with the correct edge and node attributes for the embeddings process.
Other prepare_setse:
prepare_categorical_force()
,
prepare_edges()
embeddings <- biconnected_network %>% #prepare the network for a binary embedding #k is already present in the data so is left null in the preparation function prepare_edges(k = NULL, distance = 1) %>% prepare_continuous_force(., node_names = "name", force_var = "force") %>% #embed the network using auto_setse #in the biconnected_network dataset the edge weights are used directly as k values setse_auto(k = "weight")
embeddings <- biconnected_network %>% #prepare the network for a binary embedding #k is already present in the data so is left null in the preparation function prepare_edges(k = NULL, distance = 1) %>% prepare_continuous_force(., node_names = "name", force_var = "force") %>% #embed the network using auto_setse #in the biconnected_network dataset the edge weights are used directly as k values setse_auto(k = "weight")
This function helps prepare the network edges for embedding
prepare_edges(g, k = NULL, distance = 1, create_edge_name = TRUE)
prepare_edges(g, k = NULL, distance = 1, create_edge_name = TRUE)
g |
an igraph object |
k |
The spring constant. This value is either a numeric value giving the spring constant for all edges or NULL. If NULL is used the k value will not be added to the network. This is useful k is made through some other process. |
distance |
The spring constant. This value is either a numeric value giving the spring constant for all edges or NULL. If NULL is used the distance value will not be added to the network. This is useful distance is made through some other process. |
create_edge_name |
Logical. Whether to create and edge name attribute or not. @details The function prepares the edge characteristics of the network so that they can be embedded using the SETSe_ family of functions. @return The function creates several variables
|
setse, setse_auto, setse_bicomp, setse_auto_hd
Other prepare_setse:
prepare_categorical_force()
,
prepare_continuous_force()
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto setse setse_auto(., force = "class_A")
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto setse setse_auto(., force = "class_A")
Returns an R list
containing the character vector
c("foo", "bar")
and the numeric vector c(0, 1)
.
rcpp_hello()
rcpp_hello()
rcpp_hello()
rcpp_hello()
keep only the largest component of graph
remove_small_components(g)
remove_small_components(g)
g |
An igraph object of the graph to embed. |
As setse only works on connected components this function removes all but the largest component. This is a helper function to quickly project a network with setse.
An igraph object.
library(igraph) set.seed(1284) #generate a random erdos renyi graph with 100 nodes and 150 edges g <- erdos.renyi.game(n=100, p.or.m = 150, type = "gnm" ) #count the number of components components(g)$no #remove all but the largest component g2 <-remove_small_components(g) #Now there is only 1 component igraph::components(g2)$no
library(igraph) set.seed(1284) #generate a random erdos renyi graph with 100 nodes and 150 edges g <- erdos.renyi.game(n=100, p.or.m = 150, type = "gnm" ) #count the number of components components(g)$no #remove all but the largest component g2 <-remove_small_components(g) #Now there is only 1 component igraph::components(g2)$no
Embeds/smooths a feature network using the basic SETSe algorithm. generally setse_auto or setse_bicomp is preferred.
setse( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 20000, coef_drag = 1, tol = 1e-06, sparse = FALSE, two_node_solution = TRUE, sample = 1, static_limit = NULL, noisy_termination = TRUE )
setse( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 20000, coef_drag = 1, tol = 1e-06, sparse = FALSE, two_node_solution = TRUE, sample = 1, static_limit = NULL, noisy_termination = TRUE )
g |
An igraph object |
force |
A character string. This is the node attribute that contains the force the nodes exert on the network. |
distance |
A character string. The edge attribute that contains the original/horizontal distance between nodes. |
edge_name |
A character string. This is the edge attribute that contains the edge_name of the edges. |
k |
A character string. This is k for the moment don't change it. |
tstep |
A numeric. The time interval used to iterate through the network dynamics. |
mass |
A numeric. This is the mass constant of the nodes in normalised networks this is set to 1. |
max_iter |
An integer. The maximum number of iterations before stopping. Larger networks usually need more iterations. |
coef_drag |
A numeric. |
tol |
A numeric. The tolerance factor for early stopping. |
sparse |
Logical. Whether or not the function should be run using sparse matrices. must match the actual matrix, this could prob be automated |
two_node_solution |
Logical. The Newton-Raphson algo is used to find the correct angle |
sample |
Integer. The dynamics will be stored only if the iteration number is a multiple of the sample. This can greatly reduce the size of the results file for large numbers of iterations. Must be a multiple of the max_iter |
static_limit |
Numeric. The maximum value the static force can reach before the algorithm terminates early. This prevents calculation in a diverging system. The value should be set to some multiple greater than one of the force in the system. If left blank the static limit is twice the system absolute mean force. |
noisy_termination |
Stop the process if the static force does not monotonically decrease. |
This is the basic SETS embeddings algorithm, it outputs all elements of the embeddings as well as convergence dynamics. It is a
wrapper around the core SETS algorithm which requires data preparation and only produces node embeddings and network dynamics.
There is little reason to use this function as setse_auto
and setse_bicomp
are faster and easier to use.
A list containing 4 dataframes.
The network dynamics describing several key figures of the network during the convergence process, this includes the static_force.
The node embeddings. Includes all data on the nodes the forces exerted on them position and dynamics at simulation termination.
time taken. the amount of time taken per component, includes the number of edges and nodes.
The edge embeddings. Includes all data on the edges as well as the strain and tension values.
Other setse:
setse_auto_hd()
,
setse_auto()
,
setse_bicomp()
,
setse_expanded()
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto_setse setse(., force = "class_A")
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto_setse setse(., force = "class_A")
Embeds/smooths a feature network using the SETSe algorithm automatically finding convergence parameters using a grid search.
setse_auto( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 1e+05, tol = 0.002, sparse = FALSE, hyper_iters = 100, hyper_tol = 0.01, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, sample = 100, static_limit = NULL, verbose = FALSE, include_edges = TRUE, noisy_termination = TRUE )
setse_auto( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 1e+05, tol = 0.002, sparse = FALSE, hyper_iters = 100, hyper_tol = 0.01, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, sample = 100, static_limit = NULL, verbose = FALSE, include_edges = TRUE, noisy_termination = TRUE )
g |
An igraph object |
force |
A character string. This is the node attribute that contains the force the nodes exert on the network. |
distance |
A character string. The edge attribute that contains the original/horizontal distance between nodes. |
edge_name |
A character string. This is the edge attribute that contains the edge_name of the edges. |
k |
A character string. This is k for the moment don't change it. |
tstep |
A numeric. The time interval used to iterate through the network dynamics. |
mass |
A numeric. This is the mass constant of the nodes in normalised networks this is set to 1. |
max_iter |
An integer. The maximum number of iterations before stopping. Larger networks usually need more iterations. |
tol |
A numeric. The tolerance factor for early stopping. |
sparse |
Logical. Whether or not the function should be run using sparse matrices. must match the actual matrix, this could prob be automated |
hyper_iters |
integer. The hyper parameter that determines the number of iterations allowed to find an acceptable convergence value. |
hyper_tol |
numeric. The convergence tolerance when trying to find the minimum value |
hyper_max |
integer. The maximum number of iterations that SETSe will go through whilst searching for the minimum. |
drag_min |
integer. A power of ten. The lowest drag value to be used in the search |
drag_max |
integer. A power of ten. if the drag exceeds this value the tstep is reduced |
tstep_change |
numeric. A value between 0 and 1 that determines how much the time step will be reduced by default value is 0.5 |
sample |
Integer. The dynamics will be stored only if the iteration number is a multiple of the sample. This can greatly reduce the size of the results file for large numbers of iterations. Must be a multiple of the max_iter |
static_limit |
Numeric. The maximum value the static force can reach before the algorithm terminates early. This prevents calculation in a diverging system. The value should be set to some multiple greater than one of the force in the system. If left blank the static limit is the system absolute mean force. |
verbose |
Logical. This value sets whether messages generated during the process are suppressed or not. |
include_edges |
logical. An optional variable on whether to calculate the edge tension and strain. Default is TRUE. included for ease of integration into the bicomponent functions. |
noisy_termination |
Stop the process if the static force does not monotonically decrease. |
This is one of the most commonly used SETSe functions. It automatically selects the convergence time-step and drag values to ensure efficient convergence.
The noisy_termination parameter is used as in some cases the convergence process can get stuck in the noisy zone of SETSe space. To prevent this the process is stopped early if the static force does not monotonically decrease. On large networks this greatly speeds up the search for good parameter values. It increases the chance of successful convergence. More detail on auto-SETSe can be found in the paper "The spring bounces back" (Bourne 2020).
A list containing 5 dataframes.
The network dynamics describing several key figures of the network during the convergence process, this includes the static_force
The node embeddings. Includes all data on the nodes the forces exerted on them position and dynamics at simulation termination
time taken. the amount of time taken per component, includes the edge and nodes of each component
The edge embeddings. Includes all data on the edges as well as the strain and tension values.
memory_df A dataframe recording the iteration history of the convergence of each component.
Other setse:
setse_auto_hd()
,
setse_bicomp()
,
setse_expanded()
,
setse()
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto_setse setse_auto(., force = "class_A")
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network using auto_setse setse_auto(., force = "class_A")
setse_auto_hd( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 1e+05, tol = 0.002, sparse = FALSE, hyper_iters = 100, hyper_tol = 0.1, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, sample = 100, static_limit = NULL, verbose = FALSE, include_edges = TRUE, noisy_termination = TRUE )
setse_auto_hd( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 1e+05, tol = 0.002, sparse = FALSE, hyper_iters = 100, hyper_tol = 0.1, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, sample = 100, static_limit = NULL, verbose = FALSE, include_edges = TRUE, noisy_termination = TRUE )
g |
An igraph object |
force |
A character vector. These are the nodes attributes that contain the force the nodes exert on the network. |
distance |
A character string. The edge attribute that contains the original/horizontal distance between nodes. |
edge_name |
A character string. This is the edge attribute that contains the edge_name of the edges. |
k |
A character string. This is k for the moment don't change it. |
tstep |
A numeric. The time interval used to iterate through the network dynamics. |
mass |
A numeric. This is the mass constant of the nodes in normalised networks this is set to 1. |
max_iter |
An integer. The maximum number of iterations before stopping. Larger networks usually need more iterations. |
tol |
A numeric. The tolerance factor for early stopping. Setting this value to be 0.1 \itemsparseLogical. Whether or not the function should be run using sparse matrices. must match the actual matrix, this could prob be automated \itemhyper_itersinteger. The hyper parameter that determines the number of iterations allowed to find an acceptable convergence value. \itemhyper_tolnumeric. The convergence tolerance when trying to find the minimum value. When the ratio between the current static force and the previous static force is smaller than this value the search terminates. values between 0.1-0.3 seem to be often ok, too small and you waste time fine tuning the parameters instead of converging to big and you have poorly parametrised values. \itemhyper_maxinteger. The maximum number of iterations that SETSe will go through whilst searching for the minimum. \itemdrag_mininteger. A power of ten. The lowest drag value to be used in the search \itemdrag_maxinteger. A power of ten. if the drag exceeds this value the tstep is reduced \itemtstep_changenumeric. A value between 0 and 1 that determines how much the time step will be reduced by default value is 0.5 \itemsampleInteger. The dynamics will be stored only if the iteration number is a multiple of the sample. This can greatly reduce the size of the results file for large numbers of iterations. Must be a multiple of the max_iter \itemstatic_limitNumeric. The maximum value the static force can reach before the algorithm terminates early. This prevents calculation in a diverging system. The value should be set to some multiple greater than one of the force in the system. If left blank the static limit is the system absolute mean force. \itemverboseLogical. This value sets whether messages generated during the process are suppressed or not. This is useful for large networks which can take a long time to converge, but for smaller ones can be turned off. \iteminclude_edgeslogical. An optional variable on whether to calculate the edge tension and strain. Default is TRUE. included for ease of integration into the bicomponent functions. \itemnoisy_terminationStop the process if the static force does not monotonically decrease. |
A list of four elements. A data frame with the height embeddings of the network, a data frame of the edge embeddings, the convergence dynamics dataframe for the network as well as the search history for convergence criteria of the network Uses a grid search and a binary search to find appropriate convergence conditions. This is one of the most commonly used SETSe functions. It automatically selects the convergence time-step and drag values to ensure efficient convergence.
The noisy_termination parameter is used as in some cases the convergence process can get stuck in the noisy zone of SETSe space. To prevent this the process is stopped early if the static force does not monotonically decrease. On large networks this greatly speeds up the search for good parameter values. It increases the chance of successful convergence. More detail on auto-SETSe can be found in the paper "The spring bounces back" (Bourne 2020).
g <- biconnected_network %>% prepare_edges(.) %>% #prepare the continuous features as normal prepare_continuous_force(., node_names = "name", force_var = "force") %>% #prepare the categorical features as normal prepare_categorical_force(., node_names = "name", force_var = "group")
#embed them using the high dimensional function two_dimensional_embeddings <- setse_auto_hd(g, force = c("group_A", "force"), k = "weight")
Other setse:
setse_auto()
,
setse_bicomp()
,
setse_expanded()
,
setse()
setse
Embeds/smooths a feature network using the SETSe algorithm automatically finding convergence parameters using a grid search. In addition it breaks the network into bi-connected component solves each sub-component inidividually and re-assembles them back into a single network. This is the most reliable method to perform SETSe embeddings and can be substantially quicker on certain network topologies.
setse_bicomp( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, tol = 0.01, max_iter = 20000, mass = NULL, sparse = FALSE, sample = 100, static_limit = NULL, hyper_iters = 100, hyper_tol = 0.1, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, verbose = FALSE, noisy_termination = TRUE )
setse_bicomp( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, tol = 0.01, max_iter = 20000, mass = NULL, sparse = FALSE, sample = 100, static_limit = NULL, hyper_iters = 100, hyper_tol = 0.1, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, verbose = FALSE, noisy_termination = TRUE )
g |
An igraph object |
force |
A character string. This is the node attribute that contains the force the nodes exert on the network. |
distance |
A character string. The edge attribute that contains the original/horizontal distance between nodes. |
edge_name |
A character string. This is the edge attribute that contains the edge_name of the edges. |
k |
A character string. This is k for the moment don't change it. |
tstep |
A numeric. The time interval used to iterate through the network dynamics. |
tol |
A numeric. The tolerance factor for early stopping. |
max_iter |
An integer. The maximum number of iterations before stopping. Larger networks usually need more iterations. |
mass |
A numeric. This is the mass constant of the nodes in normalised networks. Default is set to NULL and call mass_adjuster to set the mass for each biconnected component |
sparse |
Logical. Whether sparse matrices will be used. This becomes valuable for larger networks |
sample |
Integer. The dynamics will be stored only if the iteration number is a multiple of the sample. This can greatly reduce the size of the results file for large numbers of iterations. Must be a multiple of the max_iter |
static_limit |
Numeric. The maximum value the static force can reach before the algorithm terminates early. This prevents calculation in a diverging system. The value should be set to some multiple greater than one of the force in the system. If left blank the static limit is the system absolute mean force. |
hyper_iters |
integer. The hyper parameter that determines the number of iterations allowed to find an acceptable convergence value. |
hyper_tol |
numeric. The convergence tolerance when trying to find the minimum value |
hyper_max |
integer. The maximum number of iterations that SETSe will go through whilst searching for the minimum. |
drag_min |
integer. A power of ten. The lowest drag value to be used in the search |
drag_max |
integer. A power of ten. if the drag exceeds this value the tstep is reduced |
tstep_change |
numeric. A value between 0 and 1 that determines how much the time step will be reduced by default value is 0.5 |
verbose |
Logical. This value sets whether messages generated during the process are suppressed or not. |
noisy_termination |
Stop the process if the static force does not monotonically decrease. |
Embedding the network by solving each bi-connected component then re-assembling can be faster for larger graphs, graphs with many nodes of degree 2, or networks with a low clustering coefficient. This is because although SETSe is very efficient the topology of larger graphs make them more difficult to converge. Large graph tend to be made of 1 very large biconnected component and many very small biconnected components. As the mass of the system is concentrated in the major biconnected component smaller ones can be knocked around by minor movements of the largest component. This can lead to long convergence times. By solving all biconnected components separately and then reassembling the block tree at the end, the system can be converged considerably faster.
Setting mass to the absolute system force divided by the total nodes, often leads to faster convergence. As such When mass is left to the default of NULL, the mean absolute force value is used.
A list containing 5 dataframes.
The node embeddings. Includes all data on the nodes the forces exerted on them position and dynamics at simulation termination
The network dynamics describing several key figures of the network during the convergence process, this includes the static_force
memory_df A dataframe recording the iteration history of the convergence of each component.
Time taken. A data frame giving the time taken for the simulation as well as the number of nodes and edges. Node and edge data is given as this may differ from the total number of nodes and edges in the network depending on the method used for convergence. For example if setse_bicomp is used then some simulations may contain as little as two nodes and 1 edge
The edge embeddings. Includes all data on the edges as well as the strain and tension values.
Other setse:
setse_auto_hd()
,
setse_auto()
,
setse_expanded()
,
setse()
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network setse_bicomp(., force = "class_A")
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network setse_bicomp(., force = "class_A")
This is a special case function of SETSe which keeps the history of all node movements during convergence0. It is useful for demonstrations, or parametrising difficult networks.
setse_expanded( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 20000, coef_drag = 1, tol = 1e-06, sparse = FALSE, verbose = TRUE, two_node_solution = TRUE )
setse_expanded( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 20000, coef_drag = 1, tol = 1e-06, sparse = FALSE, verbose = TRUE, two_node_solution = TRUE )
g |
An igraph object. The network |
force |
A character string |
distance |
A character string. The name of the graph attribute that contains the graph distance |
edge_name |
A character string. This is the edge attribute that contains the edge_name of the edges. |
k |
A character string. This is k for the moment don't change it. |
tstep |
A numeric. The time in seconds that elapses between each iteration |
mass |
A numeric. The mass in kg of the nodes, this is arbitrary and commonly 1 is used. |
max_iter |
An integer. The maximum number of iterations before terminating the simulation |
coef_drag |
A numeric. A multiplier used to tune the damping. Generally no need to twiddle |
tol |
A numeric. Early termination. If the dynamics of the nodes fall below this value the algorithm will be classed as "converged" and the simulation terminates. |
sparse |
Logical. Whether or not the function should be run using sparse matrices. must match the actual matrix, this could prob be automated |
verbose |
Logical value. Whether the function should output messages or run quietly. |
two_node_solution |
Logical. The Newton-Raphson algo is used to find the correct angle |
A dataframe equivalent to the node_embeddings dataframe for the other SETSe methods. However, the dataframe includes a row for each node in each iteration of the simulation, as well as an additional column identifying the iteration number. This dataframe can be very large as it contains nxm rows where n is the number of nodes and m is the number of iterations in the simulation.
Other setse:
setse_auto_hd()
,
setse_auto()
,
setse_bicomp()
,
setse()
g_prep <- biconnected_network%>% prepare_edges(.) %>% prepare_continuous_force(., node_names = "name", force_var = "force", k = NULL) #the base configuration does not work divergent_result <- setse_expanded(g_prep, k = "weight", tstep = 0.1) #with a smaller timestep the algorithm converges convergent_result <- setse_expanded(g_prep, k = "weight", tstep = 0.01) ## Not run: library(ggplot2) #plot the results for a given node convergent_result %>% ggplot(aes(x = t, y = net_force, colour = node)) + geom_line() #re-plot with divergent_result to see what it looks like ## End(Not run)
g_prep <- biconnected_network%>% prepare_edges(.) %>% prepare_continuous_force(., node_names = "name", force_var = "force", k = NULL) #the base configuration does not work divergent_result <- setse_expanded(g_prep, k = "weight", tstep = 0.1) #with a smaller timestep the algorithm converges convergent_result <- setse_expanded(g_prep, k = "weight", tstep = 0.01) ## Not run: library(ggplot2) #plot the results for a given node convergent_result %>% ggplot(aes(x = t, y = net_force, colour = node)) + geom_line() #re-plot with divergent_result to see what it looks like ## End(Not run)
The basic setse function with added timestep adjustment. The time shift functionality automatically adjusts the timestep if the convergence process is noisy
setse_shift( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 20000, coef_drag = 1, tol = 1e-06, sparse = FALSE, two_node_solution = TRUE, sample = 1, static_limit = NULL, tstep_change = 0.5 )
setse_shift( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, mass = 1, max_iter = 20000, coef_drag = 1, tol = 1e-06, sparse = FALSE, two_node_solution = TRUE, sample = 1, static_limit = NULL, tstep_change = 0.5 )
g |
An igraph object |
force |
A character string. This is the node attribute that contains the force the nodes exert on the network. |
distance |
A character string. The edge attribute that contains the original/horizontal distance between nodes. |
edge_name |
A character string. This is the edge attribute that contains the edge_name of the edges. |
k |
A character string. This is k for the moment don't change it. |
tstep |
A numeric. The time interval used to iterate through the network dynamics. |
mass |
A numeric. This is the mass constant of the nodes in normalised networks this is set to 1. |
max_iter |
An integer. The maximum number of iterations before stopping. Larger networks usually need more iterations. |
coef_drag |
A numeric. |
tol |
A numeric. The tolerance factor for early stopping. |
sparse |
Logical. Whether or not the function should be run using sparse matrices. must match the actual matrix, this could prob be automated |
two_node_solution |
Logical. The Newton-Raphson algo is used to find the correct angle |
sample |
Integer. The dynamics will be stored only if the iteration number is a multiple of the sample. This can greatly reduce the size of the results file for large numbers of iterations. Must be a multiple of the max_iter |
static_limit |
Numeric. The maximum value the static force can reach before the algorithm terminates early. This prevents calculation in a diverging system. The value should be set to some multiple greater than one of the force in the system. If left blank the static limit is twice the system absolute mean force. |
tstep_change |
a numeric scaler. A value between 0 and one, the fraction the new timestep will be relative to the previous one this can stop the momentum of the nodes forcing a divergence, but also can slow down the process. default is TRUE. |
This is the basic SETS embeddings algorithm, it outputs all elements of the embeddings as well as convergence dynamics. It is a
wrapper around the core SETS algorithm which requires data preparation and only produces node embeddings and network dynamics.
There is little reason to use this function as setse_auto
and setse_bicomp
are faster and easier to use.
A list of three elements. A data frame with the height embeddings of the network, a data frame of the edge embeddings as well as the convergence dynamics dataframe for the network.
## Not run: biconnected_network %>% prepare_continuous_force(., node_names = "name", force_var = "force") %>% #embed the network using setse setse_shift(., k = "weight", tstep = 0.000029) ## End(Not run)
## Not run: biconnected_network %>% prepare_continuous_force(., node_names = "name", force_var = "force") %>% #embed the network using setse setse_shift(., k = "weight", tstep = 0.000029) ## End(Not run)