--- title: "Working with Allbus 2010 ego-centered network data using egor" author: "Till Krenz" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with Allbus 2010 ego-centered network data using egor} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5 ) ``` ***Note: The data used in this vignette is simulated based on the the original Allbus 2010 SPSS data by GESIS. The dataset simulates 100 respondents and does not resemble any actual Allbus respondents. Each variable is randomly generated based on the range of the original variables, co-variations of variables are disregarded. The data's purpose is purely to demonstrate how to technically work with the Allbus data using egor and R - no analytical assumptions should be made based on this data! The code in this vignette works with the original Allbus 2010 data, that can be acquired [here](https://www.gesis.org/en/allbus/allbus-home).*** ## The Allbus 2010: ego-centered network data The Allbus 2010 splits the respondents into two groups. Both groups were presented different name generators. - Allbus name generator - the generated alters are called "Freunde" (friends in German) in the data (max. 3 persons, "spent time with in private, not living in same household") - GSS name generator - these alters are called "Kontakte" (contacts in German) in the dataset (max. 5 persons, "discussed important matters") For more information please consult the Allbus documentation. ## Load packages and data ```{r message=FALSE, warning=FALSE} library(egor) library(purrr) library(haven) ``` In addition to *egor*, this vignette uses the *haven* package, to import the SPSS file of the Allbus 2010 and the *[purrr](https://purrr.tidyverse.org/)* package, that provides enhanced functional programming functions. The *purrr* functions used in this vignette are _map*()_ functions, which are similar in their functionality to base R's *lapply()*. When using *haven* to import the original Allbus data, that would look like this. ``` raw_data <- read_sav("ZA4610_A10.SAV") ``` For the purpose of the vignette we are loading a simulated data instead. ```{r} data("allbus_2010_simulated") raw_data <- allbus_2010_simulated ``` The Allbus variable names are quite technical ranging from V1 to V981. Fortunately the *haven* data import preserves the SPSS variable labels, that describe each variable in more detail. We are going to convert these labels into a format, that allows us to use them as variable names. The code below extracts all variable labels and eliminates all non-characters from the labels and substitutes spaces with underscores. ```{r} var_labels <- map_chr(raw_data, ~attr(., "label")) var_labels <- gsub("[,\\.:;>% filter(FRAGEBOGENSPLIT_F020 == 1) ``` Now we use the *onefile_to_egor()* function to convert the data to an egor object. This function needs a few arguments in order for it to be able to locate the alter data and alter-alter tie data in the dataset. ```{r} e_freunde <- onefile_to_egor( egos = split_freunde, ID.vars = list(ego = "IDENTIFIKATIONSNUMMER_DES_BEFRAGTEN"), netsize = split_freunde$ANZ_GENANNTER_NETZWERKPERS_SPLIT_1, attr.start.col = "GESCHLECHT", attr.end.col = "SPANNUNGEN_KONFLIKTE2", aa.first.var = "KENNEN_SICH_A_B", max.alters = 3) ``` The *onefile_to_egor()* function prints some messages during the conversion, that are supposed to help us to identify problems in case something something goes wrong. We also see a NOTE, that tells us that we need to filter out invalid alter-alter ties. In this case those are ties with a weight of 2, since Allbus codes not existing ties with 2 here. ```{r} attr(raw_data$KENNEN_SICH_A_B, "labels") ``` "KENNEN SICH NICHT" means "don't know each other" in german. We can filter the alter-alter ties using the *activate()* and *filter()* functions. ```{r} e_freunde <- e_freunde %>% activate(aatie) %>% filter(weight != 2) %>% activate(ego) ``` Next we repeat the same steps for split 2. Here we need to filter out the weight value 3 from the alter-alter ties and of adjust some arguments according to the position of the data in the dataset and the maximum amount of alters that the respondents were allowed to nominate. ```{r} split_kontakte <- raw_data %>% filter(FRAGEBOGENSPLIT_F020 == 2) e_kontakte <- onefile_to_egor( egos = split_kontakte, ID.vars = list(ego = "IDENTIFIKATIONSNUMMER_DES_BEFRAGTEN"), netsize = split_kontakte$ANZ_GENANNTER_NETZWERKPERS_SPLIT_2, attr.start.col = "GESCHLECHT3", attr.end.col = "SPANNUNGEN_KONFLIKTE7", aa.first.var = "KENNEN_SICH_KONTAKT_A_B", max.alters = 5) e_kontakte <- e_kontakte %>% activate(aatie) %>% filter(weight != 3) %>% activate(ego) ``` ## Visualize and analyze Now we can visualize and analyze the Allbus data. A few demonstrations follow. For an overview of available options, please see the main vignette of egor "Using `egor` to analyse ego-centered network data". ```{r} plot(e_freunde, ego_no = 4, x_dim = 2, y_dim = 1) plot(e_kontakte, ego_no = 4, x_dim = 2, y_dim = 1) ``` ```{r} e_freunde <- e_freunde%>% activate(alter) %>% mutate(WO_GEBOREN = droplevels(as_factor(WO_GEBOREN)), KONTAKTE = droplevels(as_factor(KONTAKTE))) plot_egograms(e_freunde, ego_no = 4, x_dim = 1, y_dim = 1, venn_var = "KONTAKTE", pie_var = "WO_GEBOREN") e_kontakte <- e_kontakte %>% activate(alter) %>% mutate(WO_GEBOREN = droplevels(as_factor(WO_GEBOREN)), KONTAKTE = droplevels(as_factor(KONTAKTE))) plot_egograms(e_kontakte, ego_no = 4, x_dim = 1, y_dim = 1, venn_var = "KONTAKTE" , pie_var = "WO_GEBOREN") ```