Saturday, April 7, 2012

NODEXL @ VGSOM ( Power Point Presentation)

Terms and Concepts required to analyze and interpret NodeXL output


“A network or graph is dened as a collection of n nodes connected by m edges. A network can
be directed, meaning the edges point in one direction, or undirected, meaning the edges go in both directions. The edges can join more than two vertices together. Such graphs are called hypergraphs. The edges can be weighted, contain self loops, and have dierent properties within the edges or nodes.”

Our Explanation: To better understand the network perspective, consider the social network of Twitter users shown in the following figure. It is an example of a sociogram, also called a network graph, which is a common way of visualizing networks. Like all networks, it consists of two primary building blocks: vertices (also called nodes or agents) and edges (also called ties or connections). The vertices are represented by images of the Twitter users, and the edges are represented by the lines that point from one vertex to another. The size of each Twitter user’s profile image is determined by the user’s total number of tweets as reported by the Twitter Application Programmer Interface (API), which gives sophisticated users access to powerful services. This is one example of how attribute data (e.g., data that describe a person) can be overlaid onto a network. A line, or edge, exists between two people when one “follows” the other or if one user “mentions” or “replies” to the other. All of these connections in aggregate reveal the emergent structure of two distinct groups with few connecting links. This accurately represents the way the workshop brought together previously separate clusters of people from different disciplines. It also helps identify individuals who fill important positions in the network, such as those who many people follow and those
who are connected to both clusters.


In graph theory, a vertex (plural vertices) or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges (unordered pairs of vertices), while a directed graph consists of a set of vertices and a set of arcs (ordered pairs of vertices). From the point of view of graph theory, vertices are treated as featureless and indivisible objects, although they may have additional structure depending on the application from which the graph arises; for instance, a semantic network is a graph in which the vertices represent concepts or classes of objects.”

Our Explanation: Vertices, also called nodes, agents, entities, or items, can represent many things. Often they represent people or social structures such as workgroups, teams, organizations, institutions, states, or even countries. At other times they represent content such as web pages, keyword tags, or videos. They can even represent physical
or virtual locations or events. They often correspond with the primary building blocks of social media platform, friends in social networking sites, and posts or authors in blogs.
Although not necessary for network analysis, having attribute data that describe each of the vertices can add insights to the analysis and visualizations. For example, the figure shown above used descriptive attribute data about the total number of posts to convey a sense of who is most active on Twitter. Other attribute data from Twitter, such as
the number of followers, people they follow, and their join date, can also be mapped to visual attributes. More generally, attribute data may describe demographic characteristics of a person (age, gender, race), data that describe the person’s use of a system (number of logins, messages posted, edits made) or other characteristics such as income or location. In network visualization tools like NodeXL, attribute data can be mapped to visual properties such as the size, color, or opacity of the vertices.

B.      EDGES
An edge can thus be defined as a set of two vertices or an ordered pair, in the case of a directed graph. An edge (a set of two elements) is drawn as a line connecting two vertices, called endpoints or (less often) endvertices. An edge with endvertices x and y is denoted by xy (without any symbol in between). The edge set of G is usually denoted by E(G), or E when there is no danger of confusion.
The size of a graph is the number of its edges, i.e. |E(G)|.”

Our explanation: Edges, also known as links, ties, connections, and relationships, are the building blocks of networks. An edge connects two vertices together. Edges can represent many different types of relationships like proximity, collaborations, kinship, friendship, trade partnerships, citations, investments, hyperlinking, transactions, and shared attributes. A tie can be said to exist if it has some official status, is recognized by the participants, or is
observed by exchange or interaction between them. A tie is any form of relationship or connection between two entities. Undirected or directed edges are the two major types of connections. Directed edges (also known as asymmetric edges) have a clear origin and destination: money is lent from one person to another, a Twitter user follows another user, an email is sent to a recipient, or a web page links to another web page. They are represented on a graph as a line with an arrow pointing from the source vertex to the recipient vertex. Directed edges may be reciprocated or not. If I sent you a message you may send one back in return, or not. An undirected edge (also known as a symmetric edge) simply exists between two people or things: a couple is married, two Facebook users are friends, or two people are members of the same organization. No origin or destination is clear in these mutual relationships. They cannot exist unless they are reciprocated. Undirected edges are represented on a graph as a line connecting two vertices with no arrows.

Because network data differ from attribute data, there are different ways of representing it.
With attribute data, it is common to create a data matrix where each row represents an individual and each column represents individuals’ characteristics, behaviors, or answers to survey questions. A related approach can be used to represent relational data. Like attribute matrices, each row represents an individual in the network. However, unlike attribute matrices, each column also represents an individual as shown in the following table:


An alternate network representation is called an “edge list.” Like its name suggests, it is simply a list of all edges in the network as shown in the next table. This is the same network as shown in the previous Table. Individuals in the Vertex1 column “point to” those in the Vertex2 column. Unless data describing the value of each edge are provided in additional columns, the network is implied to be a binary one.

Vertex 1
Vertex 2


A full or complete network contains all the people or entities of interest and the connections among them. All egos are treated equally. A full network is often created and available when a single system, such as a social media platform, acts as a hub among a group of connected people or groups. For example, the Twitter network includes all users of the service and the connections between them. In practice, it is not always feasible (or particularly insightful) to analyze a full network. Instead, analysts create a partial network by selecting a sample or slice of the full network.

Network analysts call the individual that is the focus of attention “ego” and the people he or she is  connected to “alters.” Some networks, called egocentric networks, only include individuals who are connected to a specified ego. For example, a network of your personal Facebook friends would be an egocentric network because you are, by definition, connected to all other vertices. Other egocentric networks and their associated “subgraphs”  may extend out from an ego, reaching not only friends, but also friends of friends. More generally, egocentric networks can extend out any number of “degrees” from ego. The basic “1-degree” ego network consists of the ego and their alters. The “1.5-degree” ego network extends the 1-degree network by including connections between all of the alters. For example, a Facebook 1.5 degree ego network would characterize which of your friends know each other.

The networks that contain same type of entity are called unimodal networks because they include one type (i.e., mode) of vertex. They connect users to users or they connect documents to documents, but they don’t include both users and documents.

These networks include different types of vertices creating multimodal networks. For example, a network may connect users to discussion forums and blog posts they have commended on. Each vertex on the graph would represent a user, a forum, or a blog post, which could be visually distinguished by different colors or shapes.

Data for multimodal networks often include individuals and some event, activity, or content with which they are affiliated, creating an affiliation network. For example, an affiliation network may connect users with wiki pages they edit. People are affiliated with pages. In this network, no two users would directly connect to each other. Likewise, no two wiki pages would directly connect to each other in this type of network.

Some  networks have  multiple types of connections, called multiplex networks. For example, the Twitter network may include three types of directed edges: following relationships, “reply to” relationships, and “mention” relationships. The graph could have uniquely represented each type of edge by using color, different edge types (e.g., dotted lines, solid lines), or edge labels.

A number of metrics describe entire networks. In some cases, a single network is broken into several disconnected pieces, called components. Some aggregate network metrics only work on networks where all of the vertices are connected in a single component, whereas others can be applied to entire networks even if they are split up.
For example, Centralization is an aggregate metric that characterizes the amount to which the network is centered on one or a few important nodes. Centralized networks have many edges that emanate from a few important vertices, whereas decentralized networks have little variation between the numbers of edges each vertex possesses.

Another set of metrics identifies individuals’ positions within a network. Paramount among these is the set of centrality measures, which describe how a particular vertex can be said to be in the “middle” of a network. It emerges from the concept that A person with fewer connections might have more “important” connections than someone with more connections. One connection can be more important than another in different ways. Some are better because they bridge across otherwise separated portions of the network, whereas others are important because they connect to wellconnected people. The following centrality metrics provide quantifiable measures for these concepts:
                                         I.            Degree Centrality
                                       II.            Betweenness Centralities: Bridge Scores for Boundary Spanner
                                    III.            Closeness Centrality: Distance Scores for Broadly Connected People
                                     IV.            Eigenvector Centrality: Influence Scores for Strategically Connected People


A network approach contrasts with those that presume the existence and boundaries of groups. In a network perspective, people occupy many relationships and are potentially members in many groups and less defined clusters. Defining exact boundaries in networks may be difficult, reflecting the reality of multiple and shifting memberships. From a network perspective, a group is a collection of vertices that are more connected to one another than they are to others. Relatively more cohesive or densely connected sets of vertices form regions, also called clusters, that may reflect the existence of groups without regard to whether they are officially recognized or even if members recognize their connections to one another. A rapidly growing body of research describes clustering algorithms, also called community detection algorithms, that automatically identify these clusters based on networks structures.