1. WHAT IS A NETWORK?
“A network or graph is
dened as a collection of n nodes connected by m edges. A network can
be directed, meaning the
edges point in one direction, or undirected, meaning the edges go in both
directions. The edges can join more than two vertices together. Such graphs are
called hypergraphs. The edges can be weighted, contain self loops, and have dierent
properties within the edges or nodes.”
Our
Explanation: To better understand the
network perspective, consider the social network of Twitter users shown in the
following figure. It is an example of a sociogram, also called a network graph,
which is a common way of visualizing networks. Like all networks, it consists
of two primary building blocks: vertices (also called nodes or agents) and
edges (also called ties or connections). The vertices are represented by images
of the Twitter users, and the edges are represented by the lines that point
from one vertex to another. The size of each Twitter user’s profile image is
determined by the user’s total number of tweets as reported by the Twitter
Application Programmer Interface (API), which gives sophisticated users access
to powerful services. This is one example of how attribute data (e.g., data
that describe a person) can be overlaid onto a network. A line, or edge, exists
between two people when one “follows” the other or if one user “mentions” or
“replies” to the other. All of these connections in aggregate reveal the
emergent structure of two distinct groups with few connecting links. This
accurately represents the way the workshop brought together previously separate
clusters of people from different disciplines. It also helps identify individuals
who fill important positions in the network, such as those who many people
follow and those
who are connected to both
clusters.
2. WHAT ARE ITS
COMPONENTS?
A. VERTICES
“In graph theory,
a vertex (plural vertices) or node is the fundamental unit out of which
graphs are formed: an undirected
graph consists of a set of vertices and a set of edges (unordered
pairs of vertices), while a directed graph consists of a set of vertices and a
set of arcs (ordered pairs of vertices). From the point of view of graph
theory, vertices are treated as featureless and indivisible objects, although
they may have additional structure depending on the application from which the
graph arises; for instance, a semantic
network is a graph in which the vertices represent concepts or
classes of objects.”
Our Explanation: Vertices,
also called nodes, agents, entities, or items, can represent many things. Often
they represent people or social structures such as workgroups, teams, organizations,
institutions, states, or even countries. At other times they represent content
such as web pages, keyword tags, or videos. They can even represent physical
or virtual locations or events. They often correspond with the
primary building blocks of social media platform, friends in social networking
sites, and posts or authors in blogs.
Although not necessary for network analysis, having
attribute data that describe each of the vertices can add insights to the
analysis and visualizations. For example, the figure shown above used descriptive attribute data about the total number of
posts to convey a sense of who is most active on Twitter. Other attribute data
from Twitter, such as
the number of followers, people they follow, and
their join date, can also be mapped to visual attributes. More generally,
attribute data may describe demographic characteristics of a person (age,
gender, race), data that describe the person’s use of a system (number of
logins, messages posted, edits made) or other characteristics such as income or
location. In network visualization tools like NodeXL, attribute data can be
mapped to visual properties such as the size, color, or opacity of the
vertices.
B. EDGES
“An edge can thus
be defined as a set of two vertices or an ordered pair, in the case of a directed graph. An edge (a set of two elements) is drawn as a line connecting two vertices, called endpoints or (less often) endvertices. An edge with
endvertices x and y is denoted by xy (without any symbol in between). The edge set of G is
usually denoted by E(G), or E when
there is no danger of confusion.
The size of a graph is the number of its edges,
i.e. |E(G)|.”
Our
explanation: Edges, also known as links, ties,
connections, and relationships, are the building blocks of networks. An edge
connects two vertices together. Edges can represent many different types of
relationships like proximity, collaborations, kinship, friendship, trade
partnerships, citations, investments, hyperlinking, transactions, and shared
attributes. A tie can be said to exist if it has some official status, is
recognized by the participants, or is
observed by exchange or interaction between them. A tie is any form
of relationship or connection between two entities. Undirected or directed edges are the two major types of connections. Directed edges (also
known as asymmetric edges) have a clear origin and destination: money is lent
from one person to another, a Twitter user follows another user, an email is
sent to a recipient, or a web page links to another web page. They are
represented on a graph as a line with an arrow pointing from the source vertex
to the recipient vertex. Directed edges may be reciprocated or not. If I sent
you a message you may send one back in return, or not. An undirected edge (also
known as a symmetric edge) simply exists between two people or things: a couple
is married, two Facebook users are friends, or two people are members of the same
organization. No origin or destination is clear in these mutual relationships.
They cannot exist unless they are reciprocated. Undirected edges are
represented on a graph as a line connecting two vertices with no arrows.
C.
HOW ARE THE DATA REPRESENTED?
Because network data differ from attribute data,
there are different ways of representing it.
A. MATRIX:
With attribute data, it is common to create a data
matrix where each row represents an individual and each column represents
individuals’ characteristics, behaviors, or answers to survey questions. A
related approach can be used to represent relational data. Like attribute
matrices, each row represents an individual in the network. However, unlike
attribute matrices, each column also represents an individual as shown in the
following table:
|
An
|
Bob
|
Carol
|
Ann
|
0
|
1
|
1
|
Bob
|
0
|
0
|
0
|
Carol
|
1
|
0
|
0
|
B.
EDGE LIST
An alternate network representation is called an
“edge list.” Like its name suggests, it is simply a list of all edges in the
network as shown in the next table. This is the same network as shown in the
previous Table. Individuals in the Vertex1 column “point to” those in the
Vertex2 column. Unless data describing the value of each edge are provided in
additional columns, the network is implied to be a binary one.
Vertex
1
|
Vertex
2
|
Ann
|
Bob
|
Ann
|
Carol
|
Carol
|
Ann
|
3.
WHAT ARE THE TYPES OF NETWORK?
A.
FULL AND PARTIAL NETWORK
A full or complete network contains all
the people or entities of interest and the connections among them. All egos are
treated equally. A full network is often created and available when a single
system, such as a social media platform, acts as a hub among a group of connected
people or groups. For example, the Twitter network includes all users of the
service and the connections between them. In practice, it is not always feasible
(or particularly insightful) to analyze a full network. Instead, analysts
create a partial network by selecting a sample or slice of the full network.
B.
EGOCENTRIC NETWORK
Network analysts call the individual that
is the focus of attention “ego” and the people he or she is connected to “alters.” Some networks, called
egocentric networks, only include individuals who are connected to a specified
ego. For example, a network of your personal Facebook friends would be an
egocentric network because you are, by definition, connected to all other
vertices. Other egocentric networks and their associated “subgraphs” may extend out from an ego, reaching not only
friends, but also friends of friends. More generally, egocentric networks can
extend out any number of “degrees” from ego. The basic “1-degree” ego network
consists of the ego and their alters. The “1.5-degree” ego network extends the
1-degree network by including connections between all of the alters. For example,
a Facebook 1.5 degree ego network would characterize which of your friends know
each other.
C.
UNIMODAL NETWORK
The networks that contain same type of
entity are called unimodal
networks because they include one type (i.e., mode)
of vertex. They connect users to users or they connect documents to documents,
but they don’t include both users and documents.
D.
MULTIMODAL NETWORK
These networks include different types of
vertices creating multimodal
networks. For example, a network may connect users
to discussion forums and blog posts they have commended on. Each vertex on the
graph would represent a user, a forum, or a blog post, which could be visually
distinguished by different colors or shapes.
E.
AFFILIATION
Data for multimodal networks often include
individuals and some event, activity, or content with which they are
affiliated, creating an affiliation
network. For example, an affiliation network may
connect users with wiki pages they edit. People are affiliated with pages. In
this network, no two users would directly connect to each other. Likewise, no two
wiki pages would directly connect to each other in this type of network.
F.
MULTIPLEX NETWORK
Some networks have multiple types of connections, called multiplex networks. For example, the Twitter network
may include three types of directed edges: following relationships, “reply to” relationships,
and “mention” relationships. The graph could have uniquely represented each
type of edge by using color, different edge types (e.g., dotted lines, solid lines),
or edge labels.
4.
NETWORK ANALYSIS METRICS
A.
AGGREGATE NETWORK METRICS
A number of metrics describe entire
networks. In some cases, a single network is broken into several disconnected pieces,
called components. Some aggregate network metrics only work on networks where
all of the vertices are connected in a single component, whereas others can be
applied to entire networks even if they are split up.
For example,
Centralization is an aggregate metric that characterizes the amount to which
the network is centered on one or a few important nodes. Centralized networks
have many edges that emanate from a few important vertices, whereas
decentralized networks have little variation between the numbers of edges each
vertex possesses.
B.
VERTEX SPECIFIC NETWORK METRICS
Another set of
metrics identifies individuals’ positions within a network. Paramount among
these is the set of centrality measures, which describe how a particular vertex
can be said to be in the “middle” of a network. It emerges from the concept
that A person with fewer connections might have more “important” connections
than someone with more connections. One connection can be more important than
another in different ways. Some are better because they bridge across otherwise
separated portions of the network, whereas others are important because they
connect to wellconnected people. The following centrality metrics provide
quantifiable measures for these concepts:
I.
Degree Centrality
II.
Betweenness Centralities: Bridge Scores for
Boundary Spanner
III.
Closeness Centrality: Distance Scores for Broadly
Connected People
IV.
Eigenvector Centrality: Influence Scores for
Strategically Connected People
C.
CLUSTERING AND COMMUNITY
DETECTION ALGORITHM
A network approach contrasts with those that presume the
existence and boundaries of groups. In a network perspective, people occupy
many relationships and are potentially members in many groups and less defined
clusters. Defining exact boundaries in networks may be difficult, reflecting
the reality of multiple and shifting memberships. From a network perspective, a
group is a collection of vertices that are more connected to one another than
they are to others. Relatively more cohesive or densely connected sets of
vertices form regions, also called clusters, that may reflect the existence of
groups without regard to whether they are officially recognized or even if
members recognize their connections to one another. A rapidly growing body of
research describes clustering algorithms, also called community detection
algorithms, that automatically identify these clusters based on networks
structures.