Semantic Network Analysis

Andrej Mrvar & Klement Podnar

Semantic network analysis (SNA) is a theoretical framework and a research method. It is used when we want to visually present a set of concepts that are semantically related to one another. It offers quantitative and qualitative analysis and interpretation of large-network semantic structures. Semantic networks are cognitively based, and they usually refer to the pairwise relations/connections between given words. As such, they represent semantic relations among concepts and can be understood as one of the forms of knowledge representation. They allow us to use a “natural language,” where we can focus on the relations between the words used and the shared meaning among participants.

In the context of corporate reputation, SNA can be applied in different ways and for different purposes, but it is particularly useful when we analyze stakeholders’ perceptions, corporate associations, evaluations, complaints, and other cognitive structures concerning corporations or brands. It can also be used to investigate the semantic meanings ascribed to the concept of corporate reputation or any other related phenomenon. Compared with other methods for measuring corporate reputation, SNA has an important advantage. Because it is based on data gathered by a free-association technique, it allows insight into the corporate reputation dimensions, simply because it goes beyond calculating the dimensions through indicators determined in advance, which may not be relevant at a given time or place. It also allows a relevant and detailed observation of the structure of corporate reputation associations.

This entry provides an overview of semantic networks, and then it provides a description of five different examples for conducting an SNA. The entry ends with a discussion of how SNA has been used in the context of corporate reputation.

Semantic Networks

Concepts in semantic networks can represent a variety of entities. Entities can be found in everyday life—for example, ontologies as networks, the network of word associations, the concept of food quality, and corporate associations. On the other hand, entities can be found in science as well—for example, the connections among scientific papers in selected scientific disciplines. Such representations were first used by Charles S. Peirce in 1909. He called them existential graphs.

Networks in general consist of a set of vertices (sometimes also called nodes) and a set of lines. Concepts in semantic networks are represented by vertices and relations by undirected (also called edges) or directed lines (arcs). The relation “is similar to” is undirected and therefore represented by edges, while the relation “is a part of” is directed and as such represented by arcs. Relations in semantic networks can represent explanations, similarities, associations, and so on.

To analyze the semantic content of a particular concept of associations and to identify which association’s respondents are most commonly related to a particular concept, semantic networks can be analyzed using the methods of social network analysis. Simple or more sophisticated network analysis approaches can be used. Two aspects of network analysis are particularly important for our purposes—centrality and cohesion.

Centrality

Centrality relates to the question of what are the most central (important) vertices (terms, concepts) in a network. Several centrality measures are available. The most commonly used are degree, closeness, and betweenness. When we deal with directed networks, we can further divide centrality measures to input and output centrality measures. For directed networks, the term prestige (also prominence) is often used instead of centrality. Input measures are also called measures of support, and output measures are called measures of influence.

Degree Centrality

Degree centrality is the simplest measure; the vertex is central in a network, if it is active enough in the sense that it has many links to other vertices. It is computed by simply counting incoming or outgoing arcs (edges).

Closeness Centrality

Closeness centrality measures how close the selected vertex is to all other vertices in the network (in how many steps the selected vertex can reach all other vertices or in how many steps the selected vertex can be reached from other vertices). This measure is preferable to degree centrality because it takes into account not only the direct connections among vertices but also the indirect connections.

Betweenness Centrality

In the case of communication networks, the distance from other vertices is not the only important property of a vertex to be considered as central. Vertices that lie on many of the shortest paths among pairs of other vertices are considered as more central. Such vertices have control over the flow of information in the network. Betweenness centrality of the selected vertex is the sum of probabilities across all possible pairs of other vertices that the shortest path between them will pass through the selected vertex.

Cohesion

Cohesion tells us what the most densely connected groups (also called clusters or communities) are. There are several ways to find cohesive groups in a network; some of them are components, cliques, cores, and communities.

Components

A group of vertices in a network is called a strongly connected component when from every vertex of the group every other vertex in the same group can be reached (by taking the directions of lines into account). If the direction of lines is not important (the network is considered to be undirected), such a group is called a weakly connected component.

A vertex whose removal from a network causes that network to become disconnected (the network falls into several pieces when the vertex is removed) is called an articulation point. A biconnected component is a group of vertices that does not contain any articulation point—no vertex is crucial for the group connectedness.

Cores and Cliques

A group of vertices is called a k-core if every vertex from the group is connected to at least k other vertices from the same group. If every vertex from the group is connected to all other vertices in the group, such a group is called a clique.

Communities

Communities are dense groups for which there exist more lines inside the group than between groups. When searching for communities, we try to maximize the modularity. Modularity is (simply stated) a normalized difference between the number of lines inside the group and those outside the group. There are several methods available to find such communities—for example, the Louvain method and VOS (verb-object-subject) clustering.

Concepts can represent a variety of entities. Entities can be found in everyday life—for example, ontologies as networks, networks of word associations, the concept of food quality, and marketing. On the other hand, entities can be found in science as well—for example, the connections among scientific papers in selected scientific disciplines. This entry aims to give a basic insight into semantic networks by discussing the results obtained using social network analysis approaches. In general, two main aspects of network analysis are used. The first is centrality, which tries to answer the question “What are the most central (important) vertices (terms, concepts)?” The most central vertices obtained by checking several centrality measures are reported. The second aspect is cohesion. It tells us what the most densely connected groups (also called clusters or communities) are. Cohesive groups obtained by using the Louvain method and VOS clustering are visualized. The main topics of network analysis and commands in Pajek that were used to get the results are not reported here. The reader can find explanations of basic network analysis concepts and the commands needed to be executed in works listed in the Further Readings at the end of this entry.

From Words/Concepts to Networks

Dictionary Network

To get some insight into networks, let’s first generate a large “nonsemantic” dictionary network. Networks can be generated from words in a dictionary in different ways. In Knuth’s English Dictionary, there exist 52,652 English words having from two to eight letters. Two words are connected by an edge if one can be reached from the other by

changing a single letter (e.g., “wine”—“wide”)
adding/removing a single letter (e.g., “ever”—“fever”)

The network obtained has 89,038 edges. The network is very sparse: Its density is 0.0000642. Figure 1 shows the shortest path for changing water to wine by changing/adding/removing a single letter in each step. The length of the shortest path is four—four local transformations are needed.

Figure 1 Transforming Wine to Water in Four Steps (by Changing, Adding, or Removing a Single Letter in Each Step) in Knuth’s Dictionary Network

Network of Associations

A network of associations is collected in the following way: Researchers say a selected word to several people and ask them to say the word that comes into their minds when they hear the selected word. Such connections among words are called associations. The Edinburgh Associative Thesaurus is a set of word association norms showing the counts of word associations as collected from students. It is a large network—it contains 23,219 vertices (words) and 325,624 arcs (associations). Figure 2 shows a community in this network that was found using VOS clustering.

Figure 2 Cluster in a Word Associations Network Obtained by VOS Clustering

Semantic Networks Obtained From Online Dictionaries

Several online dictionaries are available on the website in which each term is described using other terms. Two such dictionaries are presented for the purposes of this entry: (1) ODLIS: Online Dictionary of Library and Information Science and (2) FOLDOC: Free On-Line Dictionary of Computing. Only a few results of the network analysis approaches to analysis of the two dictionaries are presented here.

Online Dictionary of Library and Information Science

ODLIS includes the terminology of library science and information studies. Explanations of terms that appear in the publishing, printing, binding, and book trades are also available. The ODLIS network includes 2,909 vertices (terms) and 18,419 arcs (explanations). The centrality measures are as follows:

*Input:*Terms that are most often used to explain other terms are book, library, work, printing, and publishing.
Output: Terms that are not easy to explain (several other terms use these terms for their explanation) or terms that have several different meanings and/or explanations are periodical, catalog, bibliography, index, title, editor, journal, illustration, and serial.

Figure 3 shows a community obtained by the Louvain method from the ODLIS dictionary. For simplicity, bidirectional arcs are replaced by edges. The term intellectual freedom is an articulation point, while the most central word according to input and output measures in this community is censorship.

Figure 3 Selected Community in Online Dictionary of Library and Information Science

Free On-Line Dictionary of Computing

FOLDOC is a similar dictionary on computing. It contains vocabulary that a person would expect to find in a computer dictionary—for example, computer, operation systems, programming languages, tools, networking, architecture, mathematics, electronics, standards, telecoms. The FOLDOC dictionary is much larger than ODLIS—it contains 13,425 terms and 125,302 arcs. The centrality measures are as follows:

Input: Terms that are most often used to explain other terms in computer science are Unix, C, Usenet, IBM, Internet, operating system, protocol, MS-DOS, standard, ASCII, Macintosh, algorithm, and IBM PC.
Output: Terms that are not easy to explain in computer science or terms that have several different meanings and/or explanations are ASCII, operating system, Commonwealth Hackish, University of Edinburgh, chat, symbolic mathematics, Amiga, GCC, and W2K.

Figure 4 shows a densely connected part of the network extracted by VOS clustering. There are no articulation points in this community.

Figure 4 Small Cluster in Free On-Line Dictionary of Computing Obtained by VOS Clustering

Wordnet

Wordnet is one of the most often cited real semantic networks. It is a lexical database for the English language. It groups nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms (synsets), each expressing a distinct concept.

The network contains 111,223 vertices and 137,279 lines (some arcs, some edges). It was already proven that Wordnet has properties that are characteristic of small-world networks—that is, the network is sparse, the average path length between concepts is short, and local clustering is strong.

Figure 5 shows a community (obtained by VOS clustering) representing entailment pointers (the verb Y is entailed by X if by doing X you must be doing Y, e.g., “to sleep” is entailed by “to snore”). The community has a treelike structure—it does not contain any strongly connected component (cycle).

Figure 5 Entailment Pointers in Wordnet Lexical Database

Social Networks Analysis Software

Several programs for analysis and visualization of networks are now available. Some of the most widely used are Ucinet, Pajek, igraph, visone, gephi, ORA, Networkx, NodeXL, statnet, and Siena. In this entry, results and visualizations obtained by the program Pajek (Slovenian word for “spider”) were used, since Pajek is the only general program on the market that can handle really large networks (networks having up to 1 billion vertices and lines). The program is available for free (for noncommercial use) on its website. The Pajek website also contains the networks used in this entry: the Knuth’s English Dictionary network, the Edinburgh Associative Thesaurus network, and the ODLIS, FOLDOC, and Wordnet networks.

Semantic Network Analysis and Corporate Reputation

Pajek has also been used in attempting to capture the concepts that individuals associate with the term reputation. According to the semantic network analyses performed by Klement Podnar, Urška Tuškej, and Urša Golob, there are a few strong determinants of reputation, which mainly coincide with the established reputational measures. In their analyses, the authors revealed that the network of reputable corporate associations consisted of 187 reputable associations, which were connected with 474 lines, and the density of reputation associative network was only 0.014. This means that the individuals in this study listed many particularly free associations on corporate reputation, but there were very few associations that were shared by many respondents.

By using the k-core method, the key reputable corporate associations in this study were good business performance, quality, good attitude to consumers, and recognition. By calculating cliques on four vertices, a number of cliques show which associations represent key associative knots in the highest number of respondents; these associations were quality, good business performance, good attitude toward employees, good attitude toward consumers, recognition, fairness, innovation, positive media image, social responsibility, and quality products/services. Good business performance and quality were found in most cliques. Articulation points revealed that for reputable associative networks to exist, the most important association is good business performance, followed by quality, innovation, and profit. Measures of centrality also confirmed that good business performance is the most central association in reputation semantic network analyses, followed by quality and good attitude toward employees.

Batagelj, V., & Mrvar, A. (1998). Pajek: A program for large network analysis. Connections, 21, 47–57.

Batagelj, V., Mrvar, A., & Zaveršnik, M. (2002). Network analysis of texts. In T. Erjavec & J. Gros (Eds.), Proceedings of the Fifth International Multi-Conference Information Society: Language Technologies (pp. 143–148). Ljubljana, Slovenia: Jezikovne Tehnologije.

de Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek (Rev. and expanded 2nd ed.). New York: Cambridge University Press.

Doerfel, M. L., & Barnett, G. A. (1999). A semantic network analysis of the International Communication Association. Human Communication Research, 25, 589–603.

Drieger, P. (2013). Semantic network analysis as a method for visual text analytics. Procedia—Social and Behavioral Sciences, 79, 4–17.

Fellbaum, F. (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.

Kiss, G. R., Armstrong, C., Milroy, R., & Piper, J. (1973). An associative thesaurus of English and its computer analysis. In A. J. Aitken, R. W. Bailey, & N. Hamilton-Smith (Eds.), The computer and literary studies (pp. 153–165). Edinburgh, UK: University Press.

Knuth, D. E. (1993). The Stanford GraphBase. New York: ACM Press.

Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.

Mrvar, A., & Batagelj, V. (2015). Pajek and PajekXXL: Programs for analysis and visualization of very large networks (reference manual, list of commands with short explanation). Retrieved January 27, 2016, from http://mrvar.fdv.uni-lj.si/pajek/pajekman.pdf

Podnar, K., Tuskej, U., & Golob, U. (2012). Mapping semantic meaning of corporate reputation in global economic crisis context: A Slovenian study. Public Relations Review, 38, 906–915.

Princeton University. (2010). Wordnet: Princeton University “About WordNet.” Retrieved January 27, 2016, from http://wordnet.princeton.edu/

Quillian, M. R. (1968). Semantic memory. In M. Minsky (Ed.), Semantic information processing (pp. 227–270). Cambridge: MIT Press.

Sowa, J. F. (2015). Semantic networks. Retrieved January 27, 2016, from http://www.jfsowa.com/pubs/semnet.htm

Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science, 29, 41–78.

van Atteveldt, W. (2008). Semantic network analysis: Techniques for extracting, representing, and querying media content. Charleston, SC: BookSurge.

OCR Premium Content

The OCR Glossary