The “Gender Offender” analysis: How and Why We Did It (Part Two)

Read Part One of this post here.

PART 2. Visualising Male Producer Networks

This aspect of the project was undertaken by Stuart Palmer.

This data on creative roles in films contains within it the information to describe the networked structure of creative team relationships embodied in the film set analysed. Social network analysis (SNA) provides methods for visualising these group relations, and through quantitative measures that characterise network features, provides methods for identifying strategically important components and participants in the network. It also therefore points to ways in which these networks can be most effectively “undone”.

For network visualisation of the connections between producers and other creative on a given film, a set of network ‘edge’ descriptors of the form ‘producer’ connects to ‘other creative’ was required. That is, descriptors for individual network edges of the form:

How We Did It

In order to produce the SNA visualisations some data cleaning was required on the names of people, as a small number of names were recorded differently for the same person where they had made creative contributions on multiple films.

Further preparation of the data was required to turn multi-word person names and film titles into single terms for ease of further processing.

The following step-by-step description details the process for film producers, but the same process was also followed for directors and writers.

The preceding data on creative roles was output as an MS Excel file, and the following data processing steps were completed in MS Excel before work on the visualisations could begin:

  • Build a list of all the creative names associated with each film title;
  • For each producer on a film (remember there is likely to be multiple producers), link the list of creatives on that film with each producer name;
  • Remove any producer-connects-to-producer self edges from the spreadsheet;
  • Export the set of producer-connects-to-other-creative edge descriptors as a comma-separated values (CSV) file (for network visualisation);
  • Compile a list of individual producer/creative names and their gender;
  • Export the producer/creative gender list as a CSV file (for network visualisation);
  • Attach a gender to each ‘end’ producer-connects-to-other-creative edge; and
  • Consolidate all edge descriptor information for each producer so descriptive statistics about the propensity of producers to work with either gender can be compiled.

The free and open source network visualisation software package Gephi was then used for network visualisations.

The following general steps were performed to produce the creative network visualizations:

  • The CSV file set of producer-connects-to-other-creative edge descriptors was imported into Gephi. Gephi parses and stores this information on two tables:
    1. A node table with a single entry for each producer/creative name included in the CSV file;
    2. An edge table with a single entry for each producer-connects-to-other-creative link included in the CSV file – the direction of the edge (source (producer) to target (other creative)) is implied by the source/target data order, and a weight is recorded for each edge that is equal to the number of times that the link occurs;
  • The CSV file containing the producer/creative gender list was imported into Gephi and attached to the node table;
  • The network was laid out using the OpenOrd algorithm, one of the standard layout algorithms supplied with Gephi, and one that highlights node clustering present in the network data;
  • Nodes were sized in proportion to their degree, which is the sum of all edge connections to/from a node;
  • Nodes were coloured by gender – with different colours for male and female;
  • Edges were coloured by mixing the colour of the source (producer) node and target (other creative) node; and
  • Curved edges were used, with the direction of the edge being clockwise from the source (producer) node to the target (other creative) node.

Using the Gephi layout procedure described above, the following network shows the connections between producers and other creatives in the entire data set used – purple nodes are male.

Using the Gephi layout procedure described above, and the source data for male creatives only, the following network shows the connections between male producers and other male creatives – pink nodes are male creatives who never worked with a female creative in the data set used.

Using the Visualizations to understand how change is possible

Network visualisations are useful for observing the implicit structure in the film data, and identifying the key connected creative players. By adding the dimension of gender to these visualizations we can clearly see the influence of gender on patterns of domination in the film industry. Our concern however was to see beyond these patterns and look for ways in which the data could suggest the most effective interventions in the changing the status quo.

There is some precedent in approaching network visualizations in this way. Crime experts and counter terrorist specialists have used “criminal network analysis” for example to identify opportunities to undermine the coherence of dominant groups.

Drawing on the literature on the use of social networks to characterise criminal networks and identify key nodes whose removal would disrupt the network (i.e., Borgatti, 2006; Rostami & Mondani, 2015; Schwartz & Rouselle, 2009), we investigated the network of male-only producers and other creatives immediately above. We used Borgatti’s network fragmentation factor (F) (equation 4 in (Borgatti, 2006)) as a quantitative measure of network disruption. F is 0 when there is no fragmentation (all nodes connected in a single component), and is 1 when all nodes are isolated. Gephi provides the data required to compute F for a given network configuration. F was calculated for the initial network, a candidate node was removed, and F was recalculated to assess the impact of the node removal on network fragmentation.

The increase in F obtained from removing a range of male producer nodes from the initial network was calculated and compared. The large(est) male producer node in the centre of the network above suggests itself as a node whose removal would significantly increase the network fragmentation, and it was indeed the case that this change yielded the largest increase in network fragmentation. Those male producer nodes whose removal from the initial network yielded relatively large increases in network fragmentation were also observed to have relatively high values of ‘betweenness centrality’, as computed by Gephi for the initial network. Node betweenness centrality measures how often a node appears on shortest paths between nodes in the network. A high betweenness centrality in the initial network provides a heuristic for identifying candidate nodes for removal that would significantly increase the network fragmentation.


Borgatti, S. P. (2006). Identifying sets of key players in a social network. Computational & Mathematical Organization Theory, 12(1), 21-34. doi:10.1007/s10588-006-7084-x

Rostami, A., & Mondani, H. (2015). The Complexity of Crime Network Data: A Case Study of Its Consequences for Crime Control and the Study of Networks. PLOS ONE, 10(3), e0119309. doi:10.1371/journal.pone.0119309

Schwartz, D. M., & Rouselle, T. (2009). Using social network analysis to target criminal networks. Trends in Organized Crime, 12(2), 188-207. doi:10.1007/s12117-008-9046-9