Skip to content
Snippets Groups Projects
Commit c664b4d8 authored by schwabmi's avatar schwabmi
Browse files

updated README and added links to libraries in community_detection notebook

parent a555514e
No related branches found
No related tags found
No related merge requests found
......@@ -63,3 +63,4 @@ their difficulty (* = simple, ** = advanced, *** = sophisticated):
[[http://cs.brown.edu/people/apapouts/faculty_dataset.html][dataset of 2200 faculty in 50 top US computer science graduate
programs]] (**)
- [[file:classification.ipynb][Classification]] :: basic machine learning classification example (*)
- [[file:community_detection.ipynb][Community detection]] :: apply community detection algorithms to network graphs
%% Cell type:markdown id: tags:
<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Graph" data-toc-modified-id="Graph-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Graph</a></span></li><li><span><a href="#Comunity-detection" data-toc-modified-id="Comunity-detection-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Comunity detection</a></span></li><li><span><a href="#Visualization" data-toc-modified-id="Visualization-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Visualization</a></span></li><li><span><a href="#Further-analysis" data-toc-modified-id="Further-analysis-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Further analysis</a></span></li><li><span><a href="#Karate-club-library" data-toc-modified-id="Karate-club-library-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Karate club library</a></span><ul class="toc-item"><li><span><a href="#Graph" data-toc-modified-id="Graph-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Graph</a></span></li><li><span><a href="#Algorithm" data-toc-modified-id="Algorithm-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Algorithm</a></span></li><li><span><a href="#Evaluation" data-toc-modified-id="Evaluation-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Evaluation</a></span></li></ul></li></ul></div>
%% Cell type:markdown id: tags:
# Community detection in network graph
In this notebook we want to detect communtiy in network graphs.
First, we import all necessary modules.
- `networkx`, `karateclub`: community detection algorithms
- `matplotlib`: visualization.
- `ipywidgets`: interactive features
- [networkx](https://networkx.github.io/)
- [karateclub](https://github.com/benedekrozemberczki/karateclub)
- [matplotlib](https://matplotlib.org/)
- [ipywidgets](https://ipywidgets.readthedocs.io)
%% Cell type:code id: tags:
```
import matplotlib.pyplot as plt
import networkx as nx
from networkx.algorithms.community.centrality import girvan_newman
from networkx.algorithms import community
from karateclub import EgoNetSplitter
import itertools
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
```
%% Cell type:markdown id: tags:
## Graph
First, we load a graph.
%% Cell type:code id: tags:
```
# see https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.generators.social.karate_club_graph.html
G = nx.karate_club_graph()
print(G.edges)
```
%% Cell type:code id: tags:
```
# graph information
print("count nodes:" , len(G.nodes))
print("count edges:" , len(G.edges))
```
%% Cell type:markdown id: tags:
Now, we can draw the graph to get an impression on how it looks like.
%% Cell type:code id: tags:
```
fig=plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
nx.draw_spring(G, with_labels=True)
plt.show()
```
%% Cell type:markdown id: tags:
## Comunity detection
We use the Girvan-Newman algorithm to detect communities:
%% Cell type:code id: tags:
```
# Girvan-Newman algorithm
communities = girvan_newman(G)
# save results in a list, each element of the list consists of the communities on one level.
# necessary to use the interactive slider
com_lvl_lst = []
for com_on_lvlk in itertools.islice(communities, len(G.nodes)):
com_lvl_lst.append(com_on_lvlk)
```
%% Cell type:markdown id: tags:
Now we have a list of the communities on each level.
Since the algorithm is a hierarchical method and terminates when no edges remain, we have to choose the level ourselves.
For a small graph, we can visualize the results and have a "sharp look" which community structures fits best.
## Visualization
%% Cell type:code id: tags:
```
def get_communities_per_lvl(c,k):
return c[k]
def draw_communities(c, k):
communities = get_communities_per_lvl(c,k)
values = [0]*len(G.nodes)
for i,lst in enumerate(communities):
for el in lst:
values[el]=i
fig=plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
nx.draw_networkx(G, cmap = plt.get_cmap('jet'), node_color = values, with_labels=True)
return values
# use a slider to see the different levels of the community structures
w = interactive(draw_communities,c=fixed(com_lvl_lst), k=(0, len(G.nodes)-2, 1))
display(w)
```
%% Cell type:markdown id: tags:
## Further analysis
Sometimes it is interesting how communities are connected to each other. To get an overview, we merge all nodes of one community to one new community node and add an edge between two community nodes if at least one pair of nodes from two community nodes are connected by an edge in the original graph.
%% Cell type:code id: tags:
```
G_merged = nx.Graph()
classified_nodes = w.result
community_count = len(set(classified_nodes))
# print(community_count)
for n in G.edges:
node_1 = classified_nodes[n[0]]
node_2 = classified_nodes[n[1]]
G_merged.add_edge(node_1, node_2)
# print(G.edges)
# print(G_merged.edges)
fig=plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
nx.draw_spring(G_merged, node_color = range(0,community_count),with_labels=True)
plt.show()
```
%% Cell type:markdown id: tags:
## Karate club library
Before, we have used only the `networkx` library. The `karateclub` library has advanced algorithms.
Let's try it out with a bigger graph.
see: https://github.com/benedekrozemberczki/karateclub
### Graph
%% Cell type:code id: tags:
```
from karateclub import GraphReader
reader = GraphReader("facebook")
graph = reader.get_graph()
target = reader.get_target()
```
%% Cell type:code id: tags:
```
print("count nodes:" , len(graph.nodes))
print("count edges:" , len(graph.edges))
```
%% Cell type:markdown id: tags:
The graph consists of 22470 nodes and 171002 edges. This is why we can not visualize the graph, as it is too large.
### Algorithm
In the following we use the Label propagation algorithm to detect communities.
%% Cell type:code id: tags:
```
from karateclub import LabelPropagation
model = LabelPropagation()
model.fit(graph)
cluster_membership = model.get_memberships()
```
%% Cell type:markdown id: tags:
### Evaluation
The nodes are already classified into communities. The labels are stored in the variable `target`.
With this information, we can compare our prediction with the labels using the normalized mutual information score, which computes the correlation between two clusters.
%% Cell type:code id: tags:
```
from sklearn.metrics.cluster import normalized_mutual_info_score
cluster_membership = [cluster_membership[node] for node in range(len(cluster_membership))]
nmi = normalized_mutual_info_score(target, cluster_membership)
print('NMI: {:.4f}'.format(nmi))
```
%% Cell type:markdown id: tags:
TODO:
- Add different algorithms.
- Add measures (how "good" is the community detection"? How can you measure it?)
- Use "real-world" example (facebook data / metadata / coauthorship)
- Use karate club library and try out one of the machine learning algorithms
- Color communities and its nodes in same color in both graphs.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment