Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • ibi/notebooks
  • schlosev/notebooks
  • schelleg/notebooks
  • xuhaidi/notebooks
  • weberjay/notebooks
  • passlida.saila/notebooks
  • rana.tarraf/notebooks
7 results
Show changes
Commits on Source (169)
AirBnB_Use_Berlin.ipynb !filter notebooks/AirBnB_Use_Berlin.ipynb !filter
Das_Haus_vom_Nikolaus.ipynb !filter notebooks/Art.ipynb !filter
wikipedia_language_editions.ipynb !filter notebooks/Das_Haus_vom_Nikolaus.ipynb !filter
notebooks/DraCor.ipynb !filter
notebooks/Drama_Speakers.ipynb !filter
notebooks/FCA.ipynb !filter
notebooks/FCA.ipynb !filter
notebooks/goethes_words.ipynb !filter
notebooks/Hamming.ipynb !filter
notebooks/Video_Games.ipynb !filter
notebooks/Weinbewertungen_Vivino.ipynb !filter
notebooks/World_Risk_and_Happiness.ipynb !filter
notebooks/distances.ipynb !filter
notebooks/wikipedia_language_editions.ipynb !filter
notebooks/wikipedia_regex.ipynb !filter
notebooks/Wordsiblings.ipynb !filter
_site
_posts
_tags
public
# from JK
variables:
GEM_HOME: "$CI_PROJECT_DIR/.cache/vendor/ruby/gems"
PYTHON_VERSION: "3.8"
default:
image: ruby:3.2
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- $CI_PROJECT_DIR/.cache/vendor/ruby/bundle
- $CI_PROJECT_DIR/.cache/vendor/ruby/cache
before_script:
# update only once, even if multiple commands are missing
# - '( command -v ssh-agent >/dev/null && command -v zip >/dev/null && command -v rsync >/dev/null) || apt-get update -y'
# - 'command -v ssh-agent >/dev/null || apt-get install openssh-client -y '
# - 'command -v zip >/dev/null || apt-get install zip -y '
# - 'command -v rsync >/dev/null || apt-get install rsync -y'
- '( command -v jupyter-nbconvert >/dev/null && command -v jupyter >/dev/null && command -v bundle >/dev/null) || apt-get update -y'
- 'command -v bundle >/dev/null || apt-get install ruby-bundler -y'
- 'command -v jupyter-nbconvert >/dev/null || apt-get install jupyter-nbconvert python3-nbconvert -y'
# Should be automatically be installed by the above...
# - 'command -v jupyter >/dev/null || apt-get install jupyter-core -y'
- mkdir -p "$CI_PROJECT_DIR/.cache/vendor/ruby/bundle" "$CI_PROJECT_DIR/.cache/vendor/ruby/cache "
# Allow Caching
- bundle config set --local path 'vendor/ruby/bundle'
- bundle config set --local cache_path 'vendor/ruby/cache'
# For Debugging
# - ruby -v
# - gem -v
# - rsync -V
pages:
stage: build
script:
- bundle install -j $(nproc)
- bundle exec jekyll build --config _config_prod.yml --strict_front_matter true -d public
artifacts:
paths:
- public
only:
- master
%% Cell type:markdown id:framed-filename tags:
# Computer-generated Art
%% Cell type:markdown id:apart-death tags:
## Unknown Pleasures
This artwork is inspired by [Peter Saville's](https://en.wikipedia.org/wiki/Peter_Saville_(graphic_designer)) cover of the [Joy Division](https://en.wikipedia.org/wiki/Joy_Division) album [Unknown Pleasures](https://en.wikipedia.org/wiki/Unknown_Pleasures). The original [is based on some interesting real data](https://theconversation.com/joy-division-40-years-on-from-unknown-pleasures-astronomers-have-revisited-the-pulsar-from-the-iconic-album-cover-119861) but here we try to imitate it with random data (creating unique pictures). The code demonstrates some mathematical approaches like translation, scaling and composition of functions.
%% Cell type:code id:desperate-hands tags:
```
import numpy as np
import matplotlib.pyplot as plt
k = 50 # number of horizontal points
n = 80 # number of vertical lines
plt.rcParams['figure.figsize'] = (15, 15)
plt.rcParams['figure.dpi'] = 140
plt.style.use('dark_background')
plt.axis('off')
plt.xlim(-100, 200)
plt.ylim(-60, 240)
xs = np.linspace(0, 100, k)
def sq(x, minx, maxx, miny, maxy, invert=False):
x = (x - minx) / (maxx - minx)
if invert:
x = 1 - x
if x < 0.5: # ascent
fx = 2*x**2
else: # descent
fx = (1 - 2*(1-x)**2)
return fx * (maxy - miny) + miny
def f(x, left, right, width, height):
if x < left or x > right: # left + right
return np.abs(np.random.normal(0, 0.5))
if x < left + width: # ascent
return np.abs(np.random.normal(0, 1.0)) + sq(x, left, left + width, 0, height)
if x > right - width: # descent
return np.abs(np.random.normal(0, 1.0)) + sq(x, right - width, right, 0, height, True)
else: # middle
return np.random.exponential(2) + height
for i in range(1, n+1):
data = [f(x, 25, 75, 10, 5) + i*2 for x in xs]
plt.plot(xs, data, color="white", zorder=n-i, linewidth=1)
plt.fill_between(xs, data, color="black", zorder=n-i)
plt.show()
```
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Notebooks
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: >-
Institute of Library and Information Science at
Humboldt-Universität zu Berlin
city: Berlin
address: Dorotheenstraße 26
post-code: '10117'
country: DE
alias: IBI
repository-code: 'https://scm.cms.hu-berlin.de/ibi/notebooks'
abstract: >-
A repository to collect and share Jupyter notebooks.
All notebooks should ideally work without any extra files, use only
standard Python libraries (pandas, scikit-learn, etc.), and gather
their data from the web.
license: Apache-2.0
commit: ef8a0fbb06130c7bab0ab4881be86c343950c62e
date-released: '2024-08-22'
source "https://rubygems.org"
# Hello! This is where you manage which Jekyll version is used to run.
# When you want to use a different version, change it below, save the
# file and run `bundle install`. Run Jekyll with `bundle exec`, like so:
#
# bundle exec jekyll serve
#
# This will help ensure the proper Jekyll version is running.
# Happy Jekylling!
# This is the default theme for new Jekyll sites. You may change this to anything you like.
gem 'jekyll', '4.3.3'
gem 'jekyll-sass-converter', '3.0.0'
gem 'sass-embedded', '1.69.5'
gem 'google-protobuf', '3.25.2'
gem 'jekyll-jupyter-notebook'
gem 'jekyll-datapage-generator'
gem 'jekyll-tagging'
gem "minima", "~> 2.5"
#gem "jekyll-jupyter-notebook"
# If you want to use GitHub Pages, remove the "gem "jekyll"" above and
# uncomment the line below. To upgrade, run `bundle update github-pages`.
# gem "github-pages", group: :jekyll_plugins
# If you have any plugins, put them here!
group :jekyll_plugins do
gem "jekyll-feed", "~> 0.12"
end
# Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem
# and associated library.
platforms :mingw, :x64_mingw, :mswin, :jruby do
gem "tzinfo", ">= 1", "< 3"
gem "tzinfo-data"
end
# Performance-booster for watching directories on Windows
gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin]
# Lock `http_parser.rb` gem to `v0.6.x` on JRuby builds since newer versions of the gem
# do not have a Java counterpart.
gem "http_parser.rb", "~> 0.6.0", :platforms => [:jruby]
GEM
remote: https://rubygems.org/
specs:
addressable (2.8.6)
public_suffix (>= 2.0.2, < 6.0)
colorator (1.1.0)
concurrent-ruby (1.2.3)
em-websocket (0.5.3)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0)
eventmachine (1.2.7)
ffi (1.16.3)
forwardable-extended (2.6.0)
google-protobuf (3.25.2-x86_64-linux)
http_parser.rb (0.8.0)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
jekyll (4.3.3)
addressable (~> 2.4)
colorator (~> 1.0)
em-websocket (~> 0.5)
i18n (~> 1.0)
jekyll-sass-converter (>= 2.0, < 4.0)
jekyll-watch (~> 2.0)
kramdown (~> 2.3, >= 2.3.1)
kramdown-parser-gfm (~> 1.0)
liquid (~> 4.0)
mercenary (>= 0.3.6, < 0.5)
pathutil (~> 0.9)
rouge (>= 3.0, < 5.0)
safe_yaml (~> 1.0)
terminal-table (>= 1.8, < 4.0)
webrick (~> 1.7)
jekyll-datapage-generator (1.4.0)
jekyll-feed (0.17.0)
jekyll (>= 3.7, < 5.0)
jekyll-jupyter-notebook (0.0.5)
jekyll
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-tagging (1.1.0)
nuggets
jekyll-watch (2.2.1)
listen (~> 3.0)
kramdown (2.4.0)
rexml
kramdown-parser-gfm (1.1.0)
kramdown (~> 2.0)
liquid (4.0.4)
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
minima (2.5.1)
jekyll (>= 3.5, < 5.0)
jekyll-feed (~> 0.9)
jekyll-seo-tag (~> 2.1)
nuggets (1.6.1)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (5.0.4)
rake (13.1.0)
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.6)
rouge (4.2.0)
safe_yaml (1.0.5)
sass-embedded (1.69.5)
google-protobuf (~> 3.23)
rake (>= 13.0.0)
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
unicode-display_width (2.5.0)
webrick (1.8.1)
PLATFORMS
x86_64-linux
DEPENDENCIES
google-protobuf (= 3.25.2)
http_parser.rb (~> 0.6.0)
jekyll (= 4.3.3)
jekyll-datapage-generator
jekyll-feed (~> 0.12)
jekyll-jupyter-notebook
jekyll-sass-converter (= 3.0.0)
jekyll-tagging
minima (~> 2.5)
sass-embedded (= 1.69.5)
tzinfo (>= 1, < 3)
tzinfo-data
wdm (~> 0.1.1)
BUNDLED WITH
2.5.5
%% Cell type:markdown id: tags:
# Graphing a kind of "Hamming similarity" for strings
This notebook explores a (probably slightly weird) similarity measure for strings.
## Equal characters in strings
Given two strings, our idea is to consider the positions where their characters match:
%% Cell type:code id: tags:
```
v = "Wiesbaden"
w = "Potsdam"
# s a – the matching characters of the two strings
```
%% Cell type:markdown id: tags:
We can extract those characters with a loop:
%% Cell type:code id: tags:
```
m = [] # resulting equal characters
for i in range(min(map(len, [v, w]))): # loop over the shortest word's length
if v[i] == w[i]: # equal characters at this position?
m.append(v[i]) # collect equal character
m
```
%% Cell type:markdown id: tags:
Let's create a function that, given two strings, returns their equal characters:
%% Cell type:code id: tags:
```
def equal_chars(v, w):
m = [] # resulting equal characters
for i in range(min(map(len, [v, w]))): # loop over the shortest word's length
if v[i] == w[i]: # check character equality
m.append(v[i]) # add character
return m
```
%% Cell type:markdown id: tags:
By the way: thanks to Python's [list comprehensions](https://docs.python.org/3/howto/functional.html#generator-expressions-and-list-comprehensions) we can write the body of the function in one line:
%% Cell type:code id: tags:
```
def equal_chars(v, w):
return [v[i] for i in range(min(map(len, [v, w]))) if v[i] == w[i]]
```
%% Cell type:markdown id: tags:
Let's test our newly defined function:
%% Cell type:code id: tags:
```
equal_chars(v, w)
```
%% Cell type:markdown id: tags:
And with two different words:
%% Cell type:code id: tags:
```
equal_chars("Washington", "Massachusetts")
```
%% Cell type:markdown id: tags:
## Similarity
Now we regard the number of equal characters in two strings as a similarity measure. For example, the similarity of our two strings is:
%% Cell type:code id: tags:
```
len(equal_chars(v, w))
```
%% Cell type:markdown id: tags:
## Graph
Now given a set of strings, for example, the 16 capitals of all German states:
%% Cell type:code id: tags:
```
capitals_de = ["Berlin", "Bremen", "Dresden", "Düsseldorf", "Erfurt",
"Hamburg", "Hannover", "Kiel", "Magdeburg", "Mainz", "München",
"Potsdam", "Saarbrücken", "Schwerin", "Stuttgart", "Wiesbaden"]
```
%% Cell type:markdown id: tags:
or the names of the 16 German states:
%% Cell type:code id: tags:
```
states_de = ["Baden-Württemberg", "Bayern", "Berlin", "Brandenburg",
"Bremen", "Hamburg", "Hessen", "Mecklenburg-Vorpommern",
"Niedersachsen", "Nordrhein-Westfalen", "Rheinland-Pfalz",
"Saarland", "Sachsen", "Sachsen-Anhalt",
"Schleswig-Holstein", "Thüringen"]
```
%% Cell type:markdown id: tags:
we can create a graph with the strings as nodes by connecting strings whose similarity is larger than zero, that is, they have at least one position with equal characters:
%% Cell type:code id: tags:
```
import networkx as nx
def sim_graph(words):
G = nx.Graph() # resulting graph
for k, v in enumerate(words): # first node
for l, w in enumerate(words): # second node
if k > l: # avoid reverse duplicates
ec = equal_chars(v, w) # equal characters
sim = len(ec) # similarity
if sim > 0: # ignore dissimilar words
G.add_edge(v, w, label="".join(ec), weight=sim) # add edge
return G
```
%% Cell type:markdown id: tags:
Let's compute the graph for our set of capitals or states:
%% Cell type:code id: tags:
```
g = sim_graph(states_de)
```
%% Cell type:markdown id: tags:
A good way to understand a graph is to visualise it:
%% Cell type:code id: tags:
```
%matplotlib inline
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt
pos = graphviz_layout(g, prog='dot')
nx.draw(g, pos, with_labels=True, arrows=False)
nx.draw_networkx_edge_labels(g, pos, edge_labels=nx.get_edge_attributes(g, 'label'), font_color='blue')
plt.show()
```
%% Cell type:markdown id: tags:
This layout is not the best so it's better to use graphviz:
%% Cell type:code id: tags:
```
from networkx.drawing.nx_pydot import write_dot
import pydot
from IPython.display import HTML, display
import random
write_dot(g, "graph.dot")
graph = pydot.graph_from_dot_file("graph.dot")
graph[0].write_svg("graph.svg")
display(HTML('<img src="graph.svg?{0}">'.format(random.randint(0,2e9))))
```
GNU GENERAL PUBLIC LICENSE GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007 Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed. of this license document, but changing it is not allowed.
...@@ -631,8 +631,8 @@ to attach them to the start of each source file to most effectively ...@@ -631,8 +631,8 @@ to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found. the "copyright" line and a pointer to where the full notice is found.
Notebooks <one line to give the program's name and a brief idea of what it does.>
Copyright (C) 2020 Institut für Bibliotheks- und Informationswissenschaft Copyright (C) 2024 Prof. Dr. Robert Jäschke
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
...@@ -645,14 +645,14 @@ the "copyright" line and a pointer to where the full notice is found. ...@@ -645,14 +645,14 @@ the "copyright" line and a pointer to where the full notice is found.
GNU General Public License for more details. GNU General Public License for more details.
You should have received a copy of the GNU General Public License You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail. Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode: notice like this when it starts in an interactive mode:
Notebooks Copyright (C) 2020 Institut für Bibliotheks- und Informationswissenschaft <program> Copyright (C) 2024 Prof. Dr. Robert Jäschke
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details. under certain conditions; type `show c' for details.
...@@ -664,11 +664,11 @@ might be different; for a GUI interface, you would use an "about box". ...@@ -664,11 +664,11 @@ might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school, You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary. if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>. <https://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read Public License instead of this License. But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>. <https://www.gnu.org/licenses/why-not-lgpl.html>.
...@@ -21,35 +21,74 @@ contributing!* ...@@ -21,35 +21,74 @@ contributing!*
** List of Notebooks ** List of Notebooks
So far, notebooks are listed alphabetically and stars shall indicate So far, notebooks are listed by difficulty, indicated by stars (☆ =
their difficulty (☆ = simple, ☆☆ = advanced, ☆☆☆ = sophisticated): simple, ☆☆ = advanced, ☆☆☆ = sophisticated), then alphabetically:
- [[file:AirBnB_Use_Berlin.ipynb][AirBnB Use in Berlin]] :: exemplary (and excellent) term paper for the - [[file:notebooks/classification.ipynb][Classification]] :: basic machine learning classification example (☆)
module "Datenanalyse & -auswertung" (☆☆) - [[file:notebooks/exponential_smoothing.ipynb][Exponential smoothing]] :: using [[https://ipywidgets.readthedocs.io/en/latest/examples/Widget%2520Basics.html][Jupyter's interactive widget]] to
- [[file:amazon_reviews.ipynb][Amazon reviews]] :: crawling web sites with [[https://scrapy.org/][Scrapy]], processing JSON explore [[https://en.wikipedia.org/wiki/Exponential_smoothing][exponential smoothing]] (☆)
data, basic statistics and visualisation (☆☆) - [[file:notebooks/Hamming.ipynb][Hamming]] :: a graph visualising a strange type of word similarity (☆)
- [[file:Art.ipynb][Computer-generated art]] :: translation, scaling and composition of - [[file:notebooks/Jupyter-Demo.ipynb][Jupyter-Demo]] :: demo of some Jupyter features useful for creating
functions (☆☆) learning material (☆)
- [[file:classification.ipynb][Classification]] :: basic machine learning classification example (☆) - [[file:notebooks/Twitter.ipynb][Twitter]] :: analysing Twitter data (raw JSON from Twitter's API) (☆)
- [[file:community_detection.ipynb][Community detection]] :: applying community detection algorithms to - [[file:notebooks/wikipedia_language_editions.ipynb][Wikipedia language editions]] :: plotting the depth and number of
network graphs (☆☆) articles of different Wikipedia language editions (☆)
- [[file:crawling_a_blog.ipynb][Crawling a blog]] :: crawling web sites, basic text mining, basic - [[file:notebooks/wikipedia_regex.ipynb][Regular expressions]] :: simple information extraction from Wikipedia
statistics and visualisation (☆☆) articles (☆)
- [[file:distances.ipynb][Distances]] :: comprehensive interactive simulation of recovering - [[file:notebooks/amazon_reviews.ipynb][Amazon reviews]] :: crawling web sites with [[https://scrapy.org/][Scrapy]], processing JSON
information from noisy data (namely, point positions given their data, basic statistics and visualisation (☆☆)
noisy distance matrix) (☆☆☆) - [[file:notebooks/Art.ipynb][Art]] :: Creating computer-generated art by translation, scaling and
- [[file:Dracor.ipynb][DraCor]] :: retrieving data from a REST API, text transformation and composition of functions (☆☆)
classification (☆☆) - [[file:notebooks/community_detection.ipynb][Community detection]] :: applying community detection algorithms to
- [[file:exponential_smoothing.ipynb][Exponential smoothing]] :: using [[https://ipywidgets.readthedocs.io/en/latest/examples/Widget%2520Basics.html][Jupyter's interactive widget]] to network graphs (☆☆)
explore [[https://en.wikipedia.org/wiki/Exponential_smoothing][exponential smoothing]] (☆) - [[file:notebooks/crawling_a_blog.ipynb][Crawling a blog]] :: crawling web sites, basic text mining, basic
- [[file:Hamming.ipynb][Hamming]] :: a graph visualising a strange type of word similarity (☆) statistics and visualisation (☆☆)
- [[file:Jupyter-Demo.ipynb][Jupyter demo]] :: demo of some Jupyter features useful for creating - [[file:notebooks/Dashboard.ipynb][Dashboard]] :: building data exploration dashboards using plotly and
learning material (☆) dash (☆☆)
- [[file:Mondrian.ipynb][Mondrian]] :: turtle graphics, recursion, art (☆☆) - [[file:notebooks/DraCor.ipynb][DraCor]] :: retrieving data from a REST API, text transformation and
- [[file:Das_Haus_vom_Nikolaus.ipynb][Nikolaus]] :: graph [[https://en.wikipedia.org/wiki/Graph_traversal][traversal]] and drawing (☆☆☆) classification (☆☆)
- [[file:statistics_top50faculty.ipynb][Statistics top 50 faculty]] :: exploratory statistical analysis of the - [[file:notebooks/FCA.ipynb][FCA]] :: analysing characters in plays using Formal Concept Analysis
[[http://cs.brown.edu/people/apapouts/faculty_dataset.html][dataset of 2200 faculty in 50 top US computer science graduate (☆☆)
programs]] (☆☆) - [[file:notebooks/goethes_words.ipynb][Goethes Wörter]] :: finding the most frequent nouns (verbs,
- [[file:Twitter.ipynb][Twitter]] :: analysing Twitter data (raw JSON from Twitter's API) (☆) adjectives, ...) in Goethe's works (here: Faust) (☆☆)
- [[file:wikipedia_language_editions.ipynb][Wikipedia]] :: plotting the depth and number of articles of different - [[file:notebooks/machine_learning.ipynb][Machine Learning]] :: recipes for common machine learning tasks (☆☆)
Wikipedia language editions (☆) - [[file:notebooks/Mondrian.ipynb][Mondrian]] :: turtle graphics, recursion, art (☆☆)
- [[file:notebooks/statistics_top50faculty.ipynb][Statistics top 50 faculty]] :: exploratory statistical analysis of the
[[htotp://cs.brown.edu/people/apapouts/faculty_dataset.html][dataset of 2200 faculty in 50 top US computer science graduate
programs]] (☆☆)
- [[file:notebooks/Wordsiblings.ipynb][Word Siblings]] :: Visualizing a graph of words that are related to
each other (☆☆)
- [[file:notebooks/Prefix_search.ipynb][Prefix search]] :: Implementing
prefix search using a Trie and comparing runtime with a naive approach (☆☆)
- [[file:notebooks/distances.ipynb][Distances]] :: comprehensive interactive simulation of recovering
information from noisy data (namely, point positions given their
noisy distance matrix) (☆☆☆)
- [[file:notebooks/Das_Haus_vom_Nikolaus.ipynb][Das Haus vom Nikolaus]] :: graph [[https://en.wikipedia.org/wiki/Graph_traversal][traversal]] and drawing (☆☆☆)
- [[file:notebooks/scrape_review_blog.ipynb][Scrape review blog]] :: Here, we use the python package scrapy to
download all reviews of a literature blog (☆☆☆)
- [[file:notebooks/Optimization.ipynb][Optimization]] :: Solving an optimization problem using the example
of assigning reviewers of papers to time slots (☆☆☆)
*** Module "Datenanalyse & -auswertung"
Exemplary (and excellent) computational essays from students of our module:
- [[file:notebooks/AirBnB_Use_Berlin.ipynb][AirBnB Use in Berlin]] :: /Untersuchung der AirBnB Nutzung in Berlin/
by Juliane Köhler
- [[file:notebooks/Drama_Speakers.ipynb][Gender of Characters in Drama]] :: /Die Repräsentanz von weiblichen
Sprecherinnen in den Theaterstücken der deutschen und französischen
DramaCorpora/ by Janina Pingel und Vivian Schlosser
- [[file:notebooks/Video_Games.ipynb][Video Games Sales]] :: /Analysis of Video Games Sales Data/ by Jan
Raoul Weber
- [[file:notebooks/Weinbewertungen_Vivino.ipynb][Weinbewertungen Vivino]] :: /Untersuchung von Weinbewertungen des
Online-Weinmarktplatzes Vivino/ by Heike Wilhelm
- [[file:notebooks/World_Risk_and_Happiness.ipynb][World Risk and Happiness]] :: /World Risk Poll 2021 and World
Happiness Report 2021/ by Helene Hellmich
** Wishlist for new Notebooks
- Creating a word list is not trivial but we need it for [[file:notebooks/Wordsiblings.ipynb][the word
siblings notebook]]. We have [[file:notebooks/Prefix_search.ipynb][a prefix search notebook]]
which can be used to filter words which are a prefix of other words. This can help in filtering
invalid words which are the stem of other words and should not be in the list on their own but
this would also remove valid words if there are part of a compound word.
\ No newline at end of file
# config file
title: Notebooks am IBI
email: robert.jaeschke@hu-berlin.de
description:
Humboldt-Universität zu Berlin
Unter den Linden 6
10099 Berlin
baseurl: "/notebooks" # the subpath of your site, e.g. /blog
url: "" # the base hostname & protocol for your site, e.g. http://example.com
twitter_username: IBI_HU
# github_username: jekyll
# Build settings
# source: notebooks
#collections:
# notebooks:
# output: true
theme: minima
include:
- notebooks
- jekyll-feed
- jekyll-seo-tag
plugins:
- jekyll-feed
- jekyll-jupyter-notebook
- jekyll-datapage-generator
# datapage gen
page_gen-dirs: false
page_gen:
- data: 'notebooks'
name: 'notebook_name'
dir: 'notebooks'
# tagcloud
tag_page_layout: tag_page
tag_page_dir: tag
# exclude:
# - .sass-cache/
# - .jekyll-cache/
# - gemfiles/
# - Gemfile
# - Gemfile.lock
# - node_modules/
# - vendor/bundle/
# - vendor/cache/
# - vendor/gems/
# - vendor/ruby/
# config file
title: Notebooks am IBI
email: robert.jaeschke@hu-berlin.de
description:
Humboldt-Universität zu Berlin
Unter den Linden 6
10099 Berlin
baseurl: "/ibi/notebooks" # the subpath of your site, e.g. /blog
url: "https://pages.cms.hu-berlin.de/" # the base hostname & protocol for your site, e.g. http://example.com
twitter_username: IBI_HU
# github_username: jekyll
# Build settings
# source: notebooks
#collections:
# notebooks:
# output: true
theme: minima
include:
- notebooks
- jekyll-feed
- jekyll-seo-tag
plugins:
- jekyll-feed
- jekyll-jupyter-notebook
- jekyll-datapage-generator
# datapage gen
page_gen-dirs: false
page_gen:
- data: 'notebooks'
name: 'notebook_name'
dir: 'notebooks'
# tagcloud
tag_page_layout: tag_page
tag_page_dir: tag
# exclude:
# - .sass-cache/
# - .jekyll-cache/
# - gemfiles/
# - Gemfile
# - Gemfile.lock
# - node_modules/
# - vendor/bundle/
# - vendor/cache/
# - vendor/gems/
# - vendor/ruby/
- notebook_name: AirBnB_Use_Berlin
tags: [datavis, statistics, tabular, daa, student]
title: Airbnb Use Berlin
author: Juliane Köhler
difficulty: 3
language: German
description: Untersuchung der Einflüsse auf Übernachtungspreise, Bewertungen, Absagen, etc. von Airbnb-Inseraten in Berlin.
- notebook_name: amazon_reviews
tags: [crawling, datavis, web, tabular]
title: Amazon Reviews
author: Michael Paris
difficulty: 3
language: English
description: Learning to analyze web data by analyzing Amazon reviews with Python.
- notebook_name: Art
tags: [image, art]
title: Art
author: Robert Jäschke
difficulty: 2
language: English
description: Creating an artwork inspred by Peter Saville's cover of the Joy Division album Unknown Pleasures.
- notebook_name: Das_Haus_vom_Nikolaus
tags: [graph, search]
title: Das Haus vom Nikolaus
author: Robert Jäschke
difficulty: 3
language: German
description: Wir suchen alle möglichen Lösungen für das Zeichenspiel "Das Haus vom Nikolaus".
- notebook_name: Dashboard
tags: [datavis]
title: Dashboard
author: Robert Jäschke
difficulty: 2
language: English
description: This notebooks shows an example of how to create a dashboard for data exploration using plotly dash.
- notebook_name: DraCor
tags: [classification, literature, text, dracor]
title: Textklassifikation mit DraCor
author: Robert Jäschke
difficulty: 2
language: German
description: In diesem Notebook testen wir, inwiefern sich die Autor:innen von Dramen anhand ihrer Texte identifizieren lassen.
- notebook_name: Drama_Speakers
tags: [datavis, literature, tabular, dracor, student, daa, statistics]
title: Gender of Characters in Dramas
author: Janina Pingel & Vivian Schlosser
difficulty: 3
language: German
description: Untersuchung der Beschaffenheit der weiblichen Repräsentanz anhand der landesspezifischen DramaCorpora Deutschlands und Frankreichs.
- notebook_name: FCA
tags: [fca, literature, dracor, datavis, clustering]
title: FCA & DraCor
author: Robert Jäschke
difficulty: 2
language: English
description: We want to analyse which patterns can be discovered when we consider which characters speak in scenes of different plays. Therefore, we extract the corresponding information from DraCor and analyse it using Formal Concept Analysis (FCA).
- notebook_name: Hamming
tags: [similarity, text, graph]
title: Hamming Similarity
author: Robert Jäschke
difficulty: 1
language: English
description: This notebook explores a (probably slightly weird) similarity measure for strings.
- notebook_name: Jupyter-Demo
tags: [jupyter]
title: Jupyter Demo
author: Robert Jäschke
difficulty: 1
language: German
description: Dieses Notebook enthält Beispiele für verschiedene Anwendungsfälle von Jupyter-Notebooks.
- notebook_name: Mondrian
tags: [image, art, turtle]
title: Digital Mondrian
author: Robert Jäschke
difficulty: 2
language: English
description: This notebook aims to automatically create images similar to those of Piet Mondrian.
- notebook_name: Twitter
tags: [text, web]
title: Analyzing Twitter Data
author: Robert Jäschke
difficulty: 1
language: English
description: Different examples of Twitter data analysis for raw JSON fom Twitter's API.
- notebook_name: Video_Games
tags: [datavis, statistics, tabular, daa, student]
title: Video Games Sales
author: Jan Raoul Weber
difficulty: 3
language: English
description: Examine the sales of video games across different regions of the world to find out what video games are sold where and how the regional markets differ regarding preferences for e.g. Publishers.
- notebook_name: Weinbewertungen_Vivino
tags: [datavis, statistics, tabular, daa, student]
title: Weinbewertungen Vivino
author: Heike Wilhelm
difficulty: 3
language: German
description: Untersuchung von Weinbewertungen des Online-Weinmarktplatzes Vivino und ihrer Zusammenhänge mit Preis, Herkunftsland und Sorte der Weine.
- notebook_name: World_Risk_and_Happiness
tags: [datavis, statistics, tabular, daa, student]
title: World Risk and Happiness
author: Helene Hellmich
difficulty: 3
language: English
description: A closer look at the World Worry Index of the "World Risk Poll 2021" to determine if there is a relationship between responses or demographic factors of the respondents and the worry index, then comparison to the demographic's ranking in the "World Happiness Report".
- notebook_name: classification
tags: [classification, datamining]
title: Classification
author: Robert Jäschke
difficulty: 1
language: English
description: This notebook shows the basic steps to apply a machine learning classification algorithm. The example uses a decision tree but the procedure is very similar for other algorithms.
- notebook_name: community_detection
tags: [datamining, graph, clustering]
title: Community Detection in Networks
author: Michel Schwab
difficulty: 2
language: English
description: In this notebook we want to detect community in network graphs.
- notebook_name: crawling_a_blog
tags: [crawling, web, text]
title: Crawling a Blog
author: Robert Jäschke
difficulty: 2
language: English
description: This Jupyter notebook shows how to crawl a web blog and extract information from it for later analysis.
- notebook_name: distances
tags: [similarity, datavis]
title: Distances
author: Robert Jäschke
difficulty: 3
language: English
description: If we have a number of devices in a room which can communicate with each other and measure the distance to each other (e.g., using round-trip time or signal strength), can we use that information to reconstruct the positions of the devices?
- notebook_name: exponential_smoothing
tags: [jupyter, datavis, statistics]
title: Exponential Smoothing
author: Robert Jäschke
difficulty: 1
language: English
description: Using Jupyter's interactive widget to explore exponential smoothing.
- notebook_name: goethes_words
tags: [literature, text, dracor]
title: Wörter im Faust
author: Robert Jäschke
difficulty: 2
language: German
description: Finding the most frequent nouns (verbs, adjectives, …) in Goethe's works (Faust in our example).
- notebook_name: machine_learning
tags: [datamining, classification, tabular]
title: Machine Learning Recipes
author: Robert Jäschke
difficulty: 2
languge: English
description: This notebook contains recipes for common machine learning tasks.
- notebook_name: Optimization
tags: [optimization, tabular, integerlinearprogramming]
title: Optimization Example
author: Frederik Arnold
difficulty: 3
language: English
description: This notebook shows an example of how to use linear programming to solve the following optimization problem using Python and the PuLP library.
- notebook_name: scrape_review_blog
tags: [crawling, web, text]
title: Scraping Reviews
author: Michel Schwab
difficulty: 3
language: English
description: We use the Python package scrapy to download all reviews of a literature blog.
- notebook_name: statistics_top50faculty
tags: [datavis, tabular, statistics]
title: Statistics Top 50 Faculty
author: Robert Jäschke
difficulty: 2
language: English
description: Basic statistics using the top 50 faculty dataset.
- notebook_name: wikipedia_language_editions
tags: [datavis, web, wikipedia]
title: Comparing Wikipedia Language Editions
author: Robert Jäschke
difficulty: 1
language: English
description: Wikipedia (as of January 2022) has more than 300 active language editions. We can compare (some of) these editions quantitatively and qualitatively using the table from https://meta.wikimedia.org/wiki/Wikipedia_article_depth.
- notebook_name: wikipedia_regex
tags: [crawling, web, wikipedia]
title: Wikipedia RegEx
author: Robert Jäschke
difficulty: 1
language: German
description: Anwendung von regulären Ausdrücken am Beispiel von Wikipedia-Seiten.
- notebook_name: Wordsiblings
tags: [datavis, text, graph]
title: Visualizing Word Siblings
author: Robert Jäschke
difficulty: 2
language: English
description: Visualizing a graph of words that are related (similar) to each other.
- notebook_name: Prefix_search
tags: [text, trie, search, graph]
title: Prefix Search
author: Frederik Arnold
difficulty: 2
language: English
description: Implementing prefix search using a Trie and comparing runtime with a naive approach.
<ul class="notebooks">
{% for notebook in site.data.notebooks %}
<li class="notebook" data-tags="{{ notebook.tags | join: ' ' }}">
<a class="title" href="{{ notebook.notebook_name | datapage_url: 'notebooks' }}">{{ notebook.title }}</a>
{% include stars.html %}
<span class="language"> {{ notebook.language }}</span>
<p class="author">by <em>{{ notebook.author }}</em></p>
<p class="description">{{ notebook.description }}</p>
</li>
{% endfor %}
</ul>
<span class="stars">{% for i in (1..3) %}{% if i <= notebook.difficulty %}★{% else %}☆{% endif %}{% endfor %}</span>
&nbsp; <!-- TODO: workaround → better use flexbox or the like -->
document.addEventListener('DOMContentLoaded', () => {
// Show all notebooks by default
showAllNotebooks();
});
function filterByTag(tag, element) {
const tags = document.querySelectorAll('.tag');
const allTag = document.querySelector('.tag.all');
// If the clicked tag is active and not "All", deactivate it and activate the "All" tag
if (element.classList.contains('active')) {
tags.forEach(tag => tag.classList.remove('active'));
allTag.classList.add('active');
showAllNotebooks();
} else {
// Otherwise, remove the active class from all tags and activate the clicked tag
tags.forEach(tag => tag.classList.remove('active'));
element.classList.add('active');
// If "All" tag is clicked, show all notebooks
if (tag === 'all') {
showAllNotebooks();
} else {
filterNotebooksByTag(tag);
}
}
}
function showAllNotebooks() {
const notebooks = document.querySelectorAll('.notebook');
notebooks.forEach(notebook => {
notebook.style.display = 'list-item';
});
}
function filterNotebooksByTag(tag) {
const notebooks = document.querySelectorAll('.notebook');
notebooks.forEach(notebook => {
const tags = notebook.getAttribute('data-tags').split(' ');
if (tags.includes(tag)) {
notebook.style.display = 'list-item';
} else {
notebook.style.display = 'none';
}
});
}
<div class="word-cloud">
<!-- Add an "All" tag to reset the filter and make it permanently highlighted -->
<span class="tag all active" onclick="filterByTag('all', this)">All</span>
{% assign tags = "" | split: "" %}
{% for notebook in site.data.notebooks %}
{% for tag in notebook.tags %}
{% unless tags contains tag %}
{% assign tags = tags | push: tag %}
{% endunless %}
{% endfor %}
{% endfor %}
{% assign tags = tags | sort %}
{% for tag in tags %}
<span class="tag" onclick="filterByTag('{{ tag }}', this)">{{ tag }}</span>
{% endfor %}
</div>
<!DOCTYPE html>
<html lang="{{ page.lang | default: site.lang | default: 'en' }}">
{%- include head.html -%}
<body>
<main class="page-content" aria-label="Content">
<div class="wrapper">
{{ content }}
</div>
</main>
<footer class="site-footer">
<div class="wrapper">
<div class="footer-content">
<div class="footer-left">
<p><b>{{ site.title }}</b></p>
<p><a href="https://www.ibi.hu-berlin.de/de">https://www.ibi.hu-berlin.de/de</a></p>
<div class="social-links">
{%- include social.html -%}
</div>
</div>
<div class="footer-right">
<p>Humboldt-Universität zu Berlin<br>
Institut für Bibliotheks- und Informationswissenschaft<br>
Unter den Linden 6<br>
10099 Berlin</p>
</div>
</div>
</div>
</footer>
</body>
</html>
---
layout: base_no-nav
---
<article class="post">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ page.title }}</title>
<style>
.stars {
color: gold;
font-size: 1.2em;
}
</style>
</head>
<div>
<span><strong>Author:</strong> {{ page.author }}</span> |
<span class="stars">{% for i in (1..3) %}{% if i <= page.difficulty %}★{% else %}☆{% endif %}{% endfor %}</span>
<span><strong>Language:</strong> {{ page.language }}</span>
<!-- <br> -->
<!-- <span><strong>Description:</strong> {{ page.description }}</span> -->
</div>
<div class="post-content">
<div class="post-content e-content" itemprop="articleBody">
<p><a href="https://scm.cms.hu-berlin.de/ibi/notebooks/-/raw/master/notebooks/{{ page.notebook_name }}.ipynb?ref_type=heads&inline=false">Download Notebook</a></p>
<div class="jupyter-notebook" style="position: relative; width: 100%; margin: 0 auto;">
<div class="jupyter-notebook-iframe-container">
<iframe src="{{ page.notebook_name }}.ipynb.html" style="position: absolute; top: 0; left: 0; border-style: none;" width="100%" height="100%" onload="this.parentElement.style.paddingBottom = (this.contentWindow.document.documentElement.scrollHeight + 10) + 'px'"></iframe>
</div>
</div>
</div>
</div>
<a class="gitlab-link" href="https://scm.cms.hu-berlin.de/ibi/notebooks/-/blob/master/notebooks/{{ page.notebook_name }}.ipynb?ref_type=heads">GitLab Link</a>
</article>
# _plugins/ext.rb allows you to define custom plugins and/or require
# plugins that otherwise wouldn't work with Jekyll
# Activate jekyll-tagging
# https://github.com/pattex/jekyll-tagging
require 'jekyll/tagging'