"# Graphing a kind of \"Hamming Similarity\" of strings\n",
"\n",
"This notebook explores a slightly weird similarity measure for strings.\n",
"\n",
"## Equal characters in strings\n",
"\n",
"Given two strings, the idea is to consider the positions where their characters match:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"v = \"Wiesbaden\"\n",
"w = \"Potsdam\"\n",
"# s a – the matching characters of the two strings "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can extract those characters with a loop:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"m = [] # resulting equal characters\n",
"for i in range(min(map(len, [v, w]))): # loop over the shortest word's length\n",
" if v[i] == w[i]: # check character equality \n",
" m.append(v[i]) # add character\n",
"m"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a function that, given two strings, returns their equal characters:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def equal_chars(w, v):\n",
" m = [] # resulting equal characters\n",
" for i in range(min(map(len, [v, w]))): # loop over the shortest word's length\n",
" if v[i] == w[i]: # check character equality \n",
" m.append(v[i]) # add character\n",
" return m"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By the way: thanks to Python's [list comprehensions](https://docs.python.org/3/howto/functional.html#generator-expressions-and-list-comprehensions) we can write the body of the function in one line:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def equal_chars(w, v):\n",
" return [v[i] for i in range(min(map(len, [v, w]))) if v[i] == w[i]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Similarity \n",
"\n",
"Now the number of equal characters between two strings defines a similarity measure. For example, the similarity of our two strings is:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(equal_chars(v, w))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Graph\n",
"\n",
"Now given a set of strings, for example, the 16 capitals of all German states:"
"we can create a graph with the strings as nodes by connecting strings whose similarity is larger than zero, that is, they have at least one position with equal characters:"
# Graphing a kind of "Hamming Similarity" of strings
This notebook explores a slightly weird similarity measure for strings.
## Equal characters in strings
Given two strings, the idea is to consider the positions where their characters match:
%% Cell type:code id: tags:
```
v = "Wiesbaden"
w = "Potsdam"
# s a – the matching characters of the two strings
```
%% Cell type:markdown id: tags:
We can extract those characters with a loop:
%% Cell type:code id: tags:
```
m = [] # resulting equal characters
for i in range(min(map(len, [v, w]))): # loop over the shortest word's length
if v[i] == w[i]: # check character equality
m.append(v[i]) # add character
m
```
%% Cell type:markdown id: tags:
Let's create a function that, given two strings, returns their equal characters:
%% Cell type:code id: tags:
```
def equal_chars(w, v):
m = [] # resulting equal characters
for i in range(min(map(len, [v, w]))): # loop over the shortest word's length
if v[i] == w[i]: # check character equality
m.append(v[i]) # add character
return m
```
%% Cell type:markdown id: tags:
By the way: thanks to Python's [list comprehensions](https://docs.python.org/3/howto/functional.html#generator-expressions-and-list-comprehensions) we can write the body of the function in one line:
%% Cell type:code id: tags:
```
def equal_chars(w, v):
return [v[i] for i in range(min(map(len, [v, w]))) if v[i] == w[i]]
```
%% Cell type:markdown id: tags:
## Similarity
Now the number of equal characters between two strings defines a similarity measure. For example, the similarity of our two strings is:
%% Cell type:code id: tags:
```
len(equal_chars(v, w))
```
%% Cell type:markdown id: tags:
## Graph
Now given a set of strings, for example, the 16 capitals of all German states:
we can create a graph with the strings as nodes by connecting strings whose similarity is larger than zero, that is, they have at least one position with equal characters:
- [[file:crawling_a_blog.ipynb][Crawling a blog]] :: crawling web sites, basic text mining, basic
statistics and visualisation (☆☆)
- [[file:distances.ipynb][Distances]] :: Comprehensive interactive simulation of recovering
- [[file:distances.ipynb][Distances]] :: comprehensive interactive simulation of recovering
information from noisy data (namely, point positions given their
noisy distance matrix) (☆☆☆)
- [[file:exponential_smoothing.ipynb][Exponential smoothing]] :: using [[https://ipywidgets.readthedocs.io/en/latest/examples/Widget%2520Basics.html][Jupyter's interactive widget]] to