ISourceCode

Make the frequent cases fast and the rare case correct

Category Archives: Strings

Wagner–Fischer algorithm – To find Levenshtein distance ( C program )

The Wagner–Fischer algorithm is a dynamic programming algorithm that measures the Levenshtein distance between two strings of characters.

For example, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

kitten → sitten (substitution of ‘s’ for ‘k’)
sitten → sittin (substitution of ‘i’ for ‘e’)
sittin → sitting (insertion of ‘g’ at the end).

Computing the Levenshtein distance is based on the observation that if we reserve a matrix to hold the Levenshtein distances between all prefixes of the first string and all prefixes of the second, then we can compute the values in the matrix by flood filling the matrix, and thus find the distance between the two full strings as the last value computed.

                k 	i 	t 	t 	e 	n
	0 	1 	2 	3 	4 	5 	6
s 	1 	1 	2 	3 	4 	5 	6
i 	2 	2 	1 	2 	3 	4 	5
t 	3 	3 	2 	1 	2 	3 	4
t 	4 	4 	3 	2 	1 	2 	3
i 	5 	5 	4 	3 	2 	2 	3
n 	6 	6 	5 	4 	3 	3 	2
g 	7 	7 	6 	5 	4 	4 	3

CODE:

/* http://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm */
#include <stdio.h>
#include <math.h>
int d[100][100];
#define MIN(x,y) ((x) < (y) ? (x) : (y))
main()
{
	int i,j,m,n,temp,tracker;
	char s[] = "kitten";
	char t[] = "sitting";
	m = strlen(s);
	n = strlen(t);

	for(i=0;i<=m;i++)
		d[0][i] = i;
	for(j=0;j<=n;j++)
		d[j][0] = j;

	for (j=1;j<=m;j++)
	{
		for(i=1;i<=n;i++)
		{
			if(s[i-1] == t[j-1])
			{
				tracker = 0;
			}
			else{
				tracker = 1;
			}
			temp = MIN((d[i-1][j]+1),(d[i][j-1]+1));
			d[i][j] = MIN(temp,(d[i-1][j-1]+tracker));
		}
	}
	printf("the Levinstein distance is %d\n",d[n][m]);

}

OUTPUT:
$ ./a.exe
the Levinstein distance is 3

References: Wikipedia entry for Levenshtein distance.

Thanks for reading !

Algorithm to find the Lexicographical Next Permutation of a Number or a String – Code in Python

Firstly I am back to writing. I hope to put many articles and code here in the coming weeks.

Problem statement is simple: We need to find the next permutation of a number or string.

For example 112 has next permutation of 121
and a string for example chin has next permutation of chni.

The code has explanation.

def NextPerm(s):
	Len = len(s)
	# char at the end of string

	EndChar = s[Len-1]
	

	for i in range (Len-2,-1,-1):

		# compare with char to left of the last char

		c = s[i]

		# if smaller, then it has to be substituted

		if(c < EndChar):

		# left to the char under check is remaining the same and put into buffer

			buff = s[:i]
			#print buff

			# find next biggest digit on the right starting from end and notice the last char is the smallest)

			for j in range(Len-1,i,-1):

				# check if the digit is bigger

				if(s[j] > c):

					break

			# Add the char to buffer

			buff = buff + s[j]
			

			# book keep the index
			index = j

				

			# add remaining chars in ascending order(reverse order) and put also insert C in its index

				 

			for j in range(Len-1,i,-1):

				# check if the the char was used to replace 'c' 

				if(j == index):

					# if yes, keep 'c' in its place

					buff = buff + c

				else:

					# else copy next char

					buff = buff + s[j]

						

			return(buff)

			

		# last char is stored for the loop process

		EndChar = c

		

		# if char are in descending order its the biggest permutation

	return("its already the biggest permutation")

input = raw_input("Enter any number or string : ")

print "The next perm is :",NextPerm(input)

OUTPUT:

labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : abc
The next perm is : acb
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : acb
The next perm is : bac
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : bac
The next perm is : bca
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : bca
The next perm is : cab
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : cab
The next perm is : cba
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : cba
The next perm is : its already the biggest permutation
labuser@ubuntu:~$
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : 112
The next perm is : 121
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : 121
The next perm is : 211
labuser@ubuntu:~$ python NextPerm.py
Enter any number or string : 211
The next perm is : its already the biggest permutation
labuser@ubuntu:~$

Thanks for reading ! Watch out for more articles this month.

Find the minimum number of letters which must be removed from two words so that the remaining portions of the two words become anagram

Problem taken from here

Two words are said to be anagrams of each other if the letters from one word can be rearranged to form the other word. For example, occurs is an anagram of succor; however, dear is not an anagram of dared (because the d appears twice in dared, but only once in dear). The most famous anagram pair (in English) is dog and god.
The anagrammatic distance between any two words is the minimum number of letters which must be removed so that the remaining portions of the two words become anagrams. For example, given the words sleep and leap, we need to remove a minimum of three letters —two from sleep and one from leap —before what’s left are anagrams of each other (in each case, lep). With words such as dog and cat, where the two have absolutely no letters in common, the anagrammatic distance is an extreme (explicitly 6) since all the letters need to be removed. (Here, a word is always an anagram of itself.)
You must write a program to calculate the anagrammatic distance between any two given words.

Input
The first line of the input will contain a positive integer value N (less than 60,000) indicating the number of cases. Each case will consist of two words, possibly empty, each given on a single line (for a total of 2N additional lines).
Although they may have zero length, the words are simple —the letter are all lowercase and are taken from the usual twenty-six letter English alphabet (abcdefghijklmnopqrstuvwxyz). The longest word is pneumonoultramicroscopicsilicovolcanoconiosis.

Output
The output should consist of the case number and the anagrammatic distance, formatted as shown.

Sample Input

4
crocus
succor
dares
seared
empty

smell
lemon

Sample Output

Case #1: 0
Case #2: 1
Case #3: 5
Case #4: 4

import math
p = []
for i in range(0,26):
        p.append(0)
j = 0
s1 = raw_input("Enter the first string : ")
s2 = raw_input("Enter the second string : ")
for i in range(0,len(s1)):
        t=ord(s1[i])-ord('a')
        p[t] = p[t] + 1

for i in range(0,len(s2)):
        t=ord(s2[i])-ord('a')
        p[t] = p[t] - 1

cnt=0
for i in range(0,26):
        cnt =cnt + math.fabs(p[i])
print "The anagrammatic distance between the two given words are :",cnt

OUTPUT:

labuser@ubuntu:~$ python anagdist.py
Enter the first string : dares
Enter the second string : seared
The anagrammatic distance between the two given words are : 1.0
labuser@ubuntu:~$
labuser@ubuntu:~$
labuser@ubuntu:~$ python anagdist.py
Enter the first string : empty
Enter the second string :
The anagrammatic distance between the two given words are : 5.0
labuser@ubuntu:~$
labuser@ubuntu:~$
labuser@ubuntu:~$ python anagdist.py
Enter the first string : crocus
Enter the second string : succor
The anagrammatic distance between the two given words are : 0.0
labuser@ubuntu:~$
labuser@ubuntu:~$
labuser@ubuntu:~$ python anagdist.py
Enter the first string : smell
Enter the second string : lemon
The anagrammatic distance between the two given words are : 4.0
labuser@ubuntu:~$

I chose to write the program this way rather than the prescribed input

Computing the length of “Longest common subsequence” of TWO strings and displaying the LCS

Longest common subsequence is a classic dynamic programming problem.

Below Literature is from Wikipedia. Included some introduction to the problem in this article.

The longest common subsequence (LCS) problem is to find the longest subsequence common to all sequences in a set of sequences (often just two) – Wikipedia

For the general case of an arbitrary number of input sequences, the problem is NP-hard .When the number of sequences is constant, the problem is solvable in polynomial time by dynamic programming

It should be noted that the sequence is non-contiguous unlike longest common substring problem where the substring is contiguous.

Additional Reading

Algorithm:
Computing the length of the LCS

function  LCSLength(X[1..m], Y[1..n])
    C = array(0..m, 0..n)
    for i := 0..m
       C[i,0] = 0
    for j := 0..n
       C[0,j] = 0
    for i := 1..m
        for j := 1..n
            if X[i] = Y[j]
                C[i,j] := C[i-1,j-1] + 1
            else:
                C[i,j] := max(C[i,j-1], C[i-1,j])
    return C[m,n]

Implementation:

import java.util.Scanner;
public class LCS
{
	public static void main(String[] args)
	{
		Scanner input = new Scanner(System.in);
		System.out.println("Enter the first string");
		String str1 = input.next();
		System.out.println("Enter the Second string");
		String str2 = input.next();
		int[][] lcs = new int[str1.length()+1][str2.length()+1];
		int i;
		for(i=0;i<=str1.length();i++)
			lcs[i][0] = 0;
		for(i=0;i<=str2.length();i++)
			lcs[0][i] = 0;
		for(i=1;i<=str1.length();i++)
		{
			for(int j=1;j<=str2.length();j++)
			{
				if(str1.charAt(i-1) == str2.charAt(j-1))
					lcs[i][j]=lcs[i-1][j-1]+1;
				else
					lcs[i][j]=Math.max(lcs[i-1][j], lcs[i][j-1]);
			}
			
		}
		System.out.println("The Length of LCS is: "+lcs[str1.length()][str2.length()]);
		System.out.println("The LCS is: ");
		int x = 0, y = 0;
        	while(x < str1.length() && y < str2.length()) 
		{
            		if (str1.charAt(x) == str2.charAt(y)) 
			{
    		            	System.out.print(str1.charAt(x));
                		x++;
                		y++;
            		}
            		else if (lcs[x+1][y] >= lcs[x][y+1]) x++;
            		else y++;
        	}
        	System.out.println();		
	}
}

OUTPUT:

labuser@ubuntu:~$ java LCS
Enter the first string
AGCAT
Enter the Second string
GAC
The Length of LCS is: 2
The LCS is:
GA

Also this algorithm has formed the basis for UNIX diff command .The diff utility was developed in the early 1970s on the Unix operating system which was emerging from AT&T Bell Labs in Murray Hill, New Jersey. The final version, first shipped with the 5th Edition of Unix in 1974, was entirely written by Douglas McIlroy. This research was published in a 1976 paper co-written with James W. Hunt who developed an initial prototype of diff The algorithm this paper described became known as the Hunt-McIlroy algorithm. – Wikipedia

Algorithm to find all Permutations of a string

The problem can be solved by first trying to solve on paper for a 3 char string

For eg : a string XYZ can be chosen
we can set aside the first char and permute the remaining char. in this case
initial= X and permute(YZ) = YZ and ZY
Next we can insert X into the available position of this string
YZ — XYZ, YXZ, YZX
ZY — XZY, ZXY, ZYX

To achieve this we use Recursion.

import java.util.*;

public class StringPermutation {
	public static ArrayList<String> PermutationFinder(String s) {
	ArrayList<String> perm = new ArrayList<String>();
	if (s == null) { // error case
		return null;
	} else if (s.length() == 0) 
	{ 
		perm.add(""); // initial 
		return perm;
	}	
	char initial = s.charAt(0); // first character
	String rem = s.substring(1); // Full string without first character
	ArrayList<String> words = PermutationFinder(rem);
	for (String str : words) 
	{
		for (int i = 0; i <= str.length(); i++) 
		{
			perm.add(charinsert(str, initial, i));
		}
	}	
	return perm;
	}
	public static String charinsert(String str, char c, int j) {
	String begin = str.substring(0, j);
	String end = str.substring(j);
	return begin + c + end;
	}
	public static  void main(String[] args) {
    	String s = "CODE";
	ArrayList<String> value = PermutationFinder(s);
	System.out.println("\nThe Permutations are : \n" + value);
	}
}

OUTPUT:
laptop:~/code$ java StringPermutation

The Permutations are :
[CODE, OCDE, ODCE, ODEC, CDOE, DCOE, DOCE, DOEC, CDEO, DCEO, DECO, DEOC, COED, OCED, OECD, OEDC, CEOD, ECOD, EOCD, EODC, CEDO, ECDO, EDCO, EDOC]

Thanks for reading

Follow

Get every new post delivered to your Inbox.