Sunday, October 13, 2019
Modified Insertion Sort Algorithm: Binary Search Technique
Modified Insertion Sort Algorithm: Binary Search Technique Modified Insertion Sort Algorithm with Binary Search Technique: Application to Ranking of Images Retrieved by CBIR M. K. I. Rahmani M. A. Ansari Abstractââ¬âDue to the abundance of the high quality digital images in image repositories of very huge size on the ever growing Internet by enterprise houses, research institutions, medical healthcare organizations and academic institutions etc., finding a set of useful images from those image repositories with better precision and recall is a difficult task. Content Based Image Retrieval is a very efficient technology for retrieval of digital images from those databases. The process of image retrieval through CBIR has various phases like: Image segmentation, Feature extraction, Indexing, Clustering, Image matching through similarity measurement and Ranking of retrieved images through ordering them according to similarity value. The performance of a Content Based Image Retrieval system can be improved by improving the performance of some or all of these phases through designing better algorithms. Ranking of the Image data is very important to display the desired images to the in tended users. Images are retrieved according to the matching criteria was involved in the retrieval process. Retrieved images are ordered before they are displayed. For this ranking of the retrieved images are obtained through some easy and efficient sorting algorithm. Insertion sort is one of such algorithms but it is slow because of sequential search technique used to find the actual position of the next key element into the sorted portion of data. In this paper we have modified the insertion sort algorithm by using a novel technique of using binary search mechanism for finding the sorted location of the next key item into the previously sorted portion of the data quicker than conventional insertion sort algorithm. Performance on running time of the new algorithm has been compared with those of other conventional sorting algorithms. The results obtained on image matching parameter show that the new algorithm is better in performance than the conventional insertion sort and merge s ort algorithms. Performance of this algorithm is comparable to that of quick sort. Consequently, the new algorithm will improve the overall performance of Content Based Image Retrieval systems. Index Termsââ¬âAlgorithm, Binary search, Sequential search, Insertion sort, Rahmani sort, Ranking, Image Ranking I. INTRODUCTION Many improvements have been introduced in searching and sorting algorithms during the last decade. Sorting is the process of arranging the elements in some ordered sequence which can be either in ascending, descending or lexicographic order [1]. Searching is the technique of finding the location of a key element or item in a database or a file. It is estimated that more than 25% of all computing time is spent on sorting the keys and some installations spending more than 50% of their computing time in sorting files [2]. As a matter of fact there has been done much research on the topic of sorting searching [3]. But there is not a single sorting technique which can be considered the best among the rest [2]. Bubble sort, selection sort and exchange sort are applicable for small input size, insertion sort for medium input size whereas quick sort, merge sort and heap sort are applicable for an application expecting large to huge data size [4, 5, 6]. All of the above sorting algorithms are comparison based algorithms and hence can be no faster than O(nlog2n) [5, 6], where O and n have their usual meanings. In this paper a new enhanced sorting algorithm has been introduced which shows more efficiency than the insertion sort and other sorting algorithms like bubble sort, quick sort and merge sort. The technique used for the enhancement in insertion sort is application of improved binary search, adapted from binary search, through which the location of the next element to be placed in the sorted left sub array can be found more quickly than the conventional sequential search used to find that location. The entire paper is organised in the following manner. In section II, the step by step method of the insertion sort is explained after some background work related to sorting technique. The other sorting algorithms like merge sort and quick sort are explained in section III. The new sorting algorithm, Rahmani sort is introduced and discussed in section IV. The analysis of Rahmani sort is done in section V. Results and comparison of performance of various sorting algorithms have been discussed in tabular forms in section VI along with the graphical description of the performance of various sorting algorithms. Finally the conclusions have been drawn and future scope of the research is mentioned in the section VII. Sorting Sorting is a process of arranging the available data items into an ordered sequence. The known ordered sequences have been increasing order, decreasing order, non-increasing order, non-decreasing order and lexicographic order. The process of sorting is applied to a collection of items prior to any such operation which may consume more time and/or space if applied without prior sorting. Definition of Sorting Formally a sorting technique can be defined based on partial order relation. The definition of partial order is given as below. Definition 1. Let R be a relation on a set S. For a, b, c à ââ¬Å¾ S, if R is: a) Reflexive, i.e. aRa for every a à ââ¬Å¾ S; b) Transitive, i.e. aRb Ã¢Ë § bRc ââ¡â aRc; and c) Antisymmetric, i.e. aRb Ã¢Ë § bRa ââ¡â a = b, then, R is a partial order on set S. Sorting is generally defined as an arrangement of a list of randomly input data by their key or themselves into a partial order R, where R implies âⰠ¤ particularly. Definition 2. For N elements a(1), a(2), , a(N) à ââ¬Å¾ S, sorting is a rearrangement of the elements in order to obtain a partial order a(si) R a(si+1) for âËâ¬si, 1 âⰠ¤ si a(s1) âⰠ¤ a(s2) âⰠ¤ , , âⰠ¤ a(si) âⰠ¤ , , âⰠ¤ a(sN) Importance of sorting in computation There are two direct applications of sorting: first as an aid for searching and second as a tool to match entries in files. Broad areas of application of sorting fall in the solution of many other more complex problems, from database systems, networking, MIS, operations research and optimization problems. Sorting algorithm is one of the most fundamental techniques in computer science because of the following reasons. First, it is the basis of many other algorithms such as searching, pattern matching, information retrieval, knowledge based systems, digital filters, database systems, data statistics and processing, data warehousing, and data communications [1]. Second, it plays an important role in the teaching of design and analysis of algorithms, programming methodology, data structures and programming. Furthermore, it is a very challenging problem which has been widely and thoroughly studied [19-24]; the performance is dramatically improved [25-30] and considered the lower-bound of complexity has been reached [19, 20, 29, 30]. It is estimated that over 25% of all computing time is spent on sorting with some installations spending more than 50% of their computing time in sorting files. Consequently, study of sorting algorithms has great importance in the field of computing. A good knack of comprehension of the theoretical intricacies involved in the design and analysis of the underlying sorting algorithm is very much expected of a person who needs to implement the algorithm in real life applications. A Need of Sorting Algorithm with Reduced Complexity Unfortunately, there is no any single sorting technique which may be called the best among the rest. Bubble sort, insertion sort, selection sort and exchange sort are applicable for input data of small to medium size whereas quick sort, merge sort and heap sort are applicable for an application expecting large to huge data size. These sorting algorithms are caparison based and hence can be no faster than O (n log n). There are a few algorithms claiming to run in linear time but for specialized case of input data. So, there is an urgent need of a new sorting algorithm which may be implemented for all input data and it may also beat the lower bound (O (n log n)) of the problem of sorting. This work is an effort in that direction. What is a sorting algorithm? Sorting is a process of arranging the available data items into an ordered sequence. A sorting algorithm is a set of steps arranged in a particular sequence that puts the available data items into a certain order. The well-known ordered sequences have been increasing order, decreasing order, non-increasing order, non-decreasing order and lexicographic order. An efficient sorting mechanism is important to optimizing the design of other algorithms that require sorted data items to work correctly. Well-known ordered sequences Let r1, r2, r3, â⬠¦ rn, be n number of input data items. Then any one of the following conditions must be satisfied for the input data items to be in a sorted sequence. Increasing order: For all 1 à ¯Ã¢â¬Å¡Ã £ i à ¯Ã¢â¬Å¡Ã £ n, ri à ¯Ã¢â ¬Ã ¼ ri+1. Decreasing order: For all 1 à ¯Ã¢â¬Å¡Ã £ i à ¯Ã¢â¬Å¡Ã £ n, ri à ¯Ã¢â ¬Ã ¾ ri+1. Non-decreasing order: For all 1 à ¯Ã¢â¬Å¡Ã £ i à ¯Ã¢â¬Å¡Ã £ n, ri à ¯Ã¢â¬Å¡Ã £ ri+1. Non-increasing order: For all 1 à ¯Ã¢â¬Å¡Ã £ i à ¯Ã¢â¬Å¡Ã £ n, ri à ¯Ã¢â¬Å¡Ã ³ ri+1. Lexicographic order: This is the order in which all the words of the English language are arranged in a dictionary. II. Background Work A. Basic Concepts Sorting [1] is a process of rearranging the available data items into an ordered sequence. An ordered sequence can be any one of the known ordered sequences: increasing order, decreasing order, non-increasing order, non-decreasing order or lexicographic order [2]. A sorting algorithm is a set of steps arranged in a particular sequence that puts the available data items into a certain order. An efficient sorting technique is important to optimize the design of other algorithms that would need sorted key items to work properly and efficiently. For an application, a sorting algorithm is selected according to its computational complexity and ease of implementation. For a typical sorting algorithm ideal behavior is O(n), good behavior is O(n logn) and bad behavior is O(nà ²) [1, 2]. The lower bound of time complexity of sorting algorithms using only comparison operation keys is O(n logn) [5, 6]. A sorting algorithm is easier to implement if its number of passes and the number of comparisons along with the actual number of swaps required to be performed can be easily predicted. Efficiency of the algorithm can be improved whenever it becomes possible to reduce the number of comparisons along with the actual number of swaps required to be performed. B. Classical Insertion Sort Algorithm This approach is based on the natural technique of sorting in day to day life by the human beings. Insertion sort is the simple sorting algorithm used in computation for the medium size data items or files. In the classical insertion sort approach the sorting of array elements is performed by inserting each element into its proper position in the previously sorted array. Insertion sort is considered to be faster than bubble sort and selection sort. It is very suitable algorithm for implementation using linked lists though its array implementation is more popular. C. The procedure The array is considered to be logically partitioned into two parts namely the first part and the second part. The first logical part has to be remained sorted always. Initially the first part is having only one element which is the first element of the input array and the second part comprises the rest of the input array. In the beginning, first part is automatically sorted because a single element is sorted by the definition of sorting. In each pass of the algorithm, the first element of the second part is separated from it before it is inserted into the first partââ¬â¢s proper position so that after its insertion the first part remains sorted. Before the start of the last pass of insertion sort, there is only one element remaining in the second part of the array, which is inserted into a proper position of the first part of the array and then the algorithm terminates. Shifting of elements may be required before we insert the current element in its sorted position. Shift operatio ns cost the most in array implementation of insertion sort. A formal description of Insertion sort algorithm InsertionSort (a, n) ââ¬Ëaââ¬â¢ is an array of size ââ¬Ënââ¬â¢ starting at position 1; elements of ââ¬Ëaââ¬â¢ will be sorted on termination. 1 for j ââ 2 to n do 2 key ââ a[j] 3 i ââ j 1 4 while i > 0 and a[i] > key do 5 a[i+1] ââ a[i] 6 i ââ i-1 7 a[i+1] ââ key Time complexity of Insertion sort is O(n2) and space complexity is O(n). Performance of insertion sort can be improved by quickly finding the location of an element and then by minimizing the number of shift operations required to be performed in its each iteration. Working of Insertion Sort algorithm Fig. 1 The operation of Insertion Sort on the array A = (14, 8, 20, 4, 6, 1) III. Other Sorting Techniques A. Merge Sort Merge sort is based on divide and conquer paradigm. The elements which are to be sorted are collected into an array. This array is divided into two sub arrays of almost equal sizes in top-down manner. Each one of the two sub arrays are again divided into their two constituent sub arrays of almost equal sizes respectively. This division process of the newly formed sub arrays will continue unless their size becomes unity. At size of unity, first of all, the sub array cannot be further divided and secondly the single element in the sub array is sorted, by the definition of sorting. After the last stage of division process, when all newly formed sub arrays are of unit size, the merging of the relevant unsorted sub arrays starts taking place in bottom-up manner with a view to form a sorted sub array (which was previously unsorted) for the next stage. The process of merging continues in the same manner unless the original array gets sorted. While division is a trivial job, the algorithm has to do the most critical job while merging the unsorted sub arrays into a sorted one. Time complexity of the Merge sort algorithm is ÃË(n logn) which is optimal. The major benefit of Merge sort is its stability and ease of implementation. The drawback associated with this algorithm is additional space requirement of ÃË(n) for the auxiliary array. B. Quick Sort Quick sort is also based on divide and conquer principle. Quick sort works by partitioning a given array A[p . . r] into two sub arrays A[p . . q] and A[q+1 . . r] such that every key in A[p . . q] is less than or equal to every key in A[q+1 . . r]. Then the two sub arrays are sorted through recursive calls to Quick sort. The exact position of the partition depends on the given array and index ââ¬Ëqââ¬â¢ is computed as a part of the partitioning process. The main advantages of Quick sort is that it only uses an auxiliary stack and requires only n logn time to sort n items. The drawback associated with this algorithm is that it requires quadratic (i.e. n2) times in worst case. In this case, the situation can be simply overlooked by mistake and hence may cause serious problems. IV. Rahmani Sort Algorithm A. The Concept In the classical insertion sort, we place the first element from the second logical sub array into a proper position of the previously sorted first logical sub array. But while finding the proper position of the element to be inserted, in the left sub array, a simple linear search approach is used which has a time complexity of O(n). Even this linear time complexity of searching the proper location of the element to be inserted may be quite considerable. That is why insertion sort is not a suitable sorting algorithm for sorting large number of elements. So, by improving the search procedure adopted in insertion sort algorithm, somehow or the other, the performance of insertion sort can be improved. The proposed new sorting algorithm called Rahmani sort algorithm is based on the new concept of inserting the first element of unsorted sub array into the sorted position of the sorted sub array. The classical Insertion sort takes O(n2) time. Rahmani sort algorithm is enhancement of Insertion sort by decreasing the time of finding the position of the new element in the sorted sub array. In the following sub section the differences between the Insertion sort and the Rahmani sort are being discussed. B. The Procedure The procedure of Rahmani sort for arranging the input array in ascending order is being describes as below: C. The Algorithm Rahmani Sort is comprising of two sub algorithms, one is RahmaniSort(a, n) and another one is iBinary(a, lower, upper, mid). Here, a = Array of key items to be sorted. n = Total number of elements in the array ââ¬Ëaââ¬â¢. lower = Lower index of the array ââ¬Ëaââ¬â¢. upper = Upper index of the array ââ¬Ëaââ¬â¢. mid = Middle index of the array ââ¬Ëaââ¬â¢. Algorithm for Rahmani Sort RahmaniSort(a, n) 1 for i ââ 2 to n do 2 temp ââ a[i] 3 j ââ iBinary(a, 0, i ââ¬â 1, temp) 4 while i > j do 5 a[i] ââ a[i ââ¬â 1] 6 i ââ i ââ¬â 1 7 a[j] ââ temp 8 return In the above algorithm, the element would be inserted in its proper position in the left sub array after shifting the rest of the array to the right side by one position. The iBinarySearch algorithm below above is used for finding the position of the largest element which is less than the key element stored in ââ¬Ëtempââ¬â¢. After finding this position, each element of the sub array from this position onwards will be shifted to the right by one position. The shifting will start from the right hand side. Algorithm for Improved Binary Search iBinary(a, lower, upper, temp) 1 flag ââ 0 2 loc ââ 0 3 mid ââ (lower + upper)/2 4 repeat while lower 5 mid ââ (lower + upper)/2 6 if mid = a[mid] then 7 loc ââ mid + 1 8 flag ââ 1 9 if mid 10 upper ââ mid ââ¬â 1 11 else 12 lower ââ mid + 1 13 if flag = 0 then 14 return lower 15 else 16 return loc à ¯Ã â⬠ºÃ ¯Ã¢â ¬Ã à ¯Ã à This paragraph of the first footnote will contain the date on which you submitted your paper for review. It will also contain support information, including sponsor and financial support acknowledgment. For example, ââ¬Å"This work was supported in part by the U.S. Departà ment of Comà merce under Grant BS123456â⬠. The next few paragraphs should contain the authorsââ¬â¢ current affiliations, including current address and e-mail. For example, F. A. Author is with the National Institute of Standards and Technology, Boulder, CO 80305 USA (e-mail: [emailprotected] boulder.nist.gov). S. B. Author, Jr., was with Rice University, Houston, TX 77005 USA. He is now with the Department of Physics, Colorado State University, Fort Collins, CO 80523 USA (e-mail: [emailprotected]).
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.