---------------------------------------------------------
The Liberty Basic Newsletter - Issue #41 - JUN 99
"Knowledge is a gift we receive from others."
		- Michael T. Rankin
---------------------------------------------------------
In this issue:

Searching Arrays - Part 1, by Ian Davies

Thanks, Ian for this great article -
and for all of your contributions to
the LB community!


In future issues:

Searching Arrays - Part 2, by Ian Davies
GUI Design
Menus
---------------------------------------------------------
Copyright (c) 1999, Ian Davies:
mailto:davies@ipc.kit.ac.jp


Searching arrays - Part 1

This edition of the Liberty BASIC newsletter will concentrate 
on how to efficiently search arrays in Liberty BASIC and 
introduces the concept of binary searching. The second part 
of the newsletter will focus on a more general example of 
binary searching. 

If the explanation within this newsletter is difficult to 
understand then feel free just to adapt the code to your 
own needs. The main point should be that you try and 
incorporate the binary search technique within your programmes.


Array Basics

As we all probably know, arrays are very useful tools to 
store lists of information. A simple example would be an 
address book. If we want to store the names of 20 friends i
n our programme then we can use an array in the following manner:

  ' First we should DIM the array to a value larger than we'll ever use
DIM name$(100) 
  ' Next we fill in the array with names of our friends  
name$(1)="John" 
name$(2)="Janet" 
.....
.....
.....
name$(20)="Susan"

Well, this isn't a very useful basis for an address book as 
it only contains the names of our friends. In fact, this 
example is the simplest type of array and called a 
"1 dimensional" array as each row, i.e., name$(1), 
name$(2), etc., can store only one piece of information - 
in this case the person's name.

It would be much more useful to have an array that can store 
several pieces of information for each row. For example, in 
an address book we might want to store the person's name, 
sex, birthday, telephone number, and address. In this case 
we need to store 5 pieces of information for each person so 
we need a 5 dimenional array as follows:

DIM address$(100,5)
address$(1,1)="John" : address$(1,2)="M" : address$(1,3)="June 4th"
address$(1,4)="(0123) 456789 : address$(1,5)="3 High Street, Anytown, Anyplace"

address$(2,1)="Janet" : address$(2,2)="F" : address$(2,3)="October 23rd"
address$(2,4)="(0987) 654321 : address$(2,5)="45 Main Road, Bigtown, Overthere"
.....
.....
.....
address$(20,1)="Susan" : address$(20,2)="F" : address$(20,3)="February 16th"
address$(20,4)="(0321) 987654 : address$(20,5)="87 River Avenue, Smallvillage, Nowhere"


Why do we want to search an array?

Suppose that we want to write a programme to store and 
the names and addresses of our friends. The programme would 
need a way to search through this array in order, for example, 
to check if a new name we want to input is already in our 
address book. An easy way to search through an array would 
be the following:

	' Number of rows occupied in the array
N=20     
	' The name of the person we wish to search for
Search$="Lisa" 
	' Uses a flag to check if we find a match
Flag=0   
	' Counts how many times the FOR ... NEXT loop is executed
Loops=0 
FOR x=1 TO N
			' Increments Loops by 1
		Loops=Loops+1
		IF  Search$=address$(x,1) THEN Flag=1
NEXT x
IF Flag=1 THEN PRINT "This person is already in the database" ELSE _
     PRINT "No match was found"

This simple programme will check all N names in our array 
(N=20 for our case) to search for a match. If there is a 
match then Flag=1 and we can use the value of the flag to 
print an appropriate message or jump to a subroutine. Note 
that in this case we were only interested in whether ANY 
duplicates existed. If we wanted to know HOW MANY duplicates 
existed then we could change "Flag=1" to "Flag=Flag+1". 

In the case that we are concerned only whether ANY duplicates 
exist, we can see that the above code is very inefficient. The 
reason being that we search through all N rows of the array 
whether or not a duplicate is found. In this case the value of 
Loops will always be N at the end of the programme. However, 
if we want to search efficiently (i.e., quickly) through an 
array then we need to find a way to make the final value of 
"Loops" as small as possible.

One simple method to improve the efficiency of our search is 
to exit the FOR ... NEXT loop as soon as we find a match:

FOR x=1 to N
	Loops=Loops+1
		' Note the extra instruction in the following line
	IF Search$=address$(x,1) THEN Flag=1 : GOTO [Match]
NEXT x
' The programme will only reach this point if there WAS NOT a match
PRINT "No match was found"
GOTO [No_match] 
	' The programme will only reach the following point if there WAS a match
[Match] 
PRINT "This person is already in the database"

This routine might look fine to start with but in fact it 
contains an example of bad programming that we should try 
to avoid. The reason is that we should never use a GOTO 
statement to jump out of a FOR .... NEXT loop. Instead we 
should use something like the following:

FOR x=1 to N
		Loops=Loops+1
			' Note the change of x=N in the following
		IF Search$=address$(x,1) THEN Flag=1 : x=N
NEXT x
IF Flag=1 then PRINT "This person is already in the database" ELSE _
     PRINT "No match was found"

We can see that, by setting x=N when a match exists, the 
programme will exit the FOR .... NEXT loop at the next NEXT 
statement. Generally it is always advisable to exit a FOR ... 
NEXT loop by setting a flag variable and making x=N (or whatever 
is appropriate for your case).

In our very first example we saw that Loops=N when we exited 
the FOR ... NEXT loop. How about this new case? Well, sometimes 
we will find a match quickly (when x is small), sometimes we'll 
find a match near the end of searching (when x is nearly equal 
to N) and other times we will not find any match (in which case 
x will be equal to N). The answer is that, on average, if the 
array contains one match then Loops=N/2. Indeed, if the array 
contains m matches then typically Loops=N/(m+1).

If we assume there to be a single match then we have changed 
Loops from N in our first example to N/2 in the present 
example. This shows that we can decrease searching time 
by 50% with hardly any programming effort.

Well, if your arrays only have 10 or 20 rows then you could 
stop at this point and be satisfied with a 50% decrease in 
searching time. However, what if our address programme 
contains details for everyone in our street or town or 
country? What if, instead of N being 10 or 20, we have 
N equal to 1,000 or 1,000,000?

We can imagine that even a 50% decrease is searching time 
is not going to help us much if we still have to search 
through 500,000 rows. On a normal PC this might take 30 
minutes or so. What if I said that a method exists to 
decrease our searching time by 99.998%, i.e., make the 
searching speed 50,000 times quicker?

I thought that you would be impressed.


Binary searching

Binary searching is a very efficient method to search ordered 
arrays. We can present the idea of binary searching through 
a simple example as follows.

Imagine that you have a ruler whose length is 32 centimetres 
instead of the normal 30 centimetres. Now think of a number 
between 1 and 32 ..... say 11.

Put your finger at the halfway (i.e., 16 cm) mark.
 
Is the number we want smaller or larger than where we are now? 
The answer is "smaller" as 11<16. Now move your finger 1/2*1/2 
of the ruler (i.e., 1/4*32=8) towards the 0 cm mark. Your finger 
should now be on the 8 cm mark of the ruler.

Is the number that we want smaller or larger than where we 
are now? The answer is "larger" as 11>8. Now move your finger 
1/2*1/2*1/2 of the ruler (i.e., 1/8 *32=4) towards the 
32 cm mark. Your finger should now be on the 12 cm mark.

Is the number we want smaller or larger than where we are 
now? The answer is "smaller" as 11<12. Now move your finger 
1/2*1/2*1/2*1/2 of the ruler (i.e., 1/16*32=2) towards the 
0 cm mark. Your finger should now be on the 10 cm mark.

Is the number we want smaller or larger than where we are 
now? The answer is "larger" as 11>10. Now move your finger 
1/2*1/2*1/2*1/2*1/2 of the ruler (i.e., 1/32*32=1) towards 
the 32 cm mark. Your finger should now be on the 11 cm mark.

You can see that we "zoomed" into our search value, i.e., 
11, using only 4 steps (or 5 if you include putting our 
finger at the midpoint). Each step would be equivalent to 
one loop in our earlier code. We can also see that the 
amount we moved quickly became smaller ... 8 ... 4 ... 2 ... 1.
Anyone familiar with computers should recognise these 
numbers to be the first four of the series 2^x where x 
is referred to as the number of bits. The series is 
1 ... 2 ... 4 ... 8 ... 16 ... 32 ... 64 ... 128 ... 256 ... 
1024 ... etc. For example, an 8 bit display can show 
2^8 or 2x2x2x2x2x2x2x2 or 256 colours. 

Well, what value of x do we need for the case of our ruler, 
i.e., 2^x=32. The answer is x=5 and not by coincidence was 
this was the number of steps we needed to find the 11 
centimetre mark. In fact, if we want to binary search an 
array with N rows then the number of steps we need is 
given by 2^x=N. In fact, we'll see in the second part of 
this newsletter that this isn't always true but for the 
present examples we can assume it to be true.

What we are saying is that, for an array of size N, the 
value of "Loops" when our programme ends will be given by 
2^Loops=N or we can write it another way as Loops=log(N)/log(2). 
Our first searching technique gave us Loops=N, then Loops=N/2, 
and now Loops=log(N)/log(2).
 
Now let's calaculate the value of "Loops" for arrays with 
different sizes, i.e., different values of N.

N=32
Method 1 : Loops=32
Method 2 : Loops=16
Binary method: Loops=4

N=1024
Method 1: Loops=1024
Method 2: Loops=512
Method 3: Loops=10

N=1,048,576
Method 1: Loops=1,048,576
Method 2: Loops=524,288
Method 3: Loops=20

If we assume that it takes about the same time to perform 
one loop for each searching method then we can say that an 
array with 1,048,576 rows can be searched more than 26,000 
times quicker using the binary search technique. Even for a 
small array with 32 rows this technique is at least 4 times 
quicker.

By this point you might be asking the following questions:

(i) How can searching for numbers on a ruler be applied to 
a real array?

(ii) How can the binary method be applied to my array when it 
isn't 2^x rows in size?

(iii) What happens if I binary search an array where there 
aren't any matches?


In this newsletter we'll only consider the first question. 
The other questions will be answered in part two of the 
newsletter.

To answer the first question, imagine that our ruler is an 
array and each centimetre mark is one row such that we have 
ruler(1), ruler(2), ruler(3), ruler(4) ...... ruler(32). 
The actual values of each row are not important for the 
binary search; only that the array should be sorted with 
ruler(1) the smallest and ruler(32) the largest. This is 
important ....

Only sorted arrays can be searched using the binary 
method.

The SORT command in Liberty Basic will correctly and quickly 
sort arrays of any dimension.

Well, enough of this talking. Here is some code that shows 
the result of binary searching an array with 8192 (i.e., 2^13) 
rows. The array is called "address" and to make it easy, the 
array values are simply address(1)=1, address(2)=2, address(3)=3 
.... address(8192)=8192. However, they could have been filled 
with names, dates, colours, country names, etc., and it wouldn't 
make any difference. Of course, you'd have to use address$ 
instead of address if the array contained text.

When you run the programme it will take a few seconds before 
anything happens. This is due to the filling of the array with 
its 16,384 values together with the SORT command (which was 
not strictly necessary in this case as the array was already 
sorted when created). A random number between 1 and 16,384 is 
then picked and this number is searched for within the array. 
Results for each search will be printed meaning that 20 
searches have been performed between the time the first 
text appears and the time the text stops moving up the screen. 

In fact, searching through an array with 268 million rows 
(i.e., 16384^2) would take only twice as long to complete 
but actually creating the array prior to searching would 
take forever even if the computer had sufficient memory. 
If you want to be brave then try increasing the array size 
to N=32768 or N=65536. You will see that the search time 
itself is only slightly increased.


    ' Simple programme to show binary search method
    ' Ian Davies
    '
    ' First we should DIM the array to a value
    ' larger than we'll ever use
DIM address(70000)
    ' Actual size of the array we use
N=16384
    ' How many times to perform the search
Cycles=20


    GOSUB [Setup_Array]
FOR y=1 to Cycles
    GOSUB [Compare]
    GOSUB [Results]
NEXT y
    END


[Setup_Array]
        ' Create an simple array filled with
        ' integer numbers
    FOR x=1 to N
        address(x)=x
    NEXT x
        ' In this case the array is already sorted
        ' but you might need to SORT for your array
    SORT address(,1,N
RETURN


[Compare]
        ' Picks a number between 1 and N
        ' to search for
    SearchNumber=INT(RND(1)*N)+1
        ' Sets initial pointer to N/2
        ' i.e., the midpoint of the array
    CurrentPointer=N/2
        ' Sets the first movement of the pointer to N/4
    CurrentMove=N/4
        ' Sets the number of loops to zero
    Loops=0
        ' The important part starts here
    WHILE CurrentMove>0.25
        OldPointer=CurrentPointer
        ' Moves the pointer up or down the array
        IF SearchNumber>address(INT(CurrentPointer)) THEN _
            CurrentPointer=CurrentPointer+CurrentMove ELSE _
            CurrentPointer=CurrentPointer-CurrentMove
        ' Decreases the pointer movement value by 1/2
        CurrentMove=CurrentMove/2
        ' Increments the Loops variable
        Loops=Loops+1
    WEND
RETURN


[Results]
IF SearchNumber=address(INT(OldPointer)) THEN _
    PRINT "Value searched for ";SearchNumber: _
    PRINT "Value found        ";address(INT(OldPointer)) _
    ELSE _
    PRINT "Value searched for ";SearchNumber: _
    PRINT "Value found        ";address(INT(OldPointer+1))
    PRINT "Number of loops    ";Loops
    ' Checks that we really found the value
IF SearchNumber<>address(INT(OldPointer)) AND _
    SearchNumber<>address(INT(OldPointer+1)) THEN _
    PRINT "ERROR ERROR ERROR ERROR ERROR ERROR ERROR"
PRINT : PRINT
RETURN


If you want to understand exactly what is happening then try 
using N=4 or N=8 and follow the programme through with the 
debugger.

One result from this programme is that the value we are 
searching for, "SearchNumber", will be either 
address(OldPointer) or address(OldPointer+1). We have 
set up the array so that a match definitely exists which 
means that if SearchNumber is not equal to address(OldPointer) 
or address(OldPointer+1) then something went wrong and an 
error message will be printed. It would be possible to 
create a binary search that definitely picks the correct 
array row. However, it isn't that bad if we search through 
16384 rows and end up with two possible matches as it's 
only an extra line of code to decide which of these 
possibilities is the correct one. 

Unlike the non-binary search methods, we didn't check for 
a match each time that we went through the WHILE ..... WEND 
loop. You'll see that we only checked for a match when the 
"WHILE CurrentMove>0.25" became false, meaning that we had 
already zoomed into the likely target. The reason is that 
we are dealing with large arrays so it's unlikely that 
we'll accidently land on the correct array row until that 
condition is met. For example, if it takes us 20 loops to 
search through an array with 1 million rows then the chance 
of us accidently finding the correct row until the very end 
is extremely unlikely.

The most important line within the programme is:

WHILE CurrentMove>0.25

as this decides at what point we believe we are close enough 
to the target to stop searching.
 
You can try changing this value to something larger such as 
0.5 but then you'll notice that some of the results show an 
error as the correct array row wasn't found. So, at least for 
the way this programme is written, "WHILE CurrentMove>0.25" is 
the correct condition that decides when to stop searching.


Summary

In summary, we considered several different methods to search 
arrays. If we deal with large arrays then the binary method is
hundreds or thousand times quicker than other methods. The only 
constraint for binary searches is that the array must be sorted. 
The example of binary searching we presented showed that an array 
with 16,384 rows can be searched very quickly. In the next part 
of this newsletter we will look at a more general case of binary 
searching of arrays. 
---------------------------------------------------------
 Newsletter compiled and edited by: Brosco and Alyce.
 Comments, requests or corrections: Hit 'REPLY' now!
            mailto:brosco@orac.net.au
                       or
           mailto:awatson@mail.wctc.net
---------------------------------------------------------

