Anish's CSC148 SLOG

Monday 7 April 2014

Constant Time Access, End of the Course!!!

I will first talk a little bit about constant time access. I found it interesting how lists and python dictionaries have a constant time access while binary trees have a log(n) performance for inserting and retrieving elements. At first, I didn't understand why this was the case, but after reviewing the lecture notes again, I now understand.

The hash function reminds me a lot of what we are learning in linear algebra. A hash function is similar to a linear transformation. A linear transformation takes in a vector and converts it to a new vector in another vector space. I also found it interesting how a possible solution to having too many possible solutions to a hash function is to drop the injective requirement. To me, this seems like a slightly messy solution since there is the potential (although rare) for collisions.

Well, that's it! We have officially finished learning all the material for CSC148! I have really enjoyed this course because it was much more challenging than CSC108, and I am always up for the challenge! I am not worried about the exam because I already understand most of the concepts in this course and I haven't even started studying yet! When I do start studying, I plan to simply do many of the past exams for the course.

Here is my quick review of the course:

Assignments: Challenging, but fair.
Exercises: Easy.
Midterm 1: Fair.
Midterm 2: A bit too challenging, though I must admit I made some careless mistakes.
SLOGs: Not a fan. I feel like this doesn't have much to do with computer science.
Labs: Very useful. Provided me with a better understanding of the course.

Overall...I loved this course!

Sunday 30 March 2014

Sorting and Efficiency

The types of sorts that we have talked about this semester are quick sort, selection sort, insertion sort, and merge sort. First, however, I will talk about efficiency and Big-Oh.

The performance of a function can be described using Big-Oh notation. This means that, if the worst-case performance of f(L), where L is a list of n elements, is O(n), that means that a constant 'c' exists such that f(L) will never perform more than cn steps as n gets very large. In general, if the runtime of a function can be represented as f(n) and it's is in O(g(n)), then f(n) will always be less than cg(n) (for some constant 'c') as n gets very large.

Now I will talk about the different types of sorts.

Quick sort is a fairly simple type of sort. The way it works is there is a base case that if the list contains 0 or 1 elements, the list is simply returned, since it is already sorted. Otherwise, a pivot in the array is chosen and removed from the list, and 2 separate lists are created, one containing all the elements less than or equal to the pivot (let's call this 'left'), and the other containing all the elements greater than the pivot (let's call this 'right'). The function then returns the concatenation of recursive call of 'left' plus the pivot (in a list) plus the recursive call of 'right'. Here is an implementation of quick sort:

def quick(L):  
   if len(L) > 1:  
     pivot = L[0] # there are much better ways of choosing the pivot!  
     smaller_than_pivot = [x for x in L[1:] if x < pivot]  
     larger_than_pivot = [x for x in L[1:] if x >= pivot]  
     return quick(smaller_than_pivot) + [pivot] + quick(larger_than_pivot)  
   else:  
     return L

The way the pivot is chosen in fairly arbitrary. In this example, the pivot is simply the first element of the list, however it can be chosen in many other ways, such as by choosing the middle element. The average runtime performance of quick sort is O(nlogn) because the problem is continually reduced into (about) half, hence the 'logn' part, and for every time it is reduced, every element is searched through, hence the 'n' part.

Selection sort searching through every element, and for the current index 'i', find the minimum of of L[i:], and if it is less than the element and the current index, swap the two elements. Here is an implementation of the sort:

 def select(L):  
   """Produce copy of L in sorted order by selection sort  
   >>> select([3, 1, 2])  
   [1, 2, 3]  
   """  
   L1 = L[:]  
   for i in range(len(L1)):  
     m = min(L1[i:])  
     j = L1.index(m,i)  
     L1[i], L1[j] = L1[j], L1[i]  
   return L1

The average runtime of this sort is O(n^2) because it takes n comparisons to find the minimum of a list of length n, and this happens n times for a list of length n since the function iterates over every element of the list.

Insertion sort involves searching through every element in the list, and 'inserting' the element into the list before the current index. Here is an implementation of the sort:

def insertion_sort(s):  
   for i in range(1, len(s)):  
     val = s[i]  
     j = i - 1  
     while (j >= 0) and (s[j] > val):  
       s[j+1] = s[j]  
       j = j - 1  
     s[j+1] = val

The average runtime of this sort is also O(n^2) because it iterates through the list n times, and for each iteration, the list is compared at most n times to 'insert' the element.

Finally, merge sort involves dividing the list into 2, merge sorting the 2 halves, and then 'merging' them together. Here is an implementation of the sort:

def merge(L1:list, L2:list) -> list:  
   """return merge of L1 and L2  
   NOTE: modifies L1 and L2  
   >>> merge([1, 3, 5], [2, 4, 6])  
   [1, 2, 3, 4, 5, 6]  
   >>> merge([1, 2, 3], [0, 4, 5])  
   [0, 1, 2, 3, 4, 5]  
   >>> merge([0], [1, 2, 3, 4])  
   [0, 1, 2, 3, 4]  
   """  
   L = []  
   L1.reverse()  
   L2.reverse()  
   while L1 and L2:  
     if L1[-1] < L2[-1]:  
       L.append(L1.pop())  
     else:  
       L.append(L2.pop())  
   L1.reverse()  
   L2.reverse()  
   return L + L1 + L2  
 def merge_sort(L):  
   """Produce copy of L in non-decreasing order  
   >>> merge_sort([1, 5, 3, 4, 2])  
   [1, 2, 3, 4, 5]  
   """  
   if len(L) < 2 :  
     return L  
   else :  
     left_sublist = L[:len(L)//2]  
     right_sublist = L[len(L)//2:]  
     return merge(merge_sort(left_sublist),   
            merge_sort(right_sublist))

As you can see, a helper function is used to simplify the problem. The helper function takes two sorted lists and 'merges' them together to create one sorted list. The runtime of merge sort is O(nlogn). The merge_sort function, like quick sort, has a base case where if the list contains 0 or 1 elements, simply return the list since it is already sorted. Otherwise, the problem is continually reduced in half, hence the 'logn' part. And for every time the problem is reduced, the two halves are 'merged', which requires n steps, hence the 'n' part. Finally, although merge sort and quick sort are both in O(nlogn), merge sort is generally faster.

Sunday 23 March 2014

Sorting, Midterm 2!

During the lecture this week, I had a little trouble understanding how quick sort worked and why its performance was O(nlog(n)). Specifically, I was confused as to how a pivot was chosen, and what the pivot was for. So after the lecture I decided to do some research of my own, and I finally understood how it works. Basically, the way the pivot is chosen is essentially arbitrary, but usually it is chosen randomly or the element at the middle index is chosen. The list is then divided into 2 lists; one that contains elements less than the pivot and one that contains elements greater than the pivot. And the function is recursively called on those 2 lists.

The reason why the performance of quick sort is O(nlog(n)) is because the pivot is chosen log(n) times since once a pivot is chosen, the problem is split in half. And every time a pivot is chosen, there are n steps required (since every element must be compared to the pivot), therefore n * log(n) = O(nlog(n)).

I find merge sort to be a very interesting way to sort a list. The idea behind it is to, as stated in lecture, 'divide the list in half, (merge) sort the halves, then merge the sorted results'. After reading over some online explanations and implementations of merge sort, I have become fairly comfortable with merge sort and I am confident I can properly implement it myself.

I am a bit nervous about the second midterm this Wednesday, mainly because there was only one practice midterm posted, and that contained very little about what will be on this upcoming midterm. Therefore it is hard to predict what kinds of questions will appear on the midterm. Something I don't understand is why our midterms and final exams are written on paper, since this is a course that is done solely on computers. Maybe its a logistical issue with the university, but I feel like this should be changed soon, since in real life, we would be able to test our code on a computer before being certain it works, unlike on our midterms.

I agree with what is written on this blog - http://katherine-karin.blogspot.ca - when it says that solutions to the lab exercises should be posted. Some of the labs are very challenging and it would be nice to be able to see a correct solution to study off of.

Sunday 16 March 2014

Binary Search Trees, Searching, Sorting, and Big-Oh

This week was mostly review for me. This is because the searching and sorting was review from high school AND from CSC148, and the big-oh notation was review from CSC165.

I found the problem of deleting a node from a binary search tree to be both difficult and interesting. After thinking about it for a while and actually drawing out some trees on paper, I realized that it wasn't a very difficult problem and I actually went and created a delete method for the binary search tree class. The idea is simple; to delete a node with 2 subtrees, simply replace it with the largest leaf in the left subtree. Another method would be to replace it with the smallest leaf in the right subtree. I chose to do it in the first way.

Searching and sorting was fairly easy to understand because, like I said before, it was all review. One type of sort we didn't talk about was Merge Sort, which I did learn in high school, although I forget how to implement it. I remember Merge Sort being the most efficient sorting method, so I am excited to learn about that.

Big-Oh was very easy to understand since it was covered in CSC165 which I took last semester. Everything we learned about Big-Oh this week was very basic to me because in CSC165, I learned about Big-Oh much more in depth, including it's formal symbolic definition and proofs involving Big-Oh. I wonder if we are going to learn about Big-Omega and Big-Theta as well.

I like how this blog http://csc148sloguoft.blogspot.ca describes the overlap of material between CSC148 and CSC165. This person talks about how this course "focused on application and many examples concerning sorting algorithms were shown. The other course goes deeper into the mathematical theory." I thought that this was a very accurate way of describing how Big-Oh was taught in the two courses.

Sunday 9 March 2014

Linked Lists, Assignment 1

This week we talked A LOT about linked lists. The two types of linked lists we talked about were lists made up of an item and the remaining list and lists as objects with a value and a reference to other similar objects.

I prefer the type made up of an item and the remaining list because I find that it is a lot simpler and easier to understand. With the wrapper class, you have to create some helper functions, which I find can be a tad confusing. Something I currently don't understand is when one type of linked list is preferable over the other (or if it never really matters).

Right now, I am a little worried about the types of problems about linked lists that will appear on the second midterm and the exam and whether or not I will be able to answer them correctly. I definitely plan to practice linked list problems more to get more comfortable with the topic. I also hope that the lab this week will further my understanding of the topic.

Assignment 1 has been going pretty well so far. At first, I had a bit of trouble understanding what the purpose of Part 1 was, but after getting helpful information from my peers on Piazza, I understood what the assignment was asking me to do and had no problem finishing Part 1. After reading the blog at http://dlam15.blogspot.ca, I can see that I am not the only one who had this problem! The most confusing aspect of the assignment was how the regex class differs from the normal tree class that we discussed in class, but now that's all cleared up! I have even finished 2 of the functions for Part 2! However, since Part 2 isn't due for a while, I think I'll start working on Exercise 3 now.

Friday 28 February 2014

Recursion

Recursion is a very interesting topic. It can be very useful in many cases, but it can also be very difficult to understand.

The idea of recursion sounds pretty simple; find a base case, and use recursion to handle all other cases (general cases). When I say 'use recursion', what I mean is that you actually call the function inside of the function! A classic example would be to find the sum of all the elements in a nested list. To solve this problem, you would start by simply creating a 'sum' variable, and then creating a for-loop to iterate through every element of the list. The 'base case' for this problem would be when an element is an integer. In this case, you would simply add this element to the sum. Otherwise, you would call the function again, except the parameter for the list would simply be the current element in the for-loop. This would be added again to the sum variable. Finally you return the sum variable and then you're done! Here is some code to demonstrate:

def nested_sum(L: list) -> int:
""" Return the sum of the nested list L.
Precondition: L is a nested list of integers
"""

list_sum = 0 # create sum variable

for element in L:
# base case: when element is an integer
if isinstance(element, int):
list_sum += element
# general case
elif isinstance(element, list):
# call the function using 'element', which is a list, as the parameter
# this is the key part of recursion
list_sum += nested_sum(element)

return list_sum

Something to notice about recursion is that by calling the function with one of the elements of the list, you are making the problem smaller and smaller until you reach the base case.

This function is actually significantly longer than it needs to be. I showed you this simply to thoroughly explain what recursion is and how it works. However, the whole body of this function can be written in only one line of code! This can be done using list comprehensions and using Python's built-in function 'sum', which returns the sum of a non-nested list. Here is what this would look like:

def nested_sum(L: list) -> int:
""" Return the sum of the nested list L.
Precondition: L is a nested list of integers
"""

return sum([nested_sum(x) if type(x)==type([]) else x for x in L])

Now let me explain this. The first thing you should notice is that I am returning the sum of some list. This list is the original inputted list, but any nested lists become sums (aka [1, [2, 3], 4] becomes [1, 5, 4]). The way I create this list is by using a list comprehension. I am saying that if an element is a list, than use the nested_sum of that list, otherwise simply use that element. Once I have successfully created this list, I can simply use Python's built-in 'sum' function to find the overall sum and finish the problem.

Recursion can often be a tricky concept, and I have found that the best way to learn it is to try problems out yourself.

Another note I want to add is that not all problems will have only one base case. For example, the Fibonacci sequence, 0, 1, 1, 2, 3, 5 ,8, 13, 21..., is a sequence where the nth term is the sum of the 2 terms before it, however the first 2 terms are 0 and 1. Therefore, if you wanted to create a function that returns the nth value of this sequence, you would need to have a recursive function that has 2 base cases. Here is what that would look like:

def fib_seq(n: int) -> int:
"""Return the n'th term of the Fibonacci Sequence."""

# Base case 1: if n == 0
if n == 0:
return 0
# Base case 2: if n == 1
elif n == 1:
return 1
# If n > 1, return the the sum of the previous 2 terms (general case)
else:
return fib_seq(n-1) + fib_seq(n-2)

This may look easy right now, however I must warn you that seeing and understanding a recursive function is MUCH different than solving a problem by creating a recursive function. Finding the base case(s) is usually the easy part (for me, at least), however handling the general case can become pretty difficult. Like I said before, practice is the best way to learn recursion!

Saturday 15 February 2014

Binary Trees, Term Test!!!

This week we learned about binary trees. I found the concept of binary trees fascinating and not very difficult. I understand all the different functions involved in binary trees, and I am confident that I can implement one by myself (I have yet to try it, but plan to during reading week). One thing that I don't understand is what the purpose of binary trees is. I fail to see any practical use of a binary tree.

I am not too stressed about the term test for this course. This is mainly because I have learned a good amount of the course material during high school, and I have understood all the material that has been new to me. One thing I am a bit worried about for the midterm is the recursion questions. This is because, although I can now do most recursion problems without any trouble on my computer, it will be difficult to do it on paper without having a computer to verify my results! This means that to check any recursive functions that I write on the test, I will have to manually trace it, and with recursive functions, that can take a while! That's why I plan on studying recursion by redoing all the problems I have tried on paper, and then testing it on my computer to see if I was right.

After reading the Week 6 post on http://whatsgabbening.blogspot.ca, I see that other people felt the same way about this week's lab in the sense that it was fairly easy (I also left early). However, I disagree about having to understand the use of 'filter'. As long as you understood the purpose of the function, there was no need to understand what 'filter' does. During the lab, we had to rewrite a bunch of code without using list comprehensions. I like this style of coding better, because it makes it much more clear what the code is actually doing.