The-definitive-guide-to-algorithms-a-data-science-guide

January 26, 2020

If you have any interest in computer engineering, especially as it applies to data science, you’ve likely heard the word “algorithm” tossed around. Even people unfamiliar with much of computing have probably heard the idea used in reference to Google’s search algorithm. You may be wondering, though, what the concept means and why it matters. Let’s take a closer look at what algorithms are and how they’re put to use.

Young african male programmer writing program code sitting at the workplace with three monitors in the office. Image focused on the screen

What is an Algorithm?

Every solution to every problem imaginable has some sort of description attached to it. If you’re baking cookies, for example, there will be a series of steps you have to follow. This goes beyond the recipe because you can’t just throw a bunch of ingredients together and make cookies. Instead, you have to deal with the wet and dry ingredients separately, and then everything has to come together to be baked. The same notion applies to any problem in computer science. Steps have to be enumerated and followed.

Many challenges in the coding world lend themselves to a highly structured approach. This is where the algorithm comes into the picture. Let’s say you want to compare a set of projections in a financial application. The common solution to this is to run what’s called a Monte Carlo simulation. While a Monte Carlo simulation is intended to deal with unpredictable variables, the process of coding one is highly structured.

In the Monte Carlo simulation, the algorithm represents the steps needed to run the simulation. This includes:

Generating static points and random pairs to chart
Inducing noise in the system
Calculating distances between points
Sorting the distances
Recording data
Repeating the function for the desired number of iterations

Programmers often discuss algorithms in terms of pseudocode. Rather than focusing on the outright code in a particular programming language, a programmer focuses on understanding the broad structure of the code. This serves the purpose of allowing them to reuse the algorithm across many programming languages. While some programming languages are more ideal for certain purposes than others, for example, you’d never use PHP for a mission-critical financial application, but Python and Java are killer, every algorithm should be applicable in virtually all environments.

What Can Algorithms Do?

Ideally, a good algorithm takes a task and boils it down to its simplest form without leaving anything out. Extra moving parts in code consume processor cycles, and that’s just bad. Economizing even a couple lines of code can yield insane results at scale.

For example, Google is estimated to use about 15 exabytes (15,000,000,000,000,000,000 bytes) of storage. While economizing one line of code might only yield less than a millisecond of improvement in the time for a single search on Google, repeat that across 15 exabytes and billions of queries. That becomes millions of hours that people get back in their lives because the search system works better. It also reduces the amount of time that storage systems and networks spend running, yielding reductions in wear and tear on equipment, electrical costs and even greenhouse gas emissions.

Likewise, quality matters just as much as efficiency. The reason Ask Jeeves (now Ask.com) has largely been relegated to the dustbin of history is because the company failed to be competitive in web search due to low-quality results. Algorithms decide the winners and losers in the tech world.

Even if two organizations are using the same well-known algorithm, such as the Markov chain that’s used in Google search, having the better one means a lot. The PageRank algorithm that sits at the bottom of Google search improves upon pre-existing knowledge regarding Markov chains. Google also builds layers and layers of algorithms on top of PageRank to handle things like identifying fraudulent links, detecting semantic differences between two pages and even determining which page will look best on your phone versus your desktop.

Where Do You Even Begin?

Everyone who has ever become a serious computer engineer, data scientist or programmer started somewhere. Let’s look at a few of the basic algorithms that virtually every programmer on the planet has had to learn at some point.

The Sort Algorithms

The primary value of computing is its ability to handle quantities of data that no human could even joke about dealing with. At the core of this work is a collection of tools known as the sort algorithms. These handle important jobs involving data like:

Merging items
Quickly sorting through them
Placing them into buckets
Building a heap
Counting items

Suppose you were working as a web developer and building an ecommerce website. A major function of any good shopping site is being able to sort things by price, popularity and ratings. You also need to check inventory counts before you tell a potential customer that they can buy something. Every one of these actions is the “sort” algorithms in action.

Indexing, Keys, Security and Hashes

Assigning unique identifiers makes a huge difference in a wide range of computing tasks. Some are necessary for doing things like tracking down a user’s information by tying that data to an ID. Routers use path pairs to handle the routing of data. Hashed maps and tables can also be used to assign keys to different elements and make looking them up by key much easier. On the security end, IDs, indexes and keys are utilized to provide multi-factor authentication for logins.

Mathematical Shortcuts

Computation is by far one of the biggest costs in terms of processing power. Many encryption algorithms depend on gigantic numbers to provide a high level of security. Computing something using 1024-bit RSA isn’t computationally simple. In fact, that’s the point. The lack of computational simplicity is what makes security keys secure, because even the most cryptographically advanced organizations on earth have to spend significant resources just to break a single key.

You don’t want your iPhone to struggle with unencrypting the stored key it has based on facial recognition. Computing through this task is simplified by using a squaring method that allows it to use one key to calculate the other part of the key. In many applications, this is accomplished using an algorithm built on binary exponentiation that allows a program to take a computational task like calculating 2^32 and arrive at the result in 5 steps rather than 32.

Similar work can be done using sieves. Suppose you wanted to calculate all the prime numbers between 0 and 10 trillion. There’s certainly a way to brute force the solutions, but people know enough about primes to rule out many of the potential candidates without having to do all the calculations. This operates as a sieve, and it narrows things down.

Intriguingly, there are also massive and highly inefficient computational algorithms that can be used to handle work like benchmarking. For example, the widely accepted PassMark CPU benchmarks are calculated using a series of computationally challenging tests. These include floating-point calculations, encryption, compression and other math-intensive operations. Rather than hoping to make things run better, these algorithms try to stress the processor by hitting it with massive computational loads. The work power of the processor is then calculated based on how quickly or slowly it got through these tasks.

Machine Learning and AI

Few fields are as replete with algorithms as machine learning and artificial intelligence. Compressing massive datasets into models that allow machines to make well-informed decisions quickly is essential.

Look at machine vision applications. Massive amounts of training data are used to help a machine recognize something an adult human would find intuitive, such as distinguishing a bird from a car.

It’s neither efficient nor desirable for a machine to keep doing this on the fly. Instead, companies like Nvidia and Tesla use gigantic training sets to build models. For example, a training set might be fed into a system that employs many GPUs that have a few thousand cores for computation and consumes several kilowatts of electricity per hour. Once a viable model has been derived from many iterations of training, that model can be loaded into a much simpler machine.

For example, the Nvidia Jetson only has a couple hundred cores and operates at just 5 watts, a massive reduction in performance overhead once a model is put to work in the field. That’s critical for real-world applications where systems may only be able to get a little bit of energy each day from something like a solar panel.

In many cases, these algorithms can outperform brute force solutions. The widely used chess engine Stockfish is regularly brutalized in competitions against Google’s DeepMind-based AlphaZero. AlphaZero uses a training system that learns the game while Stockfish uses brute force calculations to compute every position. This means AlphaZero not only plays faster and more human moves, but it also frequently catches broader tactical and strategic concepts that Stockfish simply misses.

Conclusion

The diversity of problems that are solved using algorithms is impressive. This is one of the reasons that algorithms are regularly taught to computer science and engineering students. Simply put, it’s rarely worth the effort to reinvent the wheel.

Especially for beginners, knowing how to walk through an array is a skill that allows them to produce powerful results quickly. This not only instills confidence, but it helps them understand why further study of algorithms is essential.

Algorithms are a part of the toolbox that every computer scientist and engineer, data scientist and programmer should care about. With a strong knowledge of algorithms, you can tackle big problems quickly and produce systems that will deliver impressive results.

Did learning about algorithms interest you? University of Silicon Valley offers a comprehensive Computer Science & Engineering degree programs taught by entrepreneurs who are in the thick of the industry. In this project-intensive Software Engineering concentration, you’ll not only cover the fundamental concepts of the software development process, but you’ll explore the different ways that complex software systems are changing the world.

University of Silicon Valley is uniquely poised to offer a meaningful and valuable education for 21st century students. We believe in an education that directly correlates with the work you’ll be doing after you graduate. Interested in learning more? Contact Us today.