# latest Blogs

Cleve Moler is the author of the first MATLAB, one of the founders of MathWorks, and is currently Chief Mathematician at the company. He writes here about MATLAB, scientific computing and interesting mathematics.
Updated: 1 day 2 hours ago

### My First Computer, the Burroughs 205 Datatron

Mon, 10/09/2017 - 22:30

The first computer that I seriously used was the Burroughs 205 Datatron. I actually used two different machines, one at Caltech for two years in 1959-61 and one at the University of Utah in the summer of 1960. Both the technology and the style of usage were wildly different from today's computers.

The blinking lights on the console made the B205 a featured player in many movies and TV shows during the late '50s and early '60s, including Batman and Lost in Space. Here is an animated gif on a reproduction of the console.

image credit: http://www.angelfire.com/scifi/B205

ContentsBurroughs Corporation

Wikipedia tells us that the American Arithmometer Company was founded in 1886 to manufacture and sell adding machines. The company name was changed to Burroughs Adding Machine in 1904 to honor the founder, William Seward Burroughs, who, coincidentally, was the grandfather of beat generation poet William S. Burroughs. At one time it was the largest adding machine in the country. The company was merged with Sperry Corporation to form Unisys in 1986.

In 1956 Burroughs got into the new computer business by acquiring Electro Data and began marketing their Datatron, which had been introduced two years earlier. Throughout the '50s, '60s, and '70s, Burroughs was a force in computing, particularly in the academic and banking markets. In the 1960's, IBM completely dominated the industry, but there were enough smaller companies that all together they were known as "IBM and the Seven Dwarfs."

Room Full

In 1959 the Burroughs 205 Datatron cost a few hundred thousand dollars and occupied an entire room. (Here is a 1957 price list). In the foreground of this photo are the console and an electric typewriter. In the background are a magnetic tape drive and several cabinets containing a total of about 1600 vacuum tubes.

I like to joke that this was a "personal computer". Only one person could use it at a time. We would sign up for machine time, usually in half-hour chunks, at all hours of the day or night.

Vacuum Tubes

This was before transistors and semiconductors. The primary logic unit was the electronic vacuum tube. Here is one plugin module with eight tubes. The computer worked in base 10 and this module held one decimal digit. Four of the tubes held one digit using the binary combinations 0 through 9. The binary values 10 through 15 were never used; if they showed up, it was an error condition and caused an immediate halt.

Drum

Before the magnetic core, a huge problem with early computers was memory. People knew how to produce the electronics for the arithmetic, logic and control units, but where do you store programs and data?

The 205 had drum memory. Think of the rotating drum in a small clothes dryer. Here is a photo of a machine that has not been used for years and that has not been stored carefully. The drum is at the lower right. Four registers, each containing ten of the single decimal digit modules, are at the upper left.

photo credit: http://www.angelfire.com/scifi/B205/ecd.html

The drum held 4000 ten-digit words. If you regard each digit as half a byte on a modern binary machine, that is 20K bytes. The words were stored in 20 bands of 200 words each. The drum's speed was 3570 rpm. So, on average, it would take about 1/30-th of a second to access the start of a band.

In addition to the primary drum storage, there were four "high speed" bands where a block of 20 words is duplicated ten times around the drum. This was sort of like cache memory on today's computers. One machine instruction would duplicate 20 words, usually a piece of a program, ten times around a high speed band where it could be accessed ten times as fast.

Caltech and Utah

I learned a little about the 205 during my sophomore year at Caltech in the spring of 1959, and more when I was a junior and took numerical analysis from Prof. John Todd in 1959-60. We used Caltech's 205 for some of the homework in that the class. Then I went back home to Salt Lake City for the summer of 1960. It turned out the University of Utah also had a 205 and they were happy to hire me for a summer job.

There were just four people working in Utah's computer center at the beginning of that summer -- a woman who served as director, me, and two part timers, a secretary and a grad student from EE who maintained the machine. Then, a few weeks into the summer, the director resigned. So I found myself -- two months before I turned 20 -- Acting Director of the University of Utah Computer Center.

I returned to Caltech in the fall, and had a chance to do individual study under Todd and more work on that 205 until I graduated in 1961. I did a project involving Hilbert matrices. It was my introduction to matrix computation.

Paper Tape

At first, at both Caltech and Utah, we did not have a compiler or an assembler. We wrote our programs in absolute numeric machine language and punched them on paper tape, like this.

The machine was designed so that op code 00 meant "read one word from paper tape into the current location and go to the next location". There was a button on the console that would erase the drum, set the location counter to 0000 and the op code to 00, and then start the tape reader. That served as the "operating system". It would read the entire contents of the paper tape onto the drum. When the tape ran off the end, you would turn off the reader, set the location counter back to 0000, and press start. That would begin executing the program you had just read from tape onto the drum.

Registers

There were five registers. The A register -- the accumulator -- is where arithmetic happens. It holds ten decimal digits, plus a sign bit. For example, one instruction is "Add the contents of memory location XXXX to the A register". The R register -- the remainder -- is an extension of the A register. Multiplication of two ten digit (fixed point) numbers produces a twenty digit result across A and R. Division produces a quotient in A and a remainder in R.

The C register -- the command register -- holds the instructions as they are being executed. The D register holds a single word as it is being transferred to or from the drum or external I/O equipment.

The B register -- the index register -- is only four digits wide. If the sign bit of an instruction is a 1, the contents of the B register is added to the address portion of the instruction, thereby providing an array index. The value of the B register is automatically increased or decreased by one after each indexing instruction.

Floating Point

A floating point arithmetic option was available for an additional \$12,500. (Floating point was also an option with the Intel chips used in the first PC's. The 8087 chip cost about \$150.) The floating point word has a two-digit exponent and an eight-digit fraction. So floating point numbers range from about 10^(-50) to 10^50 and have an accuracy of about 10^(-8).

Program

Here is an example program from the appendix of the machine manual. This writeup is much fancier than the programs we would actually write. It's on a coding sheet; we would just use lined yellow notebooks. We would never need to write out 6980, 6981, etc. And the Remarks are much more detailed. This example is intended to be carefully explanatory.

Skip on Minus

One of the programs that I wrote during the summer at the University of Utah involved a loop that was exactly 20 instructions long. That meant the entire loop could be loaded into the high speed memory. It didn't even need the jump instruction from the end of the loop back to the beginning. It would just continue execution from the duplicate of the code in the next block on the drum. The trouble was there was no way to exit the loop.

I needed a "Skip on Minus" instruction. There was a "Skip on Plus", but no "Skip on Minus". The EE grad student who maintained the machine was a guy named Joe. I wish I could remember his last name. Anyway, I told him I needed a "Skip on Minus" instruction. He said, "I'll one in". He disappeared behind the cabinets with circuit diagrams and a soldering iron. He called out, "the op code will be 51".

I wrote my program with a "51" instruction. It worked perfectly. And so the University of Utah had the only Burroughs 205 Datatron in the world with a "Skip on Minus" instruction.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

> % % image credit: http://www.angelfire.com/scifi/B205 %% Burroughs Corporation % Wikipedia tells us that the American Arithmometer Company was founded % in 1886 to manufacture and sell adding machines. The company name was % changed to Burroughs Adding Machine in 1904 to honor the founder, % William Seward Burroughs, who, coincidentally, was the grandfather of % beat generation poet William S. Burroughs. At one time it was the % largest adding machine in the country. The company was merged with % Sperry Corporation to form Unisys in 1986. %% % In 1956 Burroughs got into the new computer business by acquiring % Electro Data and began marketing their Datatron, which had been % introduced two years earlier. % Throughout the '50s, '60s, and '70s, Burroughs was a force in computing, % particularly in the academic and banking markets. In the 1960's, IBM % completely dominated the industry, but there were enough smaller % companies that all together they were known as "IBM and the Seven % Dwarfs." %% Room Full % In 1959 the Burroughs 205 Datatron cost a few hundred thousand dollars % and occupied an entire room. (Here is a 1957 % ). In the foreground % of this photo are the console and an electric typewriter. % In the background are a magnetic tape drive % and several cabinets containing a total of about 1600 vacuum tubes. % % <> %% % I like to joke that this was a "personal computer". Only one person % could use it at a time. We would sign up for machine time, usually % in half-hour chunks, at all hours of the day or night. %% Vacuum Tubes % This was before transistors and semiconductors. The primary logic unit % was the electronic vacuum tube. Here is one plugin module with eight % tubes. The computer worked in base 10 and this module held one decimal % digit. Four of the tubes held one digit % using the binary combinations 0 through 9. The binary values 10 through % 15 were never used; if they showed up, it was an error condition and % caused an immediate halt. % % <> % % photo credit: http://www.cs.kent.edu/~rothstei/10051/history/UVa-page_files/ComputerHistory1.html %% Drum % Before the magnetic core, a huge problem with early computers was % memory. People knew how to produce the electronics for the arithmetic, % logic and control units, but where do you store programs and data? %% % The 205 had drum memory. Think of the rotating drum in a small clothes % dryer. Here is a photo of a machine that has not been used for years % and that has not been stored carefully. The drum is at the lower right. % Four registers, each containing ten of the single decimal digit modules, % are at the upper left. % % <> % % photo credit: http://www.angelfire.com/scifi/B205/ecd.html % % The drum held 4000 ten-digit words. If you regard each digit as half % a byte on a modern binary machine, that is 20K bytes. The words were % stored in 20 bands of 200 words each. The drum's speed was 3570 rpm. % So, on average, it would take about 1/30-th of a second to access the % start of a band. %% % In addition to the primary drum storage, there were four "high speed" % bands where a block of 20 words is duplicated ten times around the drum. % This was sort of like cache memory on today's computers. One machine % instruction would duplicate 20 words, usually a piece of a program, % ten times around a high speed band where it could be accessed % ten times as fast. %% Caltech and Utah % I learned a little about the 205 during my sophomore year at Caltech % in the spring of 1959, and more when I was a junior and took numerical % analysis from Prof. John Todd in 1959-60. We used Caltech's 205 % for some of the homework in that the class. Then I went back home % to Salt Lake City for the summer of 1960. It turned out the % University of Utah also had a 205 and they were happy % to hire me for a summer job. %% % There were just four people working in Utah's computer center at the % beginning of that summer REPLACE_WITH_DASH_DASH a woman who served as director, me, and % two part timers, a secretary and a grad student from EE who maintained % the machine. Then, a few weeks into the summer, the director resigned. % So I found myself REPLACE_WITH_DASH_DASH two months before I turned 20 REPLACE_WITH_DASH_DASH Acting Director % of the University of Utah Computer Center. %% % I returned to Caltech in the fall, and had a chance to do individual % study under Todd and more work on that 205 until I graduated in 1961. % I did a project involving Hilbert matrices. It was my introduction % to matrix computation. %% Paper Tape % At first, at both Caltech and Utah, we did not have a compiler or an % assembler. We wrote our programs in absolute numeric machine language % and punched them on paper tape, like this. % % <> %% % The machine was designed so that op code 00 meant "read one word from % paper tape into the current location and go to the next location". % There was a button on the console that would erase the drum, % set the location % counter to 0000 and the op code to 00, and then start the tape reader. % That served as the "operating system". It would read the entire contents % of the paper tape onto the drum. % When the tape ran off the end, you would turn off the % reader, set the location counter back to 0000, and press start. That % would begin executing the program you had just read from tape onto % the drum. %% Registers % There were five registers. The A register REPLACE_WITH_DASH_DASH the accumulator REPLACE_WITH_DASH_DASH is % where arithmetic happens. It holds ten decimal digits, plus a sign bit. % For example, one instruction is "Add the contents of memory location % XXXX to the A register". The R register REPLACE_WITH_DASH_DASH the remainder REPLACE_WITH_DASH_DASH is an % extension of the A register. Multiplication of two ten digit % (fixed point) numbers produces a twenty digit result across A and R. % Division produces a quotient in A and a remainder in R. %% % The C register REPLACE_WITH_DASH_DASH the command register REPLACE_WITH_DASH_DASH holds the instructions as they % are being executed. The D register holds a single word as it is being % transferred to or from the drum or external I/O equipment. %% % The B register REPLACE_WITH_DASH_DASH the index register REPLACE_WITH_DASH_DASH is only four digits wide. % If the sign bit of an instruction is a 1, the contents of the B register % is added to the address portion of the instruction, thereby providing % an array index. The value of the B register is automatically increased % or decreased by one after each indexing instruction. %% Floating Point % A floating point arithmetic option was available for an additional % \$12,500. (Floating point was also an option with the Intel chips used % in the first PC's. The 8087 chip cost about \$150.) The floating point % word has a two-digit exponent and an eight-digit fraction. So floating % point numbers range from about 10^(-50) to 10^50 and have an accuracy % of about 10^(-8). %% Program % Here is an example program from the appendix of the % . This writeup is much fancier than the programs we would % actually write. It's on a coding sheet; we would just use lined yellow % notebooks. We would never need to write out 6980, 6981, etc. % And the Remarks are much more detailed. This example is intended to be % carefully explanatory. % % <> %% Skip on Minus % One of the programs that I wrote during the summer at the University of % Utah involved a loop that was exactly 20 instructions long. That meant % the entire loop could be loaded into the high speed memory. It didn't % even need the jump instruction from the end of the loop back to the % beginning. It would just continue execution from the duplicate of the % code in the next block on the drum. The trouble was there was no way % to exit the loop. %% % I needed a "Skip on Minus" instruction. There was a "Skip on Plus", % but no "Skip on Minus". The EE grad student who maintained the machine % was a guy named Joe. I wish I could remember his last name. Anyway, % I told him I needed a "Skip on Minus" instruction. He said, "I'll % one in". He disappeared behind the cabinets with circuit diagrams and % a soldering iron. He called out, "the op code will be 51". %% % I wrote my program with a "51" instruction. It worked perfectly. % And so the University of Utah had the only Burroughs 205 Datatron in the % world with a "Skip on Minus" instruction. ##### SOURCE END ##### 95f11082495b4f85b8b30cb5f23fe0b2 -->

### How Far Apart Are Two Random Points in a Hypercube?

Thu, 09/28/2017 - 04:29

Two days ago I wrote about random points in a square. At the last minute I added the paragraph asking about the generalization to random points in a cube. I have to admit that I didn't check the Web to see what was known about the question.

Shortly after my post appeared, Pieterjan Robbe of KU Leuven in Belgium submitted a comment pointing out that the problem has been studied extensively. He provided several pointers. The solution even has a name; it's called Robbins Constant. I have listed a few of the references below.

ContentsRobbins Constant

Robbins Constant is the value of this six-fold integral.

$$\delta = \int_0^1 \int_0^1 \int_0^1 \int_0^1 \int_0^1 \int_0^1 \! \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2} \ \mathrm{d}x_1 \mathrm{d}y_1 \mathrm{d}x_2 \mathrm{d}y_2 \mathrm{d}x_3 \mathrm{d}y_3$$

A change of variables makes it a triple integral.

$$\delta = 8 \int_0^1\int_0^1 \int_0^1 \! \sqrt{x^2 + y^2 + z^2} \ (1-x)(1-y)(1-z) \ \mathrm{d}x \mathrm{d}y \mathrm{d}z$$

Numerical Integration

Numerical integration with the default tolerances gets about nine decimal places.

format long F = @(x,y,z) sqrt(x.^2+y.^2+z.^2).*(1-x).*(1-y).*(1-z) delta = 8*integral3(F,0,1,0,1,0,1) F = function_handle with value: @(x,y,z)sqrt(x.^2+y.^2+z.^2).*(1-x).*(1-y).*(1-z) delta = 0.661707182095901 Analytic Solution

The exact analytic solution is impressive. I can't explain where it comes from.

$$\delta = \frac{1}{105}(4 + 17\sqrt{2} - 6\sqrt{3} + 21\log{(1+\sqrt{2})} + 84\log{(1+\sqrt{3})} - 42\log{(2)} - 7\pi)$$

Here is the numeric value.

delta = (1/105)*(4 + 17*sqrt(2) - 6*sqrt(3) + 21*log(1+sqrt(2)) + ... 84*log(1+sqrt(3)) - 42*log(2) - 7*pi) delta = 0.661707182267176 Hypercubes and Vectorize

I also received email from MathWorks colleague Matt Tearle informing me about vecnorm, a new function that is in MATLAB Release R2017b. Unlike the norm function that computes matrix norms, vecnorm treats the elements along a specified dimension as vectors and calculates the norm of each vector. Here is the documentation page.

I don't yet have R2017b on this computer, but Matt's email prompted me to think about vectorizing the simulation, even without vecnorm. While we're at it, let's generalize to d dimensional hypercubes. And, I'll make it a one-liner.

n = 10000000; % n samples. for d = 1:10 % d dimensions. delta = mean(sqrt(sum((rand(n,d)-rand(n,d)).^2,2))); fprintf('%6d %7.4f\n',d,delta) end 1 0.3335 2 0.5215 3 0.6617 4 0.7778 5 0.8786 6 0.9692 7 1.0517 8 1.1282 9 1.1997 10 1.2675 Thanks

Thanks to Pieterjan Robbe and Matt Tearle.

References

D. P. Robbins, Problem 2629, Average distance between two points in a box, Amer. Mathematical Monthly, 85.4, 1978, p. 278.

D. H. Bailey, J. M. Borwein and R. E. Crandall, Advances in the Theory of Box Integrals, <http://crd-legacy.lbl.gov/~dhbailey/dhbpapers/BoxII.pdf>.

Johan Philip, The Probability Distribution of the Distance Between Two Random Points in a Box, https://people.kth.se/~johanph/habc.pdf.

Eric W. Weisstein, Hypercube Line Picking. MathWorld, <http://mathworld.wolfram.com/HypercubeLinePicking.html>.

N. J. A. Sloane, editor, The On-Line Encyclopedia of Integer Sequences, Decimal Expansion of Robbins Constant, <http://oeis.org/A073012>.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

. At the last minute I added the paragraph % asking about the generalization to random points in a cube. % I have to admit that I didn't check the Web to see what was known % about the question. % % Shortly after my post appeared, Pieterjan Robbe of KU Leuven in Belgium % submitted a comment pointing out that the problem has been studied % extensively. He provided several pointers. The solution even has % a name; it's called Robbins Constant. I have listed a few of the % references below. %% Robbins Constant % Robbins Constant is the value of this six-fold integral. % % $$\delta = \int_0^1 \int_0^1 \int_0^1 \int_0^1 \int_0^1 \int_0^1 \! \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2} \ \mathrm{d}x_1 \mathrm{d}y_1 \mathrm{d}x_2 \mathrm{d}y_2 \mathrm{d}x_3 \mathrm{d}y_3$$ % % A change of variables makes it a triple integral. % % $$\delta = 8 \int_0^1\int_0^1 \int_0^1 \! \sqrt{x^2 + y^2 + z^2} \ (1-x)(1-y)(1-z) \ \mathrm{d}x \mathrm{d}y \mathrm{d}z$$ %% Numerical Integration % Numerical integration with the default tolerances gets about nine % decimal places. format long F = @(x,y,z) sqrt(x.^2+y.^2+z.^2).*(1-x).*(1-y).*(1-z) delta = 8*integral3(F,0,1,0,1,0,1) %% Analytic Solution % The exact analytic solution is impressive. I can't explain where it % comes from. % % $$\delta = \frac{1}{105}(4 + 17\sqrt{2} - 6\sqrt{3} + 21\log{(1+\sqrt{2})} + % 84\log{(1+\sqrt{3})} - 42\log{(2)} - 7\pi)$$ %% % Here is the numeric value. delta = (1/105)*(4 + 17*sqrt(2) - 6*sqrt(3) + 21*log(1+sqrt(2)) + ... 84*log(1+sqrt(3)) - 42*log(2) - 7*pi) %% Hypercubes and Vectorize % I also received email from MathWorks colleague Matt Tearle informing % me about |vecnorm|, a new function that is in MATLAB Release R2017b. % Unlike the |norm| function that computes matrix norms, |vecnorm| % treats the elements along a specified dimension as vectors and % calculates the norm of each vector. % % the documentation page. %% % I don't yet have R2017b on this computer, but Matt's email prompted % me to think about vectorizing the simulation, even without |vecnorm|. % While we're at it, let's generalize to |d| dimensional hypercubes. % And, I'll make it a one-liner. n = 10000000; % n samples. for d = 1:10 % d dimensions. delta = mean(sqrt(sum((rand(n,d)-rand(n,d)).^2,2))); fprintf('%6d %7.4f\n',d,delta) end %% Thanks % Thanks to Pieterjan Robbe and Matt Tearle. %% References % D. P. Robbins, Problem 2629, Average distance between two points in a box, % Amer. Mathematical Monthly, 85.4, 1978, p. 278. % % D. H. Bailey, J. M. Borwein and R. E. Crandall, % Advances in the Theory of Box Integrals, % . % % Johan Philip, The Probability Distribution of the Distance Between % Two Random Points in a Box, % . % % Eric W. Weisstein, Hypercube Line Picking. MathWorld, % . % % N. J. A. Sloane, editor, The On-Line Encyclopedia of Integer Sequences, % Decimal Expansion of Robbins Constant, % . ##### SOURCE END ##### c1cdb632a661445c81ccd107144f0003 -->

### How Far Apart Are Two Random Points in a Square?

Tue, 09/26/2017 - 01:02

How far apart can you expect two points chosen at random in the unit square to be? I found this problem on the YouTube channel maintained by Presh Talwalkar, Mind Your Decisions. He correctly calls it a very hard puzzle. At first, I guessed the answer might be $1/2$. But the correct answer is more interesting than that.

ContentsSimulation

Let's do a Monte Carlo simulation to get a numerical estimate. Sampling one million pairs of points doesn't take much time.

n = 1000000; sum = 0; rng(0) for k = 1:n x = rand(1,2); y = rand(1,2); delta = norm(x-y); sum = sum + delta; end format short delta = sum/n delta = 0.5214

It turns out that this run of the simulation generates a result that is accurate to the four digit precision of format short. But can we find the exact value?

The expected distance, $\delta$, can be expressed as this quadruple integral, but the Symbolic Toolbox cannot find a closed form.

$$\delta = \int_0^1 \int_0^1 \int_0^1 \int_0^1 \! \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2} \ \mathrm{d}x_1 \mathrm{d}y_1 \mathrm{d}x_2 \mathrm{d}y_2$$

Double Integral

Make the substitutions $x = x_1 - y_1$ and $y = x_2 - y_2$ and consider the integral over the four regions where these variables are positive or negative. The four integrals are equal to each other and we obtain this double integral.

$$\delta = 4 \int_0^1 \int_0^1 \! \sqrt{x^2 + y^2} \ (1-x)(1-y) \ \mathrm{d}x \mathrm{d}y$$

Numerical Integration

Let's tackle this double integral numerically.

F = @(x,y) 4*sqrt(x.^2+y.^2).*(1-x).*(1-y); delta = integral2(F,0,1,0,1) delta = 0.5214 Polar Coordinates

Switch to polar coordinates, $r$ and $\theta$. The $\sqrt{x^2+y^2}$ term is simply $r$ and the double integral has two equal halves about the $45^o$ line, $\theta = \pi/4$.

$$\delta/8 = \int_0^{\pi/4} \int_0^{\sec{\theta}} r (1-r\cos{\theta}) (1-r\sin{\theta}) \ r \mathrm{d}r \mathrm{d}\theta$$

Symbolic Integration

The integrand is a polynomial in $r$.

syms r theta real F = expand(r^2*(1-r*cos(theta))*(1-r*sin(theta))) F = r^2 - r^3*sin(theta) - r^3*cos(theta) + r^4*cos(theta)*sin(theta)

The Toolbox can integrate this polynomial easily.

inner = int(F,r,0,sec(theta)) inner = 1/(12*cos(theta)^3) - sin(theta)/(20*cos(theta)^4)

The Toolbox can also do the outer integral over $\theta$.

outer = int(inner,theta,0,pi/4) outer = log(2^(1/2) + 1)/24 + 2^(1/2)/120 + 1/60

Multiply by 8.

delta = 8*outer delta = log(2^(1/2) + 1)/3 + 2^(1/2)/15 + 2/15

Generate a latex representation for $\delta$ to cut and paste later into this post.

latex(delta); Numerical value format long delta = double(delta) delta = 0.521405433164721 This Is My Final Answer

Here is the result.

$$\delta = \frac{\ln\left(\sqrt{2}+1\right)}{3}+\frac{\sqrt{2}}{15}+\frac{2}{15}$$

Three Dimensions

What about three dimensions? How far apart can you expect two points chosen at random in the unit cube to be? I'll leave that as a challenge and invite anyone who thinks they know the answer to post a comment.

Thanks

Thanks to Presh Talwalkar for this little nugget.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

. % He correctly calls it . % At first, I guessed the answer might be $1/2$. But the correct % answer is more interesting than that. % % <> %% Simulation % Let's do a Monte Carlo simulation to get a numerical estimate. % Sampling one million pairs of points doesn't take much time. n = 1000000; sum = 0; rng(0) for k = 1:n x = rand(1,2); y = rand(1,2); delta = norm(x-y); sum = sum + delta; end format short delta = sum/n %% % It turns out that this run of the simulation generates a result % that is accurate to the four digit precision of |format short|. % But can we find the exact value? %% Quadruple Integral % The expected distance, $\delta$, can be expressed as this quadruple % integral, but the Symbolic Toolbox cannot find a closed form. % % $$\delta = \int_0^1 \int_0^1 \int_0^1 \int_0^1 \! \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2} \ \mathrm{d}x_1 \mathrm{d}y_1 \mathrm{d}x_2 \mathrm{d}y_2$$ %% Double Integral % Make the substitutions $x = x_1 - y_1$ and $y = x_2 - y_2$ and consider % the integral over the four regions where these variables are positive % or negative. The four integrals are equal to each other and we obtain % this double integral. % % $$\delta = 4 \int_0^1 \int_0^1 \! \sqrt{x^2 + y^2} \ (1-x)(1-y) \ \mathrm{d}x \mathrm{d}y$$ %% Numerical Integration % Let's tackle this double integral numerically. F = @(x,y) 4*sqrt(x.^2+y.^2).*(1-x).*(1-y); delta = integral2(F,0,1,0,1) %% Polar Coordinates % Switch to polar coordinates, $r$ and $\theta$. The $\sqrt{x^2+y^2}$ % term is simply $r$ and the double integral has two equal halves about % the $45^o$ line, $\theta = \pi/4$. % % $$\delta/8 = \int_0^{\pi/4} \int_0^{\sec{\theta}} r (1-r\cos{\theta}) (1-r\sin{\theta}) \ r \mathrm{d}r \mathrm{d}\theta$$ %% Symbolic Integration % The integrand is a polynomial in $r$. syms r theta real F = expand(r^2*(1-r*cos(theta))*(1-r*sin(theta))) %% % The Toolbox can integrate this polynomial easily. inner = int(F,r,0,sec(theta)) %% % The Toolbox can also do the outer integral over $\theta$. outer = int(inner,theta,0,pi/4) %% % Multiply by 8. delta = 8*outer %% % Generate a |latex| representation for $\delta$ to cut and paste later % into this post. latex(delta); %% Numerical value format long delta = double(delta) %% This Is My Final Answer % Here is the result. % % $$\delta = % \frac{\ln\left(\sqrt{2}+1\right)}{3}+\frac{\sqrt{2}}{15}+\frac{2}{15}$$ %% Three Dimensions % What about three dimensions? % How far apart can you expect two points chosen at random in the unit % cube to be? % I'll leave that as a challenge and invite anyone who thinks they know % the answer to post a comment. %% Thanks % Thanks to Presh Talwalkar for this little nugget. ##### SOURCE END ##### 4605bcb308bf4a25bc7dcabdf3c3a242 -->

### A Logo Zoetrope

Thu, 09/14/2017 - 02:47
A Logo Zoetrope

A quick post to point my readers to Ned Gulley's post in MATLAB Community, A Galloping Logo Zoetrope. Don't know what a Zoetrope is? Ned explains.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

. Don't know what a _Zoetrope_ is? % Ned explains. ##### SOURCE END ##### 8b8b9a9f6ea84a5f99ae99beb8679770 -->

Mon, 09/11/2017 - 22:30

Do not get involved in a bar bet with Brad Efron and his dice until you have read this.

Bradley Efron is the Max H. Stein Professor of Humanities and Sciences, Professor of Statistics, and Professor of Biostatistics at Stanford. He is best known for his invention of the bootstrap resampling technique, which is a fundamental tool in computational statistics. In 2005, he received the National Medal of Science. But I think of him for something much less esoteric -- his nontransitive dice. And I also remember him as one of my roommates when we were both students at Caltech.

If you come across Brad in a bar some night, he might offer you the following wager. He would show you four individual dice, like these.

Photo credit: http://www.grand-illusions.com

You would get to pick one of the dice. He would then pick one for himself. You would then roll the dice a number of times, with the player rolling the highest number each time winning a dollar. (Rolling the same number is a push. No money changes hands.) Do you want to play? I'll reveal the zinger -- no matter which of the four dice you choose, Brad can pick one that will win money from you in the long run.

Transitivity

A binary mathematical relationship, $\rho$, is said to be transitive if $x \ \rho \ y$ and $y \ \rho \ z$ implies $x \ \rho \ z$. Equality is the obvious example. If $x = y$ and $y = z$ then $x = z$. Other examples include greater than, divides, and subset.

On the other hand, a relationship is nontransitive if the implication does not necessarily hold. Despite their efforts to make it transitive, Facebook friend is still nontransitive.

Ordinarily, the term dominates, or beats is transitive. And, ordinarily, you might think this applies to dice games. But, no, such thinking could cost you money. Brad's dice are nontransitive.

Here are the six values on the faces of Brad's four dice.

A = [4 4 4 4 0 0]; B = [3 3 3 3 3 3]; C = [6 6 2 2 2 2]; D = [5 5 5 1 1 1];

How do these compare when paired up against each other? It is obvious that A wins money from B because B rolls a 3 all of the time, while A rolls a winning 4 two-thirds of the time. The odds are 2:1 in A's favor. Similarly B's constant 3's dominate C's 2's two-thirds of the time. The odds are 2:1 in B's favor.

Comparing C with D is more complicated because neither rolls a constant value.. We have to compare each possible roll of one die with all the possible rolls of the other. That's 36 comparisons. The comparison is facilitated by constructing the following matrix. The comparison of a row vector with a column vector is allowed under MATLAB's new rules of singleton expansion.

Z = (C > D') - (C < D') Z = 1 1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The result is a matrix Z with +1 where C beats D and a -1 where D beats C. There is a submatrix of -1s in the upper right, but no solid column or row. All that remains is to count the number of +1's and -1's to get the odds.

odds = [nnz(Z == +1) nnz(Z == -1)] odds = 24 12

There are twice as many +1s as -1s, so, again, the odds are 2:1 that C beats D.

But now how does D fair against A?

Z = (D > A') - (D < A') odds = [nnz(Z == +1) nnz(Z == -1)] Z = 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 odds = 24 12

Again, there are twice as many +1s as -1s. The odds are 2:1 in D's favor.

To summarize, A beats B, B beats C, C beats D, and D beats A. No matter which die you choose, Brad can pick one that will beat it -- and by a substantial margin. It's like "Rock, Paper, Scissors" with a fourth option, but you have to make the first move.

Compare

Let's capture that comparison in a function. The function can print and label the matrix Z, as well as provide for pushes, which we'll need later.

type compare function odds = compare(X,Y) Z = (X > Y') - (X < Y'); odds = [nnz(Z==1) nnz(Z==0) nnz(Z==-1)]; fprintf('\n X %3d %5d %5d %5d %5d %5d\n',X) fprintf(' Y\n') fprintf(' %5d %5d %5d %5d %5d %5d %5d\n',[Y' Z]') end

Use the function to compare A with C.

AvC = compare(A,C) X 4 4 4 4 0 0 Y 6 -1 -1 -1 -1 -1 -1 6 -1 -1 -1 -1 -1 -1 2 1 1 1 1 -1 -1 2 1 1 1 1 -1 -1 2 1 1 1 1 -1 -1 2 1 1 1 1 -1 -1 AvC = 16 0 20

Dividing out a common factor, that is odds of 4:5 against A.

The final possible comparison is B versus D. B's unimaginative play results in complete rows of +1 or -1.

BvD = compare(B,D) X 3 3 3 3 3 3 Y 5 -1 -1 -1 -1 -1 -1 5 -1 -1 -1 -1 -1 -1 5 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 BvD = 18 0 18

So B and D are an even match. Here is the complete summary of all possible pairings among Efron's Dice,

A B C D A 2:1 4:5 1:2 B 1:2 2:1 1:1 C 5:4 1:2 2:1 D 2:1 1:1 1:2Simulate

In case we don't believe these comparisons, or because we'd like to see the shape of the distribution, or just because it's fun, let's do some Monte Carlo simulations. Because 108 is divisible by 36, I'll make 108 rolls one game and simulate 10000 games. Here is the code.

type simulate function simulate(X,Y) m = 108; % Number of rolls in a single game. n = 10000; % Number of games in a simulation. s = zeros(1,n); for k = 1:n for l = 1:m i = randi(6); j = randi(6); if X(i) > Y(j) s(k) = s(k) + 1; elseif X(i) < Y(j) s(k) = s(k) - 1; % else X(i) == Y(j) % push counts as a roll end end end clf shg histogram(s) line([0 0],get(gca,'ylim'),'color','k','linewidth',2) title(sprintf('mean = %6.2f',mean(s))) end

I'll initialize the random number generator so that I get the same results every time I publish this post.

rng(1)

Now verify that A beats B.

simulate(A,B);

Sure enough, the 2:1 odds imply that A can be expected to be ahead by 108/3 dollars after 108 flips. There is a vertical black line at zero to emphasize that this isn't a fair game.

How about those 4:5 odds for A versus C? A can be expected to lose 108/9 dollars with 108 flips.

simulate(A,C) Three dice nontransitive set

Efron's dice were introduced in a Scientific American article by Martin Gardner in 1970. Since then a number of people have found other sets of nontransitive dice with interesting properties. The Wikipedia article on nontransitive dice has an abundance of information, including the following set.

A = [1 1 3 5 5 6]; B = [2 3 3 4 4 5]; C = [1 2 2 4 6 6];

This set is very close to a conventional set. The sum of the numbers on each die is 21, the same as a conventional die. And each die differs from a standard 1:6 die in only two places. Nevertheless, the set is nontransitive.

Since this is so close to a conventional set, we expect the odds to be closer to even. Let's see.

AvB = compare(A,B) X 1 1 3 5 5 6 Y 2 -1 -1 1 1 1 1 3 -1 -1 0 1 1 1 3 -1 -1 0 1 1 1 4 -1 -1 -1 1 1 1 4 -1 -1 -1 1 1 1 5 -1 -1 -1 0 0 1 AvB = 17 4 15

Since the dice have matching values, we have some zeros in the matrix. A has only 17 winning values compared to B's 15.

simulate(A,B)

Here is the result all three pairwise comparisons.

A B C A 17:15 15:17 B 15:17 17:15 C 17:15 15:17 \n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

> % % Photo credit: http://www.grand-illusions.com % % You would get to pick one of the dice. He would then pick one for % himself. You would then roll the dice a number of times, with the % player rolling the highest number each time winning a dollar. % (Rolling the same number is a _push_. No money changes hands.) % Do you want to play? I'll reveal the zinger REPLACE_WITH_DASH_DASH no matter which of % the four dice you choose, Brad can pick one that will win money from % you in the long run. %% Transitivity % A binary mathematical relationship, $\rho$, is said to be _transitive_ % if $x \ \rho \ y$ and $y \ \rho \ z$ implies $x \ \rho \ z$. % _Equality_ is the obvious example. If $x = y$ and $y = z$ then $x = z$. % Other examples include _greater than_, _divides_, and _subset_. %% % On the other hand, a relationship is _nontransitive_ if the % implication does not necessarily hold. Despite their efforts to % make it transitive, _Facebook friend_ is still nontransitive. %% % Ordinarily, the term _dominates_, or _beats_ is % transitive. And, ordinarily, you might think this applies to % dice games. But, no, such thinking could cost you money. % Brad's dice are _nontransitive_. %% Brad's dice % Here are the six values on the faces of Brad's four dice. A = [4 4 4 4 0 0]; B = [3 3 3 3 3 3]; C = [6 6 2 2 2 2]; D = [5 5 5 1 1 1]; %% % How do these compare when paired up against each other? % It is obvious that |A| wins money from |B| because |B| rolls a % 3 all of the time, while |A| rolls a winning 4 two-thirds of the % time. The odds are 2:1 in |A|'s favor. Similarly |B|'s constant % 3's dominate |C|'s 2's two-thirds of the time. The odds are 2:1 % in |B|'s favor. %% % Comparing |C| with |D| is more complicated because neither rolls % a constant value.. We have to compare each possible % roll of one die with all the possible rolls of the other. That's % 36 comparisons. The comparison is facilitated by constructing % the following matrix. % The comparison of a row vector with a column vector is allowed under % MATLAB's new rules of _singleton expansion_. Z = (C > D') - (C < D') %% % The result is a matrix |Z| % with +1 where |C| beats |D| and a -1 where |D| beats |C|. There is a % submatrix of -1s in the upper right, but no solid column or row. % All that remains is to count the number of +1's and -1's to get the odds. odds = [nnz(Z == +1) nnz(Z == -1)] %% % There are twice as many +1s as -1s, so, again, the odds are 2:1 that % |C| beats |D|. %% % But now how does |D| fair against |A|? Z = (D > A') - (D < A') odds = [nnz(Z == +1) nnz(Z == -1)] %% % Again, there are twice as many +1s as -1s. The odds are 2:1 in |D|'s % favor. %% % To summarize, |A| beats |B|, |B| beats |C|, |C| beats |D|, and % |D| beats |A|. No matter which die you choose, Brad can pick % one that will beat it REPLACE_WITH_DASH_DASH and by a substantial margin. It's % like "Rock, Paper, Scissors" with a fourth option, but you have % to make the first move. %% Compare % Let's capture that comparison in a function. % The function can print and label the matrix |Z|, as well as % provide for pushes, which we'll need later. type compare %% % Use the function to compare |A| with |C|. AvC = compare(A,C) %% % Dividing out a common factor, that is odds of 4:5 against |A|. %% % The final possible comparison is |B| versus |D|. |B|'s unimaginative % play results in complete rows of +1 or -1. BvD = compare(B,D) %% % So |B| and |D| are an even match. Here is the complete summary % of all possible pairings among Efron's Dice, %% % A B C D % A 2:1 4:5 1:2 % B 1:2 2:1 1:1 % C 5:4 1:2 2:1 % D 2:1 1:1 1:2 %% Simulate % In case we don't believe these comparisons, or because we'd like to see % the shape of the distribution, or just because it's fun, % let's do some Monte Carlo simulations. Because 108 is divisible % by 36, I'll make 108 rolls one game and simulate 10000 games. % Here is the code. type simulate %% % I'll initialize the random number generator so that I get the same % results every time I |publish| this post. rng(1) %% % Now verify that |A| beats |B|. simulate(A,B); %% % Sure enough, the 2:1 odds imply that |A| can be expected to be % ahead by 108/3 dollars after 108 flips. There is a vertical % black line at zero to emphasize that this isn't a fair game. %% % How about those 4:5 odds for |A| versus |C|? |A| can be expected % to lose 108/9 dollars with 108 flips. simulate(A,C) %% Three dice nontransitive set % Efron's dice were introduced in a _Scientific American_ article % by Martin Gardner in 1970. Since then a number of people have % found other sets of nontransitive dice with interesting properties. % The has an abundance of information, % including the following set. A = [1 1 3 5 5 6]; B = [2 3 3 4 4 5]; C = [1 2 2 4 6 6]; %% % This set is very close to a conventional set. The sum of the % numbers on each die is 21, the same as a conventional die. % And each die differs from a standard |1:6| die in only two % places. Nevertheless, the set is nontransitive. %% % Since this is so close to a conventional set, we expect the % odds to be closer to even. Let's see. AvB = compare(A,B) %% % Since the dice have matching values, we have some zeros in % the matrix. |A| has only 17 winning values compared to |B|'s 15. simulate(A,B) %% % Here is the result all three pairwise comparisons. % % A B C % A 17:15 15:17 % B 15:17 17:15 % C 17:15 15:17 ##### SOURCE END ##### fbd59e85e13747dcaaa91b54ebefd563 -->

### C^5, Cleve’s Corner Collection Card Catalog

Mon, 08/28/2017 - 22:30

I have been writing books, programs, newsletter columns and blogs since 1990. I have now collected all of this material into one repository. Cleve's Corner Collection consists of 458 "documents", all available on the internet. There are

• 150 posts from Cleve's Corner blog.
• 43 columns from Cleve's Corner News and Notes edition.
• 33 chapters from two books, Experiments with MATLAB and Numerical Computing with MATLAB.
• 218 programs from Cleve's Laboratory, EXM and NCM.
• 14 video transcripts from MIT Open Courseware and elsewhere.

C^5 is an app, a search tool that acts like a traditional library card catalog. It allows you to do keyword based searches through the collection and follow links to the material on the net. Responses to queries are ordered by the scores generated by Latent Semantic Indexing, LSI, which employs the singular value decomposition of a term-document matrix of key word counts.

ContentsOpening figure

Here is the opening window for C^5.

c5

Enter a query, usually just a single key word, in the edit box at the top. This is a term. The names of the various documents that are relevant to the term are then displayed, one at a time, in the document box.

The arrow keys allow the document list to be scanned and changed. The LSI score determines the ordering of the list. The term count is the number of times, if any, that the query term appears in the document. The web button accesses a copy of the document on the internet.

Don Knuth

For my first example, let's search for material I have written that mentions Stanford Emeritus Professor of Computer Science, Donald Knuth. Enter "knuth" in the query box or on the command line.

c5 knuth

The first document name "blog/c5_blog.m" refers to this blog post, so there is a bit of self reference happening here. The document suffix is .m because the source texts for my blog are MATLAB programs processed by the publish command.

The term count "10, 10/29" indicates that "knuth" appears 10 times in this document and that so far we have seen 10 out of the 29 times that "knuth" appears in the entire collection.

Click on the log button and then click a dozen or so times on the right arrow. This log of the successive displays is printed in the command window. Document names, dates, and term counts are displayed in decreasing order of LSI score.

% knuth % arrow document term counts lsi date % blog/c5_blog.m 10 10/29 0.594 28-Aug-2017 % > blog/easter_blog.m 5 15/29 0.553 18-Mar-2013 % > blog/lambertw_blog.m 3 18/29 0.183 02-Sep-2013 % > news/stiff_equations.txt 2 20/29 0.182 01-May-2003 % > blog/hilbert_blog.m 2 22/29 0.139 02-Feb-2013 % > blog/random_blog.m 2 24/29 0.139 17-Apr-2015 % > exmm/easter.m 1 25/29 0.112 2016 % > blog/gef.m 3 28/29 0.100 07-Jan-2013 % > ncmm/ode23tx.m 0 28/29 0.086 2016 % > news/normal_behavior.txt 0 28/29 0.070 01-May-2001 % > blog/magic_2.m 0 28/29 0.059 05-Nov-2012 % .......... % >> blog/denorms.m 1 29/29 0.010 21-Jul-2014 .........

The second most relevant document, "easter_blog.m", is a post from 2013 that describes an algorithm, popularized by Knuth, for computing the date each year in the Western, or Gregorian Calendar that Easter Sunday is celebrated. The term count is "5, 15/29", so the first two documents account for slightly over half of the total appearances of the search term.

The next six lines tell us that "knuth" appears in blog posts about the Lambert W function, Hilbert matrices, random numbers, and George Forsythe (gef), as well as a MATLAB News and Notes column in 2003 about stiff differential equations, and the actual MATLAB program from EXM for computing the date of Easter.

The following results with term counts of zero are blog posts that do not contain "knuth", but which have LSI scores indicating they might be relevant. Finally, the blog post named "denorms" is about denormal floating point numbers. It is reached by right-clicking on the right arrow to skip over documents with term counts of zero.

c5setup

I don't know how to parse .html or .pdf files, so I have collected the original source material for everything that I have written that is now available on the web. There are .m files for the blog and MATLAB programs, .tex files for the LaTeX of the book chapters, and .txt files for the newsletter columns and transcripts of the videos. There are 458 files totaling about 3.24 megabytes of text.

I have a program, c5setup, that I run on my own laptop to extract all the individual words and produce the term-document matrix. This is a sparse matrix whose (k,j) -th entry is the number of times that the k -th term appears in the j -th document. It is saved in c5database.mat for use by the c^5 app.

This setup processing eliminates frequently occurring English language words, like "the", on a list of stopwords.

length(stopwords) ans = 177 c5database clear load c5database whos Name Size Bytes Class Attributes A 16315x458 1552776 double sparse D 458x1 40628 string L 1x1 120156 struct T 16315x1 1132930 string
• A is the term-document matrix.
• D is a string array of the file names in my personal repository of the source documents.
• L is a struct containing string arrays used to generate URLs of the documents on the web.
• T is a string array of key words or terms.
Sparsity

The sparsity of the term-document matrix is a little over one percent.

sparsity = nnz(A)/numel(A) sparsity = 0.0130 Spy

Spy plot of the first 1000 rows of the term-document matrix.

clf spy(A(1:1000,:)) Most frequent terms

The row sums are the total term counts.

ttc = sum(A,2);

Find the terms that occur at least 1000 times.

k = find(ttc >= 1000); fprintf('%-10s %6s\n',[T(k) num2str(ttc(k))]') function 1806 matlab 1407 matrix 1499 one 1262 two 1090

Surprise. I write a lot about MATLAB and matrices.

Singular values

We might as well compute all the singular values of the full matrix. It takes less than a second. It's important to use the economical version of the SVD that produces a U the same size as A. Otherwise we'd have a 16,315-by-16,315 U.

tic [U,S,V] = svd(full(A),'econ'); toc Elapsed time is 0.882556 seconds.

A logarithmic plot of the singular values shows that they do not decrease very rapidly.

clf semilogy(diag(S),'.','markersize',10) axis([-10 450 1 1000]) title('singular values') Reduced rank approximation

I wrote a post about Latent Semantic Indexing a month ago. LSI employs a reduced rank approximation to the term-document matrix. c^5 has a slider for choosing the rank. The plot of the singular values shows that the accuracy of the approximation is pretty much independent of the chosen value. Any value except very small values or large values near full rank gives an approximation good to between one and ten percent. The power of LSI does not derive from the approximation accuracy. I usually take the rank to be about half the number of columns.

n = size(A,2); k = n/2; Uk = U(:,1:k); Sk = S(1:k,1:k); Vk = V(:,1:k); relerr = norm(Uk*Sk*Vk'-A)/S(1,1) relerr = 0.0357 Arrow keys

The three arrow keys in the c^5 app can be clicked with either the left or right mouse button (or control-click on a one-button mouse).

• left >: next document, any term count.
• right >: next document with nonzero term count.
• left <: previous document, any term count.
• right <: previous document with nonzero term count.
• left ^: use the root of the current document for the query.
• right ^: use a random term for the query.

Repeatedly clicking the up arrow with the right button (an alt click) is a good way to browse the entire collection.

Lothar Collatz

Let's see the logs for two more examples. Lothar Collatz has a short log.

c5 Collatz % collatz % % arrow document term counts lsi date % blog/threenplus1_blog.m 9 9/19 0.904 19-Jan-2015 % >> blog/collatz_inequality.m 4 13/19 0.108 16-Mar-2015 % >> blog/c5_blog.m 5 18/19 0.075 28-Aug-2017 % >> ncm/intro.tex 1 19/19 -0.003 2004

Collatz appears in two posts from 2015, one on his 3n+1 problem and one on an elegant inequality that produces a surprising graphic, and in the section of this blog post about c^5 that you are now reading, He is also mentioned in the introduction to the NCM book, but the LSI value of very small. The double arrow at the beginning of each line signifies a right click, skipping over documents that do not mention him.

Blackjack

I have written a lot about the card game Blackjack.

c5 blackjack % blackjack % arrow document term counts lsi date % news/simulating_blackjack.txt 19 19/68 0.536 01-Oct-2012 % >> ncmm/ncmgui.m 4 23/68 0.372 2016 % >> blog/random_blog2.m 4 27/68 0.266 04-May-2015 % >> ncmm/Contents.m 2 29/68 0.244 2016 % >> blog/c5_blog.m 5 34/68 0.206 28-Aug-2017 % >> ncm/random.tex 13 47/68 0.148 2004 % >> lab/thumbnails2.m 2 49/68 0.088 2017 % >> lab/lab2.m 1 50/68 0.061 2017 % >> news/numerical_computing.txt 1 51/68 0.025 01-Jun-2004 % >> blog/lab_blog.m 1 52/68 0.004 31-Oct-2016 % >> ncmm/blackjack.m 8 60/68 -0.023 2016 % >> lab/blackjack.m 8 68/68 -0.026 2017

We can see two newsletter columns, three blogs, a portion of a book chapter, several code segments, and two copies of the blackjack app. Again, I am using right clicks.

Levenshtein distance

I recently wrote a blog post about Levenshtein Edit Distance Between Strings. If c^5 does not recognize the key word in a query, it uses Levenshtein distance to find the closest match in the term list to the unrecognized query. This easily corrects simple spelling mistakes, like missing letters. For example the missing "i" in "polynomal" is corrected to become "polynomial". And "Levenstein" becomes "levenshtein".

I received a pleasant surprise when I entered "Molar", expecting it to become "moler". Instead, I got "polar" because only one substitution is required to convert "Molar" to "polar", but two substitutions are required to turn "Molar" into "moler". (Microsoft Word spelling correction used to turn "MATLAB" into "Meatball".)

Multi-word queries

I'm not quite sure what to do with queries consisting of more than one term. What is the expected response to a query of "Wilkinson polynomial", for example? Is it documents that contain either "Wilkinson" or "polynomial"? This is what LSI would provide. But it is probably better to look for documents that contain both "Wilkinson" and "polynomial". I'm not sure how to do this.

Worse yet, I can't look for an exact match to the two-word string "Wilkinson polynomial" because the first thing the setup program does is to break text into individual words.

Stemming

This project is not finished. If I work on it any more, I am going to have learn about scraping, stemming and lemmatization of the source texts. This involves relatively simple tasks like removing possessives and plurals and more complicated tasks like combining all the words with the same root or lemma. The sentence

"the quick brown fox jumped over the lazy dog's back"

becomes

"the quick brown fox jump over the lazi dog' back"

Loren's guest blogger Toshi Takeuchi posted an article in 2015 about Latent Semantic Analysis with MATLAB. He references MATLAB code for stemming.

Parsing queries

I can imagine doing a better job of parsing queries, although I could never approach the sophistication of a system like Google or Siri.

Limitations

A significant fraction of what I have written is not prose -- it is mathematics or code. It cannot be parsed with the techniques of text analytics. For example, the source texts for the books NCM and EXM have hundreds of snippets of LaTeX like

\begin{eqnarray*} A V \eqs U \Sigma , \\ A^H U \eqs V \Sigma^H . \end{eqnarray*}

And earlier in this blog post I had

tic [U,S,V] = svd(full(A),'econ'); toc

My c5setup program now has to skip over everything like this. In doing so, it misses much the message.

Software

I had updated Cleve's Laboratory in the Central File Exchange to include c5.m and c5database.mat.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

### Levenshtein Edit Distance Between Strings

Mon, 08/14/2017 - 22:30

How can you measure the distance between two words? How can you find the closest match to a word in a list of words? The Levenshtein distance between two strings is the number of single character deletions, insertions, or substitutions required to transform one string into the other. This is also known as the edit distance.

Vladimir Levenshtein is a Russian mathematician who published this notion in 1966. I am using his distance measure in a project that I will describe in a future post. Other applications include optical character recognition (OCR), spell checking, and DNA sequencing.

Contents"Hi ho, hi ho"

For my examples, I will use these seven famous names.

T = {"Doc","Grumpy","Happy","Sleepy","Bashful","Sneezy","Dopey"} T = 1×7 cell array Columns 1 through 5 ["Doc"] ["Grumpy"] ["Happy"] ["Sleepy"] ["Bashful"] Columns 6 through 7 ["Sneezy"] ["Dopey"]

The strings "Sleepy" and "Sneezy" are close to each other because they are the same length, and we can transform "Sleepy" to "Sneezy" by two edit operations, change 'l' to 'n' and 'p' to 'z'. The Levenshtein distance between these two words is 2.

On the other hand, "Bashful" is not close to his friends. His name is longer and the only letter he shares with another is an 'a' with "Happy". So his distance to "Happy" is 6, while the distance to any of the others is 7, the length of his name.

Here is the matrix of pair-wise distances

for i = 1:7 for j = 1:7 D(i,j) = lev(T{i},T{j}); end end D D = 0 6 5 6 7 6 3 6 0 4 4 7 5 5 5 4 0 4 6 5 3 6 4 4 0 7 2 4 7 7 6 7 0 7 7 6 5 5 2 7 0 4 3 5 3 4 7 4 0

Bashful's column is all 7's, except for a 6 in Happy's row and a 0 on the diagonal.

A bar graph of the total distances shows Bashful's name is the furthest from all the others, and that Dopey's is the closest, but just barely.

bar(sum(D)) title('Total Levenshtein Distance') set(gca,'xticklabels',T) Recursive

Here is a recursive program that provides one way to compute lev(s,t). It compares each character in a string with all of the characters in the other string. For each comparison, a cost c is added to a total that is accumulated by the recursion. The cost of one comparison is 0 if the pair is a match and 1 if it isn't.

type levr function d = levr(s,t,m,n) % Levenshtein distance between strings, recursive implementation. % levr(s,t) is the number of deletions, insertions, % or substitutions required to transform s to t. % m = length(s), n = length(t), levr(s,t,m,n) initiates recursion. % https://en.wikipedia.org/wiki/Levenshtein_distance if nargin == 2 s = char(s); t = char(t); d = levr(s,t,length(s),length(t)); elseif m == 0 d = n; elseif n == 0 d = m; else c = s(m) ~= t(n); % c = 0 if chars match, 1 if not. d = min([levr(s,t,m-1,n) + 1 levr(s,t,m,n-1) + 1 levr(s,t,m-1,n-1) + c]); end end

Like all recursive programs, this code is impractical to use in practice with long strings or long lists of words because it repeatedly compares the same pairs of characters. The complexity is exponential in the lengths of the strings.

Matrix Memory

A time-memory tradeoff can be made by allocating storage to save the results of individual comparisons. The memory involved is a matrix of size (m+1)-by-(n+1) where m and n are the lengths of the two strings, so it's O(m*n). The time complexity is also O(m*n).

type levm function d = levm(s,t) % Levenshtein distance between strings, matrix implementation. % levr(s,t) is the number of deletions, insertions, % or substitutions required to transform s to t. % https://en.wikipedia.org/wiki/Levenshtein_distance s = char(s); t = char(t); m = length(s); n = length(t); D = zeros(m+1,n+1); D(:,1) = (0:m)'; D(1,:) = (0:n); for i = 1:m for j = 1:n c = s(i) ~= t(j); % c = 0 if chars match, 1 if not. D(i+1,j+1) = min([D(i,j+1) + 1 D(i+1,j) + 1 D(i,j) + c]); end end levm_print(s,t,D) d = D(m+1,n+1); end Details

I've included a print routine so we can see some detail. Let's begin by finding the distance between a single letter and a word that doesn't contain that letter. The distance is the length n of the word because one substitution and n-1 deletions are required.

d = levm("S","Dopey") D o p e y S 1 2 3 4 5 d = 5

Now let's have one character match. In this case the character is 'e'.

d = levm("Sle","Dopey") D o p e y S 1 2 3 4 5 l 2 2 3 4 5 e 3 3 3 3 4 d = 4

Finally, two full words. The distance is the last entry in the last row or column of the matrix.

d = levm("Sleepy","Dopey") D o p e y S 1 2 3 4 5 l 2 2 3 4 5 e 3 3 3 3 4 e 4 4 4 3 4 p 5 5 4 4 4 y 6 6 5 5 4 d = 4 Best Program

We don't need storage for the whole matrix, just two rows. The storage cost is now linear in the lengths of the strings. This is the most efficient of my three functions.

type lev function d = lev(s,t) % Levenshtein distance between strings or char arrays. % lev(s,t) is the number of deletions, insertions, % or substitutions required to transform s to t. % https://en.wikipedia.org/wiki/Levenshtein_distance s = char(s); t = char(t); m = length(s); n = length(t); x = 0:n; y = zeros(1,n+1); for i = 1:m y(1) = i; for j = 1:n c = (s(i) ~= t(j)); % c = 0 if chars match, 1 if not. y(j+1) = min([y(j) + 1 x(j+1) + 1 x(j) + c]); end % swap [x,y] = deal(y,x); end d = x(n+1); end Metric Space

The function lev makes the set of strings a metric space. That's because of four properties. The distance from an element to itself is zero.

d(x,x) = 0

Otherwise the distance is positive.

d(x,y) > 0 if x ~= y

Distance is symmetric.

d(x,y) = d(y,x)

The triangle inequality, for any three elements,

d(x,y) <= d(x,w) + d(w,y) 44 Programming Languages

I usually avoid programming language debates, but here are implementations in 44 different programming languages, including one in MATLAB. I prefer my own code.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a

. %% "Hi ho, hi ho" % For my examples, I will use these seven famous names. T = {"Doc","Grumpy","Happy","Sleepy","Bashful","Sneezy","Dopey"} %% % The strings "Sleepy" and "Sneezy" are close to each other because they % are the same length, and we can transform "Sleepy" to "Sneezy" by two % edit operations, change 'l' to 'n' and 'p' to 'z'. The Levenshtein % distance between these two words is 2. %% % On the other hand, "Bashful" is not close to his friends. His name % is longer and the only letter he shares with another is an 'a' with % "Happy". So his distance to "Happy" is 6, while the distance to % any of the others is 7, the length of his name. %% % Here is the matrix of pair-wise distances for i = 1:7 for j = 1:7 D(i,j) = lev(T{i},T{j}); end end D %% % Bashful's column is all 7's, except for a 6 in Happy's row and a 0 % on the diagonal. %% % A bar graph of the total distances shows Bashful's name is the % furthest from all the others, and that Dopey's is the closest, % but just barely. bar(sum(D)) title('Total Levenshtein Distance') set(gca,'xticklabels',T) %% Recursive % Here is a recursive program that provides one way to compute |lev(s,t)|. % It compares each character in a string with all of the characters in % the other string. For each comparison, a cost |c| is added to a % total that is accumulated by the recursion. The cost of one comparison % is 0 if the pair is a match and 1 if it isn't. type levr %% % Like all recursive programs, this code is impractical to use in % practice with long strings or long lists of words because it % repeatedly compares the same pairs of characters. The complexity % is exponential in the lengths of the strings. %% Matrix Memory % A time-memory tradeoff can be made by allocating storage to save the % results of individual comparisons. The memory involved is a matrix of % size (m+1)-by-(n+1) where m and n are the lengths of the two strings, % so it's O(m*n). The time complexity is also O(m*n). type levm %% Details % I've included a print routine so we can see some detail. % Let's begin by finding the distance between a single letter and a word % that doesn't contain that letter. The distance is the length n of the % word because one substitution and n-1 deletions are required. d = levm("S","Dopey") %% % Now let's have one character match. In this case the character is 'e'. d = levm("Sle","Dopey") %% % Finally, two full words. The distance is the last entry in the last % row or column of the matrix. d = levm("Sleepy","Dopey") %% Best Program % We don't need storage for the whole matrix, just two rows. % The storage cost is now linear in the lengths of the strings. % This is the most efficient of my three functions. type lev %% Metric Space % The function |lev| makes the set of strings a _metric space_. % That's because of four properties. The distance from an element to % itself is zero. % % d(x,x) = 0 % % Otherwise the distance is positive. % % d(x,y) > 0 if x ~= y % % Distance is _symmetric_. % % d(x,y) = d(y,x) % % The triangle inequality, for any three elements, % % d(x,y) <= d(x,w) + d(w,y) %% 44 Programming Languages % I usually avoid programming language debates, but % are implementations in 44 different programming languages, % including one in MATLAB. I prefer my own code. ##### SOURCE END ##### 6edce509ea604c6fbef7aee4ee314bbd -->

### Latent Semantic Indexing, SVD, and Zipf’s Law

Mon, 07/31/2017 - 22:30

Latent Semantic Indexing, LSI, uses the Singular Value Decomposition of a term-by-document matrix to represent the information in the documents in a manner that facilitates responding to queries and other information retrieval tasks. I set out to learn for myself how LSI is implemented. I am finding that it is harder than I thought.

ContentsBackground

Latent Semantic Indexing was first described in 1990 by Susan Dumais and several colleagues at Bellcore, the descendant of the famous A.T.&T. Bell Labs. (Dumais is now at Microsoft Research.) I first heard about LSI a couple of years later from Mike Berry at the University of Tennessee. Berry, Dumais and Gavin O'Brien have a paper, "Using Linear Algebra for Intelligent Information Retrieval" in SIAM Review in 1995. Here is a link to a preprint of that paper.

An important problem in information retrieval involves synonyms. Different words can refer to the same, or very similar, topics. How can you retrieve a relevant document that does not actually use a key word in a query? For example, consider the terms

• FFT (Finite Fast Fourier Transform)
• SVD (Singular Value Decomposition)
• ODE (Ordinary Differential Equation

Someone looking for information about PCA (Principal Component Analysis) would be more interested in documents about SVD than those about the other two topics. It is possible to discover this connection because documents about PCA and SVD are more likely to share terms like "rank" and "subspace" that are not present in the others. For information retrieval purposes, PCA and SVD are synonyms. Latent Semantic Indexing can reveal such connections.

Strings

I will make use of the new string object, introduced in recent versions of MATLAB. The double quote has been an illegal character in MATLAB. But now it delineate strings.

s = "% Let $A$ be the 4-by-4 magic square from Durer's _Melancholia_." s = "% Let $A$ be the 4-by-4 magic square from Durer's _Melancholia_."

To erase the leading percent sign and the LaTeX between the dollar signs.

s = erase(s,'%') s = eraseBetween(s,'$','$','Boundaries','inclusive') s = " Let $A$ be the 4-by-4 magic square from Durer's _Melancholia_." s = " Let be the 4-by-4 magic square from Durer's _Melancholia_." Cleve's Corner

The documents for this project are the MATLAB files that constitute the source text for this blog. I started writing the blog edition of Cleve's Corner a little over five years ago, in June 2012.

Each post comes from a MATLAB file processed by the publish command. Most of the lines in the file are comments beginning with a single %. These become the prose in the post. Comments beginning with a double %% generate the title and section headings. Text within '|' characters is rendered with a fixed width font to represent variables and code. Text within '$' characters is LaTeX that is eventually typeset by MathJax. I have collected five years' worth of source files in one directory, Blogs, on my computer. The statement D = doc_list('Blogs'); produces a string array of the file names in Blogs. The first few lines of D are D(1:5) ans = 5×1 string array "apologies_blog.m" "arrowhead2.m" "backslash.m" "balancing.m" "biorhythms.m" The statements n = length(D) n/5 n = 135 ans = 27 tell me how many posts I've made, and how many posts per year. That's an average of about one every two weeks. The file that generates today's post is included in the library. for j = 1:n if contains(D{j},"lsi") j D{j} end end j = 75 ans = 'lsi_blog.m' Read posts The following code prepends each file name with the directory name, reads each post into a string s, counts the total number of lines that I've written, and computes the average number of lines per post. lines = 0; for j = 1:n s = read_blog("Blogs/"+D{j}); lines = lines + length(s); end lines avg = lines/n lines = 15578 avg = 115.3926 Stop words The most common word in Cleve's Corner blog is "the". By itself, "the" accounts for over 8 percent of the total number of words. This is roughly the same percentage seen in larger samples of English prose. Stop words are short, frequently used words, like "the", that can be excluded from frequency counts because they are of little value when comparing documents. Here is a list of 25 commonly used stop words. I am going to use a brute force, easy to implement, stategy for stop words. I specify an integer parameter, stop, and simply ignore all words with stop or fewer characters. A value of zero for stop means do not ignore any values. After some experiments which I am about to show, I will decide to take stop = 5. Find terms Here is the core code of this project. The input is the list of documents D and the stop parameter stop. The code scans all the documents, finding and counting the individual words. In the process it • skips all noncomment lines • skips all embedded LaTeX • skips all code fragments • skips all URLs • ignores all words with stop or fewer characters The code prints some intermediate results and then returns three • T, string array of unique words. These are the terms. • C, the frequency counts of T. • A, the (sparse) term/document matrix. The term/document matrix is m -by- n, where • m = length of T is the number of terms, • n = length of D is the number of documents. type find_terms function [T,C,A] = find_terms(D,stop) % [T,C,A] = find_terms(D,stop) % Input % D = document list % stop = length of stop words % Output % T = terms, sorted alphabetically % C = counts of terms % A = term/document matrix [W,Cw,wt] = words_and_counts(D,stop); T = terms(W); m = length(T); fprintf('\n\n stop = %d\n words = %d\n m = %d\n\n',stop,wt,m) A = term_doc_matrix(W,Cw,T); C = full(sum(A,2)); [Cp,p] = sort(C,'descend'); Tp = T(p); % Terms sorted by frequency freq = Cp/wt; ten = 10; fprintf(' index term count fraction\n') for k = 1:ten fprintf('%8d %12s %8d %9.4f\n',k,Tp(k),Cp(k),freq(k)) end total = sum(freq(1:ten)); fprintf('%s total = %7.4f\n',blanks(24),total) end No stop words Let's run that code with stop set to zero so there are no stop words. The ten most frequently used words account for over one-quarter of all the words and are useless for characterizing documents. Note that, by itself, "the" accounts for over 8 percent of all the words. stop = 0; find_terms(D,stop); stop = 0 words = 114512 m = 8896 index term count fraction 1 the 9404 0.0821 2 of 3941 0.0344 3 a 2885 0.0252 4 and 2685 0.0234 5 is 2561 0.0224 6 to 2358 0.0206 7 in 2272 0.0198 8 that 1308 0.0114 9 for 1128 0.0099 10 this 969 0.0085 total = 0.2577 Three characters or less Skipping all words with three or less characters cuts the total number of words almost in half and eliminates "the", but still leaves mostly uninteresting words in the top ten. stop = 3; find_terms(D,stop); stop = 3 words = 67310 m = 8333 index term count fraction 1 that 1308 0.0194 2 this 969 0.0144 3 with 907 0.0135 4 matrix 509 0.0076 5 from 481 0.0071 6 matlab 472 0.0070 7 have 438 0.0065 8 about 363 0.0054 9 first 350 0.0052 10 point 303 0.0045 total = 0.0906 Five characters or less Setting stop to 4 is a reasonable choice, but let's be even more aggressive. With stop equal to 5, the ten most frequent words are now much more characteristic of this blog. All further calculations use these results. stop = 5; [T,C,A] = find_terms(D,stop); m = length(T); stop = 5 words = 38856 m = 6459 index term count fraction 1 matrix 509 0.0131 2 matlab 472 0.0121 3 function 302 0.0078 4 number 224 0.0058 5 floating 199 0.0051 6 precision 184 0.0047 7 computer 167 0.0043 8 algorithm 164 0.0042 9 values 160 0.0041 10 numerical 158 0.0041 total = 0.0653 Zipf's Law This is just an aside. Zipf's Law is named after George Kingsley Zipf, although he did not claim originality. The law states that in a sample of natural language, the frequency of any word is inversely proportional to its index in the frequency table. In other words, if the frequency counts C are sorted by frequency, then a plot of log(C(1:k)) versus log(1:k) should be a straight line. Our frequency counts only vaguely follow this law, but a log-log plot makes many quantities look proportional. (We get a little better conformance to the Law with smaller values of stop.) Zipf(C) SVD Now we're ready to compute the SVD. We might as well make A full; it has only n = 135 columns. But A is very tall and skinny, so it's important to use the 'econ' flag so that U also has only 135 columns. Without this flag, we would asking for a square m -by- m matrix U with m over 6000. [U,S,V] = svd(full(A),'econ'); Query Here is a preliminary stab at a function to make queries. type query function posts = query(queries,k,cutoff,T,U,S,V,D) % posts = query(queries,k,cutoff,T,U,S,V,D) % Rank k approximation to term/document matrix. Uk = U(:,1:k); Sk = S(1:k,1:k); Vk = V(:,1:k); % Construct the query vector from the relevant terms. m = size(U,1); q = zeros(m,1); for i = 1:length(queries) % Find the index of the query key word in the term list. wi = word_index(queries{i},T); q(wi) = 1; end % Project the query onto the document space. qhat = Sk\Uk'*q; v = Vk*qhat; v = v/norm(qhat); % Pick off the documents that are close to the query. posts = D(v > cutoff); query_plot(v,cutoff,queries) end Try it Set the rank k and the cutoff level. rank = 40; cutoff = .2; Try a few one-word queries. queries = {"singular","roundoff","wilkinson"}; for j = 1:length(queries) queryj = queries{j} posts = query(queryj,rank,cutoff,T,U,S,V,D) snapnow end queryj = "singular" posts = 4×1 string array "eigshowp_w3.m" "four_spaces_blog.m" "parter_blog.m" "svdshow_blog.m" queryj = "roundoff" posts = 5×1 string array "condition_blog.m" "floats.m" "partial_pivot_blog.m" "refine_blog.m" "sanderson_blog.m" queryj = "wilkinson" posts = 2×1 string array "dubrulle.m" "jhw_1.m" Note that the query about Wilkinson found the post about him and also the post about Dubrulle, who improved a Wilkinson algorithm. For the finale, merge all the queries into one. queries posts = query(queries,rank,cutoff,T,U,S,V,D) queries = 1×3 cell array ["singular"] ["roundoff"] ["wilkinson"] posts = 5×1 string array "four_spaces_blog.m" "jhw_1.m" "parter_blog.m" "refine_blog.m" "svdshow_blog.m" ------------------------------------------------------------------ Code doc_list type doc_list function D = doc_list(dir) % D = doc_list(dir) is a string array of the file names in dirtory dir. D = ""; [status,L] = system(['ls ' dir]); if status ~= 0 error(['Cannot find ' dir]) end k1 = 1; i = 1; for k2 = find(L == newline) D(i,:) = L(k1:k2-1); i = i+1; k1 = k2+1; end end erase_stuff type erase_stuff function sout = erase_stuff(s) % s = erase_stuff(s) % erases noncomments, !sysltem, LaTeX, URLs, Courier, and punctuation. j = 1; k = 1; while k <= length(s) sk = lower(char(s(k))); if length(sk) >= 4 if sk(1) ~= '%' % Skip non comment break elseif any(sk=='!') && any(sk=='|') % Skip !system | ... break elseif all(sk(3:4) == '$') sk(3:4) = ' '; % Skip display latex while all(sk~='$') k = k+1; sk = char(s(k)); end else % Embedded latex sk = eraseBetween(sk,'$','$','Boundaries','inclusive'); % URL if any(sk == '<') if ~any(sk == '>') k = k+1; sk = [sk lower(char(s(k)))]; end sk = eraseBetween(sk,'<','>','Boundaries','inclusive'); sk = erase(sk,'>'); end if contains(string(sk),"http") break end % Courier f = find(sk == '|'); assert(length(f)==1 | mod(length(f),2)==0); for i = 1:2:length(f)-1 w = sk(f(i)+1:f(i+1)-1); if length(w)>2 && all(w>='a') && all(w<='z') sk(f(i)) = ' '; sk(f(i+1)) = ' '; else sk(f(i):f(i+1)) = ' '; end end % Puncuation sk((sk<'a' | sk>'z') & (sk~=' ')) = []; sout(j,:) = string(sk); j = j+1; end end k = k+1; end skip = k-j; end find_words type find_words function w = find_words(s,stop) % words(s,stop) Find the words with length > stop in the text string. w = ""; i = 1; for k = 1:length(s) sk = [' ' char(s{k}) ' ']; f = strfind(sk,' '); for j = 1:length(f)-1 t = strtrim(sk(f(j):f(j+1))); % Skip stop words if length(t) <= stop continue end if ~isempty(t) w{i,1} = t; i = i+1; end end end query_plot type query_plot function query_plot(v,cutoff,queries) clf shg plot(v,'.','markersize',12) ax = axis; yax = 1.1*max(abs(ax(3:4))); axis([ax(1:2) -yax yax]) line(ax(1:2),[cutoff cutoff],'color','k') title(sprintf('%s, ',queries{:})) end read_blog type read_blog function s = read_blog(filename) % read_blog(filename) % skip over lines that do not begin with '%'. fid = fopen(filename); line = fgetl(fid); k = 1; while ~isequal(line,-1) % -1 signals eof if length(line) > 2 && line(1) == '%' s(k,:) = string(line); k = k+1; end line = fgetl(fid); end fclose(fid); end term_doc_matrix type term_doc_matrix function A = term_doc_matrix(W,C,T) % A = term_doc_matrix(W,C,T) m = length(T); n = length(W); A = sparse(m,n); for j = 1:n nzj = length(W{j}); i = zeros(nzj,1); s = zeros(nzj,1); for k = 1:nzj i(k) = word_index(W{j}(k),T); s(k) = C{j}(k); end A(i,j) = s; end end word_count type word_count function [wout,count] = word_count(w) % [w,count] = word_count(w) Unique words with counts. w = sort(w); wout = ""; count = []; i = 1; k = 1; while k <= length(w) c = 1; while k < length(w) && isequal(w{k},w{k+1}) c = c+1; k = k+1; end wout{i,1} = w{k}; count(i,1) = c; i = i+1; k = k+1; end [count,p] = sort(count,'descend'); wout = wout(p); end word_index type word_index function p = word_index(w,list) % Index of w in list. % Returns empty if w is not in list. m = length(list); p = fix(m/2); q = ceil(p/2); t = 0; tmax = ceil(log2(m)); % Binary search while list(p) ~= w if list(p) > w p = max(p-q,1); else p = min(p+q,m); end q = ceil(q/2); t = t+1; if t == tmax p = []; break end end end words_and_counts type words_and_counts function [W,Cw,wt] = words_and_counts(D,stop) % [W,Cw,wt] = words_and_counts(D,stop) W = []; % Words Cw = []; % Counts wt = 0; n = length(D); for j = 1:n s = read_blog("Blogs/"+D{j}); s = erase_stuff(s); w = find_words(s,stop); [W{j,1},Cw{j,1}] = word_count(w); wt = wt + length(w); end end Zipf type Zipf function Zipf(C) % Zipf(C) plots log2(C(1:k)) vs. log2(1:k) C = sort(C,'descend'); % Sort counts by frequency figure(1) clf shg k = 256; plot(log2(1:k),log2(C(1:k)),'.','markersize',12) axis([-1 8 4 10]) xlabel('log2(index)') ylabel('log2(count)') title('Zipf''s Law') drawnow end \n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } --> Get the MATLAB code (requires JavaScript) Published with MATLAB® R2017a in 1995. % % to a preprint of that paper. %% % An important problem in information retrieval involves synonyms. % Different words can refer to the same, or very similar, topics. % How can you retrieve a relevant document that does not actually % use a key word in a query? For example, consider the terms % % * FFT (Finite Fast Fourier Transform) % * SVD (Singular Value Decomposition) % * ODE (Ordinary Differential Equation % % Someone looking for information about PCA (Principal Component % Analysis) would be more interested in documents about SVD than those % about the other two topics. It is possible to discover this connection % because documents about PCA and SVD are more likely to share terms % like "rank" and "subspace" that are not present in the others. % For information retrieval purposes, PCA and SVD are synonyms. % Latent Semantic Indexing can reveal such connections. %% Strings % I will make use of the new |string| object, introduced in recent versions % of MATLAB. The double quote has been an illegal character in MATLAB. % But now it delineate strings. s = "% Let$A$be the 4-by-4 magic square from Durer's _Melancholia_." %% % To erase the leading percent sign and the LaTeX between the dollar signs. s = erase(s,'%') s = eraseBetween(s,'$','$','Boundaries','inclusive') %% Cleve's Corner % The documents for this project are the MATLAB files that constitute the % source text for this blog. I started writing the blog edition of % Cleve's Corner a little over five years ago, in June 2012. %% % Each post comes from a MATLAB file processed by the |publish| command. % Most of the lines in the file are comments beginning with a single |%|. % These become the prose in the post. Comments beginning with a double % |%%| generate the title and section headings. % Text within '|' characters % is rendered with a fixed width font to represent variables and code. % Text within '$' characters is LaTeX that is eventually typeset by % MathJax. %% % I have collected five years' worth of source files in one directory, % |Blogs|, on my computer. The statement D = doc_list('Blogs'); %% % produces a string array of the file names in |Blogs|. % The first few lines of |D| are D(1:5) %% % The statements n = length(D) n/5 %% % tell me how many posts I've made, and how many posts per year. % That's an average of about one every two weeks. %% % The file that generates today's post is included in the library. for j = 1:n if contains(D{j},"lsi") j D{j} end end %% Read posts % The following code prepends each file name with the directory name, % reads each post into a string |s|, counts the total number of lines % that I've written, and computes the average number of lines per post. lines = 0; for j = 1:n s = read_blog("Blogs/"+D{j}); lines = lines + length(s); end lines avg = lines/n %% Stop words % The most common word in Cleve's Corner blog is "the". % By itself, "the" accounts for over 8 percent of the total number of % words. This is roughly the same percentage seen in larger samples % of English prose. %% % _Stop words_ are short, frequently used words, like "the", that can be % excluded from frequency counts because they are of little value when % comparing documents. % of 25 commonly used stop words. %% % I am going to use a brute force, easy to implement, stategy for stop % words. I specify an integer parameter, |stop|, and simply ignore % all words with |stop| or fewer characters. A value of zero for |stop| % means do not ignore any values. After some experiments which I am % about to show, I will decide to take |stop = 5|. %% Find terms % Here is the core code of this project. The input is the list of % documents |D| and the stop parameter |stop|. The code scans all % the documents, finding and counting the individual words. % In the process it % % * skips all noncomment lines % * skips all embedded LaTeX % * skips all code fragments % * skips all URLs % * ignores all words with |stop| or fewer characters %% % The code prints some intermediate results and then returns three % % * T, string array of unique words. These are the terms. % * C, the frequency counts of T. % * A, the (sparse) term/document matrix. %% % The term/document matrix is |m| -by- |n|, where % % * |m| = length of T is the number of terms, % * |n| = length of D is the number of documents. type find_terms %% No stop words % Let's run that code with |stop| set to zero so there are no stop words. % The ten most frequently used words account for over one-quarter of % all the words and are useless for characterizing documents. % Note that, by itself, "the" accounts for over 8 percent of all the words. stop = 0; find_terms(D,stop); %% Three characters or less % Skipping all words with three or less characters cuts the total number % of words almost in half and eliminates "the", % but still leaves mostly uninteresting words in the top ten. stop = 3; find_terms(D,stop); %% Five characters or less % Setting |stop| to 4 is a reasonable choice, but let's be even more % aggressive. With |stop| equal to 5, the ten most frequent words are % now much more characteristic of this blog. All further calculations % use these results. stop = 5; [T,C,A] = find_terms(D,stop); m = length(T); %% Zipf's Law % This is just an aside. is named after George Kingsley Zipf, although he did % not claim originality. The law states that in a sample of natural % language, the frequency of any word is inversely proportional to % its index in the frequency table. %% % In other words, if the frequency counts |C| are sorted by % frequency, then a plot of |log(C(1:k))| versus |log(1:k)| % should be a straight line. Our frequency counts % only vaguely follow this law, but a log-log plot makes many % quantities look proportional. (We get a little better conformance % to the Law with smaller values of |stop|.) Zipf(C) %% SVD % Now we're ready to compute the SVD. We might as well make |A| full; % it has only |n| = 135 columns. But |A| is very tall and skinny, % so it's important to use the |'econ'| flag so that |U| also has only % 135 columns. Without this flag, we would asking for a square % |m| -by- |m| matrix |U| with |m| over 6000. [U,S,V] = svd(full(A),'econ'); %% Query % Here is a preliminary stab at a function to make queries. type query %% Try it %% % Set the rank k and the cutoff level. rank = 40; cutoff = .2; %% % Try a few one-word queries. queries = {"singular","roundoff","wilkinson"}; for j = 1:length(queries) queryj = queries{j} posts = query(queryj,rank,cutoff,T,U,S,V,D) snapnow end %% % Note that the query about Wilkinson found the post about him % and also the post about Dubrulle, who improved a Wilkinson algorithm. %% % For the finale, merge all the queries into one. queries posts = query(queries,rank,cutoff,T,U,S,V,D) %% % REPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASHREPLACE_WITH_DASH_DASH %% Code % *doc_list* type doc_list %% % *erase_stuff* type erase_stuff %% % *find_words* type find_words %% % *query_plot* type query_plot %% % *read_blog* type read_blog %% % *term_doc_matrix* type term_doc_matrix %% % *word_count* type word_count %% % *word_index* type word_index %% % *words_and_counts* type words_and_counts %% % *Zipf* type Zipf ##### SOURCE END ##### 24969ac7161f427885d977875c2db9e3 -->

### What is the Condition Number of a Matrix?

Tue, 07/18/2017 - 07:28

A couple of questions in comments on recent blog posts have prompted me to discuss matrix condition numbers.

• Can you comment on when the condition number gives a tight estimate of the error in a computed inverse and whether there is a better estimator?

• Do you have any idea of the slowdown factor ... for large linear equation solves with extremely ill-conditioned matrices?

My short answers are that the error estimate is rarely tight, but that it is not possible to find a better one, and that it takes the same amount of time to solve ill-conditioned linear equations as it does to solve well-conditioned systems.

ContentsCondition Number for Inversion

I should first point out that there are many different condition numbers and that, although the questioners may not have realized it, they were asking about just one of them -- the condition number for inversion. In general, a condition number applies not only to a particular matrix, but also to the problem being solved. Are we inverting the matrix, finding its eigenvalues, or computing the exponential? The list goes on. A matrix can be poorly conditioned for inversion while the eigenvalue problem is well conditioned. Or, vice versa.

A condition number for a matrix and computational task measures how sensitive the answer is to perturbations in the input data and to roundoff errors made during the solution process.

When we simply say a matrix is "ill-conditioned", we are usually just thinking of the sensitivity of its inverse and not of all the other condition numbers.

Norms

In order to make these notions more precise, let's start with a vector norm. Specifically, the Euclidean norm or 2- norm.

$$\|x\| \ = \ (\sum_i x_i^2)^{1/2}$$

This is the "as the crow flies" distance in n-dimensional space.

The corresponding norm of a matrix $A$ measures how much the mapping induced by that matrix can stretch vectors.

$$M \ = \ \|A\| \ = \ {\max {{\|Ax\|} \over {\|x\|}}}$$

It is sometimes also important to consider how much a matrix can shrink vectors.

$$m \ = \ {\min {{\|Ax\|} \over {\|x\|}}}$$

The reciprocal of the minimum stretching is the norm of the inverse, because

$$m \ = \ {\min {{\|Ax\|} \over {\|x\|}}} \ = \ {\min {{\|y\|} \over {\|A^{-1} y\|}}} \ = \ 1/{\max {{\|A^{-1} y\|} \over {\|y\|}}} \ = \ 1/\|A^{-1}\|$$

A singular matrix is one that can map nonzero vectors into the zero vector. For a singular matrix

$$m \ = \ 0$$

and the inverse does not exist.

The ratio of the maximum to minimum stretching is the condition number for inversion.

$$\kappa(A) \ = \ {{M} \over {m}}$$

An equivalent definition is

$$\kappa(A) \ = \ \|A\| \|A^{-1}\|$$

If a matrix is singular, then its condition number is infinite.

Linear equations

The condition number $\kappa(A)$ is involved in the answer to the question: How much can a change in the right hand side of a system of simultaneous linear equations affect the solution? Consider a system of equations

$$A x \ = \ b$$

and a second system obtained by altering the right-hand side

$$A(x + \delta x) = b + \delta b$$

Think of $\delta b$ as being the error in $b$ and $\delta x$ as being the resulting error in $x$, although we need not make any assumptions that the errors are small. Because $A (\delta x) = \delta b$, the definitions of $M$ and $m$ immediately lead to

$$\|b\| \leq M \|x\|$$

and

$$\|\delta b\| \geq m \|\delta x\|$$

Consequently, if $m \neq 0$,

$${\|\delta x\| \over \|x\|} \leq \kappa(A) {\|\delta b\| \over \|b\|}$$

The quantity $\|\delta b\|/\|b\|$ is the relative change in the right-hand side, and the quantity $\|\delta x\|/\|x\|$ is the resulting relative change in the solution. The advantage of using relative changes is that they are dimensionless --- they are not affected by overall scale factors.

This inequality shows that the condition number is a relative error magnification factor. Changes in the right-hand side can cause changes $\kappa(A)$ times as large in the solution.

Example

Let's investigate a system of linear equations involving

$$A=\left(\begin{array}{cc} 4.1 & 2.8 \\ 9.7 & 6.6 \end{array}\right)$$

Take $b$ to be the first column of $A$, so the solution to $Ax = b$ is simply

$$x = \left(\begin{array}{c} 1 \\ 0 \end{array}\right)$$

Switch to MATLAB

A = [4.1 2.8; 9.7 6.6] b = A(:,1) x = A\b A = 4.1000 2.8000 9.7000 6.6000 b = 4.1000 9.7000 x = 1.0000 -0.0000

Now add 0.01 to the first component of b.

b2 = [4.11; 9.7] b2 = 4.1100 9.7000

The solution changes dramatically.

x2 = A\b2 x2 = 0.3400 0.9700

This sensitivity of the solution x to changes in the right hand side b is a reflection of the condition number.

kappa = cond(A) kappa = 1.6230e+03

The upper bound on the possible change in x indicates changes in all of the significant digits.

kappa*norm(b-b2)/norm(b) ans = 1.5412

The actual change in x resulting from this perturbation is

norm(x-x2)/norm(x) ans = 1.1732

So this particular change in the right hand side generated almost the largest possible change in the solution.

Close to singular

A large condition number means that the matrix is close to being singular. Let's make a small change in the second row of A.

A A2 = [4.1 2.8; 9.676 6.608] A = 4.1000 2.8000 9.7000 6.6000 A2 = 4.1000 2.8000 9.6760 6.6080

The resulting matrix is effectively singular. If we try to compute its inverse, we get a warning message.

A2inv = inv(A2) Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 1.988677e-17. A2inv = 1.0e+15 * -1.4812 0.6276 2.1689 -0.9190

The quantity RCOND in the warning is an estimate of the reciprocal of the condition number. Using the reciprocal is left over from the days before we had IEEE floating point arithmetic with Inf to represent overflow and infinity. For this example RCOND is on the order of eps(1) and the scale factor for A2inv implies that its elements are useless. It's impossible to compute something that doesn't exist.

A fairly recent addition to MATLAB is the function condest that estimates $\kappa(A)$ . Now condest(A) is preferable to 1/rcond(A).

Inverse

The condition number $\kappa(A)$ also appears in the bound for how much a change $E$ in a matrix $A$ can affect its inverse.

$${{\|(A+E)^{-1} - A^{-1}\|} \over {\|A^{-1}\|}} \ \le \ \kappa(A) {{\|E\|} \over {\|A\|}}$$

Jim Wilkinson's work about roundoff error in Gaussian elimination showed that each column of the computed inverse is a column of the exact inverse of a matrix within roundoff error of the given matrix. Let's fudge this a bit and say that inv(A) computes the exact inverse of $A+E$ where $\|E\|$ is on the order of roundoff error compared to $\|A\|$ .

We don't know $E$ exactly, but for an n -by- n matrix we have the estimate

norm(E) $\approx \space$ n*eps(norm(A))

So we have a simple estimate for the error in the computed inverse, relative to the unknown exact inverse.

X = $\space$ exact inverse of A

Z = inv(A)

norm(Z - X)/norm(X) $\approx$ n*eps*cond(A)

For our 2-by-2 example the estimate of the relative error in the computed inverse is

2*eps*condest(A) ans = 9.9893e-13

This says we can expect 12 or 13 (out of 16) significant digits.

Wilkinson had to assume that every individual floating point arithmetic operation incurs the maximum roundoff error. But only a fraction of the operations have any roundoff error at all and even for those the errors are smaller than the maximum possible. So this estimate can be expected to an overestimate. But no tighter estimate is possible.

For our example, the computed inverse is

format long Z = inv(A) Z = -66.000000000000057 28.000000000000025 97.000000000000085 -41.000000000000036

It turns out that the exact inverse has the integer entries produced by

X = round(Z) X = -66 28 97 -41

We can compare the actual relative error with the estimate.

format short norm(Z - X)/norm(X) ans = 8.7342e-16

So we actually have more than 15 significant digits of accuracy.

Speed

In simplified outline, the algorithm for computing the inverse of an $n$ -by- $n$ matrix, or for solving a system of $n$ linear equations, involves loops of length $n$ nested three deep. Each of the $n^2$ elements is accessed roughly $n$ times. So the computational complexity is proportional to $n^3$. Lots of things can be done that affect the coefficient of proportionality, but it's still order $n^3$.

The actual numbers in the matrix (generally) don't affect the execution time. A nearly singular matrix can be inverted just as fast as a well-conditioned one. The answer might not be very accurate if the condition number is large, but $\kappa(A)$ does not play a role in the speed.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->

Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2017a