What are good resources for learning more about Zernike Polynomials?

I recently was asked what good resources existed for learning more about Zernike polynomials, ranging from an introduction through intermediate texts.

In my experience, I relied on Dr. Wyant’s course notes on the topic, and consider them ranging from intro to intermediate: http://wyant.optics.arizona.edu/zernikes/Zernikes.pdf

Dr. Mahajan’s texts and books on the subject qualify as intermediate to expert level resources, and cover the historical background as well as practical application: https://www.spie.org/Publications/Book/2029485 (for only one example).

Born and Wolf’s Principles of Optics was probably the first text I read that introduced Zernike polynomials to me, although I haven’t revisited it recently and can’t comment too strongly on how good of a resource it is (although the book as a whole is incredible, and is a must have if you’ve got spare money burning a hole in your pocket): Principles of Optics (Electromagnetic Theory of Propagation, Interference and Diffraction of Light): Born, Max: 9780521642224: Amazon.com: Books

Lastly, I found while searching a tutorial paper, which is a pretty great fast and to the point introduction and practical guide to using Zernike polynomials : https://www.tandfonline.com/doi/abs/10.1080/09500340.2011.554896 (seriously, its like 2 weeks of a course in 9 pages, pretty impressive stuff).

To the community at large, do you have any other recommendations for texts, videos, lectures, etc. for explaining the use and origination of Zernike polynomials to optical systems? I will specifically ask @Isaac , @bdube , and @heejoooptics and @hkang as I know all of you have great experience with optical design and Zernike polynomials.

1 Like

I’ll let the resources that @lrgraves mentioned speak to this point in more depth, but in my opinion the most important take-away from a practical standpoint is knowing when not to use Zernike polynomials. They are such a ubiquitous tool in the optical community, from design to test, that I think they often get used without careful consideration. This knowledge can come from understanding their mathematical definitions, which the resources certainly cover, but some of the biggest misuse comes from non-circular apertures. So if you encounter a non-circular aperture, take pause and research how they have been modified to handle this case!

To extend the list of resources to the design side, I think Kyle Fuerschbach’s dissertation is a great resource as it covers nodal aberration theory and can be used for diving deeper into that subject matter.

Start by learning about orthogonal polynomials and perhaps the Gram Schmidt process, by which any set of polynomials can be orthogonalized. The Zernike polynomials are the radial component of an orthogonal set with inner product defined on a unit circle.
You can describe your wavefront errors on a non-orthogonal set too, but the description will not be unique. When your lens is not round, you can pick some other basis as well.

I often use Dr. Schwiegerling’s material from his optical specifications course, they are a helpful visualization for me:

Also, just thought I’d share my two most frequent mistakes/misconceptions about using Zernikes:

  1. The Zernike surfaces are only orthogonal over the normalized radius, so over-sizing OR under-sizing will result in a non-orthogonal result.
  2. Zernike surfaces are 100% orthogonal before they are sampled, but once sampled (especially with coarser sampling), they won’t quite be truly orthogonal. Sometimes this is a numerical precision issue, other times it is a significant deviation.

I think it depends what you want, but if you want to get “why” beyond “orthogonal, duh,” Zernike’s original paper Diffraction Theory of the Knife-Edge Test and its Improved Form, The Phase-Contrast Method. The genesis of Zernike polynomials is entirely to make diffraction integrals closed form. Along that path, Nijboer’s thesis, The diffraction theory of aberrations and subsequent papers through the 40s, as well as Emil Wolf’s paper of unfortunately exactly(!) the same title, which is now a chapter in Principles of Optics. Wolf’s paper/book chapter may be interesting because it includes the analog to the Zernike expansion of the E field in the focal plane, but done laboriously by hand for the first few of what later became known as Hopkins polynomials (W020, W040 and the like).

ENZ Zernike theory is a “pure math” bend on optics, which can give you many unusual insights (for example, exactly why a vortex coronagraph works beyond “because Mathematica says so”), based on the closed form focal plane E fields of Zernike polynomials.

Greg Forbes’ first papers on 2D-Q polynomials touch on Zernikes, and connect them to Jacobi polynomials. They were the first papers that did that that I ever found, but an early “modern era” ENZ Zernike paper actually did it first. I think that connection is very valuable, because it lets you design your own polynomials from scratch based on the same methods. (If you have an implementation of Jacobi polynomials, then Zernikes are only about 5 lines away from your fingertips). This also allows you to forego Gramm-Schmidt orthogonalization if you want to design over a different aperture (annulus, hexagon, … it matters not). That is nice, because it is a closed form way, vs the numerical nature of Gramm-Schmidt.

I think there are a lot of misconceptions about Zernikes and they are put on some sort of pedestal, when in fact they are the simplest possible orthogonal polynomial on a circle, a slight twist on something Zernike saw at a maths conference. Non-orthogonal should not be scary; the standard numerical methods we use for pretty much everything (design, metrology, fitting, the lot of it) care very little for orthogonality / do not at all require it. It isn’t a given, either, that nature “wants” orthogonality. Hopkins polynomials are often a more compact set than Zernikes, despite being simple monomials. Many people I interact with that don’t come from a formal optics background think that Zernike polynomials are aberrations and that there is nothing else. A reduction in their emphasis would, I think, benefit our industry.

Thanks for the extra content and recommendations! These are excellent resources to further our understanding of Zernike polynomials and the role they play in optical sciences.

@ted thank you for bringing up Gramm-Schmidt!

This is certainly a powerful numerical method, and one that is great to be aware of when we have discretely sampled data, as opposed to the closed form way of Jacobi polynomials as @bdube mentions.

which also ties into what @dsommitz mentions about their orthogonality after sampling:

@bdube I’m glad you brought up the point about Zernike polynomials being put on a pedestal, and that a reduction in their emphasis would benefit our industry. I think the same is true for the Hopkins expansion of the wavefront error. At its core, I see this stemming from what happens when a particular set of conventions or approximations on top of the physics gets used for so long it becomes the physics in people’s world. I do not mean to take away from the utility of making such simplifications, as they are often necessary and fit the needs, but when we forget that they are a tool and not the thing itself we lose some creative freedom to question our basis and invent something new.

I see this becoming increasingly relevant as we move into an age where design requirements produce light fields that do not fit the rotationally symmetric paradigm that has produced most optical systems. To continue to innovate, we need to understand where our tools no longer work, or are not the best fit, and this discussion about Zernike polynomials is providing the information for people to make that assessment.

Hi Brandon, I would disagree with your statement that standard numerical methods we use for “pretty much everything” do not require orthogonality. For example, when a set of Zernike polynomials are used for the alignment of an optical system, orthogonality can become critical- it is important to map the right aberrations to the correct degree of freedom or the system will easily become less aligned. More generally, when a numerical system lacks orthogonality, two opposing coefficients can become very large in opposite directions, which becomes a stability issue. Perhaps I am misunderstanding your point?

What I mean is that fitting does not care if things are orthogonal:

import numpy as np
from prysm import polynomials # v020-dev branch
x = np.linspace(-1,1,100)
coefs = np.random.rand(5)
modes_ortho = list(polynomials.cheby1_sequence([0,1,2,3,4], x))
modes_monomials = [x**n for n in range(0,5)]

data1 = np.tensordot(coefs, modes_ortho, axes=(0,0))
data2 = np.tensordot(coefs, modes_monomials, axes=(0,0))

polynomials.lstsq(modes_ortho,data1) - coefs
>>> array([-1.04083409e-17,  4.44089210e-16,  0.00000000e+00,  3.33066907e-16,
        6.66133815e-16])

polynomials.lstsq(modes_monomials,data2) - coefs
>>> array([ 3.98986399e-16,  1.11022302e-16,  0.00000000e+00, -2.77555756e-16,
        0.00000000e+00])

Whether your modes were orthogonal or not, the least squares fit finds the right answer to a few epsilon anyway. That’s because the guts of least squares is basically universally SVD, and in making “U” and “V^T”, an orthogonal basis for the input modes is made anyway.

It is basically a certainty that when you align an optical system you don’t have a diagonal control matrix (mapping from actuation to mode) if your modes are Zernike polynomials. For example, your decenters are going to introduce a lot of tilt and coma, when all you want is coma. Since W_{111} and W_{131} are both basically part of Z7 or Z8. It is also true that the span of \{Z2,Z3,Z7,Z8\} and \{W_{111}, W_{131}\} is the same, so :man_shrugging:

If your fitting software is bad, or your fitting setup is bad (handling of data dropout, for example) then you can get problems. The Zernike fitting in Mx is so-so (it was very bad circa metropro 7 or so, but is better now). In 4sight it is egregiously bad (to me). But I would never blame the concept for a bad implementation.

I’ve never had any problems aligning an optical system “human closed loop” regardless of what I fit with. We used the W (Hopkins’ expansion) to make a \lambda/100 class system when I was in college, no problem.

Oh I’m glad you brought up the Hopkins expansion again, I’ve never heard of that before. Is there any difference between the Hopkins expansion and Seidel aberrations?

The Hopkins’ aberrations are the Ws, W_{abc} = \cos(a*\theta) \cdot \rho^b \cdot H^c; you can make a sine term if you want the other axis. The Seidel aberrations are defined primarily for rays, and are scaled linear combinations of Hopkins’ terms (or the other way around, if you think of it that way). For example, W_{040} = \rho^4. Seidel spherical is S_{IV} = -\frac{1}{4} \rho^4.

Ok thanks, I was never aware that’s what they were called!

When working with a misaligned system (one that was nominally rotationally symmetric), one aberration we commonly observe is an astigmatism that is linearly field dependent, W_122, instead of the standard quadratically field-dependent term, W_222. Is W_122 captured within the Hopkin’s expansion?

a, b, c can take any value in the Hopkins expansion. What you’re observing is probably field linear field conjugate astigmatism, covered by nodal aberration theory (my first area of research!)

Aha, in that case I see how the Hopkins terms could be more easily be implemented than the Zernike polynomials, especially since the Zernike polynomials themselves say nothing about field dependence.

I also am reflecting on the point you made about SVD forming an orthogonal basis even if the terms we start with aren’t orthogonal themselves. That’s a great point I had not considered, and as you guessed, SVD is what I’ve used as well for my least squares fitting.

So if I can paraphrase what you’ve said- you can use whatever numerical system you want with least squares, but some systems will be a better fit than others in the sense that you get the most meaningful/significant terms earlier on in the expansion?

Yes, least-squares fitting via SVD is stable even if the modes are not orthogonal. I can imagine some tools implemented their own least squares because Not Invented Here, but if you use the lstsq function from any package (numpy, scipy, matlab, mathematica, etc) it’s going to be SVD, and thus not care if the modes are orthogonal.

You can imagine that the most common introduction to fitting – polynomials (monomial) fitting, fit y=ax + bx^2 + cx^3 + … would be super fraught with problems, but it’s fine. There can be cross talk between the terms not included for non-orthogonal modes. So if you have a system which contains Z2..Z100 but only fit Z2..Z11, say, the out-of-band terms from Z12..Z100 would do nothing to your fit, but if you used non-orthogonal monomials, etc, then they would change based on inclusion or not of the higher order terms.

Fitting a judicious number of terms abates that, and the crosstalk only exists between terms that “look like” each other. So W_{222} and W_{242} for example may crosstalk, but W_{080} and W_{222} would not (significantly), for example. There is a natural decay anyway, where high frequency components are smaller than low spatial frequency ones in almost all signals. You could call this “weakly band limited.” Eventually adding more high order terms does basically nothing because the full spectral content is already captured.

In your last paragraph - yes, judicious choice of basis makes the fit more or less sparse and/or compact. Having a small number of terms can be nice.