Exit Pupil Wavefront Fitting by ELE Optics

Background & Theory
To fit a wavefront to the real wavefront in the exit pupil, we must first sample our real wavefront. What do we mean by wavefront? Let us first start with our source, and for simplicity let us imagine our source, or object, is an infinitely small point in object space (there is actually some physical approximation for this, check out the hydrogen atom!). This point will emit light outwards in all directions, over a full sphere. We also know that light can be described as an electro-magnetic wave. Specifically, Maxwell described that light is a propagating wave with orthogonal electric and magnetic field components which oscillate [3]. We further can state that inherent to the field is the wavelength and phase of the electro-magnetric wave [4]. But, this is all getting fairly technical (if you are interested in this topic we encourage the reader to investigate Electrodynamics. References [4] through [8] are all excellent resources).

We can instead summarize the point as this: for a given emitter of light, a wavefront is the surface in space and time that describes where the phase (between the electrical and magnetic components of our light field) is the same for all points of the light field that was emitted. For our simple point emitter, a sphere expanding outwards will describe, for propagation in free space, a wavefront. For our optical system however, we only capture a portion of the light emitted by an object. The surface area over which we capture an incoming wavefront is described by the system’s entrance pupil. Once in the system, light propagates through the system and interacts in complex ways with the various components of the optical system (this is where we simplify things and assume a geometrical ray tracing model). After traversing the optical system, the wavefront is then output, from the…you guessed it, exit pupil of our system. It is at this point that it can be extremely useful to understand what the shape of our wavefront is. For an imaging system, perhaps we want something that is described as an ideal sphere, whose radius of curvature matches our desired focal point (of course we as experienced designers know not to forget about diffraction effects!). Or, lets consider a different case, maybe your optical system is afocal, and it matters more what the wavefront shape is in absolute terms rather than is it focusing or not. In this case, our tool takes the rays at the exit pupil, and determines their pointing vectors. Using these we can leverage mathematical systems to ‘fit’ a wavefront to our data.

Currently, we offer two methods for fitting a wavefront to the data in the exit pupil. First however, what do we mean by ‘fitting’? Specifically, we have data which samples the real wavefront, which are the traced rays at the exit pupil. However, we want to be able to describe any point in our wavefront at the exit pupil! Thus, we can construct a mathematical equation that well fits our existing data points, and we assume it will also represents that data in between the known points well. This specifically means we can use our equation to interpolate the wavefront data. However, while it can be useful to fit an equation to our data and thus be able to interpolate data we don’t have, many scientists simply find the equation used to fit the data important. So, how do we fit and equation to our data? For basic fitting such as fitting a line to a set of points, we could make a function that depends on one variable and outputs the result. But for our wavefront, we are depending on 2 variables, x and y location at a z plane, to define the normal of our wavefront. Further, we are dealing with an optical wavefront, which has been affected by the optical elements that it interacted with. Lastly, we traditionally in optical systems deal with circular apertures. Thus, we want an equation which a) represents common wavefront aberrations imparted by optical elements, b) is ortho-normal over our aperture (traditionally a circle), and c) that our equation forms what is known as a complete set. Fortunately for us, Fritz Zernike in 1934 discovered a polynomial set, the Zernike Polynomials, which well described common optical aberrations found in existing metrology tools at the time, which was continuous over a unit circle, and formed a complete set. As an aside, and to further the pool of optical knowledge, Fritz Zernike won the Noble Prize in 1953 for inventing the phase-contrast microscope (wow, this guy was smart)!

At first blush it may appear that Zernike polynomials are the perfect tool for fitting our optical wavefront. This is not always true. This cannot be stressed enough, as it is a common misconception. In particular cases, namely: wavefronts which are continuous over a unit circular aperture that primarily suffer from common optical aberrations, Zernike polynomials work quite well. However, in software, it is important to note that we never have continuous data, we only deal with discrete data. We may design an optical system that has non-circular apertures, indeed, this is becoming increasingly common with freeform and unique cutting edge optics. Further, our primary aberrations may not be due to common optical aberrations; such as the case when turbulence is a primary factor. We strongly encourage all readers to refer to Wyant and Creath for an in depth but approachable discussion of optical aberrations and how Zernike polynomials relate (and where they fall short) [8].

What alternatives do we have to Zernike polynomials, and where are they most applicable? Currently, we only offer Zernike polynomial fitting, and we plan on offering Chebyshev and Lagrange polynomial fitting as well. Chebyshev gradient polynomials are ortho-normal over a rectangular aperture (while various optical systems have begun to leverage rectangular apertures, almost all systems use rectangular detectors). Further, they can be recursively generated easily, allowing for fitting of high numbers of polynomials terms (tens of thousands) to data. For a full discussion on Chebyshev gradient polynomials and their applicability to optical wavefronts, see Aftab et. al., [9].

Whether we use Zernike or Chebyshev gradient polynomials, fitting is done in the following way. We start with our sample data points; the ray pointing vectors over our exit pupil plane. We then determine the set of polynomials we wish to fit to the data; i.e. we cannot fit an infinite number of polynomials, so we start with the first 65 Zernike polynomials, or the first 254 Chebyshev gradient polynomials. Using a matrix method, we can state that our sampled data points are known data that represents the gradient of our wavefront (the pointing vector’s x and y slope represents the local slope of the wavefront), and the coefficients applied to the polynomials are unknown. This takes the form of the matrix problem A = bC, where A is our known sampled points, b are the unknown coefficients we wish to solve for, and C is our matrix of the polynomials. A Graham-Schmidt process can be applied to apply ortho-normalization to the fitting process.

For the interested scientist and user, we offer the following algorithm description of the fitting process. The following section lays out the algorithms used:

Step 1: User creates an object which which will generate a wavefront of rays that will enter the system.

Step 2: Ray transfer

The rays are geometrically propagated through the optical system using the following algorithms
First, find intercept between ray and object:

For a spherical surface we use Line Sphere Intersection:

d^2 ( \vec{l} \cdot \vec{l}) + 2d(\vec{l} \cdot (o-c)) + (o - c) \cdot (o - c) - r^2 = 0 (1)

Where \vec{l} is the ray vector, o is the origin of the ray, c is the sphere center, r is the radius of curvature of the sphere, and d is the distance from the ray origin, along it’s pointing vector, to the intercept. Thus, we solve for d by using the quadratic formula.

For a plane surface we use Line Plane Intersection:

d = \frac{(p_0 - o) \cdot \vec{n}}{\vec{l} \cdot \vec{n}} (2)

Where `p_0` is a point on the plane, `o` is the origin of the ray, \vec{n} is the normal vector of the plane, \vec{l} is the ray vector, and `d` is the distance from the ray origin, along it’s pointing vector, to the intercept. We solve for `d` directly in this case.

Step 3: Ray Deviation

After the intercept at the next surface is determined, the ray is deviated by solving the vector form of Snell’s Law. The following algorithm is utilized:

Given the incident ray vector \vec{l}_{inc}, and the normal of the incident point, \vec{n}, we can define the cosine of the angle of incidence \theta _{inc} as

\cos \theta _{inc} = -\vec{n} \cdot \vec{l}_{inc} (3)

\cos \theta _{inc} must be positive, if it is not then \vec{n} sign is changed to positive in eq. 3.

Next, we determine the cosine of the angle of transmission, \cos \theta _{trans}, as:

\cos \theta _{trans} = \sqrt{(1 - \frac{n_2}{n_1}^2) * (1 - \cos \theta _{inc}^2}) (4)

if \cos \theta _{trans} is less than zero, total internal reflection occurred. The Fresnel coefficients are next calculated as:

R_s = \left\lvert \frac{n_1 \cos \theta _{inc} - n_2 \cos \theta _{trans}}{n_1 \cos \theta _{inc} + n_2 \cos \theta _{trans}} \right\rvert ^2 (5)
R_p = \left\lvert \frac{n_1 \cos \theta _{trans} - n_2 \cos \theta _{inc}}{n_1 \cos \theta _{trans} + n_2 \cos \theta _{inc}} \right\rvert ^2 (6)
R = 0.5 * (R_s + R_p) (7)

Where R_s is the reflectance for s-polarized light, while R_p is the reflectance for p-polarized light, and R is the effective reflectivity.

We approximate the interaction at this stage, and state that if R is greater than 0.5, the ray reflects entirely, with the power scaled by the effective reflectivity, otherwise, it transmits, with the power scaled by 1 - R. The direction for the reflected ray, \vec{l} _{refl}, is given as:

\vec{l} _{refl} = \vec{l} _{inc} + 2 \cos \theta _{inc} \vec{n} (8)

While the direction for the refracted ray, \vec{l} _{trans}, is given as:

\vec{l} _{trans} = R * \vec{l} _{inc} + (R * \cos \theta _{inc} - \cos \theta _{trans}) \vec{n} (9)

Step 4: Exit Pupil Rays (Zernikes)
The rays from Step 1 are traced through the system to find the rays in the exit pupil plane. The ray directions represent the gradient of the wavefront in the exit pupil. We assume we can describe our wavefront at the exit pupil as:

W(x,y) = \sum\limits_{mn} c_{mn} Z_{mn} (x, y) = \sum\limits_{i=1} ^j c_i Z_i (x, y) (10)

where W is our wavefront, c represent the Zernike coefficients, and Z are the Zernike polynomials. We utilize ANSI Standard numbering scheme for representing n and m, which follow the following rules:

n: the power of the radial coordinate r.

m: the multiplication factor of the angular coordinate \theta.

n and m obey: m\leq n and (n-m) is even.

To see the full breakdown of the primary index, j, and its relationship to the indices n and m, and the corresponding coefficients, please select the table view in the plot options.

We define Z as:

Z_{mn} (r,\theta) = \begin{cases} R_{mn}(r), & m=0 \\R_{mn}(r)cos(m\theta), & m>0\\ R_{mn}(r)sin(m\theta), & m<0 \end{cases} (11)
R_{mn}(r)=\sum\limits_{k=0}^{(n-m)/2} \frac{(-1)^k (n-k)!}{k!(\frac{n+m}{2}-k)!(\frac{n-m}{2}-k)!}r^{n-2k} (12)

We can represent the gradient of the wavefront as:

S_x = \frac{\partial W(x,y)}{\partial x} = \sum\limits_{i=1} ^j c_i \frac{\partial Z_i (x, y)}{\partial x} (13)
S_y = \frac{\partial W(x,y)}{\partial x} = \sum\limits_{i=1} ^j c_i \frac{\partial Z_i (x, y)}{\partial y} (14)

which we can express in matrix for as :

S_{xy} = Ac (15)

where S_{xy} is a vector of the wavefront gradients with alternating x and y values:

S_{xy} = \left[ \frac{\partial W(x_1, y_1)}{\partial x_1}\frac{\partial W(x_1, y_1)}{\partial y_1}...\frac{\partial W(x_n, y_n)}{\partial x_n}\frac{\partial W(x_n, y_n)}{\partial y_n} \right]^T (16)

where T denotes the transposed matrix. A is the matrix of the partial derivatives of the Zernike polynomials, and lastly c is the column vector of the Zernike coefficients.

Our goal is to solve for the coefficients to fit to our data. Therefore, S_{xy} is known as the wavefront gradients from the raytrace, and A is pre-calculated, so c is defined as:

c = A^\dagger S_{xy} = U_{nm}S_{nm}^\dagger V_{nm}^T S_{xy} (17)

where A^\dagger is the psuedoinverse of A, obtained from the SVD. For the SVD, the columns of U_{nm} are the eigenvectors of AA^T, S_{nm} is the diagonal matrix, and the columns of V_{nm} are the orthonormal eigenvectors of A^T A. Moser, et al. provides a robust discussion on the mathematics behind the above process [11], as does Zhao & Burge [12,13].

If you have any questions regarding these algorithms, or find any errors in our algorithms, please let us know by contacting us at mail@eleoptics.com


[1] Lakshminarayanan, Vasudevan & Fleck, Andre. (2011). Zernike polynomials: A guide. Journal of Modern Optics - J MOD OPTIC. 58. 1678-1678. 10.1080/09500340.2011.633763

[2] Aftab, M., Burge, J. H., Smith, G. A., Graves, L., Oh, C. J., & Kim, D. W. (2018). Chebyshev gradient polynomials for high resolution surface and wavefront reconstruction. In R. Williamson, D. W. Kim, & R. Rascher (Eds.), Optical Manufacturing and Testing XII [1074211] (Proceedings of SPIE - The International Society for Optical Engineering; Vol. 10742). SPIE. https://doi.org/10.1117/12.2320804

[3] James Clerk Maxwell. A Treatise on Electricity & Magnetism - Volume 2. Accessed June 8, 2020. http://archive.org/details/ATreatiseOnElectricityMagnetism-Volume2.

[4] “Introduction to Electrodynamics: Griffiths, David J.: 9781108420419: Amazon.Com: Books.” Accessed June 8, 2020. https://www.amazon.com/Introduction-Electrodynamics-David-J-Griffiths/dp/1108420419.

[5] Jackson, John David. Classical Electrodynamics. 3rd ed. New York: Wiley, 1999. Print.

[6] “The Feynman Lectures on Physics, Volume II” in The Feynman Lectures on Physics New Millennium Edition https://www.feynmanlectures.caltech.edu/II_toc.html

[7] Zernike, F. (1934). “Beugungstheorie des Schneidenverfahrens und Seiner Verbesserten Form, der Phasenkontrastmethode”. Physica. 1 (8): 689–704. Bibcode:1934Phy…1…689Z. doi:10.1016/S0031-8914(34)80259-5

[8] http://wyant.optics.arizona.edu/zernikes/Zernikes.pdf

[9] Aftab, Maham, James H. Burge, Greg A. Smith, Logan Graves, Chang-jin Oh, and Dae Wook Kim. “Modal Data Processing for High Resolution Deflectometry.” International Journal of Precision Engineering and Manufacturing-Green Technology 6, no. 2 (April 1, 2019): 255–70. https://doi.org/10.1007/s40684-019-00047-y.

[10] R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. 66, 207-211 (1976).

[11]Moser, Steve, Peter Lee, and Adrian G.H. Podoleanu. “An FPGA Architecture for Extracting Real-Time Zernike Coefficients from Measured Phase Gradients.” Measurement Science Review, 15 (2). Pp. 92-100. E-ISSN 1335-8871. (doi:10.1515/msr-2015-0014 ).

[12] Chunyu Zhao and James H. Burge, “Orthonormal vector polynomials in a unit circle, Part I: basis set derived from gradients of Zernike polynomials,” Opt. Express 15, 18014-18024 (2007)

[13] Chunyu Zhao and James H. Burge, “Orthonormal vector polynomials in a unit circle, Part II : completing the basis set,” Opt. Express 16, 6586-6591 (2008)