{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyPhtgJ8p0b/WaZJy8f1SN6u"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Sampling standard Gaussian random variables"],"metadata":{"id":"856tFswmENXK"}},{"cell_type":"code","execution_count":1,"metadata":{"id":"V0_GOOUdY9X_","executionInfo":{"status":"ok","timestamp":1697483838271,"user_tz":-120,"elapsed":399,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"outputs":[],"source":["import numpy as np\n","import matplotlib.pyplot as plt\n","import math\n","from scipy.stats import norm # To access the CDF of the standard Gaussian distribution"]},{"cell_type":"markdown","source":["Throughout this tutorial, we are only allowed to use Numpy's random number generator to sample uniform random variables."],"metadata":{"id":"OfdXU_IGvjye"}},{"cell_type":"code","source":["def Unif():\n","  return np.random.uniform()"],"metadata":{"id":"65qU4uPjvxW0","executionInfo":{"status":"ok","timestamp":1697483844843,"user_tz":-120,"elapsed":9,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":2,"outputs":[]},{"cell_type":"code","source":["Unif()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"-LmH8Ve0v0y9","executionInfo":{"status":"ok","timestamp":1697483847416,"user_tz":-120,"elapsed":580,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}},"outputId":"90b0b048-a8d1-4877-d291-61733357bbc7"},"execution_count":3,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.27943576228382394"]},"metadata":{},"execution_count":3}]},{"cell_type":"markdown","source":["The purpose of the tutorial is merely to implement the Box--Muller and Ziggurat algorithms, we will not try to optimise their efficiency."],"metadata":{"id":"x9GuSRDixM9M"}},{"cell_type":"markdown","source":["## 1. The Box--Muller method\n","\n","We recall that the Box--Muller method is based on the remark that if $(U_1, U_2)$ are independent $\\mathcal{U}[0,1]$ variables, then\n","$$X_1 = \\sqrt{-2\\log(U_1)} \\cos(2\\pi U_2), \\qquad X_2 = \\sqrt{-2\\log(U_1)} \\sin(2\\pi U_2),$$\n","are independent $\\mathcal{N}(0,1)$ variables.\n","\n","Complete the function `BoxMuller()` so that it returns the variable $X_1$ as above."],"metadata":{"id":"e9XL05DXL_CB"}},{"cell_type":"code","source":["def BoxMuller():\n","  # Your code here\n","  return 0"],"metadata":{"id":"Mklav__ovfcr","executionInfo":{"status":"ok","timestamp":1697487127327,"user_tz":-120,"elapsed":235,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":42,"outputs":[]},{"cell_type":"code","source":["BoxMuller()"],"metadata":{"id":"hfv2ACrNwcgN"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["We check the validity of the function `BoxMuller()` by plotting the histogram of 1000 realisations together with the density of the $\\mathcal{N}(0,1)$ law."],"metadata":{"id":"vpE-9zbFxjH8"}},{"cell_type":"code","source":["# Sample\n","N = 5000\n","Sample = np.zeros(N)\n","for k in range(N):\n","  Sample[k] = BoxMuller()\n","\n","plt.hist(Sample, bins=50, density=True)\n","\n","# Theoretical density\n","x_max = max(abs(Sample))\n","xx = np.linspace(-x_max, x_max, 500)\n","den = np.exp(-xx**2/2)/(2*math.pi)**.5\n","\n","plt.plot(xx,den,color='red')\n","\n","plt.show()"],"metadata":{"id":"C9FKpNdbx4To"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## 2. The Ziggurat algorithm\n","\n","This is adapted from Wikipedia's page on the [Ziggurat algorithm](https://en.wikipedia.org/wiki/Ziggurat_algorithm), which was published in 2000.\n","\n","The algorithm primarily generates a positive random variable $X'$ distributed as $|X|$ with $X \\sim \\mathcal{N}(0,1)$, so that $X'$ has density\n","$$\\sqrt{\\frac{2}{\\pi}} \\exp\\left(-\\frac{x^2}{2}\\right)$$\n","on $[0,+\\infty)$.\n","\n","To generate a realisation of $X$ from a realisation of $X'$ you may use the following function."],"metadata":{"id":"zpKIMSDyPWoK"}},{"cell_type":"code","source":["def RandomSign(x):\n","  return (1-2*(Unif()<.5))*x"],"metadata":{"id":"EoF3c8Jo6-O7","executionInfo":{"status":"ok","timestamp":1697484078940,"user_tz":-120,"elapsed":235,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":7,"outputs":[]},{"cell_type":"markdown","source":["The Ziggurat algorithm does not depend on the density of $X'$ being normalised so we set\n","$$f(x) = \\exp\\left(-\\frac{x^2}{2}\\right).$$"],"metadata":{"id":"R_RHpB327eOr"}},{"cell_type":"code","source":["def f(x):\n","  return np.exp(-x**2/2)\n","\n","y_max = f(0)\n","\n","def f_inv(y): # inverse function\n","  return (-2*np.log(y))**.5"],"metadata":{"id":"fizYUEu1_OUq","executionInfo":{"status":"ok","timestamp":1697484080715,"user_tz":-120,"elapsed":15,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":8,"outputs":[]},{"cell_type":"markdown","source":["### 1. Precomputing the ziggurat\n","\n","It may help to have in mind what a [ziggurat](https://en.wikipedia.org/wiki/Ziggurat) actually is.\n","\n","The basis of the Ziggurat algorithm consists in covering the graph of $f$ with a predefined number $n$ of horizontal layers defined as follows.\n","\n","The base is defined as all points inside the distribution and below $y_1 = f(x_1)$. This consists of a rectangular region from $(0, 0)$ to $(x_1, y_1)$, and the (infinite) tail of the distribution, where $x > x_1$ (and $y < y_1$).\n","\n","This layer (call it layer $0$) has area $A$. On top of this, add a rectangular layer of width $x_1$ and height $A/x_1$, so it also has area $A$. The top of this layer is at height $y_2 = y_1 + A/x_1$, and intersects the density function at a point $(x_2, y_2)$, where $y_2 = f(x_2)$. This layer includes every point in the density function between $y_1$ and $y_2$, but (unlike the base layer) also includes points such as $(x_1, y_2)$ which are not in the distribution. Further layers are then stacked on top, until $y_n > y_\\mathrm{max} = f(0)$ in which case we set $x_n=0$.\n","\n","The following function takes $x_1$ as an argument and returns a pair $(x,y)$ with $x$ and $y$ are two lists which respectively contain $(x_1, \\ldots, x_n)$ and $(y_1, \\ldots, y_n)$."],"metadata":{"id":"1rW988tY_O2X"}},{"cell_type":"code","source":["def layers(x1):\n","  # Your code here\n","  return 0"],"metadata":{"id":"dDPIJXz17dl6","executionInfo":{"status":"ok","timestamp":1697484458856,"user_tz":-120,"elapsed":232,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":9,"outputs":[]},{"cell_type":"markdown","source":["The covering of $f$ by the ziggurat is visualised as follows."],"metadata":{"id":"RvgsjTDPFW3M"}},{"cell_type":"code","source":["x1 = 2.5\n","plt.plot(np.linspace(0,1.2*x1,100),f(np.linspace(0,1.2*x1,100)),color='red')\n","\n","[x,y] = layers(x1)\n","n = len(x)\n","\n","\n","plt.plot([0,1.2*x1],[0,0],color='blue')\n","plt.plot([0,x[0]],[y[0],y[0]],color='blue')\n","plt.plot(np.linspace(x1,1.2*x1,20),f(np.linspace(x1,1.2*x1,20)),color='blue')\n","\n","for i in range(n-1):\n","  plt.plot([x[i],x[i]],[y[i],y[i+1]],color='blue')\n","  plt.plot([0,x[i]],[y[i+1],y[i+1]],color='blue')\n","\n","print(\"Number of layers:\", n)"],"metadata":{"id":"1e8n6102FuWC"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 2. Rejection sampling\n","\n","To sample from the (normalised) ziggurat, we first draw a layer $i \\in \\{0, \\ldots, n-1\\}$ uniformly. Then we draw a point $(x,y)$ uniformly in the layer with the following algorithm.\n","\n","* If $i \\geq 1$, the layer is a rectangle. We first draw $x$ uniformly in $[0,x_i]$. If $x \\leq x_{i+1}$ then whatever the choice of $y$ the point $(x,y)$ will be below the graph of $f$ so there is no actual need to generate $y$ and we may return $X=x$ directly. If $x \\in (x_{i+1}, x_i]$, then we draw $y$ uniformly in $[y_i,y_{i+1}]$. If $y \\leq f(x)$ then we return $X=x$ ortherwise we reject the draw and restart from the beginning.\n","\n","* If $i=0$, we first define $x_0 = A/y_1 > x_1$ and draw $x$ uniformly on $[0,x_0]$. If $x \\leq x_1$ then we return $X=x$. Otherwise we need to draw $X$ in the tail of the distribution, namely according to the distribution $$\\frac{f(x)1_{x>x_1}}{\\int f(x)1_{x>x_1} dx}.$$ To do so we use the following *fallback algorithm*: let $x'=-\\log(U_1)/x_1$, $y'=-\\log(U_2)$; if $2y'>x'^2$ return $X=x'+x_1$ otherwise reject.\n","\n","Implement this algorithm in the next function."],"metadata":{"id":"z8nH7VaHLsXS"}},{"cell_type":"code","source":["x1 = 3.5\n","[x,y] = layers(x1)\n","n = len(x)\n","\n","def Ziggurat():\n","  # Your code here\n","  return 0"],"metadata":{"id":"A842qW3WEjq-","executionInfo":{"status":"ok","timestamp":1697486979932,"user_tz":-120,"elapsed":211,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":34,"outputs":[]},{"cell_type":"code","source":["Ziggurat()"],"metadata":{"id":"dZcvfzwwF0py"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["We draw the associated histogram."],"metadata":{"id":"4cPnyQeuGkDj"}},{"cell_type":"code","source":["# Sample\n","N = 5000\n","Sample = np.zeros(N)\n","for k in range(N):\n","  Sample[k] = RandomSign(Ziggurat())\n","\n","plt.hist(Sample, bins=50, density=True)\n","\n","# Theoretical density\n","x_max = max(abs(Sample))\n","xx = np.linspace(-x_max, x_max, 500)\n","den = np.exp(-xx**2/2)/(2*math.pi)**.5\n","\n","plt.plot(xx,den,color='red')\n","\n","plt.show()"],"metadata":{"id":"aFFN8pN6A7iU"},"execution_count":null,"outputs":[]}]}