{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyPZKA8xUuJx7PnLkPiJEnk1"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Variance reduction"],"metadata":{"id":"856tFswmENXK"}},{"cell_type":"code","execution_count":66,"metadata":{"id":"V0_GOOUdY9X_","executionInfo":{"status":"ok","timestamp":1698055014098,"user_tz":-120,"elapsed":627,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"outputs":[],"source":["import numpy as np\n","import matplotlib.pyplot as plt"]},{"cell_type":"markdown","source":["## 1. Control variate\n","\n","For $t>0$ and $x \\in R$, we set $$f_t(x) = \\frac{1}{1+t x^2}, \\qquad g_t(x) = 1-t x^2.$$"],"metadata":{"id":"e9XL05DXL_CB"}},{"cell_type":"code","source":["def f(t,x):\n","  return 1/(1+t*x**2)\n","\n","def g(t,x):\n","  return 1-t*x**2"],"metadata":{"id":"Mklav__ovfcr","executionInfo":{"status":"ok","timestamp":1698047197420,"user_tz":-120,"elapsed":233,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":8,"outputs":[]},{"cell_type":"markdown","source":["For $X \\sim \\mathcal{N}(0,1)$, our aim is to estimate the value of $\\mathcal{I}:=E[f_t(X)]$, using $Y=g_t(X)$ as a control variate. Note that $E[Y] = 1-t$."],"metadata":{"id":"aq08HibqKlOW"}},{"cell_type":"markdown","source":["With the notation of the lecture notes, complete the following code so that the function `MCandCV(t,X)` takes in input a number $t>0$ and an array $X$ which contains $n$ realisations of independent standard Gaussian variables, and returns the list $(\\hat{\\mathcal{I}}_n, \\hat{\\sigma}_n, \\hat{\\mathcal{I}}^\\mathsf{CV}_n, \\hat{\\sigma}^\\mathsf{CV}_n)$."],"metadata":{"id":"0SEbykCDKWh3"}},{"cell_type":"code","source":["def MCandCV(t,X):\n","  # Your code here\n","\n","  return [0,0,0,0]"],"metadata":{"id":"hfv2ACrNwcgN","executionInfo":{"status":"ok","timestamp":1698048832693,"user_tz":-120,"elapsed":246,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":19,"outputs":[]},{"cell_type":"markdown","source":["We use this function to plot the estimated values of $\\mathcal{I}$ as a function of $t$ obtained with both the naive MC estimator and the CV estimator, together with respective $95\\%$ confidence intervals."],"metadata":{"id":"RwL3Fk3gnzgU"}},{"cell_type":"code","source":["# Sample\n","nX = 1000\n","X = np.random.normal(0,1,nX)\n","\n","# Values of t\n","nt = 100\n","t = np.linspace(.1,10,nt)\n","\n","# Estimators and standard deviations\n","estim_MC = np.zeros(nt)\n","sigma_MC = np.zeros(nt)\n","estim_CV = np.zeros(nt)\n","sigma_CV = np.zeros(nt)\n","\n","for i in range(nt):\n","  [estim_MC[i],sigma_MC[i],estim_CV[i],sigma_CV[i]] = MCandCV(t[i],X)\n","\n","# Plot of estimators and 95% confidence intervals\n","plt.plot(t,estim_MC,label='Naive MC',color='blue')\n","plt.fill_between(t, estim_MC-1.96*sigma_MC/nX**.5, estim_MC+1.96*sigma_MC/nX**.5, color='blue', alpha=.1)\n","\n","plt.plot(t,estim_CV,label='Control Variate',color='orange')\n","plt.fill_between(t, estim_CV-1.96*sigma_CV/nX**.5, estim_CV+1.96*sigma_CV/nX**.5, color='orange', alpha=.1)\n","\n","plt.legend()\n","plt.show()"],"metadata":{"id":"QiSil_xyNGKX"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Using the quantities computed above, plot as a function of $t$ the ratio $\\hat{\\sigma}^\\mathsf{CV}_n/\\hat{\\sigma}_n$, which quantifies the variance reduction due to the use of the control variate. What do you conclude?"],"metadata":{"id":"vpE-9zbFxjH8"}},{"cell_type":"code","source":["# Your code here\n"],"metadata":{"id":"C9FKpNdbx4To"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## 2. Importance Sampling\n","\n","For $p \\in [0,1]$, we let $(X_n)_{n \\geq 1}$ be a sequence of iid random variables such that $P(X_1=1)=p$, $P(X_1=-1)=1-p$. We next define $S_0 = 0$ and $S_{n+1} = S_n + X_{n+1}$, and for $N \\geq 1$ we let $$T^\\pm_N = \\inf\\{n \\geq 1: S_n = \\pm N\\}.$$\n","\n","We are interested in the evaluation of the probability of the event $\\{T^+_N < T^-_N\\}$, which is *small* when $p$ is close to $0$. In fact, it can be proved that we have\n","$$\\mathcal{I} := P(T^+_N < T^-_N) = \\begin{cases}\n","  1/2 & \\text{if $p=1/2$;}\\\\\n","  \\frac{1-\\alpha^{-N}}{\\alpha^N-\\alpha^{-N}} & \\text{otherwise;}\n","\\end{cases} \\qquad \\alpha = \\frac{1-p}{p}.$$\n","So when $p$ is small, this probability is equivalent to $p^N$.\n","\n","We first fix values for $N$ and $p$."],"metadata":{"id":"zpKIMSDyPWoK"}},{"cell_type":"code","source":["p = .25 # Must be lower than .5\n","N = 5\n","\n","proba_to_estim = (1-(1/p-1)**(-N))/((1/p-1)**N-(1/p-1)**(-N))\n","\n","print(\"Probability to estimate:\",proba_to_estim)"],"metadata":{"id":"BjRoZ-uEzdK9"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Our purpose is to fix $p$ small and to use importance sampling with random variables $(\\tilde{X}_n)_{n \\geq 1}$ which are such that $P(\\tilde{X}_1=1)=q$, $P(\\tilde{X}_1=-1)=1-q$. We denote by $\\tilde{T}^\\pm_N$ the associated hitting times of $\\pm N$. We want to study variance reduction as a function of $q$.\n","\n","Complete the following code so that the function `OneTraj(q)` returns a realisation of the pair $(\\tilde{T}^+_N \\wedge \\tilde{T}^-_N, 1_{\\tilde{T}^+_N < \\tilde{T}^-_N})$."],"metadata":{"id":"vhIUlwlWtV-d"}},{"cell_type":"code","source":["def OneTraj(q):\n","  # Your code here\n","\n","  return [0,False]"],"metadata":{"id":"EoF3c8Jo6-O7","executionInfo":{"status":"ok","timestamp":1698055786818,"user_tz":-120,"elapsed":239,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":83,"outputs":[]},{"cell_type":"markdown","source":["We now claim that for any $k \\geq N$,\n","$$P(\\tilde{T}^+_N \\wedge \\tilde{T}^-_N = k, \\tilde{T}^+_N < \\tilde{T}^-_N) = \\left(\\frac{q}{p}\\right)^{(k+N)/2}\\left(\\frac{1-q}{1-p}\\right)^{(k-N)/2}P(T^+_N \\wedge T^-_N = k, T^+_N < T^-_N),$$\n","therefore the IS estimator of $\\mathcal{I}$ writes\n","$$\\hat{\\mathcal{I}}^{\\mathsf{IS},q}_n = \\frac{1}{n}\\sum_{i=1}^n w_q(\\tilde{T}^{+,i}_N) 1_{\\tilde{T}^{+,i}_N < \\tilde{T}^{-,i}_N},$$\n","with\n","$$w_q(k) = \\left(\\frac{p}{q}\\right)^{(k+N)/2}\\left(\\frac{1-p}{1-q}\\right)^{(k-N)/2}.$$\n","\n","Complete the following so that the function `IS(q,n)` returns an array of $n$ realisations $w_q(\\tilde{T}^{+,i}_N) 1_{\\tilde{T}^{+,i}_N < \\tilde{T}^{-,i}_N}$."],"metadata":{"id":"R_RHpB327eOr"}},{"cell_type":"code","source":["def IS(q,n):\n","  # Your code here\n","\n","  return np.zeros(n)"],"metadata":{"id":"fizYUEu1_OUq","executionInfo":{"status":"ok","timestamp":1698055789944,"user_tz":-120,"elapsed":408,"user":{"displayName":"Julien Reygner","userId":"06465841509771183939"}}},"execution_count":84,"outputs":[]},{"cell_type":"markdown","source":["To check that the IS estimator is unbiased, we plot its value as a function of $q$, together with the true value of $\\mathcal{I}$."],"metadata":{"id":"Jk_cIltG_Gms"}},{"cell_type":"code","source":["# Values of q\n","nq = 20\n","q = np.linspace(.05,.95,nq)\n","\n","estim_IS = np.zeros(nq)\n","\n","for i in range(nq):\n","  isq = IS(q[i],10000)\n","  estim_IS[i] = np.mean(isq)\n","\n","plt.plot(q,estim_IS,label='IS estimator')\n","plt.axhline(y=proba_to_estim, color='black',label='True probability')\n","plt.legend()\n","plt.show()"],"metadata":{"id":"8II-RIbA_WXg"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Using a similar code, plot the empirical variance of $\\hat{\\mathcal{I}}^{\\mathsf{IS},q}_n$ as a function of $q$. Which value of $q$ seems to be optimal?"],"metadata":{"id":"CZrywyom-y4I"}},{"cell_type":"code","source":["# Your code here\n"],"metadata":{"id":"DNLAkfxl2Ixr"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["You may now try to play with the values of $p$ and $N$ to make the event $\\{T^+_N < T^-_N\\}$ even more unlikely, and see how IS performs."],"metadata":{"id":"dv5q938lAVzo"}}]}