{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HPX + Cling + Jupyter\n",
    "This tutorial works in a special Jupyter notebook that can be used in one of two ways:\n",
    "* From this website: https://hpx-jupyter.cct.lsu.edu\n",
    "* From the docker image: stevenrbrandt/fedora-hpx-cling\n",
    "* Normally, each cell should contain declarations, e.g. definitions of functions,\n",
    "  variables, or `#include` statements.\n",
    "  <div style='border: solid black 1pt;'>\n",
    "  ```#include <iostream>\n",
    "using namespace std;```</div>\n",
    "* If you wish to process an expression, e.g. ```cout << \"hello world\\n\"``` you\n",
    "  can put ```.expr``` at the front of the cell.\n",
    "  <div style='border: solid black 1pt;'>\n",
    "  ```.expr cout << \"hello, world\\n\";```</div>\n",
    "* Sometimes you will want to test a cell because you are uncertain whether\n",
    "  it might cause a segfault or some other error that will kill your kernel.\n",
    "  Othertimes, you might want to test a definition without permanently adding\n",
    "  it to the current namespace. You can do this by prefixing your cell with\n",
    "  ```.test```. Whatever is calculated in a test cell will be thrown away\n",
    "  after evaluation and will not kill your kernel.\n",
    "  <div style='border: solid black 1pt;'>\n",
    "  ```.test.expr int foo[5];\n",
    "foo[10] = 1;```</div>\n",
    "## Docker Instructions\n",
    "* Frist, install Docker on your local resource\n",
    "* Second, start Docker, e.g. ```sudo service docker start```\n",
    "* Third, run the fedora-hpx-cling container, e.g.\n",
    "\n",
    "    <div style='border: solid black 1pt;'>```$ docker pull stevenrbrandt/fedora-hpx-cling\n",
    "$ docker run -it -p 8000:8000 stevenrbrandt/fedora-hpx-cling```</div>\n",
    "    \n",
    "    After you do this, docker will respond with something like\n",
    "    \n",
    "    <div style='border: solid black 1pt;'>`http://0.0.0.0:8000/?token=5d1eb8a4797851910de481985a54c2fdc3be80280023bac5`</div>\n",
    "    \n",
    "    Paste that URL into your browser, and you will be able to interact with the notebook.\n",
    "* Fourth, play with the existing ipynb files or create new ones.\n",
    "* Fifth, save your work! This is an important step. If you simply quit the container, everything you did will be lost. To save your work, first find your docker image using ```docker ps```.\n",
    "\n",
    "    <div style='border: solid black 1pt;'>```$ docker ps\n",
    "CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS              PORTS                    NAMES\n",
    "4f806b5f4fb3        stevenrbrandt/fedora-hpx-cling   \"/bin/sh -c 'jupyter \"   11 minutes ago      Up 11 minutes       0.0.0.0:8000->8000/tcp   dreamy_turing```</div>\n",
    "\n",
    "    Once you have it (in this case, it's 4f806b5f4fb3), you can use ```docker cp``` to transfer files to or from your image.\n",
    "    \n",
    "    <div style='border: solid black 1pt;'>```$ docker cp 4f806b5f4fb3:/home/jup/HPX_by_example.ipynb .\n",
    "$ docker cp HPX_by_example.ipynb 4f806b5f4fb3:/home/jup```</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#include <hpx/hpx.hpp>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using namespace std;\n",
    "using namespace hpx;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# What is a (the) Future?\n",
    "\n",
    "Many ways to get hold of a future, simplest way is to use (std) async:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "int universal_answer() { return 42; }\n",
    "void deep_thought()\n",
    "{\n",
    "  future<int> promised_answer = async(util::annotated_function(&universal_answer,\"universal answer\"));\n",
    "  // do other things for 7.5 million years\n",
    "  cout << promised_answer.get() << endl; // prints 42\n",
    "  apex::dump(true);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want to do something other than a declaration, use the \".expr\" prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "42\n",
      "\n",
      "Elapsed time: 65.7861 seconds\n",
      "Cores detected: 4\n",
      "Worker Threads observed: 2\n",
      "Available CPU time: 131.572 seconds\n",
      "\n",
      "Timer                                                :  #calls  |    mean  |   total  |  % total  \n",
      "------------------------------------------------------------------------------------------------\n",
      "                                           <unknown> :        1   0.00e+00   0.00e+00      0.000\n",
      "                                           APEX MAIN :        1   6.58e+01   6.58e+01    100.000\n",
      "                                     background_work :        1   2.01e-05   2.01e-05      0.000\n",
      "                       call_startup_functions_action :        1   0.00e+00   0.00e+00      0.000\n",
      "                              load_components_action :        1   0.00e+00   0.00e+00      0.000\n",
      "                                            pre_main :        1   0.00e+00   0.00e+00      0.000\n",
      "                                          run_helper :        1   0.00e+00   0.00e+00      0.000\n",
      "                                  task_object::apply :        1   6.58e+01   6.58e+01     49.984\n",
      "                                    universal answer :        1   1.49e-05   1.49e-05      0.000\n",
      "                                           APEX Idle : 6.58e+01     50.016\n",
      "------------------------------------------------------------------------------------------------\n",
      "                                        Total timers : 9\n"
     ]
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    ".expr deep_thought()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Compositional Facilities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "future<string> make_string()\n",
    "{\n",
    "    future<int> f1 = async([]()->int { return 123; });\n",
    "    future<string> f2 = f1.then(\n",
    "      [](future<int> f) -> string\n",
    "      {\n",
    "        return to_string(f.get()); // here .get() won't block\n",
    "      });\n",
    "    return f2;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Elapsed time: 5.3791 seconds\n",
      "Cores detected: 4\n",
      "Worker Threads observed: 2\n",
      "Available CPU time: 10.7582 seconds\n",
      "\n",
      "Timer                                                :  #calls  |    mean  |   total  |  % total  \n",
      "------------------------------------------------------------------------------------------------\n",
      "                                           <unknown> :        1   0.00e+00   0.00e+00      0.000\n",
      "                                           APEX MAIN :        1   5.38e+00   5.38e+00    100.000\n",
      "                                     background_work :        1   2.06e-05   2.06e-05      0.000\n",
      "                       call_startup_functions_action :        1   0.00e+00   0.00e+00      0.000\n",
      "                              load_components_action :        1   0.00e+00   0.00e+00      0.000\n",
      "                                            pre_main :        1   0.00e+00   0.00e+00      0.000\n",
      "                                          run_helper :        1   0.00e+00   0.00e+00      0.000\n",
      "                                  task_object::apply :        1   5.42e+00   5.42e+00     50.420\n",
      "                                    universal answer :        1   0.00e+00   0.00e+00      0.000\n",
      "                                           APEX Idle : 5.33e+00     49.579\n",
      "------------------------------------------------------------------------------------------------\n",
      "                                        Total timers : 9\n",
      "123\n"
     ]
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    ".expr cout << make_string().get() << endl << apex::dump(true);\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parallel Algorithms\n",
    "HPX allows you to write loop parallel algorithms in a generic fashion, applying to specify the way in which parallelism is achieved (i.e. threads, distributed, cuda, etc.) through polcies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#include <hpx/include/parallel_for_each.hpp>\n",
    "#include <hpx/parallel/algorithms/transform.hpp>\n",
    "#include <boost/iterator/counting_iterator.hpp>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "vector<int> v = { 1, 2, 3, 4, 5, 6 };"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transform\n",
    "Here we demonstrate the transformation of a vector, and the various mechnanisms by which it can performed in parallel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr\n",
    "// This parallel tranformation of vector v\n",
    "// is done using thread parallelism. An\n",
    "// implicit barrier is present at the end.\n",
    "parallel::transform (\n",
    "  parallel::par,\n",
    "  begin(v), end(v), begin(v),\n",
    "  [](int i) -> int\n",
    "  {\n",
    "    return i+1;  \n",
    "  });\n",
    "for(int i : v) cout << i << \",\";"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr\n",
    "// This parallel tranformation of vector v\n",
    "// is done using thread parallelism. There\n",
    "// is no implicit barrier. Instead, the\n",
    "// transform returns a future.\n",
    "auto  f = parallel::transform (\n",
    "  parallel::par (parallel::v3::task),\n",
    "  begin(v), end(v), begin(v),\n",
    "  [](int i) -> int\n",
    "  {\n",
    "    return i+1;  \n",
    "  });\n",
    "  \n",
    "// wait for the future to be ready.\n",
    "f.wait();\n",
    "\n",
    "for(int i : v) cout << i << \",\";"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#include <hpx/include/parallel_fill.hpp>\n",
    "#include <hpx/include/compute.hpp>\n",
    "#include <hpx/include/parallel_executors.hpp>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "auto host_targets = hpx::compute::host::get_local_targets();\n",
    "typedef hpx::compute::host::block_executor<> executor_type;\n",
    "executor_type exec(host_targets);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr\n",
    "// Print out a list of the localities, i.e. hosts\n",
    "// that can potentially be involved in this calculation.\n",
    "// This notebook will probably show 1, alas.\n",
    "for(auto host : host_targets)\n",
    "  cout << host.get_locality() << endl;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr\n",
    "// This parallel tranformation of vector v\n",
    "// is done using using distributed parallelism.\n",
    "parallel::transform (\n",
    "  parallel::execution::par.on(exec),\n",
    "  begin(v), end(v), begin(v),\n",
    "  [](int i) -> int\n",
    "  {\n",
    "    return i+1;  \n",
    "  });\n",
    "\n",
    "for(int i : v) cout << i << \",\";"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Other Algorithms\n",
    "There are a great many algorithms. Here we demonstrate \"fill\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr\n",
    "std::vector<float> vd;\n",
    "for(int i=0;i<10;i++) vd.push_back(1.f);\n",
    "parallel::fill(parallel::execution::par.on(exec),vd.begin(),vd.end(),0.0f);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let’s Parallelize It – Adding Real Asynchrony\n",
    "\n",
    "Here we take a step back. Instead of using a pre-designed parallel operation on a vector, we instead introduce task-level parallelism to an existing program.\n",
    "\n",
    "Calculate Fibonacci numbers in parallel (1st attempt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "uint64_t fibonacci(uint64_t n)\n",
    "{\n",
    "  // if we know the answer, we return the value\n",
    "  if (n < 2) return n;\n",
    "  // asynchronously calculate one of the sub-terms\n",
    "  future<uint64_t> f = async(launch::async, &fibonacci, n-2);\n",
    "  // synchronously calculate the other sub-term\n",
    "  uint64_t r = fibonacci(n-1);\n",
    "  // wait for the future and calculate the result\n",
    "  return f.get() + r;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr cout << fibonacci(10) << endl;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let’s Parallelize It – Introducing Control of Grain Size\n",
    "Parallel calculation, switching to serial execution below given threshold"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "const int threshold = 20;\n",
    "\n",
    "uint64_t fibonacci_serial(uint64_t n)\n",
    "{\n",
    "  if (n < 2) return n;\n",
    "  uint64_t f1 = fibonacci_serial(n-2);\n",
    "  uint64_t f2 = fibonacci_serial(n-1);\n",
    "  return f1 + f2;\n",
    "}\n",
    "\n",
    "uint64_t fibonacci2(uint64_t n)\n",
    "{\n",
    "  if (n < 2) return n;\n",
    "  if (n < threshold) return fibonacci_serial(n);\n",
    "  // asynchronously calculate one of the sub-terms\n",
    "  future<uint64_t> f = async(launch::async, &fibonacci, n-2);\n",
    "  // synchronously calculate the other sub-term\n",
    "  uint64_t r = fibonacci2(n-1);\n",
    "  // wait for the future and calculate the result\n",
    "  return f.get() + r;\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr cout << fibonacci2(22) << endl;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let’s Parallelize It – Apply Futurization\n",
    "Parallel way, futurize algorithm to remove suspension points"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "future<uint64_t> fibonacci3(uint64_t n)\n",
    "{\n",
    "  if(n < 2) return make_ready_future(n);\n",
    "  if(n < threshold) return make_ready_future(fibonacci_serial(n));\n",
    "\n",
    "  future<uint64_t> f = async(launch::async, &fibonacci3, n-2);\n",
    "  future<uint64_t> r = fibonacci3(n-1);\n",
    "\n",
    "  return dataflow(\n",
    "    [](future<uint64_t> f1, future<uint64_t> f2) {\n",
    "      return f1.get() + f2.get();\n",
    "    },\n",
    "    f, r);\n",
    "}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr cout << fibonacci3(22).get() << endl;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let’s Parallelize It – Unwrap Argument Futures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#include <hpx/util/unwrapped.hpp>\n",
    "\n",
    "using hpx::util::unwrapped;\n",
    "\n",
    "future<uint64_t> fibonacci4(uint64_t n)\n",
    "{\n",
    "  if(n < 2) return make_ready_future(n);\n",
    "  if(n < threshold) return make_ready_future(fibonacci_serial(n));\n",
    "\n",
    "  future<uint64_t> f = async(launch::async, &fibonacci4, n-2);\n",
    "  future<uint64_t> r = fibonacci4(n-1);\n",
    "\n",
    "  return dataflow(\n",
    "    unwrapped([](uint64_t f1, uint64_t f2) {\n",
    "      return f1+f2;\n",
    "    }),\n",
    "    f, r);\n",
    "}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr cout << fibonacci4(22).get() << endl;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Excercise: Parallelize a sort"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#include <unistd.h>\n",
    "#include <stdlib.h>\n",
    "#include <iostream>\n",
    "#include <vector>\n",
    "#include <functional>\n",
    "using namespace std;\n",
    "function<void(vector<int>&)> myqsort = [](vector<int>& v)->void {};"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    ".expr\n",
    "myqsort = [](vector<int>& v)->void {\n",
    "  if(v.size()<2) return;\n",
    "  vector<int> pre, eq, post;\n",
    "  int pivot = v[rand() % v.size()];\n",
    "  for(int val : v) {\n",
    "    if(val < pivot) pre.push_back(val);\n",
    "    else if(pivot < val) post.push_back(val);\n",
    "    else eq.push_back(val);\n",
    "  }\n",
    "  myqsort(pre);\n",
    "  myqsort(post);\n",
    "  for(int i=0;i<pre.size();i++) v[i] = pre[i];\n",
    "  for(int i=0;i<eq.size();i++) v[i+pre.size()] = eq[i];\n",
    "  for(int i=0;i<post.size();i++) v[i+pre.size()+eq.size()] = post[i];\n",
    "};\n",
    "vector<int> vv{20};\n",
    "for(int i=0;i<20;i++) vv.push_back(rand() % 100);\n",
    "for(int val : vv) cout << val << \" \";\n",
    "cout << endl;\n",
    "myqsort(vv);\n",
    "for(int val : vv) cout << val << \" \";\n",
    "cout << endl;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "C++14",
   "language": "",
   "name": "cling-cpp14"
  },
  "language_info": {
   "codemirror_mode": "c++",
   "file_extension": ".c++",
   "mimetype": "text/x-c++src",
   "name": "c++"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}