Skip to main content

OpenCL Actors – Adding Data Parallelism to Actor-Based Programming with CAF

  • Chapter
  • First Online:
Programming with Actors

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10789))

Abstract

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation.

In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 39.99
Price excludes VAT (USA)
Softcover Book
USD 54.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/boostorg/compute (Feb. 2017).

  2. 2.

    https://github.com/ddemidov/vexcl (Feb. 2017).

  3. 3.

    http://msdn.microsoft.com/en-us/library/hh265136.aspx (Feb. 2017).

References

  1. ACM: Result and Artifact Review and Badging, January 2017. http://acm.org/publications/policies/artifact-review-badging

  2. Agha, G.: Actors: A Model of Concurrent Computation In Distributed Systems. MIT Press, Cambridge (1986)

    Google Scholar 

  3. Agha, G., Mason, I.A., Smith, S., Talcott, C.: Towards a theory of actor computation. In: Cleaveland, W.R. (ed.) CONCUR 1992. LNCS, vol. 630, pp. 565–579. Springer, Heidelberg (1992). https://doi.org/10.1007/BFb0084816

    Chapter  Google Scholar 

  4. AMD: Aparapi, February 2017. http://aparapi.github.io

  5. Armstrong, J.: Making Reliable Distributed Systems in the Presence of Software Errors. Ph.D. thesis, Department of Microelectronics and Information Technology, KTH, Sweden (2003)

    Google Scholar 

  6. Armstrong, J.: A history of erlang. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages (HOPL III), pp. 6-1–6-26. ACM, New York (2007)

    Google Scholar 

  7. Billeter, M., Olsson, O., Assarsson, U.: Efficient stream compaction on wide SIMD many-core architectures. In: Proceedings of the Conference on High Performance Graphics 2009, HPG 2009, pp. 159–166. ACM, New York, August 2009

    Google Scholar 

  8. Blythe, D.: The Direct3D 10 system. In: ACM SIGGRAPH 2006 Papers, SIGGRAPH 2006, pp. 724–734. ACM, New York (2006)

    Google Scholar 

  9. Breitbart, J.: CuPP - a framework for easy CUDA integration. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), pp. 1–8. IEEE Computer Society, Washington (2009)

    Google Scholar 

  10. CAPS: Cray Inc., NVIDIA and the Portland Group. The OpenACC Application Programming Interface, v1.0, November 2011

    Google Scholar 

  11. Charousset, D., Hiesgen, R., Schmidt, T.C.: CAF - the C++ actor framework for scalable and resource-efficient applications. In: Proceedings of the 5th ACM SIGPLAN Conference on Systems, Programming, and Applications (SPLASH 2014), Workshop AGERE! pp. 15–28. ACM, New York, October 2014

    Google Scholar 

  12. Charousset, D., Hiesgen, R., Schmidt, T.C.: Revisiting actor programming in C++. Comput. Lang. Syst. Struct. 45, 105–131 (2016). https://doi.org/10.1016/j.cl.2016.01.002

    Article  Google Scholar 

  13. Charousset, D., Schmidt, T.C., Hiesgen, R., Wählisch, M.: Native actors - a scalable software platform for distributed, heterogeneous environments. In: Proceedings of the 4th ACM SIGPLAN Conference on Systems, Programming, and Applications (SPLASH 2013), Workshop AGERE! pp. 87–96. ACM, New York, October 2013

    Google Scholar 

  14. Clucas, R., Levitt, S.: CAPP: a C++ aspect-oriented based framework for parallel programming with OpenCL. In: Proceedings of the 2015 Annual Conference on South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015), pp. 10:1–10:10. ACM, New York (2015)

    Google Scholar 

  15. Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)

    Article  Google Scholar 

  16. Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 228–239. EDBT 2010. ACM, New York, March 2010

    Google Scholar 

  17. Desell, T., Varela, C.A.: SALSA lite: a hash-based actor runtime for efficient local concurrency. In: Agha, G., et al. (eds.) Concurrent Objects and Beyond. LNCS, vol. 8665, pp. 144–166. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44471-9_7

    Chapter  Google Scholar 

  18. Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: Parallel Processing (ICPP), pp. 216–225 (2011)

    Google Scholar 

  19. Fusco, F., Vlachos, M., Dimitropoulos, X., Deri, L.: Indexing million of packets per second using GPUs. In: Proceedings of the 2013 Conference on Internet Measurement Conference (IMC 2013), pp. 327–332. ACM, New York, October 2013

    Google Scholar 

  20. Harvey, P., Hentschel, K., Sventek, J.: Parallel programming in actor-based applications via OpenCL. In: The 16th International Conference on Middleware. ACM, New York, December 2015

    Google Scholar 

  21. Hewitt, C., Bishop, P., Steiger, R.: A universal modular ACTOR formalism for artificial intelligence. In: Proceedings of the 3rd IJCAI, pp. 235–245. Morgan Kaufmann Publishers Inc., San Francisco (1973)

    Google Scholar 

  22. Hiesgen, R., Charousset, D., Schmidt, T.C.: Manyfold actors: extending the C++ actor framework to heterogeneous many-core machines using OpenCL. In: Proceedings of the 6th ACM SIGPLAN Conference on Systems, Programming, and Applications (SPLASH 2015), Workshop AGERE! pp. 45–56. ACM, New York, October 2015

    Google Scholar 

  23. Intel: Intel Xeon PhiTM Coprocessor x100 Product Family Datasheet, February 2017. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html

  24. Kale, L.V., Krishnan, S.: Charm++: parallel programming with message-driven objects. In: Parallel Programming Using C++, pp. 175–213 (1996)

    Google Scholar 

  25. Kirk, D.B., Hwu, W.m.W.: Programming Massively Parallel Processors, A Hands-on Approach, 2nd edn. Morgan Kaufmann, San Francisco (2013)

    Google Scholar 

  26. Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A.: PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput. 38(3), 157–174 (2012)

    Article  Google Scholar 

  27. Krieder, S.J., et al.: Design and evaluation of the GeMTC framework for GPU-enabled many-task computing. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC 2014, pp. 153–164. ACM, New York (2014)

    Google Scholar 

  28. Howes, L., Rovatsou, M.: SYCL integrates OpenCL devices with modern C++. Khronos Group, February 2017

    Google Scholar 

  29. Lindholm, E., Kilgard, M.J., Moreton, H.: A user-programmable vertex engine. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2001, pp. 149–158. ACM, New York (2001)

    Google Scholar 

  30. Medina, D.S., St-Cyr, A., Warburton, T.: OCCA: A unified approach to multi-threading languages. ArXiv e-prints, March 2014

    Google Scholar 

  31. Munshi, A.: The OpenCL Specification. Khronos OpenCL Working Group, Khronos Group (2012). http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf, Version 1.2, Revision 19

  32. Munshi, A., Howes, L.: The OpenCL Specification. Khronos OpenCL Working Group, Khronos Group (2015). https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf, Version 2.0, Revision 29

  33. Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)

    Article  Google Scholar 

  34. NVIDIA: Tesla C2075 Computing Processor Board (Board Specification), February 2017

    Google Scholar 

  35. OpenACC-standard.org: The OpenACC Application Programming Interface, February 2017

    Google Scholar 

  36. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)

    Article  Google Scholar 

  37. Scarpino, M.: OpenCL in Action: How to Accelerate Graphics and Computation. Manning Publications Company, Manning Publication Co., 20 Baldwin Road, Shelter Island, NY 11964 (2011)

    Google Scholar 

  38. Scheitle, Q., Wählisch, M., Gasser, O., Schmidt, T.C., Carle, G.: Towards an ecosystem for reproducible research in computer networking. In: Proceedings of ACM SIGCOMM Reproducibility Workshop. ACM, New York, August 2017

    Google Scholar 

  39. Sorensen, T., Donaldson, A.F., Batty, M., Gopalakrishnan, G., Rakamarić, Z.: Portable inter-workgroup barrier synchronisation for GPUs. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, pp. 39–58. ACM, New York (2016)

    Google Scholar 

  40. Srinivasan, S., Mycroft, A.: Kilim: isolation-typed actors for java. In: Vitek, J. (ed.) ECOOP 2008. LNCS, vol. 5142, pp. 104–128. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70592-5_6

    Chapter  Google Scholar 

  41. The Khronos Group: The Khronos Group, February 2017. http://www.khronos.org/

  42. Typesafe Inc.: Akka Framework, August 2017. http://akka.io

  43. Vallentin, M., Paxson, V., Sommer, R.: VAST: a unified platform for interactive network forensics. In: Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), March 2016

    Google Scholar 

  44. Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_85

    Chapter  Google Scholar 

  45. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1), 1–38 (2006)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Marian Triebe and Sebastian Bartels for implementing benchmarks, testing, and bugfixing. We further want to thank Matthias Vallentin for raising the indexing use case, and the iNET working group for vivid discussions and inspiring suggestions. Funding by the German Federal Ministry of Education and Research within the projects ScaleCast and X–CHECK is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas C. Schmidt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hiesgen, R., Charousset, D., Schmidt, T.C. (2018). OpenCL Actors – Adding Data Parallelism to Actor-Based Programming with CAF. In: Ricci, A., Haller, P. (eds) Programming with Actors. Lecture Notes in Computer Science(), vol 10789. Springer, Cham. https://doi.org/10.1007/978-3-030-00302-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00302-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00301-2

  • Online ISBN: 978-3-030-00302-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics