Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Testing Spark and Scala
1. Testing Spark and scala
https://github.com/ganeshayadiyala/Scalatest-library-to-unit-test-spark/
2. ● Ganesha Yadiyala
● Big data consultant at
datamantra.io
● Consult in spark and scala
● ganeshayadiyala@gmail.com
3. Agenda
● What is testing
● Different types of testing process
● Unit tests using scalatest
● Different styles in scalatest
● Using assertions
● Sharing fixtures
● Matchers
● Async Testing
● Testing of spark batch operation
● Unit testing streaming operation
4. What is testing
Software testing is a process of executing a program or application with the intent
of finding the software bugs.
It can also be stated as the process of validating and verifying that a software
application,
● Meets the business and technical requirements that guided it’s design and
development
● Works as expected
5. Few of the types of tests
● Unit tests
● Integration tests
● Functional tests
6. Unit tests
● Unit testing simply verifies that individual units of code (mostly functions) work
as expected
● Assumes everything else works
● Tests one specific condition or flow.
Advantages :
● Codes are more reusable. In order to make unit testing possible, codes need
to be modular. This means that codes are easier to reuse.
● Debugging is easy. When a test fails, only the latest changes need to be
debugged.
7. Integration tests
● Tests the interoperability of multiple subsystem
● Includes real components, databases etc
● Tests the connectivity of the components
● Hard to test all the cases (combination of tests are more)
● Hard to localize the errors ( may break different reasons)
● Much slower than unit tests
8. Functional tests
● Functional Testing is the type of testing done against the business
requirements of application
● Use real components and real data
10. Scalatest
● We use scalatest for unit tests in scala
● For every class in src/main/scala write a test class in src/test/scala
● Consists of suite (collection of test cases)
● You define test classes by composing Suite style and mixin traits.
● You can test both scala and java code
● offers deep integration with tools such as JUnit, TestNG, Ant, Maven, sbt,
ScalaCheck, JMock, EasyMock, Mockito, ScalaMock, Selenium, Eclipse,
NetBeans, and IntelliJ.
11. Using the scalatest maven plugin
We have to disable maven surefire plugin and enable scalatest plugin
● Specify <skipTests>true</skipTests> in maven surefire plugin
● Add the scalatest-maven plugin and set the goals to test
12. Different styles in scalatest
● FunSuite
● FlatSpec
● FunSpec
● WordSpec
● FreeSpec
● PropSpec
● FeatureSpec
13. FunSuite
● In a FunSuite, tests are function values.
● You denote tests with test and provide the name of the test as a string
enclosed in parentheses, followed by the code of the test in curly braces
Ex : com.ganesh.scalatest.specs.FunSuitTest.scala
14. FlatSpec
● No nesting approach contrasts with the traits FunSpec and WordSpec.
● Uses behavior of clause
Ex : com.ganesh.scalatest.specs.FlatSpecTest.scala
15. FunSpec
● Tests are combined with text that specifies the behavior of the test.
● Uses describe clause
Ex : com.ganesh.scalatest.specs.FunSpecTest.scala
16. WordSpec
● your specification text is structured by placing words after strings
● Uses should and in clause
Ex : com.ganesh.scalatest.specs.WordSpecTest.scala
17. Using Assertions
ScalaTest makes three assertions available by default in any style trait
● assert - for general assertion.
● assertResult - to differentiate expected from actual values.
● assertThrows - to ensure a bit of code throws an expected exception.
Scalatest assertions are defined in trait Assertions. Assertions also provide some
other API’s.
Ex : com.ganesh.scalatest.features.AssertionsTest.scala
18. Ignoring the test
● Scalatest allows to ignore the test.
● We can ignore the test if we want it to change it implementation and run later
or if the test case is slow.
● We use ignore clause to ignore the test
● We use @Ignore annotation to ignore all the test in a suite.
Ex : com.ganesh.scalatest.features.IgnoreTest.scala
19. Sharing fixture
A test fixture is composed of the objects and other artifacts, which tests use to do
their work.
When multiple tests needs to work with the same fixture, we can share the fixture
between them.
It will reduce the duplication of code.
20. By calling get-fixture methods
If you need to create the same mutable fixture objects in multiple tests we can use
get-fixture method
● A get-fixture method returns a new instance of a needed fixture object each
time it is called
● Not appropriate to use if we need to cleanup those objects
Ex : com.ganesh.scalatest.fixtures.GetFixtureTest.scala
21. By Instantiating fixture-context objects
When different tests need different combinations of fixture objects, define the
fixture objects as instance variables of fixture-context objects.
● In this approach we initialize a fixture object inside trait/class.
● We create a new instance of the fixture trait in the test we need them.
● We can even mix in these fixture traits we created.
Ex : com.ganesh.scalatest.fixtures.FixtureContextTest.scala
22. By using withFixture
● Allows cleaning up of fixtures at the end of the tests
● If we have no object to pass to the test case, then we can use
withFixture(NoArgTest).
● If we have one or more objects to be passed to test case, then we need to
use withFixture(OneArgTest).
Ex : com.ganesh.scalatest.fixtures.WithFicture*.scala
23. By using BeforeAndAfter
● Methods which we used till now for sharing fixtures are performed during the
test.
● If exception occurs while creating this fixture then it’ll be reported as test
failure.
● If we use BeforeAndAfter setup happens before the test execution starts, and
cleanup happens once the test is completed
● So if any exception happens in the setup, it’ll abort the entire suit and no more
tests are attempted.
Ex : com.ganesh.scalatest.fixtures.BeforeAndAfterTest.scala
24. Matchers
ScalaTest provides a domain specific language (DSL) for expressing assertions in
tests using the word should.
Ex : com.ganesh.scalatest.features.MatchersTest.scala
25. Asynchronous testing
● Given a Future returned by the code you are testing, you need not block until
the Future completes before performing assertions against its value.
● We can instead map those assertions onto the Future and return the resulting
Future[Assertion] to ScalaTest.
● This result is executed asynchronously.
Ex : com.ganesh.scalatest.features.AsyncTest.scala
26. Testing private methods
● If the method is private in a class we can test it using scalatest.
● We can use PrivateMethodTester trait to achieve this.
● We can use invokePrivate operator to call the private method
Ex : com.ganesh.scalatest.features.PrivateMethodTest.scala
29. Complexities
● Needs spark context for all the tests
● Testing operations such as map, flatmap and reduce.
● Testing streaming application (Dstream operations).
● Making sure that there is only one context for each test case.
30. Setup
● Instead of creating contexts which are needed for each test suite, we create
the trait which extends BeforeAndAfter, and all our suites will extend this trait.
● In that trait we try to initialize all the contexts in before method
● All the contexts will be destroyed in after method
● Extend this trait in all the test suites
Ex : com.ganesh.scalatest.sparkbatch.EnvironmentInitializerSC.scala
31. Spark Streaming test
● The full control over clock is needed to manually manage batches, slides and
windows.
● Spark Streaming provides necessary abstraction over system clock,
ManualClock class.
● But its private class, we cannot access it in our testcases
● So we use a wrapper class to use the ManualClock instance in our test case.
Ex : com.ganesh.scalatest.sparkstreaming
32. Summary
● We can select any of the styles provided by the scalatest, it just differs in how
we write test but will have all the features.
● Make use of assertions and matchers provided by scalatest for better test
cases.
● While testing spark we need to test the logic, so keep your code modular so
that each logic can be tested individually.
● There is a external library called spark testing base which provides many
functions to assert on dataframe level and it has traits which provides you the
contexts needed for the test.