生不如死是什么意思| 甘油三脂高是什么意思| 石几念什么| 闭口是什么样子| 逝者如斯夫什么意思| 创伤性关节炎有什么症状| 好无奈是什么意思| 缩影是什么意思| 什么地溜达| 补肾吃什么药好| 腺样体增生是什么意思| 婴儿42天检查什么项目| 三多一少指的是什么| 胃烧心吃什么药| 三只手是什么意思| 知了猴什么时候出来| 调理内分泌失调吃什么药效果好| 日本为什么经常地震| mi是什么单位| 小孩心跳快是什么原因| 搭桥是什么意思| 胰岛素偏高是什么意思| 梦见妖魔鬼怪是什么意思| 早餐吃什么最有营养| AG是什么| 柴鸡蛋是什么| 什么食物含锌多| alb是什么意思| 鬼代表什么数字| 保税区什么意思| 财代表什么生肖| 治骨质疏松打什么针| 万能受血者是什么血型| 脸上突然长斑是什么原因引起的| 湿疹是长什么样的| 下巴长闭口是什么原因| 胆结石是什么| 心肌炎有什么症状和表现| 鸭蛋不能和什么一起吃| 什么奶茶最贵| 什么屎不臭| 尿酸偏高是什么病| 怀孕喝什么牛奶好| 泌尿科主要看什么病| 喉咙干痒咳嗽吃什么药| 沸去掉三点水念什么| 短效避孕药什么时候吃| 什么人不穿衣服| 柯南叫什么| 什么泉水| 什么情况下做试管婴儿| 黄体什么意思| 小厮是什么意思| 嗓子疼喝什么| 心脏疼痛吃什么药| 人参和什么泡酒最好| 腰部凉凉的是什么原因| 静候佳音是什么意思| 大象的天敌是什么动物| 下午六点多是什么时辰| 皮肌炎是什么病| 孩子说话晚是什么原因是什么情况| 湿疹抹什么药| 狐狸的尾巴有什么作用| 情难自禁是什么意思| 什么是抖m| 全科医学科看什么病| 四月十一日是什么星座| 电压mv是什么意思| 五步蛇为什么叫五步蛇| 福寿螺有什么寄生虫| 18岁是什么生肖| 子宫附件彩超检查什么| 孩子流黄鼻涕吃什么药效果好| 最里面的牙齿叫什么牙| 便溏什么意思| 心脏24小时监测叫什么| 吃什么食物可以去湿气| 牙龈萎缩吃什么药| 胆囊炎可以吃什么| 男生早上为什么会晨勃| 手足口病要注意什么| 胃疼是什么感觉| 梦见打死黄鼠狼是什么意思| 智商130算什么水平| 五劳七伤什么生肖| 皮鞋配什么裤子好看| 食物不耐受是什么意思| 牛大力和什么泡酒壮阳| 胎儿永久性右脐静脉是什么意思| 用膳是什么意思| 什么猪没有嘴| 情调是什么意思| 戒指戴无名指是什么意思| 随餐服用是什么意思| 上呼吸道感染吃什么药| 地铁站务员是干什么的| 喝酒为什么会头疼| 羊水少吃什么| 十三太保什么意思| 回潮是什么意思| 中暑喝什么好| 经常感冒吃什么增强抵抗力| 孩子高低肩有什么好办法纠正| 伏什么意思| ash是什么牌子| 左下腹疼挂什么科| 空谷幽兰下一句是什么| 五级职员是什么级别| ecl是什么意思| 油性皮肤适合用什么牌子的护肤品| 什么的云彩| 经常挖鼻孔有什么危害| 活性印染是什么意思| 脾胃虚弱吃什么药调理| 十一月是什么月| 空调出的水是什么水| 卡西欧属于什么档次| 大腿粗是什么原因导致的| 时隔是什么意思| 送礼物送什么好| 排山倒海是什么意思| 血糖高的人吃什么水果好| 阴虚吃什么药效果最好| 什么是集合| 2.21是什么星座| 壁虎吃什么| 沐猴而冠代表什么生肖| 什么是bp| 腿上有青筋是什么原因| 圣字五行属什么| 白细胞异常是什么原因| 血细胞分析是查什么的| 足内翻是什么样子的| 脾胃虚弱能吃什么水果| 足度念什么| 花生什么时候成熟| 胃病可以吃什么水果| swan什么意思| s925是什么意思| 戴隐形眼镜用什么眼药水| 底细是什么意思| 什么作用| 胸胀是什么原因| 什么是抽动症| 蛇是什么号码| 鼻子上长痘痘是什么原因| 灌肠是什么意思| 大是什么意思| 做什么检查确诊是白塞| 橘猫是什么品种| 药流前需要做什么检查| 24h是什么意思| 夏至是什么时候| 肺炎支原体感染吃什么药| 悉心栽培什么意思| 红糖荷包蛋有什么功效| 宫颈那囊是什么| 晚上吃什么饭| 两个b型血能生出什么血型的孩子| 心肌缺血吃什么药效果最好| 西瓜和什么相克| 铎读什么| 什么叫大男子主义| 下眼袋浮肿是什么原因| 内向的人适合什么职业| 什么是处女膜| 淋巴细胞百分比高是什么原因| 飞机为什么能飞上天| 今年流行什么颜色头发| 奶油奶酪可以做什么| 毛尖属于什么茶| 术后吃什么伤口愈合快| 谷氨酰转肽酶是指什么| 女人脾虚吃什么药最好| 中国国菜是什么菜| 什么人不适合做纹绣师| 早上五点半是什么时辰| 肛瘘是什么症状表现| 京东e卡是什么| 奶油小生什么意思| 陕西的特产有什么| 梦见自己掉牙齿是什么征兆| 什么叫庚日| 肠胃消化不好吃什么药| 泻盐是什么东西| 什么是闺蜜| 冷感冒吃什么药好得快| 生姜什么时候吃最好| 玛卡和什么搭配壮阳效果最佳| 什么地问填词语| 磨人的小妖精是什么意思| 保姆是什么意思| 人为什么会死| kingtis手表什么牌的| 什么是肺部腺性肿瘤| 肚子疼喝什么能缓解| 九牛一毛是什么意思| 势在必得是什么意思| 梦见抱小女孩是什么意思| 操逼什么意思| 鸡飞狗跳是什么意思| 6.8是什么星座| 子宫长什么样| 人生观价值观世界观是什么意思| 肚脐眼臭是什么原因| 66.66红包代表什么意思| 6月30日是什么日子| 双鱼座是什么星座| 西皮是什么皮| 苏州有什么特产可以带回家| 龙猫吃什么| 双子座和什么座最配对| 浣碧什么时候背叛甄嬛| 清创手术是什么意思| 皮肤溃烂是什么原因| 我国最早的中医学专著是什么| 什么烟好抽又便宜| 喝什么能补肾| 私处瘙痒用什么药| 冷战是什么意思| 山己念什么| 鹅吃什么食物| 补办手机卡需要什么| 儿童办理身份证需要什么材料| 竹字头均念什么名字| 肠粘连吃什么药| 7月26日什么星座| 青团是什么节日吃的| 什么书在书店里买不到| 食物中毒用什么药| 为什么上小厕会有刺痛感| 财神是什么生肖| 石男是什么意思| force是什么牌子| 一夫一妻制产生于什么时期| 朱砂是什么东西| 鼻子下面长痘痘是什么原因引起的| 铁观音什么季节喝最好| 乳腺囊肿和乳腺结节有什么区别| 阿胶什么季节吃最好| 何方珠宝是什么档次| 嘴巴苦是什么原因| 小孩子经常流鼻血是什么原因| 晴水翡翠属于什么档次| 梦见别人装修房子是什么预兆| 88年的龙是什么命| 什么是黄疸| 酒精过敏有什么症状| 孩子为什么长不高| 戴银首饰对身体有什么好处| 每次睡觉都做梦为什么| 寻找什么| 弘字五行属什么| cot是什么| 黄芪的功效是什么| 录取通知书是什么生肖| 健谈是什么意思| 比肩劫财是什么意思| 手术室为什么那么冷| 做脑部ct挂什么科| 什么是神经衰弱| 为什么这么热| 喝柠檬水有什么作用与功效| 百度

火舌喷涌!东海舰队驱逐舰7昼夜实训

百度   张雪松强调,相关技术转让并没有国际条约的限制。

An automated regression suite can play a vital role on a software project, valuable both for reducing defects in production and essential for evolutionary design. In talking with development teams I've often heard about the problem of non-deterministic tests - tests that sometimes pass and sometimes fail. Left uncontrolled, non-deterministic tests can completely destroy the value of an automated regression suite. In this article I outline how to deal with non-deterministic tests. Initially quarantine helps to reduce their damage to other tests, but you still have to fix them soon. Therefore I discuss treatments for the common causes for non-determinism: lack of isolation, asynchronous behavior, remote services, time, and resource leaks.

14 April 2011



I've enjoyed watching Thoughtworks tackle many difficult enterprise applications, bringing successful deliveries to many clients who have rarely seen success. Our experiences have been a great demonstration that agile methods, deeply controversial and distrusted when we wrote the manifesto a decade ago, can be used successfully.

There are many flavors of agile development out there, but in what we do there is a central role for automated testing. Automated testing was a core approach to Extreme Programming from the beginning, and that philosophy has been the biggest inspiration to our agile work. So we've gained a lot of experience in using automated testing as a core part of software development.

Automated testing can look easy when presented in a text book. And indeed the basic ideas are really quite simple. But in the pressure-cooker of a delivery project, trials come up that are often not given much attention in texts. As I know too well, authors have a habit of skimming over many details in order to get a core point across. In my conversations with our delivery teams, one recurring problem that we've run into is tests which have become unreliable, so unreliable that people don't pay much attention to whether they pass or fail. A primary cause of this unreliability is that some tests have become non-deterministic.

A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment. Such tests fail, then you re-run them and they pass. Test failures for such tests are seemingly random.

Non-determinism can plague any kind of test, but it's particularly prone to affect tests with a broad scope, such as acceptance or functional tests.

Why non-deterministic tests are a problem

Non-deterministic tests have two problems, firstly they are useless, secondly they are a virulent infection that can completely ruin your entire test suite. As a result they need to be dealt with as soon as you can, before your entire deployment pipeline is compromised.

I'll start with expanding on their uselessness. The primary benefit of having automated tests is that they provide bug detection mechanism by acting as regression tests1. When a regression test goes red, you know you've got an immediate problem, often because a bug has crept into the system without you realizing.

1: Yes, I know many advocates of TDD consider that a primary virtue of testing is the way it drives requirements and design. I agree that this is a big benefit, but I consider the regression suite to be the single biggest benefit that automated tests give us. Even without TDD tests are worth the cost for that.

Having such a bug detector has huge benefits. Most obviously it means that you can find and fix bugs just after they are introduced. Not just does this give you the warm fuzzies because you kill bugs quickly, it also makes it easier to remove them since you know the bug got in with the last set of changes that are fresh in your mind. As a result you know where to look for the bug, which is more than half the battle in squashing it.

The second level of benefit is that as you gain confidence in your bug detector, you gain the courage to make big changes knowing that when you goof, the bug detector will go off and you can fix the mistake quickly. 2 Without this teams are frightened to make the changes code needs in order to be kept clean, which leads to a rotting of the code base and plummeting development speed.

2: Sometimes, of course, a test failure is due to a change in what the code is supposed to do, but the test hasn't been updated to reflect the new behavior. This is essentially a bug in the tests, but is equally easy to fix if it's caught right away.

The trouble with non-deterministic tests is that when they go red, you have no idea whether its due to a bug, or just part of the non-deterministic behavior. Usually with these tests a non-deterministic failure is relatively common, so you end up shrugging your shoulders when these tests go red. Once you start ignoring a regression test failure, then that test is useless and you might as well throw it away.3

3: There is a useful role for non-deterministic tests. Tests seeded from a randomizer can help hunt out edge cases. Performance tests will always come back with different values. But these kinds of tests are quite different from automated regression tests, which are my focus here.

Indeed you really ought to throw a non-deterministic test away, since if you don't it has an infectious quality. If you have a suite of 100 tests with 10 non-deterministic tests in them, than that suite will often fail. Initially people will look at the failure report and notice that the failures are in non-deterministic tests, but soon they'll lose the discipline to do that. Once that discipline is lost, then a failure in the healthy deterministic tests will get ignored too. At that point you've lost the whole game and might as well get rid of all the tests.

Quarantine

My principal aim in this article is to outline common cases of non-deterministic tests and how to eliminate the non-determinism. But before I get there I offer one piece of essential advice: quarantine your non-deterministic tests. If you have non-deterministic tests keep them in a different test suite to your healthy tests. That way you'll you can continue to pay attention to what's going on with your healthy tests and get good feedback from them.

Place any non-deterministic test in a quarantined area. (But fix quarantined tests quickly.)

Then the question is what to do with the quarantined test suites. They are useless as regression tests, but they do have a future as work items for cleaning up. You should not abandon such tests, since any tests you have in quarantine are not helping you with your regression coverage.

A danger here is that tests keep getting thrown into quarantine and forgotten, which means your bug detection system is eroding. As a result it's worthwhile to have a mechanism that ensures that tests don't stay in quarantine too long. I've come across various ways to do this. One is a simple numeric limit: e.g. only allow 8 tests in quarantine. Once you hit the limit you must spend time to clear all the tests out. This has the advantage of batching up your test-cleaning if that's how you like to do things. Another route is to put a time limit on how long a test may be in quarantine, such as no longer than a week.

The general approach with quarantine is to take the quarantined tests out of the main deployment pipeline so that you still get your regular build process. However a good team can be more aggressive. Our Mingle team puts its quarantine suite into the deployment pipeline one stage after its healthy tests. That way it can get the feedback from the healthy tests but is also forced to ensure that it sorts out the quarantined tests quickly. 4

4: This works well for the Mingle team as they are skillful enough to find and fix non-deterministic tests quickly and disciplined enough to ensure they do it quickly. If your build remains broken for long due to your quarantine tests failing you will lose the value of continuous integration. So for most teams I'd advise keeping the quarantined tests out of the main pipeline.

Lack of Isolation

In order to get tests to run reliably, you must have clear control over the environment in which they run, so you have a well-known state at the beginning of the test. If one test creates some data in the database and leaves it lying around, it can corrupt the run of another test which may rely on a different database state.

Therefore I find it's really important to focus on keeping tests isolated. Properly isolated tests can be run in any sequence. As you get to the larger operational scope of functional tests, it gets progressively harder to keep tests isolated. When you are tracking down a non-determinism, lack of isolation is a common and frustrating cause.

Keep your tests isolated from each other, so that execution of one test will not affect any others.

There are a couple of ways to get isolation - either always rebuild your starting state from scratch, or ensure that each test cleans up properly after itself. In general I prefer the former, as it's often easier - and in particular easier to find the source of a problem. If a test fails because it didn't build up the initial state properly, then it's easy to see which test contains the bug. With clean-up, however, one test will contain the bug, but another test will fail - so it's hard to find the real problem.

Starting from a blank state is usually easy with unit tests, but can be much harder with functional tests 5 - particularly if you have a lot of data in a database that needs to be there. Rebuilding the database each time can add a lot of time to test runs, so that argues for switching to a clean-up strategy.6

5: There's no hard-and-fast definitions here, but I'm using the early Extreme Programming terminology of using “unit test” to mean something fine-grained and “functional test” as a test that's more end-to-end and feature related.

6: One trick is to create the initial database and copy it using file system commands before opening it for each test run. File system copies are often faster than loading data using the database commands.

One trick that's handy when you're using databases, is to conduct your tests inside a transaction, and then to rollback the transaction at the end of the test. That way the transaction manager cleans up for you, reducing the chance of errors7.

7: Of course this trick only works when you can conduct the test without committing any transactions.

Another approach is to do a single build of a mostly-immutable starting fixture before running a group of tests. Then ensure that the tests don't change that initial state (or if they do, they reverse the changes in tear-down). This tactic is more error-prone than rebuilding the fixture for each test, but it may be worthwhile iff it takes too long to build the fixture each time.

Although databases are a common cause for isolation problems, there are plenty of times you can get these in-memory too. In particular be aware with static data and singletons. A good example for this kind of problem is contextual environment, such as the currently logged in user.

If you have an explicit tear-down in a test, be wary of exceptions that occur during the tear-down. If this happens the test can pass, but cause isolation failures for subsequent tests. So ensure that if you do get a problem in a tear-down, it makes a loud noise.

Some people prefer to put less emphasis on isolation and more on defining clear dependencies to force tests to run in a specified order. I prefer isolation because it gives you more flexibility in running subsets of tests and parallelizing tests.

Asynchronous Behavior

Asynchrony is a boon that allows you to keep software responsive while taking on long term tasks. Ajax calls allow a browser to stay responsive while going back to the server for more data, asynchronous message allow a server process to communicate with other system without being tied to their laggardly latency.

But in testing, asynchrony can be curse. The common mistake here is to throw in a sleep:

        //pseudo-code
        makeAsyncCall;
        sleep(aWhile);
        readResponse;
      

This can bite you two ways. First off you'll want to set the sleep time to long enough that it gives plenty of time to get the response. But that means that you'll spend a lot of time idly waiting for the response, thus slowing down your tests. The second bite is that, however long you sleep, sometimes it won't be enough. There will be some change in the environment that will cause you to exceed the sleep - and you'll get false failure. As a result I strongly urge you to never use bare sleeps like this.

Never use bare sleeps to wait for asynchonous responses: use a callback or polling.

There are basically two tactics you can do for testing an asynchronous response. The first is for the asynchronous service to take a callback which it can call when done. This is the best since it means you'll never have to wait any longer than you need to 8. The biggest problem with this is that the environment needs to be able to do this and then the service provider needs to do it. This is one of the advantages of having the development team integrated with testing - if they can provide a callback then they will.

8: Although you'll still need a timeout in case you never get a reply - and that time out is subject to the same danger when you move to a different environment. Fortunately you can set that timeout to be pretty high, which minimizes the chances of that biting you.

The second option is to poll on the answer. This is more than just looking once, but looking regularly, something like this

        //pseudo-code
        makeAsyncCall
        startTime = Time.now;
        while(! responseReceived) {
          if (Time.now - startTime > waitLimit) 
            throw new TestTimeoutException;
          sleep (pollingInterval);
        }
        readResponse
      

The point of this approach is that you can set the pollingInterval to a pretty small value, and know that that's the maximum amount of dead time you'll lose to waiting for a response. This means you can set the waitLimit very high, which minimizes the chance of hitting it unless something serious has gone wrong. 9

9: In that case, however the tests will run very slowly. You may want to consider aborting the whole test suite if you reach the wait limit.

Make sure you use a clear exception class that indicates this is a test timeout that's failing. This will help make it clear what's gone wrong should it happen, and perhaps allow a more sophisticated test harness to take account of this information in its display.

The time values, in particular the waitLimit, should never be literal values. Make sure they are always values that can be easily set in bulk, either by using constants or set through the runtime environment. That way if you need to tweak them (and you will) you can tweak them all quickly.

All this advice is handy for async calls where you expect a response from the provider, but how about those where there is no response. These are calls where we invoke a command on something and expect it to happen without any acknowledgment. This is the trickiest case since you can test for your expected response, but there's nothing to do to detect a failure other than timing-out. If the provider is something you're building you can handle this by ensuring the provider implements some way of indicating that it's done - essentially some form of callback. Even if only the testing code uses it, it's worth it - although often you'll find this kind of functionality is valuable for other purposes too10. If the provider is someone else's work, you can try persuasion, but otherwise may be stuck. Although this is also a case when using Test Doubles for remote services is worthwhile (which I'll discuss more in the next section).

10: If your asynchronous behavior is triggered from the UI, it's often a good UI choice to have some indicator to show an asynchronous operation is in progress. Having this be part of the UI also helps testing as the hooks required to stop this indicator can be the same hooks as detecting when to progress the test logic.

If you have a general failure in something asynchronous, such that it's not responding at all, then you'll always be waiting for timeouts and your test suite will take a long time to fail. To combat this it's a good idea to use a smoke test to check that the asynchronous service is responding at all and stop the test run right away if it isn't.

You can also often side-step the asynchrony completely. Gerard Meszaros's Humble Object pattern says that whenever you have some logic that's in a hard-to-test environment, you should isolate the logic you need to test from that environment. In this case it means put most of the logic you need to test in a place where you can test it synchronously. The asynchronous behavior should be as minimal (humble) as possible, that way you don't need that much testing of it.

Remote Services

Sometimes I'm asked if Thoughtworks does any integration work, which I find somewhat amusing since there's hardly any project we do that doesn't involve a fair bit of integration. By their nature, enterprise applications involve a great deal of combining data from different systems. These systems are maintained by other teams operating to their own schedules, teams that often use a very different software philosophy to our heavily test-driven agile approach.

Testing with such remote systems brings a number of problems, and non-determinism is high on the list. Often remote systems don't have test system we can call, which means hitting a live system. If there is a test system, it may not be stable enough to provide deterministic responses.

In this situation it's vital to ensure determinism, so it's time to reach for a Test Double - a component that looks like the remote service, but is really just a pretend version that mimics the remote system's behavior. The double needs to be setup so that it provides the right kind of response in interaction with our system, but in a way we control. In this manner we can ensure determinism.

Using a double has a downside, in particular when we are testing across a broad scope. How can we be sure that the double behaves in the same way that remote system does? We can tackle this again using tests, a form of test that I call Contract Tests. These run the same interaction with the remote system and the double, and check that the two match. In this case 'match' may not mean coming up with the same result (due to the non-determinisms), but results that share the same essential structure. Integration Contract Tests need to be run frequently, but not part of our system's deployment pipeline. Periodic running based on the rate of the change of the remote system is usually best.

For writing these kinds of test doubles, I'm a big fan of Self Initializing Fakes - since these are very simple to manage.

Some people are firmly against using Test Doubles in functional tests, believing that you must test with real connection in order to ensure end-to-end behavior. While I sympathize with their argument, automated tests are useless if they are non-deterministic. So any advantage you gain by talking to the real system is overwhelmed by the need to stamp out non-determinism11.

11: There are other advantages to using a test double in these circumstances, even if the remote system is deterministic. Often response time is too slow to use a remote system. If you can only talk to a live system, then your tests can generate significant, and unappreciated, load on that system.

Time

Few things are more non-deterministic than a call to the system clock. Each time you call it, you get a new result, and any tests that depend on it can thus change. Ask for all the todos due in the next hour, and you regularly get a different answer12.

12: You could reseed your datastore for each test based on the current time. But that's a lot of work, and fraught with potential timing errors.

The most important thing here is to ensure that you always wrap the system clock with routines that can be replaced with a seeded value for testing. A clock stub can be set to particular time and frozen at that time, allowing your tests to have complete control over its movements. That way you can synchronize your test data to the values in the seeded clock.1314

13: In this case the clock stub is a common way to break isolation, each test that uses it should ensure it's properly re-initialized.

14: One of my colleagues likes to force a test run just before and after midnight in order to catch tests that use the current time and assume it's the same day an hour or two later. This is especially good at times like the last day of the month.

Always wrap the system clock, so it can be easily substituted for testing.

One thing to watch with this, is that eventually your test data might start having problems because it's too old, and you get conflicts with other time based factors in your application. In this case you can move the data, and your clock seeds to new values. When you do this, ensure that this is the only thing you do. That way you can be sure that any tests that fail are due to time-movement in the test data.

Another area where time can be a problem is when you rely on other behaviors from the clock. I once saw a system that generated random keys based on clock values. This systems started failing when it was moved to a faster machine that could allocate multiple ids within a single clock tick.15

15: Although, of course, this isn't always a non-determinism bug, but one that's due to a change in environment. Depending on how close the clock ticks are to the id allocation, it could result in non-deterministic behavior.

I've heard so many problems due to direct calls to the system clock that I'd argue for finding a way to use code analysis to detect any direct calls to the system clock and failing the build right there. Even a simple regex check might save you a frustrating debugging session after a call at an ungodly hour.

Resource Leaks

If your application has some kind of resource leak, this will lead to random tests failing, since it's just which test causes the resource leak to go over a limit that gets the failure. This case is awkward because any test can fail intermittently due to this problem. If it isn't a case of one test being non-deterministic then resource leaks are a good candidate to investigate.

By resource leak, I mean any resource that the application has to manage by acquiring and releasing. In non-memory-managed environments, the obvious example is memory. Memory-management did much to remove this problem, but other resources still need to be managed, such as database connections.

Usually the best way to handle these kind of resources is through a Resource Pool. If you do this then a good tactic is to configure the pool to a size of 1 and make it throw an exception should it get a request for a resource when it has none left to give. That way the first test to request a resource after the leak will fail - which makes it a lot easier to find the problem test.

This idea of limiting resource pool sizes, is about increasing constraints to make errors more likely to crop up in tests. This is good because we want errors to show in tests so we can fix them before they manifest themselves in production. This principle can be used in other ways too. One story I heard was of a system which generated randomly named temporary files, didn't clean them up properly, and crashed on a collision. This kind of bug is very hard to find, but one way to manifest it is to stub the randomizer for testing so it always returns the same value. That way you can surface the problem more quickly.


Footnotes

1: Yes, I know many advocates of TDD consider that a primary virtue of testing is the way it drives requirements and design. I agree that this is a big benefit, but I consider the regression suite to be the single biggest benefit that automated tests give us. Even without TDD tests are worth the cost for that.

2: Sometimes, of course, a test failure is due to a change in what the code is supposed to do, but the test hasn't been updated to reflect the new behavior. This is essentially a bug in the tests, but is equally easy to fix if it's caught right away.

3: There is a useful role for non-deterministic tests. Tests seeded from a randomizer can help hunt out edge cases. Performance tests will always come back with different values. But these kinds of tests are quite different from automated regression tests, which are my focus here.

4: This works well for the Mingle team as they are skillful enough to find and fix non-deterministic tests quickly and disciplined enough to ensure they do it quickly. If your build remains broken for long due to your quarantine tests failing you will lose the value of continuous integration. So for most teams I'd advise keeping the quarantined tests out of the main pipeline.

5: There's no hard-and-fast definitions here, but I'm using the early Extreme Programming terminology of using “unit test” to mean something fine-grained and “functional test” as a test that's more end-to-end and feature related.

6: One trick is to create the initial database and copy it using file system commands before opening it for each test run. File system copies are often faster than loading data using the database commands.

7: Of course this trick only works when you can conduct the test without committing any transactions.

8: Although you'll still need a timeout in case you never get a reply - and that time out is subject to the same danger when you move to a different environment. Fortunately you can set that timeout to be pretty high, which minimizes the chances of that biting you.

9: In that case, however the tests will run very slowly. You may want to consider aborting the whole test suite if you reach the wait limit.

10: If your asynchronous behavior is triggered from the UI, it's often a good UI choice to have some indicator to show an asynchronous operation is in progress. Having this be part of the UI also helps testing as the hooks required to stop this indicator can be the same hooks as detecting when to progress the test logic.

11: There are other advantages to using a test double in these circumstances, even if the remote system is deterministic. Often response time is too slow to use a remote system. If you can only talk to a live system, then your tests can generate significant, and unappreciated, load on that system.

12: You could reseed your datastore for each test based on the current time. But that's a lot of work, and fraught with potential timing errors.

13: In this case the clock stub is a common way to break isolation, each test that uses it should ensure it's properly re-initialized.

14: One of my colleagues likes to force a test run just before and after midnight in order to catch tests that use the current time and assume it's the same day an hour or two later. This is especially good at times like the last day of the month.

15: Although, of course, this isn't always a non-determinism bug, but one that's due to a change in environment. Depending on how close the clock ticks are to the id allocation, it could result in non-deterministic behavior.

Acknowledgments

As usual, I need to thank many Thoughtworks colleagues for sharing their experiences and thus providing the material to put together this article.

Michael Dietz, Danilo Sato, Badrinath Janakiraman, Matt Savage, Krystan Vingrys and Brandon Byers read the article and gave me some further feedback.

Ed Sykes reminded me about the approach of using a file-system copy of a database file to create initial databases for each test.

Significant Revisions

14 April 2011: First published

24 March 2011: draft posted for review within Thoughtworks

16 February 2011: started article

桃子又什么又什么填空 兰州有什么好吃的 孟夏是什么意思 尿痛吃什么药效果最好 辛辣的辛指什么
智字五行属什么 体内湿气重吃什么药效果好 心电图异常q波什么意思 身体缺钾会有什么症状 属龙跟什么属相最配
类风湿吃什么药 事后紧急避孕药什么时候吃有效 什么是挠脚心 乘务员是干什么的 挚爱的意思是什么
石榴花什么时候开 音准是什么意思 脑宁又叫什么名字 龙眼树上的臭虫叫什么 什么情况下吃救心丸
梦见红薯是什么意思luyiluode.com 罗京什么病fenrenren.com 手掌小鱼际发红是什么原因hcv9jop7ns3r.cn 砖茶是什么茶hcv8jop6ns5r.cn 流感挂什么科hcv8jop8ns8r.cn
来月经吃什么排得最干净hcv8jop6ns8r.cn 火热是什么意思hcv7jop4ns8r.cn 圣诞节是什么时候96micro.com 低血压不能吃什么食物hcv8jop1ns2r.cn 普贤菩萨保佑什么生肖hcv7jop6ns9r.cn
什么吃蚊子hcv9jop7ns4r.cn 做什么菜适合放胡椒粉hcv8jop6ns4r.cn 抗原体阳性是什么意思hcv8jop1ns9r.cn 胃痉挛有什么症状hcv8jop8ns3r.cn 主是什么结构的字体hcv8jop0ns6r.cn
哺乳期可以吃什么感冒药hcv9jop7ns0r.cn 胆囊充盈欠佳什么意思jasonfriends.com 38岁属什么hcv8jop2ns5r.cn 蒂芙尼算什么档次hcv9jop4ns6r.cn 荆芥不能和什么一起吃hcv8jop8ns8r.cn
百度