miércoles, 6 de enero de 2016

Oracle 11g JDBC driver blocked by /dev/random

This time I was promoting my new application to the Testers env. Everything was working on quite good, deployments were made with our CD tool, everything automated and had deployed the application several times in dev env. Everything was happiness until I crashed into a wall because I couldn't connect to the database ¬¬. I'm not going to make this history longer but the main problem has to do with the secure random numbers the oracle driver uses to login. This random number generation is delegated to the OS, in this case Unix. In my very particular case the problem had to do with the configuration of the server where I deployed my application.

It was only after printing out the stack trace of my application that it was hung because the oracle driver was waiting for a secure random number being generated:


Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2,5,main]
      java.io.FileInputStream.readBytes(Native Method)
      java.io.FileInputStream.read(FileInputStream.java:255)
      sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:539)
      sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:144)
      sun.security.provider.SecureRandom$SeederHolder.(SecureRandom.java:203)
      sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
      java.security.SecureRandom.nextBytes(SecureRandom.java:468)
      oracle.security.o5logon.O5Logon.a(Unknown Source)
      oracle.security.o5logon.O5Logon.(Unknown Source)
      oracle.jdbc.driver.T4CTTIoauthenticate.(T4CTTIoauthenticate.java:582)
      oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:401)
      oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:553)
      oracle.jdbc.driver.T4CConnection.(T4CConnection.java:254)
      oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
      oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
      com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:134)
      com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
      com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
      com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
      com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
      com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
      com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
      com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)

java.security.SecureRandom was waiting for some bytes, strange. I googled. Then I had to learn and understand a bit about entropy noise and could realize, with a little help from my Unix friends, that one of the servers had no installed a package that generates the "noise".

As mentioned in the following post I ran the next commands:
cat /proc/sys/kernel/random/entropy_avail
23
cat /proc/sys/kernel/random/poolsize
4096

So there where too few entropy available to generate a secure random number.

After installing the package
cat /proc/sys/kernel/random/entropy_avail
4096
Afterwards I had no more apparent deadlocks, my application could get connections to the DB.

miércoles, 2 de septiembre de 2015

A coding solo

It was quite close to the end of the day, I was packing up my stuff to happily go back home after a pleasant coding day.

Suddenly it started to rain very heavily, it was insane!. Instantly I recalled the news in the morning where they warned about having 3 level-4-out-of-5 hurricanes on the west coast of Mexico, this is crazy, I don't remember at all having more than 2 hurricanes at the same time, climate change... Afterwards I just decided to stay at work until the rain stops, which last for a long while. I ended up going home enjoying the rain.

Before packing up, again, I decided to finish my coding task. It was almost done, started typing in my test, went red, then it turned green very fast and the next one.

I was working at a very high speed, I don't know if it was because of the rain, or because there were less people in the office, or if I just got into what some of my friends and me call, the coding solo, inspired on the guitar solo.

That moment when you really show off yourself to yourself, when you get a huge smile from ear to ear but you don't tell until later. I could bet blindly that you have felt this feeling before. Such a nice feeling, don't you think so? Everything goes green quite fast, you realize that what you have done is gonna help your team, you ask for help to your teammates who reply very fast and keep you going, and you can be very proud of your coding.

There were still a few of my friends at work, even though it was quite late, almost 9pm, by the way time flied for me. I was very happy and started to talk to my friends there. I was so excited that they could tell and also share my happiness.

I don't want to demerit pair-programming. I just want to make it clear that sometimes you enter into this mood, or coding solo as I call it, where you can not get distracted by anything and you get a lot of work well done.

In the end it didn't stop raining and had to go home. It didn't bother me to get wet at all, I was enjoying the rain while still very excited of my coding solo.

viernes, 28 de agosto de 2015

sábado, 11 de julio de 2015

The tiny change

Last weekend we released a gigantic version of the system which was presumably perfect regardless its size. At least that's what final test revealed on that Saturday.

At the beginning of this week, on Monday, The system was working quite well until 9:30ish when one of the components started to publish malformed messages of a certain type. Fortunately that message type had no huge impact on the business, though it was very unpleasant for the team to fall in a situation like this.

In the evening, we went into the detail of the failure and I have gone into some conclusions:

What happened?
The request for change was triggered by the business to improve this message type in certain conditions.
5 loc changed in svn before moving out to git. (Nobody noticed about the change as there was no pull/merge request)
0 UT for that change
0 FT for that change
0 Preexistent tests for that message type.

Having this conditions, the software engineer who applied the change thought the change was so easy and implementing a test wouldn't have had brought more benefits than the cost of implementing it. So, he applied the change and you now know what happened.

Fortunately...
The system is quite modular and the main operation was not affected at all by this glitch. Though it can be even more modular.
Rollback of the module/service malfunctioning was performed, though it could have been much much faster.
We have showed and convinced management of the benefits of changing the way software is being done and adopt practices like (A)TDD, CD, DevOps culture, automation, among others. In fact some changes have been implemented, for instance, changing from svn to git, code inspections, katas, and a (small) reading club.

What could have prevented that glitch?
Certainly (A)TDD. In fact, TDD is being adopted, software engineers have seen the huge benefits of working this way. Unfortunately, the glitch was injected before this adoption and nobody could catch that glitch earlier.

I always remember a piece of code refactored by the uncle bob in one of his books, "clean code: a handbook of agile software craftsmanship", when there was a method doing something with dates. He decided to clean that code up so the first thing he did was to test the current implementation until he got a hundred percent of covered code. Until then, he refactored the piece of code without any collateral damage.

Software inspections/Pair programming. It surprises me the amount of people that say software inspections and pair programming is a waste of time. Some of them have told me they tried it but didn't receive any benefit, some others say it just don't work and they don't have time, though they have plenty of time to fix bugs...

There are quite a few books that explain how to do software inspections and pair programming and I feel I can help using this analogy, Software is like a paper, report or document. It has an structure and you are supposed to understand what the document, the code, says/does. Once you finish, talking about inspections, you go to your supervisor and ask him to review (inspect) it. Your supervisor might tell you about changing your redaction, focusing on the reader and if they finds anything difficult to understand, they will ask you to rephrase or reorder it. In software it happens exactly the same while pair programming or inspecting software, if someone does not understand anything I'm hundred percent they will ask why is it like that or what it does or any other question/improvement.

Continuous Delivery. Of course, if we had had continuous delivery, we could have reestablished the service in a few seconds/minutes and not causing more damage to the business. Also we wouldn't have deployed that gigantic change where we didn't have control of all the changes included.

I'm being very brief on this topic but do not get me wrong, CD is not just deploying software automatically, it goes far beyond that. There are quite a few books, blogs, and papers that can explain what CD is much better than I do.

Software craftsman mindset. This is a huge topic and I encourage reading "The software craftsman: Professionalism, Pragmatism, Pride".

DevOps culture. This topic also needs a dedicated post.I'm going to write more about this when I've got more experience and can share thoughts about implementing that cultural change in my work.

To conclude:

Unfortunately in terms of releasing software in the Mexican Stock Exchange this is not the exception to the rule, this kind of releases are very common and they only occur twice a year. Changes are taking place to give the stock exchange the ability to release often and better. This is a very long way but this is not the first time any company has driven this road and won't be the last company taking this road to happiness and professionalism.

About the glitch I hope everybody understands the importance of anything we do, even it its a quite small change. It might have catastrophic results as history says. Glitches like this have caused casualties and million of, pounds, dollars, Mexican pesos, euros, you name it, lost.

And please, do no start coding if a test does not exist!

viernes, 19 de junio de 2015

Some conclusions about mentoring TDD

Today I finished the first round of katas to try TDD among all members of the software engineering team I'm part of.

I learned quite a lot from them and the best part of it was that they seem enthusiastic to incorporate TDD in their next developments. Features like thinking about tiny steps, pair programming have made have taught them another way to do better software and faster.

Some of them have seen the benefits of not doing overengineering.  Some others have seen the importance of refactoring, though there is a lack of refactoring practice which I'm quite confident to solve with katas.

My intention is to keep doing katas all the time and hope the team start doing the same. I mean, they have seen themselves the benefits of the katas.

There is still a long way to accomplish my goal which is to transform the way software is being done here at the Mexican Stock Exchange. It's such a good start. I just can promise to try hard, learn, share and mentor whenever I can.

Finally, should I mention an analogy for doing katas I would say that is like football players, in fact any sportsman, you have to train before performing your activity. In doing software it is the same, we must train before performing, though we play a match everyday not twice a week like a football player.

martes, 9 de junio de 2015

Teaching TDD

I decided to show/teach/encourage to try TDD with my colleagues at work. A few of my colleagues and I had some katas before. They were quite successful and particularly 2 of them sounded quite excited.

On my way to invite more people to try TDD, last night a colleague of mine and I had a kata, the wordwrap kata. Unfortunately it was not as good as the previous ones.

We started typing a test, then make the class and the method. Everything compiled and got the red bar! We went green and started adding more tests, red bar turning into green for about 6 times. Then we decided to add a more complex test, which was supposed to be the final test.

During this final test we struggled  quite a lot and couldn't get the green bar. On the opposite we ruined two tests. That was quite frustrating, it was like 8:20 pm and decided to continue today.

On my way home, riding my bike I was still thinking how it could have failed. I had done that kata before a few times but this was totally different. I started analyzing our kata and thought that our main mistake was to refactor too lately, perhaps. When we decided to implement the last test, our code was too complex.

Maybe we were tired and couldn't think about refactoring instead of forcing it to turn green.

Today we refactored what we needed and happily turned green. Several lessons learned during this kata. Perhaps the most important is to have a bottle of water next to the keyboard to force us stop for a while and try to rethink what we were doing!

sábado, 23 de mayo de 2015

JMS Encapsulation

How many times have you implemented JMS? There are quite a lot Message Oritented Middleware (MOM) based architectures out there or if not at least JMS communication is required.This leads you to implement JMS everytime you need. It might be within the same system between different components or if you finish a project and move to another one, sometimes you reimplement JMS.

Spring offers a great JMS template that does all the plumbing for us but still you need to reimplement some features.

I decided to create this JMS Wrapper and to give it some key communication responsibilities. For instance, how many times have you deal with a JMS broker failure and the client is not configured as fault tolerant? I talked to management and the different areas within the Mexican Stock Exchange (BMV) that use JMS and the idea was pretty welcome. I read quite a lot about JMS and ended up implementing this JMS Wrapper to fit basic communication needs.

In terms of software architecture, this is a component that acts as a connector. Communication responsibilities are delegated to this component, some availability features, like reestablishing communication once JMS broker is alive, are key responsibilities and of course, encapsulation. You define through the main interface the functional responsibilities without coupling with a JMS provider. Features are responsibilities of this JMS Wrapper not part of the JMS Provider, so if a broker does not implement a responsibility you may extend this JMS Wrapper in order to provide it. Again, you avoid coupling with the JMS provider.

This JMS Wrapper was originally implemented to be used only within the BMV systems. After some chats to management we decided to open source it. Someone might need it!

Feel free to fork it!
https://github.com/gusvmx/BmvMQ

I implemented the .Net version as well. I'm gonna publish it soon