torstai 23. elokuuta 2007

Trainee position

This month is coming to an end and my time working for COSS ends too. I'm going to have a working version that checks for errors and warnings, but there is just not enough time to make checks for everything.

Also when doing analysis for larger rule bases there is going to be a need for a database that stores meta data about the rules. Data can then be loaded into rule base according to the rule or rules that are under analysing. This is when the impact analysis is going to be helpful, at the moment the impact analysis data is collected but not really used that much. When loading only the necessary data, adding new rules and checking them immediately will of course be faster. This is required for the JBoss Drools BRMS, that makes it possible to control rules trough browser, and for the Eclipse plugin.

Because there is still work to do, last week I was offered a position as a trainee at the JBoss Drools team. I'm going to take the job, rule engines and analytics module have proven to be really interesting. Hopefully there is going to be work related to them later in my life too.

tiistai 14. elokuuta 2007

More Redundancy Checks

Since my last blog entry I have improved my meta data model. This model describes the AST in a way that makes it easier for me to write Drools rules for analytics. Coding and designing the new features took longer than I thought, but without them, full testing could not be possible.

The more time I spend doing this, the more questions pop out. Right now I'm not that convinced that the rule base can be checked for redundancy in every situation. The most puzzling cases are the eval keyword, functions, accumulate feature and the RHS of the rule. All these give the rule base engineer a change to write his own code in to the rule. Simplifying the different kind of rules and comparing them is not easy, and code made by user makes it even harder. Right now code sections can even include a database query, this query can have restrictions that are impossible to map.

I have mentioned in my blog before that currently only MVEL code analysing is going to be implemented. At the moment also Java is used in rules, and Java is probably a lot more common in the old rule bases as the MVEL dialect was just introduced. In the future there might be other dialects, this is why the mapping of the code parts needs to be flexible enough so that it can some day implement all of the dialects.

After I finished making the data model, I continued to write the redundancy and subsumption rules for Drools. All the restrictions, patterns and different combinations of patterns and rules need to be checked. At this point the restriction rules are almost done, after that I can use data from those rules to check for redundancy in pattern. The following is a simple example:

Pattern( x == 1 && x == 1 )

The field x is checked for literal 1 twice in the same pattern. This doesn't prevent this rule from firing but it does make the rule base a bit slower. After this I'll do the pattern redundancy checks and finally the rule redundancy.

maanantai 30. heinäkuuta 2007

Subsumption And Redundancy

I made the unit test for range checks. Those were really useful and I found some bugs from my logic. At the end of last week I started to make unit tests and some rules to check subsumption and redundancy. To find these I'm going to need the data from the RHS of the rule. Since subsumption and redundancy should only be checked when RHS is the same, or actually, when RHS does the same things. Simple string comparison is not enough. Lines of code can be written in different order and they still do the same. Also lines that do not do anything, like comments and simple text printing for console needs to be ignored. I also need the information from RHS to solve possible rule flows and actions between rules.

The rule flow that I use for subsumption and redundancy checks creates a lot of meta data that I can use when I'm checking other issues from rule base. Right now I can see how I could use them when checking for rule equivalence, combination and deduction.

New Drools release had some new functions, this means that I need to add them in to my analysis model of the Drools AST. Many of the new features seem to be easy to add. I'm trying to make my model so that it has only the data I need and in as simple form as possible. At the same time I'm going to add data about the constraint and restriction orders, this will help me when I'm doing the checks for optimisation.

maanantai 23. heinäkuuta 2007

Unit Testing

Since last time I have improved the range checks, done some planning for redundancy and subsumption checks, improved output and made some unit tests. JBoss Rules also changed its name back to Drools, so I'm going to use that name from now on.

I started to create unit tests so that I could make sure that I have covered all the cases for the range checks. Unit testing rules is not that simple since rules are usually fired in groups, and what I need for the testing is to separate the rule or rules that I'm currently testing, and give them only the minimum facts to fire. Debugging the rules is also difficult, at this moment only the RHS (Right Hand Side of the rule) can be debugged. At the same time that I started to think about unit testing Dr. Gernot Starke made a blog entry on subject. I found out about this because Mark Proctor mentioned it in his blog.

For better testing of rules I made a helper object that I pass to Drools as a global fact. With this object I can get the data of all the fired rules, and causes for firings. This kind of data is not normally needed so I'm only using this object for debugging and testing. Whole thing is still under development, but I hope that I can make something useful that might also help others.

My model for AST data is still changing, one reason for this is the new Drools release that has new features that I need to take care of.

Another thing that has given me some headaches is the redundancy and subsumption checks, after a lot of brain work I found out a way to test it. Creating rules seems to be all about asking the right questions. I still find myself thinking like a Java programmer, not like a Drools rule programmer.

I talked with Mark Proctor and asked him for new directions with testing the rules using impact analysis. Next step is to map the data in the RHS part of the rule. At this point Drools has two dialects Java and MVEL, this mapping will start with MVEL and maybe later it also covers Java. I will personally change the dialect used in my tests to MVEL, because it seems practical for rules.

maanantai 9. heinäkuuta 2007

More Range Checks

Now I have all the solutions for the patterns inside the rule, and all the possibilities for constraints inside patterns. I can use this information to compare two rules for redundancy and subsumption, and check for incoherence inside one rule.

Range checks for numbers, dates and variables are done, even found a way to check for variable ranges. Basically the structure of the rule that finds missing ranges is the same for all. Here is an example of two rules, the check for bar == $value is missing.

rule "Sample 1"

when

Something( $value :value )

Foo( bar < $value )

then

# Do something.

end


rule "Sample 2"

when

Something( $value :value )

Foo( bar > $value )

then

# Do something.

end




In this case the variable points to field value in class Something, variable could also be any number, like 42 or a date like 21-Oct-2007. In my last blog entry I talked about finding all the values that bar is being compared to, that was not needed as I'm using rule engine to solve this problem. All I needed to do was to find constraint that has field bar from Foo compared to any value for example ( bar > 42 ) and then check if there is a constraint that checks for ( bar == 42 ) or ( bar <= 42 ), of course if 42 is an integer there is the need to add +1 or -1 to 42. Rule engine then does the work for me and goes trough all the cases.

The checks for numbers in patters is still undone. When field x is compared to 10, 20, 30, 50 and 60 there should be a warning that x == 40 is missing. I don't think there is any way to check for all the of the possible patterns, but at least addition, deduction, division and multiplying could be possible.

The notifications that RAM gives needed an update, they can now be caused by multiple rows, and multiple rules. After range checks are done I'll be doing redundancy and subsumption checks.

perjantai 29. kesäkuuta 2007

Range checks

Last week I looked at the output that RAM returns. There will be three kind of notifications: notes, warnings and errors. I made a result object that returns these notifications to user, at this point results can be exported as XML or Java objects. XML can be transformed for example to HTML and Java objects can be used by Eclipse or any other program. Each notification contains unique id , explanation, line number and rule name.

I continued to make meta data from rules for RAM. I found a way to make all the possible clauses for patterns from the AST, but data for testing solutions between patterns is still undone. This will probably use something similar as solutions inside patterns. Mark Proctor gave me some advises on impact analysis. How to map the data about rules so that RAM could know which fields from which classes are used in which rules. This way if a field is removed from class, it is possible to see what rules are affected, or if a new rule is added it is possible to check what existing rules this new rule might impact with.

Next step is trying to figure out how to check for missing ranges. ( x > 1 && x < 1 ) should give a warning that x == 1 is not taken care of. To solve this one I need to find all the possible solutions that a rule can have to fire. Then I need to know which rules use the field x, impact analysis will help me with this, after that I can get every possible value that x is being compared to. By comparing these values, I can find out the missing gaps. There is only one but here, what if x is compared to a variable and that variable is binded to a field in a class, but I'm sure there is a solution for that too.

I have been reading that Expert Systems book and it's really interesting. Expert systems are used to replace and help human experts on their work. With expert systems the human expertize can be inserted into a machine. JBoss Rules is expert system shell and a tool for building expert systems. The book introduces another tool called CLIPS. Idea behind CLIPS and Rules is similar so this book helps to understand Rules.

perjantai 15. kesäkuuta 2007

Looking at the AST

I spent last weekend testing JRuby. I found out that some things would be easier using JRuby some things using Java, this was of course to be expected.

First days of this week I used to design a model that would help me to test rules.

JBoss Rules forms an abstract syntax tree (AST) from the rules it gets from rule base. This AST is done using Java, so the problem is that rule engines do not support object-oriented model that well, and as I am mainly using JBoss Rules to find conflicts I need to find another way. So Michael Neale told me to think about relations like the ones used in SQL databases, using this advise I added an identifier to all my objects and information of parent objects so that the relations could be solved. After this I could test small cases that were under And or Or descriptions, but this is not enough, because I would need to form loops to check for conflicts in the entire rule. Loops would be too messy to use, so I needed an other solution.

After some brain work, I realized that if I can get a list of all of the simpler clauses that can be formed from one rule, I can use those clauses to test conflicts inside this rule. Lets look at how this looks in the Rules drl file:

rule “Rule that causes warning”

when
Foo(bar == "baz" && ( xyz =="123" || bar != "baz" ) )
then
# Do something

end


This rule looks for Foo objects from working memory, if object Foo has parameters that match the definitions set inside the brackets it does something.
So all the simpler clauses for definitions inside object Foo would be:
bar == “baz” && xyz == “123”
bar == “baz” && bar != “baz”

This rule has an error because obviously parameter bar can not be equal to “baz” and at the same time be unequal to “baz”. On these kind of situations the RAM could check the rules and inform the user that he or she has a rule that can be true, but has an condition that can never be true. Another warning could be for example: Foo( x > 42 || x < 42 ) this throws a warning that possibility x == 42 is not taken care of. Only problem now is that how can I form all the possible clauses from the AST.

Yesterday and today I'll be looking at how the feed back from rule checks should work. Michael Neale said that the feed back would be in XML and it could then be transformed to for example HTML. Proctor and Neale also suggested some books that could help, so last Tuesday I got Expert Systems: Principles and Programming by Joseph Giarratano and Gary D. Riley, I'll be reading that on next weekend.