|
Hi All,
I am working on a banking fraud detection project with Drools Fusion, which will match a transaction against hunreds of rules to check whether the transaction is suspicious. In some rules, I use time-based sliding window to calculate the average transaction amount of an account in the past 3 or 6 months. One possible rule will be as below: rule "Single Large Amount Transaction" dialect "mvel" when $account : Account($number : number) $averageAmount : BigDecimal() from accumulate( TransactionCompletedEvent(fromAccountNumber == $account.number, $amount : amount) over window:time(90d) from entry-point TransactionStream, bigDecimalAverage($amount)) $t1 : TransactionCreatedEvent(fromAccountNumber == $account.number, amount > $account.creditAmount * 0.5, amount > $averageAmount * 3.0) from entry-point TransactionStream then end In such cases, the Fusion Engine will hold TransactionCompletedEvent in its memory for 90 days. And we have about 1 billion Accounts in total, so the TransactionCompletedEvent will be huge, we will very soon run out of memory. I have been blocked here for a long time! Is it possible to distribute a single StatefulKnowledgeSession in multiple JVMs or machines using Distributed Memory cache such as Hazelcast? If yes, could you give me some opinion on the solution? Or is this the problem the Drools Grid project try to handle? Or there are other techonology to handle large numbers of facts or events problem? As far as I know, Drools Grid distribute multilple ksessions on multiple machines in the Grid, each or several kseesions on one node? Is my understandings right? Any response or opinion from you will be appriciated! Thank you very much! |
|
Yes, you get it right.
Drools Grid (or at least what is there in the source code right now) was about session virtualization. It allows you to access to a session hosted in a different JVM. What you are looking for, I'm afraid is not possible right now, because in order to distribute one session into multiple JVMs the RETE algorithm needs to be split. There are some experimental works around this, but nothing has being released yet. Probably Mark can give you more details about that.
Cheers
On Thu, Jun 14, 2012 at 1:06 PM, chrisLi <[hidden email]> wrote: Hi All, - MyJourney @ http://salaboy.wordpress.com - Co-Founder @ http://www.jugargentina.org - Co-Founder @ http://www.jbug.com.ar - Salatino "Salaboy" Mauricio - _______________________________________________ rules-users mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-users |
|
In reply to this post by chrisLi
May be you can think differently : Instead of keeping all objects for 90 days, keep only the accumulate results for each previous day.
----- Mail original ----- De: "chrisLi" <[hidden email]> À: [hidden email] Envoyé: Jeudi 14 Juin 2012 18:06:50 Objet: [rules-users] Is a single StatefulKnowledgeSession with Distributed Memory cache possible? Hi All, I am working on a banking fraud detection project with Drools Fusion, which will match a transaction against hunreds of rules to check whether the transaction is suspicious. In some rules, I use time-based sliding window to calculate the average transaction amount of an account in the past 3 or 6 months. One possible rule will be as below: rule "Single Large Amount Transaction" dialect "mvel" when $account : Account($number : number) $averageAmount : BigDecimal() from accumulate( TransactionCompletedEvent(fromAccountNumber == $account.number, $amount : amount) over window:time(90d) from entry-point TransactionStream, bigDecimalAverage($amount)) $t1 : TransactionCreatedEvent(fromAccountNumber == $account.number, amount > $account.creditAmount * 0.5, amount > $averageAmount * 3.0) from entry-point TransactionStream then end In such cases, the Fusion Engine will hold TransactionCompletedEvent in its memory for 90 days. And we have about 1 billion Accounts in total, so the TransactionCompletedEvent will be huge, we will very soon run out of memory. I have been blocked here for a long time! Is it possible to distribute a single StatefulKnowledgeSession in multiple JVMs or machines using Distributed Memory cache such as Hazelcast? If yes, could you give me some opinion on the solution? Or is this the problem the Drools Grid project try to handle? Or there are other techonology to handle large numbers of facts or events problem? As far as I know, Drools Grid distribute multilple ksessions on multiple machines in the Grid, each or several kseesions on one node? Is my understandings right? Any response or opinion from you will be appriciated! Thank you very much! -- View this message in context: http://drools.46999.n3.nabble.com/Is-a-single-StatefulKnowledgeSession-with-Distributed-Memory-cache-possible-tp4017968.html Sent from the Drools: User forum mailing list archive at Nabble.com. _______________________________________________ rules-users mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-users _______________________________________________ rules-users mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-users |
|
Yes, that's another option too.. it really depends on how you can accumulate data or split your data to be analyzed.
Cheers
On Thu, Jun 14, 2012 at 1:22 PM, Vincent LEGENDRE <[hidden email]> wrote: May be you can think differently : Instead of keeping all objects for 90 days, keep only the accumulate results for each previous day. - MyJourney @ http://salaboy.wordpress.com - Co-Founder @ http://www.jugargentina.org - Co-Founder @ http://www.jbug.com.ar - Salatino "Salaboy" Mauricio - _______________________________________________ rules-users mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-users |
|
Hi,
Thank you very much for so qucik response. I cannot even believe it! As far as I know, the Fusion engine has to store the event details for sliding windows. Because if an event in the window is expired, the Fusion engine still need this event details to update the accumulate results. So, I think storing the accumulate results for per day could not conform to Fusion's logic. Thank you, all! |
|
It's difficult proposing alternatives without knowing all the requirements.
If there are no rules that require the presence of transactions with unknown/indeterminate account numbers, a caching strategy might be considered. Clearly, this would render relying on Fusion features obsolete, and (temporary) retractions would have to be done explicitly. Most frequently used accounts would remain in WM up for the period you have to consider. Others would drop out due to being idle for a certain period. A new transaction for some swapped out account would trigger reloading of old transactions. (This is akin to some page swapping strategy used to implement virtual memory in operating systems.) This isn't a simple task, but given the huge amount of transactions you have to cope with you may not have any feasible alternative "out of the box". -W On 15 June 2012 03:32, chrisLi <[hidden email]> wrote: Hi, _______________________________________________ rules-users mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-users |
| Powered by Nabble | Edit this page |
