Friday, August 22, 2014

Cast Iron sFTP Connector Does Not Support SSH key based Authentication

After searching on forums and through the IBM documentation, I've found many people asking the question: "Does IBM Cast Iron support sFTP using SSH Key Based Authentication?"

The answer is actually simple and spelled out in the documentation, the sFTP Connector does not support Key or certificate based authentication at this time.  Per this technote from IBM, key based authentication "is not supported at this time."


Thursday, April 10, 2014

Notice for CIOS 6.x and 7.x Users Concerned about Heartbleed

Notice for CIOS 6.x and 7.x Users Concerned about Heartbleed

People everywhere are talking about the new heartbleed vulnerability discovered in OpenSSL see heartbleed.com for more details on the vulnerability.  Essentially, this vulnerability exploits a defect in the SSL heartbeat protocol which allows attackers to obtain sensitive information from your server including the private key your server uses to establish secure HTTP connections.  Once an attacker has your private key communication with that server can be decrypted and your sensitive data can be accessed.  You may be wondering if CIOS affected by this vulnerability.  The short answer is that if you are running CIOS 6.x you are not affected, however, if you are running CIOS 7.x you are affected by the heartbleed vulnerability.  IBM is working on a patch for CIOS 7.x see this link for more details.

How Do I know what version of CIOS I am running?

To determine which version of CIOS you are running login to the CLI via SSH and type the command system show version.  You may also login to the WMC and go to the system tab.

I'm running CIOS 7.x, What Risks am I exposed to?

If nothing else the Web Management Console uses SSL, so theoretically an attacker could gain access to the WMC and take down your appliance.  Most users do not expose the WMC to the internet so the risk is only within your own network.  If you use the HTTP Receive or Provide Web Service Activities with SSL Enabled you are also vulnerable.  Again, most customers do not expose web services to the internet so your exposure is limited by which networks have access to your services.

I'm running CIOS 7.x, what do I do next?

IBM is working on a patch, if you are running CIOS 7.x you should apply the patch as soon as it becomes available.  Contact IBM Support for more details.  Once you have applied the patch you should also generate a new certificate for your appliance in case your system has already been compromised.  Once an attacker has your private key the only way to secure your connections again is to create a new key (generate an new certificate).  See the Security Section of the Cast Iron Online Help for further details.
 

Tuesday, December 3, 2013

QT015: Use Filter Recurring Nodes to Apply XPath Predicates within Map Activities

Quick Tip #015: Use Filter Recurring Nodes to Apply XPath Predicates within Map Activities

The Cast Iron Map Activity is a powerful and easy to use tool for transforming XML documents.  For most transformations the simple point and click / drag and drop interface for mapping fields and inserting functions is intuitive and for the most part self explanatory.  However, there are a few features that are not very well known and are sometimes forgotten about because they can only be accessed by right click menus.  This article will cover the usage of one such feature, the Filter Recurring Nodes feature.

What is Filter Recurring Nodes?

The Filter Recurring Nodes option, which is accessible by right clicking a recurring node on the target side of a Map, allows you to apply an XPath predicate to the source document as part of the map.  XPath is a language for navigating XML hierarchies, it has a simple syntax and includes various functions for transforming data.  An XPath predicate is a filter that can be applied within an XPath expression to filter nodes that meet the criteria expressed in the XPath predicate.  Predicates can use sub-path expressions functions and comparison operators to identify the nodes that should be included in the filter.

How do I use Filter Recurring Nodes?

The best way to understand how to use Filter Recurring Nodes is with an example.  For this example we will use a common design pattern when working with the Salesforce.com Connector.  When working with the Salesforce.com Connector you may have noticed that in order to determine whether or not your operation completed successfully you must check the results output parameter of the Salesforce.com Activity.  The results output parameter is an XML document which contains a result element for each sObject passed to the input parameter of the connector.  Each result contains a boolean flag indicating whether or not the operation on the associated record was successful.  (Results are outputted in the same order as the input data)  Assuming you want to report on all the records that errored, you will need to collect those records and write the error messages to a database or send them in an email.
Without filter recurring nodes you might use a for-each loop on the results object with an if-then and an expand occurrences map.  This is a very inefficient way to collect these records (expand occurrences maps are very inefficient), also your orchestration would be cluttered with a lot of unnecessary activities which increases the potential for mistakes and hinders readability.  Fortunately, filter recurring nodes provides an efficient and compact way to accomplish this goal.
First, we will create a Salesforce.com Upsert Activity and go to the Map Outputs Pane.  Create two variables based on the results output parameter of the activity and name them successes and failures.  Add the newly created variables to the map.  Now you can drag the recurring result node of the output parameter to the recurring result node of both the successes and failures variables on the target side of the map.  Next right click on the result recurring node of the successes variable and choose Filter Recurring Nodes.  This will open the filter recurring nodes dialog.  You will see an XPath expression with an empty predicate (/results/result[]).  Now you can fill in the predicate to complete the expression and filter for only nodes where success equals true.  To do this enter the following text into the box:
*:success = true()  
Repeat this filter recurring nodes step for the failures variable and use the XPath predicate:
*:success = false()

That's it you now have a variable called successes that contains all the success records and a variable called failures which contains all the errors.  The failures variable can now be transformed to an email message or logged to a database, etc.

How do I know if Filter Recurring Nodes has been Applied?

When reviewing an orchestration, especially one created by another developer, it is important to understand whether or not this feature is being used and which XPath predicate has been applied.  When a map contains a Filter Recurring Nodes condition, an icon is displayed next to the recurring node where it was applied.  You may also hover over the the node to see the XPath predicate that was used.  See the screenshot below, the Filter Recurring Nodes icon and the predicate are highlighted in green.


Notes on XPath predicates

You may have noticed a couple things about the predicates above.  First, the *: before the success fieldname.  This is a namespace wildcard and is a common idiom in XPath expressions such as this because it is often difficult to know which namespace prefixes have been declared.  Be careful when using this wildcard that you do not happen to have two fields in your document with the same name and different prefixes.  Second, the use of the true() function instead of a literal true.  XPath does not reserve a true keyword because XML does not reserve true as a keyword.  Therefore the literal string true would match to a node named true rather than the boolean value.  To get around this limitation XPath provides the true() and false() functions which return their respective boolean values.

Further Reading

In order to use this feature effectively you will want to have a good background in XPath predicates the following resources may be helpful in understanding XPath:

Friday, August 2, 2013

QT014: A Quick Note on Upgrading to 6.3.0.0 When Using HTTP Receive

Quick Tip #014:  A Quick Note on Upgrading to 6.3.0.x When Using HTTP Receive

Several of our customers have experienced issues related to migrating existing orchestrations that begin with an HTTP Receive Activity to version 6.3.0.x.  (We've seen this issue in both 6.3.0.0 and 6.3.0.1, we have not tested 6.4.0.0 yet)  I believe that the issues mostly come up when you are trying to manually parse the URI and simple use cases that don't do anything with the URI seem to work just fine.  Our recommendation is that if you are using the HTTP Receive Activity as a starter for any of your orchestrations that you thoroughly regression test your orchestration.  Additionally, if you are manually parsing the URI we recommend rebuilding the Activity to use the built in parsing functionality.  See this post for more details on the new features of the HTTP Receive Activity.  There are two main problems that we have encountered so far:

  • The URI string is now passed to the orchestration URLEncoded.  If you were previously parsing this value with javascript or a flat file definition you may want to consider using the new built in functionality or using the new Http Header Functions available in the functions tab to parse out the path or extract a query parameter.
  • Certain URI strings cause the Activity to throw an exception and prevent the job from starting.  This is a bug that we discovered at a client today, if you have a query string in your URL with a parameter but no value, this will cause the HTTP Receive Activity to throw and exception.  If you encounter this issue, the work around is to make sure that you pass a value to all your query parameters. 
    • this throws an exception: http://www.example.com/MyTestOrchestration?value
    • this does not throw an exception: http://www.example.com/MyTestOrchestration?parameter=value
Both of these problems have easy work arounds, and certainly the new functionality for parsing path and query parameters are welcome enhancements.  However, as always when upgrading to a new version, be sure to thoroughly regression test your orchestrations.

Monday, July 1, 2013

QT013: Resolving DNS Resolution Issues in CIOS

Quick Tip #013: Resolving DNS Resolution Issues in CIOS

The Domain Name System (DNS) is a global distributed network of servers that is used to resolve hostnames such as blog.conexus-inc.com to an IP address which can be used by the network layer to connect to a remote host.

Common DNS Issues

DNS issues usually manifest themselves as UnknownHostExceptions in the system log.  An UnknownHostException is generated when an activity tries to resolve the hostname of a remote host and the DNS server either does not respond or does not have an entry for that host.  Another common issue occurs when your Cast Iron appliance resides behind the same firewall as the remote host and the DNS server returns the external address rather than the internal address.

Ensure DNS Servers are Properly Configured

You may specify multiple DNS servers in the networking configuration of CIOS.  This can be done via the Command Line Interface (CLI) using the net set nameserver command or under the networking panel in the Web Management Console  (WMC).  The first step in troubleshooting DNS is making sure that these settings are correct.  If you are able to resolve other hostnames and have isolated the issue to a specific remote host, there are a few other options to consider.

Use the IP Address Instead

Often you can bypass the DNS resolution process by replacing the hostname for a remote server with its DNS address.  This is the simplest solution, however, there are some drawbacks to it under certain conditions.  By bypassing the DNS system you are taking responsibility for ensuring that if the IP address of the remote host changes, the change is made in your configuration properties.  Some endpoints such as Domino connect to a gateway which redirects the connection to another hostname.  In these cases, specifying the gateways IP address will not resolve the problem unless the gateway is configured to redirect to an IP address rather than a hostname.  In this case you will need to make sure that cast iron can resolve the hostname.  Also, there are SSL implications to using the IP address instead of the DNS entry.  SSL typically checks that the Common Name of a certificate presented by a remote server matches the hostname used to resolve it.  If you use the IP address this check will fail.  You could disable hostname verification, but there is a better way . . .

Add an etc/hosts Entry

CIOS is, underneath the covers, a linux server and linux servers do have their own internal name resolution process that happens before reaching out to the DNS server.  If your DNS servers do not properly resolve a given hostname you may statically add it to the etc/hosts file.  When resolving a hostname, CIOS will first check for an etc/hosts entry and only if the address is not resolved by etc/hosts, contact the DNS server.  Again, by bypassing DNS you are taking responsibility for maintaining the hostname to IP address mapping.  However, this method has the benefit of allowing your orchestrations to use the actual hostname to connect which means that you can maintain the hostname to IP mapping in one place and SSL can perform hostname verification.  etc/hosts entries can be added via the CLI by using the net add etchost address <ip> hostname <name> command where <ip> is the remote hosts ip address and <name> is the FQDN.

Monday, June 3, 2013

QT012: Calling Apex Web Services from IBM Cast Iron

Quick Tip #012: Calling Apex Web Services from IBM Cast Iron

The standard interface between Cast Iron and Salesforce.com is the the Salesforce SOAP API.  The SOAP API provides standard CRUD operators that would with all standard and custom objects in salesforce.com.  This is a powerful and robust interface, however, you may want to encapsulate complex business logic within salesforce as you might find in a database stored procedure.  This is possible to accomplish using Apex Web Services.

Exposing Apex Web Services with Salesforce

Salesforce.com allows you to expose your own custom web services using the Apex Programming Language.  Method declarations in apex prefixed with the webservice keyword will be exposed as operations in an Apex Webservice.  Webservice methods must be declared as static methods within a global class.  

global class MyWebService{
 ...
 webservice static String getName(Id id){
  Account a = [select Name from Account WHERE id=:id];
  return a.Name;
 }
 ...
}

Any static method declared within a gobal class with the webservice keyword will be exposed as a webservice, to generate a WSDL for the class navigate to the class page under the setup screen in SFDC and click the Generate WSDL button.  The generated WSDL will contain the definition for all web service methods in your class. 


Can I Use Complex Types in an Apex Web Service?

You can define complex data structures to pass to and return from your web service operations by defining a global class to contain the data and declaring the fields of your class with the web service keyword.
global class MyReturnParams{
 webservice Id id;
 webservice String message;
}

Calling Apex Web Services from Cast Iron

Once you have created your Apex Web Service and downloaded the WSDL, you can import the WSDL into Cast Iron Studio and create a Web Service Endpoint.

You are now ready to add your invoke Webservice Activity to your orchestration.

What About Authentication?

Salesforce.com Web Services uses a proprietary authentication handshake to get a token which must be passed to your Apex Web Service.  These tokens can get stale and expire, so to avoid building logic to deal with this, we will let Cast Iron do it for us.  CIOS has built in logic to maintain a pool of sessions with SFDC so we will leverage the SFDC Connector to get a token that we can pass to our Web Service.  To do this, we will need a Salesforce.com endpoint and we will need to create an Activity to interact with SFDC and get a SessionHeader.  Any activity will do, so we will use a simple get Server Timestamp call to Salesforce.  If you have used the SFDC connector before you may have noticed that the connector returns the session header that it used to make the call to SFDC.  You will need to copy this header and pass it in to your Apex Webservice call.  You will also need to capture the server URL that is returned and use it to construct the appropriate URL for your WSDL.  Because server urls can change between environments and may change over time, it is a good idea not to hard code the URL of your instance and instead use the one that SFDC returns.  To do this you will need to write a simple JavaScript to parse the instance of your organization and use that to build the URL to pass to the location optional parameter of your invoke web service activity.

There are a few other required parameters that you will need to pass to various headers in the web service call.  To see the headers, right click the target side of your Map Inputs step on the invoke WebService Activity and choose show optional parameters.  Here is a list of the required parameters and what they do:
  • headers/SessionHeader/SessionId: This is where you will map the session id from the get Server Timestamp activity.
  • headers/DebuggingHeader/DebugLevel:  This field determines how much debugging info SFDC will return to your call.  For production this value should be None, as there is a governor limit for returning debugging details.  You want to avoid exceeding the limit on calls that do not need to be debugged so you can actually get debugging info when necessary.
  • headers/AllowFieldTruncationHeader/allowFieldTruncation:  This field defines the default behavior for the Database.DMLOptions allowFieldTruncation property.  Prior to version 15.0 SFDC would truncate strings when their value exceeded the size of the field.  This behavior was changed in version 15.0 to default to throwing an error when a string exceeds the field length.  This parameter allows you to specify the previous behavior as the default.  I believe that this parameter will not override the property if it is specified in your apex code.
Note: It appears that recent versions of SFDC now have a Security header which allows you to specify a username and password for authentication.  This option could be used instead of the traditional method of obtaining a session header.  I have never used this method, but it seems that there are some trade offs to consider here.  If you use the above method, you actually make two calls to SFDC one to get a sessionId and another to invoke the web service.  (However, any call to SFDC will return the header, and I often find that I can get the session header from another call within an orchestration such as a query that must be done before the invoke web service)  Specifying a username and password would allow you do everything in one step, however, you would be authenticating for each call to SFDC which can be an expensive operation in itself.  Also, the above method uses SFDC to retrieve the appropriate server URL from the login server.  Without that step you would have to hard code the server, which could change if SFDC moves your instance.  Salesforce.com provides plenty of warning when they do make changes, so that may not be a major concern.  More investigation in needed to determine if the security header provides a cleaner solution to this problem. 

Thursday, May 23, 2013

AP001: High Availability for IBM Cast Iron

Architectural Patterns #001: High Availability for IBM Cast Iron

Some folks out there have been asking about High Availability (HA).  With CIOS there are two general purpose High Availability options: For physical appliances you can use an HA Pair setup to provide high availability, in a HyperVisor environment you have several levels of HA build into VMWare.  Every situation is different but we typically recommend VMWare as an HA mechanism because it offers more flexibility and many of our customers already have VMWare infrastructure and expertise.

First a bit of Background on High Availability and Fault Tolerance 

When designing a system you inevitably spend a lot of time thinking about what happens when something goes wrong.  Error handling logic is often the most time consuming part of system design, this must inevitably extend outside of your orchestrations to the platform itself.  System availability as measured by percentage uptime is a common metric used when defining an Service Level Agreement (SLA).  For example a system with 99% uptime can be down for roughly 1.5 hours a week.  A system with 99.999% (five-nines is a common idiom when it comes to availability), the system can be down for about 5 minutes every year.  Typically, when we talk about System Availability, we are concerned with maximizing the amount of time that this system is available to process transactions.  There are two main reasons a system can be unavailable: maintenance or a system failure.  Having no maintenance windows in what is typically referred to as a "Zero Downtime" architecture is extremely difficult, we've never attempted this type of scenario with Cast Iron because when it comes down to it not many users can justify the expense of that kind of SLA and simply schedule downtimes when the system is not heavily used and fall back such as allowing transactions to queue can be used to allow system maintenance to occur.

Avoiding downtime due to system failures is referred to as "High Availability" or "Fault Tolerance."  How a system failure will affect system availability depends on how long it will take to recover from an outtage.  Like anything, there is a spectrum of system availability options.  A typical set of options are as follows:
Zero Redundancy: You have no backup unit, you need to call and order a replacement part and wait for it to be delivered and installed before you can power your unit back up.
Cold Spare: You have a backup unit, its sitting in the box in your data center.  You need to unrack the old unit, rack the new unit, plug in the network and power cables.  Boot the unit, patch it, load your projects, configure them, and finally start your orchestrations.
Hot Spare: your spare unit is already racked and patched, with orchestrations loaded, you just need to switch the IP addresses and start your orchestrations.
High Availability: With a high availability solution, the process is now fully automated.  You have reserved capacity to accommodate failover an automated process to recover in under 10 minutes.
Fault Tolerant: With a Fault Tolerant system, the process is fully automated and resources are not only reserved they are already allocated.  Failover is seamless to external systems recovery time is under 10 seconds, ideally instantaneous.

What is a Physical HA Pair and How does it work?

With a physical HA Pair you actually have two physical appliances that are tied together in a Master / Slave setup.  The appliances have special hardware and dedicated network connections between them so they can replicate and detect failure scenarios.  One of the appliances is determined to be the "Active" appliance and the other runs in "Passive" mode.  The Active appliance carries out all the work that a standalone appliance would do however it commits all changes to the Work In Progress (WIP) memory area to the passive appliance.  The WIP is the persistent store that the appliance uses to store the state of all of your variables before any connector activity.  With the WIP replicated to the passive appliance, should anything happen to the Active appliance, the Passive appliance is ready to take over as soon as it detects a failure.  When the Passive appliance takes over, it will take over the MAC addresses of the former Active appliance, therefore to external systems there is no change.  In the System Availability spectrum this solution is somewhere between HA and FT, recovery is automatic and close to instantaneous, however, because the system recovers at that last state of the WIP network connections need to be reestablished to endpoints and you need to understand the nature of the interactions with your endpoints.  Some endpoints support Exactly Once semantics where the connector will guarantee that an operation is only performed once.  For example, the database connector does this by using control tables to synchronize a key between the appliance and the database.  They insert a key into the control table and the presence of that key is checked before repeating an operation, if the key is present, the operation has already been completed.  We generally recommend that you design all processes to be idempotent, so it won't matter if a single interaction with an endpoint is repeated.  This is the easiest way to recover from errors, but often requires careful design to achieve.

What Options Do I have with VMWare?

VMWare actually gives you several levels of High Availability to choose from depending on the resources that you want to allocate to HA.  The simplest option within VMWare is VMWare High Availability.  VMWare High Availability requires a VMWare cluster with VMotion and VMWare HA configured.  In this mode VMWare will detect a server or appliance fault and automatically bring up the VM on a new server in the cluster.  In this mode, there is a potential for some downtime while the appliance is started on the new server, however, the appliance will recover whether it left off using the last state of the WIP before the crash.  The advantage of this setup is that resources do not have to be allocated to a redundant appliance until a failure occurs.  Essentially, the resources required to recover from a failure are reserved not allocated and therefore can be pooled.  VMWare offers a higher level of high availability called VMWare Fault Tolerance.  With VMWare Fault Tolerance, failover resources are preallocated and VMWare is actively replicating the state of your virtual machine to another server. This method provides instantaneous recovery in the event of a failure and unlike a physical appliance the replication goes beyond the WIP, therefore, in Fault Tolerance mode the failover can occur in the middle of an interaction with an external resource transparently.  The disadvantage of this approach is that you need additional dedicated network resources for FT and you need to preallocate the memory and CPU resources for FT.  Therefore, FT effectively requires more than double the resources as HA due to the extra network requirements and load required to replicate the state.  See this post for more details on setting up CIOS HyperVisor Edition.

Active/Active and Load Balancing Scenarios

The active/passive model works well when you want High Availability and your load does not exceed the capacity of a single appliance.  It is a simple but elegant design that provides transparent recovery in the event of failure.  This ease of use is perfectly aligned with what a typical customer expects from Cast Iron.  Further, in our experience a single appliance, when orchestrations are designed properly, is more than adequate for most Cast Iron users.

That being said, there are other options out there for load balancing and high availability using multiple appliances, however, most are dependent on the use case and the endpoints involved.  If you are using CIOS to host web services over HTTP you can use an HTTP load balancer to distribute load across multiple appliances, most HTTP load balancers have some means of detecting failed nodes and redirecting traffic.  For database sources you can use multiple buffer tables and write triggering logic to balance the load.  Other source systems such as SAP and JMS are also easily setup for load balancing across multiple appliances.

In the past we have also used a dispatcher model to distribute load, this is particularly effective when the load is generated by use cases with complex logic which leads to longer running jobs.  With a dispatcher model, however, eliminating the dispatcher as a signal point of failure can prove to be difficult and is use case dependent.

What About Disaster Recovery?

Disaster Recovery (DR) is a question of how do deal with catastrophic failure such as when a hurricane destroys your datacenter.  Again, how quickly you can recover and what level of service you can provide in such an event will depend on architecture and impact budget.  The lowest cost DR solutions are usually manual workarounds to allow business to continue when a catastrophic failure occurs.  True seamless DR requires a remote datacenter with hardware replicating the main data center and automated recovery.  In most DR plans the recovery requires some manual processes, and in most there is ongoing maintenance that needs to occur to keep project versions and patch levels in sync.  Most DR plans call for DR appliances to be racked and mounted and powered up at all times, but that too is a consideration and a cost.  Most customers who opt for a hardware solution will have an HA pair for there main appliance and a single node in a remote data center for DR.  Typically, the DR node is racked, mounted and powered on with all the orchestrations loaded and configured but undeployed.  When it comes time to activate the DR appliance at that point it is theoretically just a matter of starting up the projects on the DR appliance.  Virtual appliance users typically have a DR plan for their virtual infrastructure and Cast Iron falls in line with that plan.  However, planning for DR is typically application specific and requires thinking about the problem from end to end.  You need to understand the DR plan for any endpoints that you are integrating with and also understand where the state of your integrations is stored.  In the end there is a serious cost benefit analysis that must be considered when planning for HA / FT and DR.  The business must decide where the proper balance is between SLA and budget.