SharePoint is a great tool and adding workflows to it makes it a marvellous one. SharePoint Workflow Solutions, Tasks automation and process streamlining mean less work and much efficient information flow. Unfortunately, like with any powerful tools, deploying workflows in a wrong manner can lead to results opposite than expected. There are many ways to achieve given functionality, but only one can be the optimal one. With so many switches and levers available for workflow designers, working on top of highly customizable platform like SharePoint, process implementation can become a disaster if performed without basic knowledge about workflow architecture and environment it works in. And usually it does not matter if you use SharePoint Designer, Visual Studio or third-party tool to define the workflow.
If you are responsible for designing and implementing workflows read on to learn about five most common things you can do to turn your workflow solution into a failure and, fortunately, advices how to avoid these traps.
1 Avoid Planning and Testing
With SharePoint’s powerful customization options and easy-to-define workflows (especially when using third-party workflow tools) it is easy to quickly build and deploy solutions. Because SPD and third-party tools allow you to create and implement workflows without the need to install anything on servers (of course after initial product setup) there is a great temptation to deploy solutions to production without the necessary testing and stabilizing phase or even to create solutions on production environment. Also the planning phase can be easily avoided – we can now just gather the requirements and rapidly create an application that meets these requirements. When we pass it to business to get their thoughts we can sometimes find out that business liked it so much that they started using in, even though it was just a prototype.
Skipping or shortening the planning phase thanks to the prototyping possibilities of workflow solutions can have serious consequences. You might become victim of your own success. If you create a successful workflow solution for department users, it does not mean it will be a successful solution for the entire enterprise. Workflows running on hundreds of items and for twenty users most probably will not work efficiently for thousands of items and hundreds of users if they were not planned to do so. Supporting very large lists and multiple users requires careful planning and choice of used tools.
The problem with this approach is that because the tools give us so much power we feel that we can cut corners. And there are two big workflow-specific traps that can ruin our entire solution.
First is the complexity of the workflow solution. Almost all workflows involve two or more people and introduce exchange of information between them. Building a working solution is a one thing, testing all possible scenarios is something completely different. Because there are many people involved in a single workflow, we need to make sure that all users, with their specific permissions and needs, get the information they need and in the way they need it to make right decisions. This requires developers and testers to identify and support all possible combinations of users, permissions and workflow stages.
Failure to identify all these combinations will have much serious consequences in workflow solution than, for example, web parts, because of the second trap: workflows cannot be changed once started. If you start a workflow in SharePoint, it will run according to its definition that was valid when the workflow started – and you cannot make any modifications to it. You can only terminate the workflow instance and start a new one, based on the new workflow definition. If you start a hundred long-running workflows and after a month users begin to report errors, all the workflows running the same definition will have the same error which you will not be able to fix without terminating all the workflows.
Solution to this problem is quite simple but requires hard work and involvement of entire team to implement: follow best practices for software development. Even if you are not a developer and you do not use developer tools, you are still creating a software solution. Make sure you plan your workflows: know what data you need, what exactly your workflow should do, what are the limitations of your platform, who is going to use your tool and what is the expected load. Plan for testing and perform tests, make sure you include all the potential actors in the scenarios and remember to make a test deployment first. If you expect to have hundreds of workflows running every day, be sure they run according to a flawless workflow definition that models the process in a correct way.
Resources:
– Plan workflows – http://technet.microsoft.com/en-us/library/cc263134.aspx
– Content type and workflow planning – http://technet.microsoft.com/en-us/library/cc262735.aspx
– Creating and Managing Workflows – Planning for Workflow Deployment http://allcomputers.us/windows_server/sharepoint-2010—creating-and-managing-workflows—planning-for-workflow-deployment.aspx
– Estimate performance and capacity planning for workflow in SharePoint Server 2010 – http://technet.microsoft.com/en-us/library/gg508755.aspx
2 Mess Up Your Farm
If you have a professional SharePoint administrator taking care of your farm then consider yourself lucky and skip this point. If not then continue reading to see what can happen to your workflows when your farm is not in a good condition.
SharePoint workflows run on your SharePoint farm. If your farm is not healthy, your workflows will not be healthy either. SharePoint farms come in different sizes and flavors. Each can have different purpose, load patterns and requirements that lead to different configuration – both hardware and software. Number of platform configuration options is mind-boggling – starting with OS and database versions and editions, through number of front-ends, application and DB servers with combination of services running on each one of them and ending with specific settings for each farm, web application, site collection, site and list – down to definition of one single item. With such variety of available options it’s quite easy to pick the wrong ones. Unfortunately, workflows have this strange ability to quickly find all configuration problems and crash on them.
If you want to run into workflow problems simply don’t care about your environment. Do a plain setup and start setting up your solution. Don’t worry about the health analyzer errors or some strange entries in the event log – SharePoint is working, so no worries, right? Probably as soon as you start to put some more load on the farm, add more users and couple of thousands of items, you will experience workflows not starting correctly, crashing, hanging, failing to perform whatever task you want them to do.
Unfortunately there is no silver bullet solution to this problem. Due to almost infinite number of possible configurations, each issue needs to be diagnosed and solved individually. Here are some areas that should receive your special attention:
> SharePoint installation – do the “next, next, next, finish” setup only on simple, development or demonstration environments. For anything else (yes, staging and testing as well) plan and prepare before starting the SharePoint installer.
> Ensure appropriate resources – Remember to plan for performance as workflows can be heavy users. Make sure you have enough web front-ends, as workflows run on them, and database resources, as workflows are stored in the database and are constantly moved between WFE and DB servers. Monitor your resources: processor, RAM, network and disks – the latter especially on the database machines. Look for peaks and try to identify the usage trends to optimize their utilization.
> Accounts and permissions – learn and implement Microsoft’s recommendation regarding farm’s accounts and their permissions – on local server, domain and database level. Make sure you have configured all the accounts for all application pools, OWSTimer, search and profiles, because all services are in some ways connected and sometimes feature that seems irrelevant to workflows can cause trouble. Inappropriate permission settings can lead to incomprehensible “Thread is being aborted” or “Unexpected error occurred” errors.
> IIS and SharePoint timer – SharePoint workflows are run by w3wp and OWSTimer processes. Make sure these processes are recycled regularly, have sufficient resources assigned and run on account with appropriate permissions. Clogged IIS and timer processes will cause SharePoint response time to drop, lose queries and threads and ultimately become irresponsive. What is more important, monitoring can show normal resource utilization in a situation where IIS and timer are starving for more RAM.
SharePoint monitoring – regularly check and fix problems reported by health analyzer, event logs and diagnostic logs. They might be scary at first, but there are great tools that help you to tame them. And remember – search engine of your choice is your best friend when resolving the “Exception from HRESULT: 0x80004004” error.
Workflow thresholds – SharePoint allows you to configure how the workflow are being processed by the w3wp and OWSTimer duet. Use the “workitem-eventdelivery” and “workflow-eventdelivery” properties to specify workflow events paging size, workflow job timeout and other workflow-related parameters.
All of the above tasks are quite well documented on the MSDN. Take some time to read about them so you won’t waste time fixing errors. See the “Resources” paragraph below for recommended readings and tools.
Resources:
> Planning and architecture for SharePoint Server 2010 – http://technet.microsoft.com/en-us/library/cc261834.aspx
> Monitoring and maintaining SharePoint Server 2010 – http://technet.microsoft.com/en-us/library/ff758658.aspx
> Why all these SharePoint service accounts? – http://www.ericharlan.com/Moss_SharePoint_2007_Blog/why-all-these-sharepoint-service-accounts-a181.html
> SharePoint Administration Toolkit – http://www.microsoft.com/downloads/en/details.aspx?FamilyID=718447d8-0814-427a-81c3-c9c3d84c456e&displaylang=en
SharePoint LogViewer – http://sharepointlogviewer.codeplex.com/ – tool for viewing ULS logs with live monitoring, search, filtering and sorting
3 Ignore Fine-Grained Permission Restrictions
If you have a professional SharePoint administrator taking care of your farm then consider yourself lucky and skip this point. If not then continue reading to see what can happen to your workflows when your farm is not in a good condition.
SharePoint workflows run on your SharePoint farm. If your farm is not healthy, your workflows will not be healthy either. SharePoint farms come in different sizes and flavors. Each can have different purpose, load patterns and requirements that lead to different configuration – both hardware and software. Number of platform configuration options is mind-boggling – starting with OS and database versions and editions, through number of front-ends, application and DB servers with combination of services running on each one of them and ending with specific settings for each farm, web application, site collection, site and list – down to definition of one single item. With such variety of available options it’s quite easy to pick the wrong ones. Unfortunately, workflows have this strange ability to quickly find all configuration problems and crash on them.
If you want to run into workflow problems simply don’t care about your environment. Do a plain setup and start setting up your solution. Don’t worry about the health analyzer errors or some strange entries in the event log – SharePoint is working, so no worries, right? Probably as soon as you start to put some more load on the farm, add more users and couple of thousands of items, you will experience workflows not starting correctly, crashing, hanging, failing to perform whatever task you want them to do.
Unfortunately there is no silver bullet solution to this problem. Due to almost infinite number of possible configurations, each issue needs to be diagnosed and solved individually. Here are some areas that should receive your special attention:
> SharePoint installation – do the “next, next, next, finish” setup only on simple, development or demonstration environments. For anything else (yes, staging and testing as well) plan and prepare before starting the SharePoint installer.
> Ensure appropriate resources – Remember to plan for performance as workflows can be heavy users. Make sure you have enough web front-ends, as workflows run on them, and database resources, as workflows are stored in the database and are constantly moved between WFE and DB servers. Monitor your resources: processor, RAM, network and disks – the latter especially on the database machines. Look for peaks and try to identify the usage trends to optimize their utilization.
> Accounts and permissions – learn and implement Microsoft’s recommendation regarding farm’s accounts and their permissions – on local server, domain and database level. Make sure you have configured all the accounts for all application pools, OWSTimer, search and profiles, because all services are in some ways connected and sometimes feature that seems irrelevant to workflows can cause trouble. Inappropriate permission settings can lead to incomprehensible “Thread is being aborted” or “Unexpected error occurred” errors.
> IIS and SharePoint timer – SharePoint workflows are run by w3wp and OWSTimer processes. Make sure these processes are recycled regularly, have sufficient resources assigned and run on account with appropriate permissions. Clogged IIS and timer processes will cause SharePoint response time to drop, lose queries and threads and ultimately become irresponsive. What is more important, monitoring can show normal resource utilization in a situation where IIS and timer are starving for more RAM.
> SharePoint monitoring – regularly check and fix problems reported by health analyzer, event logs and diagnostic logs. They might be scary at first, but there are great tools that help you to tame them. And remember – search engine of your choice is your best friend when resolving the “Exception from HRESULT: 0x80004004” error.
> Workflow thresholds – SharePoint allows you to configure how the workflow are being processed by the w3wp and OWSTimer duet. Use the “workitem-eventdelivery” and “workflow-eventdelivery” properties to specify workflow events paging size, workflow job timeout and other workflow-related parameters.
All of the above tasks are quite well documented on the MSDN. Take some time to read about them so you won’t waste time fixing errors. See the “Resources” paragraph below for recommended readings and tools.
Resources:
– Planning and architecture for SharePoint Server 2010 – http://technet.microsoft.com/en-us/library/cc261834.aspx
– Monitoring and maintaining SharePoint Server 2010 – http://technet.microsoft.com/en-us/library/ff758658.aspx
– Why all these SharePoint service accounts? – http://www.ericharlan.com/Moss_SharePoint_2007_Blog/why-all-these-sharepoint-service-accounts-a181.html
– SharePoint Administration Toolkit – http://www.microsoft.com/downloads/en/details.aspx?FamilyID=718447d8-0814-427a-81c3-c9c3d84c456e&displaylang=en
SharePoint LogViewer – http://sharepointlogviewer.codeplex.com/ – tool for viewing ULS logs with live monitoring, search, filtering and sorting.
4 Ignore Fine-Grained Permission Restrictions
If you ask “what fine-grained permission restrictions” then this way of reaching the failure can soon be your way. Sooner or later you will come upon a scenario where you need to set unique permissions on item/document level. These scenarios usually involve having a requirement to hide item from all users except author/responsible/approver/reviewer or restrict viewing/editing/managing every item only to a group of people somehow related to the item – usually through the metadata values.
The easiest way to accomplish this scenario is to use a workflow to change current item’s/document’s permissions. Simply in some moment of a workflow check who is the author/person responsible/approver/reviewer and grant this person additional permissions.
If you are doing it on a list with over thousand items, better yet all items are in one root directory, you append permissions to already existing ones and on top of that you realize the size of the list only after your business started dropping hundreds of items every day then get ready for a ride of your lifetime.
What will happen is that you will notice that your solutions is strangely beginning to slow down, users need to wait longer than expected for the list to load or workflow to move forward. Entire farm, not only your solution is getting very, very slow, but at the same time your WFE are fine, your application servers are fine, only SQL seems a bit overloaded. When waiting time is going to reach few minutes your users will walk away leaving you stunned.
The reason of all these problems is the fine grained permissions – having separate permissions for many items on large lists or document libraries. Due to a way SharePoint handles the permission management having many security scopes, especially on one hierarchical level will put very heavy load on the SQL server. Each query, including search related to such list will force SQL to get list of all the security scopes and membership. For lists with five thousand items and 300 users one (just one!) query can take ten seconds. If you have just 5% of your users try to enter the list simultaneously then your entire SQL is busy serving these queries and does not have time for anything else for over 2 minutes.
The fine-grained permissions (FGP) restrictions are known for some time, but it is still not very popular knowledge. If you have a solution that requires FGP, then start by reading Microsoft’s white paper “Best practices for using fine-grained permissions”. It provides insights of the source of limitations and also possible solutions to the problem. In general Microsoft advises to avoid it.
Workarounds include:
> Group users into groups and assign permissions to these groups. Try to avoid SharePoint groups as assigning permissions to SharePoint group will start full search crawl.
> Try to group the item with the same permissions and put them into containers – folders, lists, sites and assign permissions on the containers. This way permission inheritance will be broken only on the container level.
> Use publishing levels to control access.
> Use the ReadSecurity and WriteSecurity permission levels for the item authors on lists
> Remove user permissions to the list and create your own solution to display lists and items using elevated privileges and custom permission infrastructure.
> If using custom code use the new AddToCurrentScopeOnly method to assign permissions – this way you will avoid updating all parent objects’ scopes.
Also keep these best practices in mind:
> Limit number of items on one hierarchical level to 2000
> Reduce the number of uniquely permissioned parent objects
> Try to manage permissions on the web site level or at least at the highest possible level.
> Avoid at all cost copying permissions from the parent. When you break inheritance and add permissions to chosen user, all parent objects up to the first uniquely permissioned web will receive the “Limited access” permissions for this user to ensure user’s access. If you assign permissions to 500 different items to 500 different users on the same hierarchical level then all these parent objects will have 500 “Limited access” entries. If you do not pay attention you will end up with all your items having 500 “Limited access” permissions which will cause the SQL query to grow very, very large.
Resources:
– Best practices for using fine-grained permissions –http://www.microsoft.com/download/en/details.aspx?id=9030
– Security planning for sites and content – http://technet.microsoft.com/en-us/library/cc262939.aspx
– SharePoint 2010 Performance with Item Level Permissions http://e-junkie-chronicles.blogspot.com/2011/03/sharepoint-2010-performance-with-item.html – part 1 and 2
5 Forget about Workflow Running Context
This scenario applies to workflows that do not run in elevated privileges mode – as an app pool account. Unfortunately most workflows don’t – build in workflows, SharePoint Designer workflows and most of the 3rd party solution workflows run as workflow initiator with an option to run as workflow author (designer).
You can run into problems quite easily. It’s enough to have just one activity in your workflow (add item, update item) that requires permissions the workflow initiator does not have and your workflow will stop with the “Error occurred” status. Also if you are using a lookup and you do not ensure that user who started the workflow will have permissions to the object your lookup is pointing to, then you will receive a blank string instead of the value you have expected which can cause your workflow to malfunction. If you try to work around this problem using the new “impersonation step” you might run into bigger problems. On one hand as a workflow designer you will be sure, that if you can get the workflow running properly then everyone else will have it running properly, regardless of their permissions. On the other hand if the workflow designer loses his or her permissions (for example changes work position or is sacked) then all workflows created by this person will fail on first operation that requires permissions.
To avoid having issues mentioned above you have the following options:
> Code your workflows to work as system account or use a workflow solution that works as system account (like Datapolis Workbox).
> Use one dedicated account to deploy workflows and perform entire workflows in the impersonation step. This account should have appropriate permissions for the workflows to run properly. Before deploying the workflow make sure it does not have any malicious code or does not give access to sensitive data.
> Very carefully plan for permissions in your workflow. Make sure you know where exactly you are getting data from and what data you are going to modify and ensure that every potential workflow user has permissions to all these objects.
Resources:
– Plan for workflow security and user management – http://technet.microsoft.com/en-us/library/ee428324.aspx
– Impersonating user in Workbox vs impersonating user in SharePoint Designer – http://wbblog.datapolis.com/2011/04/impersonating-user-in-workbox-vs.html
– Declarative Workflows and User Context – http://blogs.msdn.com/b/sharepointdesigner/archive/2008/09/28/declarative-workflows-and-user-context.aspx
5-B Start Large Numbers of Workflows Simultaneously
SharePoint is able to run tens of thousands of workflows without problems. If you ensure enough front-ends and database even large solutions can run efficiently. There is a limit however how many workflows can be processed simultaneously. By default there is a limit of 15 workflows starting/activating simultaneously on your farm. This limit can be raised by a PS/stsadm command – “workflow-eventdelivery-throttle”. If your try to start more than 15 workflows at the same time then workflows exceeding the limit will be queued and will wait for the OWSTimer to start them when there are more resources available. Unfortunately under large loads this mechanism is not working as it should.
You can easily run into problems if you implement the following scenario: your business requires an approval workflow on invoices. Invoices are being scanned with a scanner that cannot save directly to SharePoint library, so scans are sent by e-mail to e-mail enabled SharePoint library. Scanning is done in batches once a day and there is about a thousand documents scanned in one batch. When the e-mails with documents start to arrive at the library, SharePoint takes the attachments and adds then as new documents on the library and then automatically starts the workflow on document creation. First 15 workflows are being started, rest is queued and picked by the OWSTimer. At some point you start to notice that some workflows (up to 2%-10% of the total number of new workflows, depending on environment load) are acting strange. Some have status “Failed on start (restarting)”, some have empty status, some show “Starting” for quite long time. Since having to start a thousand items can be a resource-hungry task you assume that SharePoint will be able to restarted the failed workflows, these items with empty workflow status will have their workflow started once the resources are freed and the “Starting” workflow will start. But after few hours of system idling you notice that workflows are not fixing themselves. Also you notice that the same time when the batch was being processed some other workflows were trying to start in other sites and application and also are hanging in these strange states. This means that some important workflows are not starting correctly and no one knows about it.
Avoiding this problem can be a problematic, but here as some tips:
> First thing you can do is to monitor if your resources are being totally used during the batch scanning and if not then raise the workflow throttle and other “workflow-eventdelivery” and “workitem-eventdelivery” parameters to make use of your system’s resources. This way more workflows will be able to start simultaneously and will not be transferred to OWSTimer.
> Add more frontends to your farm – Microsoft’s test show that adding up to 3 WFE servers has significant on the workflow processing. Of course you can add more WFEs – it will not do much to the workflow start, but can take the request processing load of the servers starting the workflows. Such move will allow you to raise the workflow threshold and use your resources
> Instruct your users to avoid flooding SharePoint with items on auto-starting workflows. Instead of one batch of thousand documents make them do 4 batches of 250 documents.
> Clear SharePoint Configuration Cache and remember about ISS and Timer recycle.
> Implement your own workflow starting mechanism. Turn of automatic workflow start on item creation and set a timer-based solution that will start workflows on items without a workflow in a sequential manner. In other words – check every 5 minutes if there are any items without workflow running on selected list and if so start a workflow on first item found, then second, then third. This way you will not force SharePoint to handle hundreds of starting workflow, but wait with starting new workflow until the previous one has started. This solution is a bit slower than auto-starting workflows, but gives your much more control over the workflow starting process and allows you to have confidence that all process that needed starting are starting correctly.
Resources:
– Workflow management: Stsadm properties – http://technet.microsoft.com/en-us/library/cc262633(office.12).aspx
– Clear SharePoint cache – http://blogs.msdn.com/b/josrod/archive/2007/12/12/clear-the-sharepoint-configuration-cache-for-timer-job-and-psconfig-errors.aspx
Check out last years European SharePoint Conference video:
European SharePoint Conference 2015 takes places in Stockholm Sweeden from 9-12 November 2015. View Programme>>