Yesterday I was helping another developer with an external API that had a weird limitation. The API only “processed” 75 strings in a call but allowed users to have tens of thousands of strings. So the question become on how to chunk these 20000 strings list into batches of 75.
There are a lot of ways to do this. I decided to use C# yield keyword to solve this problem. The basic idea of yield is to return from the middle of a an iterator and keep coming back into that loop on subsequent calls. The function in which you yield return needs to have a return type of System.Collections.IEnumerable.
There are a few things to note when defining a function that uses yield. The yield statement can only appear inside an iterator block, which can be implemented as the body of a method, operator, or accessor. The body of such methods, operators, or accessors is controlled by the following restrictions:
-
Unsafe blocks are not allowed.
-
Parameters to the method, operator, or accessor cannot be ref or out.
-
A yield return statement cannot be located anywhere inside a try-catch block. It can be located in a try block if the try block is followed by a finally block.
-
A yield break statement may be located in a try block or a catch block but not a finally block.
Enough of theory. Now let’s see an example in action:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// load from some data source
List<string> mydataset = new List<string>()
{ "aaa", "bbb", "ccc",
"ddd", "eee", "fff",
"ggg", "hhh", "iii"};
// process the data in chunks of 2
foreach (List<string> chunks in ChunkMe(mydataset, 2))
{
// process the data
Console.WriteLine("Processing batch...size of " + chunks.Count);
chunks.ForEach(s => Console.WriteLine(s));
}
}
/// <summary>
/// chunks a long arbitrary array into smaller peices
/// </summary>
/// <param name="data">long array of strings</param>
/// <param name="chunkSize">the number of rows to return</param>
/// <returns>rows from the array</returns>
public static IEnumerable ChunkMe(List<string> data, int chunkSize)
{
// do error checks on input params
// start at the beginning of the list
int currentChunkStart = 0;
// the chunk of data to return
List<string> currentChunk = null;
// while the string array has more data
while (currentChunkStart < data.Count)
{
// get data to return
currentChunk = data
// skip already processed entries
.Skip(currentChunkStart)
// take the next batch
.Take(chunkSize)
// get it
.ToList();
// set the next return point
currentChunkStart += chunkSize;
// return in the middle
yield return currentChunk;
}
}
}
}
As you can see from the above code, the return of the ChunkMe function is an IEnumerable and the calling code just iterates over the values returned by this function to gets chunks of data to process. Below is an example output:
Processing batch...size of 2
aaa
bbb
Processing batch...size of 2
ccc
ddd
Processing batch...size of 2
eee
fff
Processing batch...size of 2
ggg
hhh
Processing batch...size of 1
iii
looks like a cool technique to have handy
ReplyDeletevery good, i had the need of a solution that allows to do it within try catch, and i ended up with generics without yield...
ReplyDeleteusing System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// load from some data source
List mydataset = new List()
{ "aaa", "bbb", "ccc",
"ddd", "eee", "fff",
"ggg", "hhh", "iii","jjj","kkk"};
// process the data in chunks of 2
foreach(var chunks in ChunkMe(mydataset, 4))
{
// process the data
Console.WriteLine("Processing batch...size of " + chunks.Count);
chunks.ForEach(s => Console.WriteLine(s));
}
}
public static IEnumerable> ChunkMe(
List mydataset, int chunkSize)
{
int chunkCount = mydataset.Count / chunkSize;
int lastChunkSize = mydataset.Count % chunkSize;
var retVal = new List>(chunkCount + (lastChunkSize == 0 ? 0 : 1));
for(int index = 0; index < chunkCount; index++)
{
retVal.Add(ChunkOne(mydataset, chunkSize, index * chunkSize));
}
if(lastChunkSize > 0)
{
retVal.Add(ChunkOne(mydataset, lastChunkSize, mydataset.Count-lastChunkSize));
}
return retVal;
}
public static List ChunkOne(List mydataset, int chunkSize, int position)
{
var retVal = new List(chunkSize);
for(int cIndex = 0; cIndex < chunkSize; cIndex++)
{
retVal.Add(mydataset[(position) + cIndex]);
}
return retVal;
}
}
}
• You have posted a extremely detail document. I go through all of your article and I actually really like it, I understand your point of view.
ReplyDelete* gmail signup
beli foto ayam sabung paling bagus
ReplyDeleteThis is very appealing, however , it is very important that will mouse click on the connection:
ReplyDeletehttps://qualityseopackages.com/qspx-managed-whitelabel-seo-services/
I love this blog!! The flash up the top is awesome!!
ReplyDelete80 Niche Related links just 5$ on fiverr
I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading.
ReplyDeleteSEO BACKLINKS
Your keyword research time and effort must focus on identifying keywords that are relevant to your niche because relevance is critical to both search engine rankings and to the satisfaction of your end users (which, in turn, is obviously good for conversions).adwords wrapper tool
ReplyDeleteI have spent a lot of the time in different blogs but this is really a unique blog for me.
ReplyDeleteproduct design agency
In C#, the yield keyword is used to return each element of an enumerable collection one at a time, without creating an entire collection in memory. It simplifies the implementation of iterators. For an example of report writing on C# features, the yield keyword can demonstrate how to efficiently handle large data sets with minimal resource usage.
ReplyDelete